1. 78e9dbc [FlashAttention] Adapt attention for tiling + implement tiling/decompose (#15217) by Abhishek Varma · 1 year, 5 months ago
  2. 3323519 Port samples/dynamic_shapes/ to PyTorch using SHARK-Turbine. (#15255) by Scott Todd · 1 year, 5 months ago
  3. 20e2112 Cleaning up hal.executable.variant syntax. (#15254) by Ben Vanik · 1 year, 5 months ago
  4. a95a28a Move expected output URL into the parameter of `iree_run_module_test` (#15248) by Jerry Wu · 1 year, 5 months ago
  5. eb9b8b6 Revert "Improvements to e2e matmul tests" (#15252) by bjacob · 1 year, 5 months ago
  6. 71c22da Improvements to e2e matmul tests (#15243) by bjacob · 1 year, 5 months ago
  7. 5a20dce bf16: select appropriate tile sizes on x86 and Arm, and enable in x86 bitcode build (#15244) by bjacob · 1 year, 5 months ago
  8. 02e34b0 Optimize `moveOp[Up,Down]InBlock` functions in `SimplifyGlobalAccesses`. (#15245) by Scott Todd · 1 year, 5 months ago
  9. df00df9 Disable flaky metal `e2e/*_ops` tests. (#15240) by Scott Todd · 1 year, 5 months ago
  10. fed372c Make iree_benchmark_suite_module_test easier to run locally (#15238) by Jerry Wu · 1 year, 5 months ago
  11. 63381a8 Switching external resources to be device-local only. (#14016) by Ben Vanik · 1 year, 5 months ago
  12. 87c968c Lint fix trailing space in debugging/releases.md. (#15239) by Scott Todd · 1 year, 5 months ago
  13. 3a70dda Improving the `*.tensor.trace` op to carry shapes/encodings. (#15228) by Ben Vanik · 1 year, 5 months ago
  14. c232aeb Run markdownlint on some files under `docs/developers/*`. (#15168) by Scott Todd · 1 year, 5 months ago
  15. b13037b Fix build after #15151 (#15236) by Andrzej Warzyński · 1 year, 5 months ago
  16. 4db7c1a Avoid complaints about missing bc file if compiling for CUDA (#15232) by Jacques Pienaar · 1 year, 5 months ago
  17. 479f4ed [mlir][Flow] Reland https://github.com/openxla/iree/pull/14656 but keep `CollapseReductionDims` pass (#15219) by MaheshRavishankar · 1 year, 5 months ago
  18. add9417 [Bindings] Implement alloc + copy to local host when map is unavailable. (#14997) by Stanley Winata · 1 year, 5 months ago
  19. 8b1af38 Add support for converting bf16 to uint16 on func ops. (#15231) by Han-Chung Wang · 1 year, 5 months ago
  20. 82611a9 Making execution region results queue-ordered allocas. (#15149) by Ben Vanik · 1 year, 5 months ago
  21. cda49ca [rocm] Print GPU information when dumping device info (#15230) by Lei Zhang · 1 year, 5 months ago
  22. e8f184d Revert "[LLVMGPU] Splitting TensorCoreVectorization to two passes." (#15225) by Han-Chung Wang · 1 year, 5 months ago
  23. a9d7aa5 [Cleanup] Retire filter-based vectorization patterns. (#15185) by Han-Chung Wang · 1 year, 5 months ago
  24. 6b5b989 Remove unnecessary pragma (#15224) by bjacob · 1 year, 5 months ago
  25. f84545e Implementing stack trace capture on iree_status_t for Win/Mac. (#15151) by Ben Vanik · 1 year, 5 months ago
  26. e918678 [LLVMGPU] Splitting TensorCoreVectorization to two passes. (#15184) by Han-Chung Wang · 1 year, 5 months ago
  27. 87a1cc6 Fork CUDA and ROCm guides into separate pages. (#15196) by Scott Todd · 1 year, 5 months ago
  28. 0b2997e Tune compiler tracing usage of 'frame's. (#15216) by Scott Todd · 1 year, 5 months ago
  29. bb087a1 [ConstEval] Make ConstExprMaxSizeIncreaseThreshold be controlled by API. (#15183) by Han-Chung Wang · 1 year, 5 months ago
  30. 00b75cb [spirv] Account for dynamic dimensions when computing parallelism (#15211) by Lei Zhang · 1 year, 5 months ago
  31. d87a97e Delete a.out. by Scott Todd · 1 year, 5 months ago
  32. 58afd02 Bump TF to 2.16.0.dev20231013 (#15198) by Jerry Wu · 1 year, 5 months ago
  33. b7fd668 Move passes implementation to GlobalOptimization. (#15206) by Han-Chung Wang · 1 year, 5 months ago
  34. b2fc30d [docs] Add instructions for printing iree SHA from venv (#15212) by Jakub Kuderski · 1 year, 5 months ago
  35. ba3a6e0 Remove GPT-2 benchmarks on Andreno and Mali by mariecwhite · 1 year, 5 months ago
  36. 1af382f Update RISCV prebuild toolchains to support manylinux_2_28_x86_64 (#15170) by CindyLiu · 1 year, 5 months ago
  37. 1fa8b48 Replace more uses of `cc_library` with `iree_runtime_cc_library`. (#15204) by Scott Todd · 1 year, 5 months ago
  38. 46a2305 Integrate llvm 20231012 (#15163) by Stella Laurenzo · 1 year, 5 months ago
  39. acddd0c Adding hal.dispatch.extern op. (#15193) by Ben Vanik · 1 year, 5 months ago
  40. d7f97ba Drop references to an already-fixed test and some no-longer-existing files. (#15203) by bjacob · 1 year, 5 months ago
  41. d0dac3f Add vscode workspace files to `.gitignore` (#15202) by bjacob · 1 year, 5 months ago
  42. bc6643b Fix `ireeCompilerLoadBinary` on Apple: pass unprefixed symbol name to `dlsym` (#15201) by bjacob · 1 year, 5 months ago
  43. 9d7a4ba Data-tiling encodings: take the element types out of the enums. (#15182) by bjacob · 1 year, 5 months ago
  44. 2b5e61f Bump Nvidia libraries in docker to 535.113.01 (#15200) by Jerry Wu · 1 year, 5 months ago
  45. 15fe302 [Target][ROCM] Linking OCLC params as global instead of device BC. (#15169) by Stanley Winata · 1 year, 5 months ago
  46. e33613c Fix allocator crashes on Apple (#15199) by bjacob · 1 year, 5 months ago
  47. 467c86e [spirv] Generalize transposed batch matmul op (#15197) by Lei Zhang · 1 year, 5 months ago
  48. 193c132 [GPU] Fix hoisting after upstream change disallowing view ops (#15192) by Lei Zhang · 1 year, 5 months ago
  49. d3a152b Rebase PyTorch guide on SHARK-Turbine. (#15181) by Scott Todd · 1 year, 5 months ago
  50. ebdb098 [experimental][regression] Add ROCM Regression test. (#14861) by Stanley Winata · 1 year, 5 months ago
  51. fd63b3f Remove CUDA from benchmark_large suite by mariecwhite · 1 year, 5 months ago
  52. e190dd9 Ensure that compiler output is sufficiently aligned. (#15188) by Stella Laurenzo · 1 year, 5 months ago
  53. 95bece1 Add an API to set global compiler CL options. (#15190) by Stella Laurenzo · 1 year, 5 months ago
  54. 3eaface Add an API to disable nanobind's leak checker. (#15189) by Stella Laurenzo · 1 year, 5 months ago
  55. 7a8d8f7 [ROCM] Replace rocm sdk ld.lld with iree-lld for compile-time linkage. (#15187) by Stella Laurenzo · 1 year, 5 months ago
  56. e023ea7 [rocm] Bundle HIP headers into a submodule and use that by default. (#15186) by Stella Laurenzo · 1 year, 5 months ago
  57. 89031fc Delete duplicate canonicalization patterns from LinalgExt dialect. (#12124) by Han-Chung Wang · 1 year, 5 months ago
  58. b6c7c62 Limit the latency of e2e matmul tests (#15180) by bjacob · 1 year, 5 months ago
  59. c397258 Add pytorch_aot_simple sample Colab notebook using SHARK-Turbine. (#15166) by Scott Todd · 1 year, 5 months ago
  60. 3f001b6 [NFC] Switch CUDA to use GenericVectorization pass. (#15176) by Han-Chung Wang · 1 year, 5 months ago
  61. d11c5e5 Add a transform to hoist unrolled vector ops out of scf.for ops. (#14281) by Han-Chung Wang · 1 year, 5 months ago
  62. 4cdfb91 Update Nvidia host driver to 535.113.01 (#15179) by Jerry Wu · 1 year, 6 months ago
  63. 29b8197 Splitting dispatch annotation from outlining. (#15177) by Ben Vanik · 1 year, 6 months ago
  64. 2a42fd3 `avx512bf16` ukernel: work around yet another Clang-16 crash (#15164) by bjacob · 1 year, 6 months ago
  65. ef25b0c [metal] Fix creating device via registry path (#15142) by Lei Zhang · 1 year, 6 months ago
  66. 374f854 Fixing warnings on windows build (#15167) by Han-Chung Wang · 1 year, 6 months ago
  67. 2617126 Rename Colab notebooks to highlight frameworks used. (#15162) by Scott Todd · 1 year, 6 months ago
  68. b1263d1 Add pkgci artifacts to .gitignore (#15165) by Kunwar Grover · 1 year, 6 months ago
  69. dd26475 [PkgCI] Add recipe for correctness on CPU (#15131) by Kunwar Grover · 1 year, 6 months ago
  70. 9c424c4 [LinalgExt] Remove LinalgExt::Softmax and use upstream linalg::softmax (#15021) by Abhishek Varma · 1 year, 6 months ago
  71. b3cd60a Add pytorch_jit sample Colab notebook using SHARK-Turbine. (#15146) by Scott Todd · 1 year, 6 months ago
  72. f782069 [CPU] Share SCF tile size computation logic with `-iree-llvmcpu-tile` (#15154) by Benjamin Maxwell · 1 year, 6 months ago
  73. aef639b Add TF GPT2 benchmarks by mariecwhite · 1 year, 7 months ago
  74. 4e06f74 Re-enable PyTorch Benchmarks by mariecwhite · 1 year, 6 months ago
  75. 8380941 [NFC][SPIRV] Cleanup after deprecating LinalgVectorizationPattern. (#15156) by Han-Chung Wang · 1 year, 6 months ago
  76. 4983668 `mmt4d` ukernel for the `bf16*bf16->f32` case using AVX-512-BF16 (#15089) by bjacob · 1 year, 6 months ago
  77. a7a784e Update RISCV prebuilt toolchains (#15153) by CindyLiu · 1 year, 6 months ago
  78. 02e1625 Prune mobile benchmark suites (#14923) by Jerry Wu · 1 year, 6 months ago
  79. dec2949 [NFC] Switch to use upstreamed memref methods and transform ops. (#14871) by Han-Chung Wang · 1 year, 6 months ago
  80. 5b92b73 [Flow] Add naming heuristic for possible slow memory copies (#15150) by Kunwar Grover · 1 year, 6 months ago
  81. ba3e6a7 [CPU][SVE] Enable scalable vectorization and tiling for non-padded matmuls (#15108) by Benjamin Maxwell · 1 year, 6 months ago
  82. 729bb75 [spirv] NFC: Restructure TileAndVectorizeToCooperativeOps pass (#15138) by Lei Zhang · 1 year, 6 months ago
  83. 77a8741 Fixing TRACY_NO_EXIT on MacOS and supporting MacOS tracy builds. (#15143) by Ben Vanik · 1 year, 6 months ago
  84. 9181525 [CPU] Enable fast min/max ops in CPU codegen (#15130) by Diego Caballero · 1 year, 6 months ago
  85. 94e7e23 Revert "Fixing TRACY_NO_EXIT on MacOS. (#15139)" (#15140) by mariecwhite · 1 year, 6 months ago
  86. 198af34 Fixing TRACY_NO_EXIT on MacOS. (#15139) by Ben Vanik · 1 year, 6 months ago
  87. b5bbea2 [Codegen] Add the `bitcast -> extui` to `shuffle` folding patterns to EmulateNarrowTypes pass. (#15102) by MaheshRavishankar · 1 year, 6 months ago
  88. 17e758b [Reducer] Fix ReduceOptimizationBarrierDelta (#15129) by Kunwar Grover · 1 year, 6 months ago
  89. d545d82 [spirv] NFC: Restructure the vector lowering pass (#15134) by Lei Zhang · 1 year, 6 months ago
  90. 6baf83f [GlobalOptimization] Add RaiseSpecialOps earlier in the pipeline (#15125) by Quinn Dawkins · 1 year, 6 months ago
  91. 46d1860 Fix SSA use-def violation created by Tile and fuse pass. (#15133) by MaheshRavishankar · 1 year, 6 months ago
  92. 91dfbf4 Adds torch-mlir dequant ops to the lowering pipeline (#15128) by Daniel Garvey · 1 year, 6 months ago
  93. 5121c3f [Flow] Add peephole optimization for partial negation and reverse (#15121) by Quinn Dawkins · 1 year, 6 months ago
  94. 2dcd04d Integrate llvm 20231005 4 (#15119) by Stella Laurenzo · 1 year, 6 months ago
  95. 4b9cd62 [Reducer] Add bytecode support to iree-reduce (#15079) by Kunwar Grover · 1 year, 6 months ago
  96. 8e1befd [Flow] Optionally fuse the fill in the dequant fusion pass (#15124) by Quinn Dawkins · 1 year, 6 months ago
  97. 8651777 [gpu] Add basic heuristics for better reduction occupancy (#15120) by Lei Zhang · 1 year, 6 months ago
  98. 9e9aff0 [PkgCI] Add llama2_7b_i4 recipe for correctness testing on cuda (#15113) by Kunwar Grover · 1 year, 6 months ago
  99. 58b5670 [rocm] Enable the ROCM compiler target backend by default. (#15111) by Stella Laurenzo · 1 year, 6 months ago
  100. c64b31f [PkgCI] Add tqdm bar while downloading artifacts (#15112) by Kunwar Grover · 1 year, 6 months ago