1. 597629e [Codegen] Limit async scope in pipelining (#24350) by Lukas Sommer · 2 hours ago main
  2. 98ddf0c [VectorExt] Change `to_layout` `shared_memory_conversion` (#24377) by Lukas Sommer · 3 hours ago
  3. c0f5d4b [Codegen] Move remaining pipelines to `iree_codegen` attrs (#24398) by Jakub Kuderski · 8 hours ago
  4. 8cc05f5 [CPU] Improve tiling config for elementwise ops with dynamic shapes. (#24383) by Han-Chung Wang · 10 hours ago
  5. 16191ce Revert "[Codegen] Enable DMA by default for F16/BF16 Gemm on gfx950 (#24373)" (#24395) by Zhewen Yu · 10 hours ago
  6. 0fe0ca8 [Torch][LinalgExt] Support GQA in torch.hop_flex_attention lowering (#24313) by Keshav Vinayak Jha · 12 hours ago
  7. d62d69b [Compiler] Use Repeated<T> for repeated value ranges. NFC. (#24392) by Jakub Kuderski · 13 hours ago
  8. c4da71c [Codegen][CPU] Add a type-polymorphic generic-scalar MMA fallback. (#24389) by Benoit Jacob · 13 hours ago
  9. fac9d3d [INTEGRATION] Bump llvm to 0f3ca6bb9 (#24390) by Alan Li · 14 hours ago
  10. f4fb944 [ROCM] Workaround LLVM #194924 partial-unroll regression (#24379) by Alan Li · 15 hours ago
  11. 1c14508 [Codegen] Canonicalize transfer_{read,write} vector<1xT> (#24382) by Erick Ochoa Lopez · 21 hours ago
  12. 4030534 [INTEGRATION] Bump llvm to 3ed76d05a78d (#24376) by Alan Li · 35 hours ago latest-snapshot
  13. ac49ab6 [ROCm] Drop deprecated --iree-hip flag aliases (#24381) by Jakub Kuderski · 2 days ago
  14. 4f99043 Reapply "[Codegen] Enable DMA by default for F16/BF16 Gemm on gfx950 (#24117)" (#24235) (#24373) by Zhewen Yu · 2 days ago
  15. 5fbfe29 [LinalgExt] Fix attention NaN for fully-masked rows (#24178) by Keshav Vinayak Jha · 2 days ago
  16. d2f4f44 Bump iree-org/torch-mlir@46cbd27f7c (#24380) by Keshav Vinayak Jha · 2 days ago
  17. 9092be0 [InputConversion] Lower AtenArgmax/AtenArgmin to iree_linalg_ext.arg_compare (#24291) by Bangtian Liu · 2 days ago
  18. 6efc2ca Generalize FoldMaskedTransferRaw and add FoldTransferReadOfEmptyTensor (#24301) by Erick Ochoa Lopez · 2 days ago
  19. ca7e063 [VectorDistribute] Lower and distribute `async_dma` (#24299) by Lukas Sommer · 2 days ago
  20. c6525dd [Codegen] Duplicate operations in tile size analysis (#24246) by Lukas Sommer · 2 days ago
  21. 2608330 [IREEGPU] Bufferize `async_dma` (#24300) by Lukas Sommer · 2 days ago
  22. 2725310 [AMDGPU] Roll-up of AMDGPU HAL improvements for CDNA support (#24359) by Ben Vanik · 2 days ago
  23. a1b8b72 [test][nfc] Add regression tests about strided vector.gather back. (#24370) by Han-Chung Wang · 2 days ago
  24. f9562fe [HAL] Preserve tensor import effects when folding (#24364) by Ben Vanik · 2 days ago
  25. 915c6ea [HAL] Add executable global lookup buffers (#24336) by Ben Vanik · 2 days ago
  26. b63db90 [LLVMGPU] Add TileAndFuse fallback for iree_linalg_ext.arg_compare (#24347) by Bangtian Liu · 2 days ago
  27. b014947 [INTEGRATION] bump llvm @ 8be29edc2 (#24363) by Alan Li · 3 days ago
  28. 020f6be [Codegen][CPU] Lower data-tiled inner_tiled in VirtualVectorLoweringPass. (#24358) by Benoit Jacob · 3 days ago
  29. 4f6cd98 [DispatchCreation][LinalgExt] Add OnlineAttentionOp support in dispatch formation and reshape fusion (#24068) by Keshav Vinayak Jha · 3 days ago
  30. d7eed39 [LinalgExt] Fix attention index remap after unit-dim folding (#24349) by Keshav Vinayak Jha · 3 days ago
  31. 76a8215 Bump dawidd6/action-download-artifact from 20 to 21 in the github-actions group (#24360) by dependabot[bot] · 3 days ago
  32. 18a49cb [Codegen] NFC: Lift InnerTiledOp unroll pattern to Codegen. (#24357) by Benoit Jacob · 4 days ago
  33. 2ba8b6f Revert "[LLVMGPU] Fall back to scalar lowering for tiny attention shapes (#24239)" (#24356) by Nirvedh Meshram · 4 days ago
  34. f2b0897 [Codegen] Fix iree-compile --debug crash on CPU/GPU codegen pass options (#24190) by Han-Chung Wang · 4 days ago
  35. e623d00 [INTEGRATION] Bump llvm to f306525759 (#24354) by Alan Li · 4 days ago
  36. d097bad [DispatchCreation] Prevent fill->scatter cloning (#24214) by Ian Wood · 4 days ago
  37. 1192630 [Codegen] NFC: Lift InnerTiledOp lower & drop-unit-dim patterns to Codegen. (#24351) by Benoit Jacob · 4 days ago
  38. d358e81 [Codegen][CPU] Lower inner_tiled to llvm.call_intrinsic. (#24345) by Benoit Jacob · 4 days ago
  39. d4e04f7 compiler/plugins/input/TOSA: fix: TOSA arith lowering must handle apply scale introduced by linalg lowering (#24121) by Florian Walbroel · 4 days ago
  40. 81f4dec [LLVMGPU] Fall back to scalar lowering for tiny attention shapes (#24239) by Keshav Vinayak Jha · 4 days ago
  41. fdf1392 [DispatchCreation] Allow fusion of multi-result producers (#24169) by Keshav Vinayak Jha · 4 days ago
  42. 0935008 [DispatchCreation] Tighten scatter-skip predicate in CollapseDimensions (#24334) by Vivian Zhang · 6 days ago
  43. f4d1908 [Codegen][DMA] Fix unaligned swizzle offset computation in gather-to-lds lowering (#24241) by Zhewen Yu · 6 days ago
  44. 7098bdf [LLVMGPU][nfc] Modernize the rest of LLVMGPU pipeline tests. (#24341) by Han-Chung Wang · 7 days ago
  45. 316c1c1 [LLVMGPU][nfc] Modernize vector distribution pipeline tests. (#24340) by Han-Chung Wang · 7 days ago
  46. 8fc32e0 [DispatchCreation] Refactor and add low-parallelism split reduction parameter set (#24293) by Vivian Zhang · 7 days ago
  47. 3be9dc6 Refactor vector.multi_reduction into flattening, unrolling, and lowering passes. (#24183) by Erick Ochoa Lopez · 7 days ago
  48. d13374f Bump llvm to llvm-project@88e5eeb292f (#24339) by Nirvedh Meshram · 7 days ago
  49. dd5a6e3 [Codegen] Support pack/unpack/linalg generic transpose in CombineLayoutTransformation (#24273) by Muzammiluddin Syed · 7 days ago
  50. c40c7a3 [Codegen][CPU] Teach the lowering strategy about inner_tiled. (#24328) by Benoit Jacob · 7 days ago
  51. 174808a [LinalgExt] Fix ArgCompareOp::generateResultTileValue for producer fusion (#24317) by Bangtian Liu · 7 days ago
  52. 83a30bb Update CODEOWNERS for spreading review responsibility (#24332) by Han-Chung Wang · 7 days ago
  53. 404b958 [Codegen] NFC: Lift DataTiledMMA inner_tiled lowering helpers into MMAUtils. (#24326) by Benoit Jacob · 7 days ago
  54. 0a73681 [Codegen][CPU] Fix RHS indexing map in materialize-encoding inner_tiled lowering. (#24325) by Benoit Jacob · 7 days ago
  55. 01c52eb [DispatchCreation] Fuse scalar reductions with their parallel consumers (#24166) by Abhishek Varma · 7 days ago
  56. 64031dd Reapply "[Codegen] Use local binders for optimization flags in codegen (#24220)" (#24333) by Han-Chung Wang · 7 days ago
  57. e6139f6 [CI] Ease contention on self hosted machines (#24316) by Erick Ochoa Lopez · 7 days ago
  58. d055923 Bump stablehlo to stablehlo@806a6844dfd92cca (#24330) by Nirvedh Meshram · 8 days ago
  59. 7247601 [LLVMGPU][ROCDL] Add pass to group global loads for better instruction scheduling (#24247) by Max191 · 8 days ago
  60. ce12fef Bump iree-org/torch-mlir@d2768f876d (#24320) by Rob Suderman · 8 days ago
  61. 967b794 [CPU] Add ContiguousMemrefGather1DToConditionalLoads vector lowering. (#24327) by Han-Chung Wang · 8 days ago
  62. fcbd569 Bump LLVM to llvm-project@6f1e6e47bdf (#24314) by Nirvedh Meshram · 8 days ago
  63. 9f7a14e [CI] Update iree-test-suite ref (#24304) by Erick Ochoa Lopez · 8 days ago
  64. 5872dc2 [Codegen][CPU] Pick inner-tiled unroll factors from a register budget. (#24303) by Benoit Jacob · 8 days ago
  65. 0380544 [IREEGPU] Define and expand `subgroup_scan` (#24188) by Lukas Sommer · 8 days ago
  66. a79bb7b [LLVMGPU] Remove unused `--iree-codegen-llvmgpu-use-unaligned-gemm-vector-distribution` flag (#24308) by Vivian Zhang · 8 days ago
  67. dfe8134 [HAL/AMDGPU] Initial host-side AMDGPU HAL implementation (#24298) by Ben Vanik · 8 days ago
  68. b42f44c [HAL/AMDGPU] Use status matcher in notification test by Ben Vanik · 9 days ago
  69. bf7d2b5 [HAL/AMDGPU] Disable queue upload rings by default by Ben Vanik · 9 days ago
  70. 3ac4164 [HAL/AMDGPU] Remove dynamic binding slot sidecars by Ben Vanik · 12 days ago
  71. 8d437bd [HAL/AMDGPU] Specialize all-dynamic dispatch replay by Ben Vanik · 12 days ago
  72. 237efe6 [HAL/AMDGPU] Compact dynamic binding pointer replay by Ben Vanik · 12 days ago
  73. 34c5ba3 [HAL/AMDGPU] Bake dynamic binding slots into command buffers by Ben Vanik · 13 days ago
  74. 2fc0cac [HAL/AMDGPU] Sample counter ranges on a profile queue by Ben Vanik · 13 days ago
  75. 85117aa [HAL/AMDGPU] Fix patched-template profile metadata by Ben Vanik · 14 days ago
  76. 6ebc9c9 [HAL/AMDGPU] Add queue-range counter profiling by Ben Vanik · 10 days ago
  77. d664e03 [HAL/AMDGPU] Initialize queue upload rings by Ben Vanik · 10 days ago
  78. 7953fbd [HAL/AMDGPU] Track upload ring reclaim positions by Ben Vanik · 10 days ago
  79. 593764a [HAL/AMDGPU] Add queue upload ring primitive by Ben Vanik · 10 days ago
  80. 4871281 [HAL/AMDGPU] Record mixed dynamic kernarg templates by Ben Vanik · 10 days ago
  81. c6caae6 [HAL/AMDGPU] Test external buffer fail-loud contracts by Ben Vanik · 10 days ago
  82. f60c162 [HAL/AMDGPU] Centralize physical topology edge selection by Ben Vanik · 10 days ago
  83. 1d27820 [HAL/AMDGPU] Split device metrics source sampling by Ben Vanik · 10 days ago
  84. fc1dcfe [HAL/AMDGPU] Abstract profile device clock sampling by Ben Vanik · 10 days ago
  85. f8505e9 [HAL] Extract profile event ring utility by Ben Vanik · 10 days ago
  86. b383a52 [HAL/AMDGPU] Stage generated inputs through coarse memory by Ben Vanik · 10 days ago
  87. 14c82d9 [HAL/AMDGPU] Document command-buffer fence policy by Ben Vanik · 10 days ago
  88. c156d33 [HAL/AMDGPU] Mark grant-required peer memory by Ben Vanik · 10 days ago
  89. 55eea4a [HAL/AMDGPU] Separate SVM facts from peer flags by Ben Vanik · 10 days ago
  90. 9b634df [HAL/AMDGPU] Split kernarg benchmark counters by Ben Vanik · 10 days ago
  91. fb363f6 [HAL/AMDGPU] Report prepublished kernarg replay counters by Ben Vanik · 10 days ago
  92. 75fbea9 [HAL/AMDGPU] Record prepublished kernarg totals by Ben Vanik · 10 days ago
  93. a81bd5e [HAL/AMDGPU] Document profiling and replay workflows by Ben Vanik · 14 days ago
  94. 2a36353 [HAL/AMDGPU] Split device-library target selection by Ben Vanik · 10 days ago
  95. b905b7d [HAL/AMDGPU] Test executable target inference by Ben Vanik · 10 days ago
  96. a809ed5 [HAL/AMDGPU] Name ISA commonality agents by Ben Vanik · 10 days ago
  97. cdf2f71 [HAL/AMDGPU] Model target feature support by Ben Vanik · 10 days ago
  98. dfe4644 [HAL/AMDGPU] Table vendor packet capabilities by Ben Vanik · 10 days ago
  99. 00760ee [HAL/AMDGPU] Split physical-device capability policy by Ben Vanik · 10 days ago
  100. 40a269e [HAL/AMDGPU] Name prepublished kernarg storage by Ben Vanik · 10 days ago