1. 278e63a Move gcc dangling-reference check to GCC>=13. (#19772) by Scott Todd · 4 hours ago main
  2. a430695 [GPU] Add pattern to fuse tensor.collapse_shape into forall producer (#19295) by Max191 · 7 hours ago
  3. ba30557 [runtime][hip] Cast IREE_[HOST|DEVICE]_MAX_SIZE to iree_host_size_t type (#19766) by Han-Chung Wang · 7 hours ago
  4. 03c5a0f [LLVMGPUVectorDistribute] Refactor vector.contract distribute (#19631) by Manupa Karunaratne · 11 hours ago
  5. 6933c39 [Codegen] add mi308x target (#19756) by Bangtian Liu · 18 hours ago
  6. 26dcb8e Fixes for CMake 3.31 policy changes. (#19759) by Scott Todd · 18 hours ago
  7. cac390d Fix or work around gcc-14 warnings/errors. (#19758) by Scott Todd · 18 hours ago
  8. 525389c [LLVMGPU] Enable forall distr on the gpuvectorization pass pipeline (#19753) by Prashant Kumar · 19 hours ago
  9. 8c7eeca [DispatchCreation] Enable Rope computation fusion with attention. (#19745) by MaheshRavishankar · 21 hours ago
  10. eb21715 Bump StableHLO to openxla/stablehlo@23d7f60. (#19754) by Scott Todd · 26 hours ago
  11. 1cd62fd [DT][GPU] Permute cross-thread dims of TileSwizzle to outermost (#19734) by Max191 · 26 hours ago
  12. 3e15a5a [Global Opt] Add option to generalize matmul ops (#19741) by Ian Wood · 27 hours ago
  13. b47fbdf [LinalgExt] Update scatter to allow dropping unit dims (#19704) by Ian Wood · 28 hours ago
  14. 4c0ba9c [DispatchCreation] Add constant expression hoisting (#19750) by Quinn Dawkins · 29 hours ago
  15. 21d5db3 [Codegen] Use linearize_index op when swapping slice and expand (#19730) by Max191 · 29 hours ago
  16. e154e8b [hip] Enable caching in the hip async allocator (#19667) by Andrew Woloszyn · 30 hours ago
  17. 2e21a9a [LLVMGPU] Enable scf.forall distr. on vectorDistribute Pipeline (#19420) by Prashant Kumar · 31 hours ago
  18. 38ca3be [GPU] Add SwapExpandShapeWithSlice pattern to loop fusion pass (#19729) by Max191 · 31 hours ago
  19. 4a7af87 Integrate llvm 1_20_2025 (#19740) by Nirvedh Meshram · 2 days ago
  20. 8ba420d Bump sarisia/actions-status-discord from 1.15.1 to 1.15.2 in the github-actions group (#19744) by dependabot[bot] · 2 days ago
  21. 88887e7 Add ubuntu-24.04-arm runtime and runtime_tracing CI jobs. (#19724) by Scott Todd · 2 days ago
  22. 4c600ca [Codegen] Refactor RemoveSingleIteratinLoop to use ValueBoundsOpInte… (#19678) by Krzysztof Drewniak · 2 days ago
  23. 9aae362 [Codegen] Sprinkle in PropagateDispatchSizeBounds passes (#19677) by Krzysztof Drewniak · 2 days ago
  24. a64d713 Implement ValueBoundsOpInterface on HAL ID/count ops, util.assume.int (#19676) by Krzysztof Drewniak · 2 days ago
  25. db129e5 Update torch-mlir to llvm/torch-mlir@f42c7e4 (#19736) by zjgarvey · 2 days ago
  26. be8e3d2 Convert barriers into copies during allocation (#19735) by Rob Suderman · 5 days ago latest-snapshot
  27. 6052a1d Bump to llvm/llvm-project@3f1486f (#19683) by MaheshRavishankar · 5 days ago
  28. b4c7de6 [Python] Enable building Python bindings as editable wheels, document it (#19716) by Krzysztof Drewniak · 5 days ago
  29. b08d152 [HAL] Use util.assume.int for memref alignments (#19691) by Krzysztof Drewniak · 5 days ago
  30. 4d3f06a [VectorDistribution] Clone vector.step on layout conflict (#19732) by Kunwar Grover · 5 days ago
  31. c5bc37f [Util] Fix OptimizeIntArithmetic pattern failure condition (#19731) by Kunwar Grover · 5 days ago
  32. e1f010c [Dispach] Clone chain of ops into dispatch (#19723) by Ian Wood · 5 days ago
  33. f31cc72 Update resource placement and transfer for barrier operations (#19725) by Rob Suderman · 5 days ago
  34. 75c9e86 [GPU] Avoid fusing slices of already tiled ops (#19404) by Max191 · 5 days ago
  35. c1cc4cc [LLVMGPU] Add pass to distribute undistributed copies to threads (#19715) by Quinn Dawkins · 5 days ago
  36. dde5992 [Codegen] Allow memref type propagation through collapse_shape (#19400) by Max191 · 5 days ago
  37. 36c2353 [Codegen] Push up the extract slice op (#19680) by Prashant Kumar · 6 days ago
  38. 08b44e2 [hip] Try again to fix the semaphore busy loop. (#19712) by Andrew Woloszyn · 6 days ago
  39. 6f33cd4 [Stream] Specialize encoding for TensorPhaseOp that have result_encoding (#19707) by Han-Chung Wang · 6 days ago
  40. 3e8c81c Remove legacy sync path (#19714) by Rob Suderman · 6 days ago
  41. f6f6388 [Codegen] Add workgroups reordering to distribute using forall (#19681) by Prashant Kumar · 6 days ago
  42. 5ee9b27 Clean up encoding-related code. NFC. (#19717) by Jakub Kuderski · 7 days ago
  43. 3032df2 Fix newlines in markdown mermaid.js diagrams. (#19657) by Scott Todd · 7 days ago
  44. c285d58 Copy sample code into samples/dynamic_shapes/README.md. (#19699) by Scott Todd · 7 days ago
  45. 3c95042 Re-enable MI250 workflows. (#19705) by saienduri · 8 days ago
  46. 27e7a90 [DT][Encoding] Use layouts to calculate storage size when it is present. (#19686) by Han-Chung Wang · 8 days ago
  47. a953763 Temporarily Disable MI250 workflow due to machine outage (#19702) by Akansha Bansal · 8 days ago
  48. c320935 Bump dawidd6/action-download-artifact from 3.1.4 to 7 in the github-actions group (#19692) by dependabot[bot] · 8 days ago
  49. 6fd0fd0 [LinalgExt] Implement PartialReductionOpInterface for OnlineAttentionOp (#19684) by Kunwar Grover · 8 days ago
  50. 3c963dd Update PyTorch sample notebooks using latest iree-turbine code. (#19658) by Scott Todd · 8 days ago
  51. 01c9f14 [LLVMGPUVectorDistribute] Add support for inter-subgroup multi_reduction (#19596) by Manupa Karunaratne · 9 days ago
  52. 21b0101 [GPU] Disable prefetching for loops with no computation (#19695) by Nirvedh Meshram · 9 days ago
  53. 8d1d867 [GPU] Add thread tile size inference for scatter (#19694) by Quinn Dawkins · 9 days ago
  54. 158c636 Revert "Increase default threshold of TileLargeTensor pass (#19671)" (#19693) by Nirvedh Meshram · 9 days ago
  55. 3e34e03 Bump the github-actions group with 8 updates (#19689) by dependabot[bot] · 9 days ago
  56. 3978ce6 Increase default threshold of TileLargeTensor pass (#19671) by Nirvedh Meshram · 9 days ago
  57. 2452b22 [Codegen][GPU] Let integer range optimization narrow GPU computations to i32 (#19473) by Krzysztof Drewniak · 9 days ago
  58. 2b29155 Update GH actions with Dependabot (#19663) by Marius Brehler · 9 days ago
  59. 9b35412 Run on schedule in iree-org only (#19685) by Marius Brehler · 9 days ago
  60. d90c505 Reshape propagation to enable broadcast(transpose) -> attention(q, kt, vt) fusion. (#19661) by MaheshRavishankar · 9 days ago
  61. cac7a96 Update IREE test suite to use iree-org/iree-test-suites@c47d13c (#19617) by MaheshRavishankar · 9 days ago
  62. 40c19e3 Better support multidevice placement with `stream.async.barrier` (#19651) by Rob Suderman · 9 days ago
  63. 88d5f59 Update PkgCI test_amd to use MI300x conductor cluster (#19517) by yamiyysu · 9 days ago
  64. ae50c5e [DOCS] Update VectorExt::NestedLayoutAttr docs (#19246) by Manupa Karunaratne · 9 days ago
  65. 1441caa Enable macOS Tracy CI build. (#19668) by Scott Todd · 9 days ago
  66. a583b25 [GPU] Teach GPUApplyTilingLevel PartialReduction tiling (#19682) by Kunwar Grover · 10 days ago
  67. 9f93691 [LLVMGPU] Use LLVMGPUDistribute for small input scatters (#19670) by Quinn Dawkins · 12 days ago
  68. f7a2157 Remove Upcasting schedule from TileAndFuse (#19669) by Nirvedh Meshram · 12 days ago
  69. 039b8b4 Using tracy::GetQueue instead of the sketchy static variable reference. (#19653) by Ben Vanik · 12 days ago
  70. 1d91bec Supporting file descriptors in iree_io_stream_open. (#19665) by Ben Vanik · 12 days ago
  71. 106371d Bump torch-mlir to f92c587cb6150e73078f32cf847dc3892be16f93 (#19659) by jinchen · 12 days ago
  72. a88555c Add macOS workflow running on M1 (#19656) by Marius Brehler · 12 days ago
  73. e64cb12 Increase strictness of global isel use for ROCM (#19247) by Tres · 12 days ago
  74. 2aca091 [Codegen][Nearly NFC] Move PropagateDispatchSizeBounds to Common/ (#19650) by Krzysztof Drewniak · 12 days ago
  75. 6245db1 [Stream] Attach layouts to tensor ops in encoding specialization pass. (#19649) by Han-Chung Wang · 13 days ago
  76. c793f90 [i1] Implement `packed_storage` layout encoding attribute (#19354) by lialan · 13 days ago
  77. 801e2c1 Expand runtime_tracing job to include Windows and macOS. (#19655) by Scott Todd · 13 days ago
  78. 7d21c5d Revert (2nd) of "Propagate reshapes through generics with reduction" (#19647) by MaheshRavishankar · 13 days ago
  79. b3ff1ed Rename `unroll_{m,n,k}` to `intrinsics_{m,n,k}` (#19652) by Benoit Jacob · 13 days ago
  80. 6d6bd6e [runtime] Fix runtime tracing compile failure on gcc (#19642) by Ian Wood · 13 days ago
  81. bb1c561 Erase all address spaces and get inlined ukernels (#19646) by Benoit Jacob · 13 days ago
  82. a7bac5d [Flow] Fix dispatch naming for dynamic shaped fusions (#19439) by Quinn Dawkins · 13 days ago
  83. 9055c9d [hip] Fix race in the cleanup of queue read operations. (#19645) by Andrew Woloszyn · 13 days ago
  84. 82e37d6 Fix (cross) compiling for 32-bit targets (#19644) by Marius Brehler · 13 days ago
  85. 02d145e [Stream] Implement SpecializeEncodings pass (1/n) (#19502) by Han-Chung Wang · 14 days ago
  86. 74f8d3c [LinalgExt] Scatter fusion by expansion 3/3 (#19588) by Ian Wood · 14 days ago
  87. 2347d9f Supporting (and renaming) IREE_HAL_WHOLE_BUFFER in binding table resolve. (#19640) by Ben Vanik · 14 days ago
  88. 126f0ac Add docs for updating release git tags manually. (#19637) by Scott Todd · 14 days ago
  89. af416b3 Bump version to 3.2.0 after releasing 3.1.0. (#19638) by Scott Todd · 2 weeks ago
  90. c484058 [GPU] Add barriers when resolving GPUMappedForall to fix race condition (#19635) by Nirvedh Meshram · 2 weeks ago
  91. 9b4906e [DispatchCreation] Drop fusion restriction for stride != 1 conv (#19634) by Quinn Dawkins · 2 weeks ago
  92. c75b686 [GPU][Codegen] Allowing mfma for narrow problem config sizes (#19615) by Zhuoran Yin · 2 weeks ago
  93. 7b9aa28 When dumping intermediates, dump how to reproduce the `.optimized.ll` (#19633) by Benoit Jacob · 2 weeks ago
  94. be75a30 Update minor Python versions used to build packages (#19632) by Marius Brehler · 2 weeks ago
  95. fb21dd6 Adding experimental Tracy API for TLS-less event recording. (#19625) by Ben Vanik · 2 weeks ago
  96. a5c3879 Reapply "Propagate reshapes through generics with reduction… (#18968) by Ian Wood · 2 weeks ago
  97. 80cbf6b [GPU] Add a pass to convert accumulating GEMMs to GEMMs (#19587) by Nirvedh Meshram · 2 weeks ago
  98. 550d88e [GPU] Add lowering configuration logic for scatter (#19624) by Quinn Dawkins · 2 weeks ago
  99. 349026b Add explicit tolerances to SDXL benchmark test times. (#19628) by Scott Todd · 2 weeks ago
  100. 9a83239 [GPU] Add chained reshape support for scf.forall expand destination pattern (#19597) by Nirvedh Meshram · 2 weeks ago