1. 8adae37 [cuda][hip] Add support for semaphore multi wait (#16638) by Lei Zhang · 1 year, 2 months ago
  2. 9d6d99f faster narrow mmt4d ukernels on x86 (#16655) by Benoit Jacob · 1 year, 2 months ago
  3. 4f1f055 mmt4d ukernel: use fewer magic macros to generate tile-functions M0-variants (#16645) by Benoit Jacob · 1 year, 2 months ago
  4. b994b72 Reenable accidentally disabled architecture-specific parts of `mmt4d_test` (#16654) by Benoit Jacob · 1 year, 2 months ago
  5. f433fd2 Using iree.abi.name consistently for arg/result names. (#16635) by Ben Vanik · 1 year, 2 months ago
  6. fe5e69a [cuda][hip] Shorten deferred queue worker name (#16642) by Lei Zhang · 1 year, 2 months ago
  7. 9dfc612 [cuda][hip] Fix worker thread and device host callback synchronization (#16621) by Boian Petkantchin · 1 year, 2 months ago
  8. f66d7f2 Fix enablement of mmt4d ukernel test cases based on ISA code paths built (#16637) by Benoit Jacob · 1 year, 2 months ago
  9. 5180ede mmt4d ukernel: simplification in generic tile funcs: stop using a stack array (#16633) by Benoit Jacob · 1 year, 2 months ago
  10. 8959b90 Make ukernels fallback opt-in and add a `mmt4d_info` ukernel to query the mmt4d implementation. (#16631) by Benoit Jacob · 1 year, 2 months ago
  11. 6ff9a3d Refactor how llvm-cpu check tests interface with ASan/TSan. (#16452) by Scott Todd · 1 year, 2 months ago
  12. e6397cb Change ukernels calling convention to default (#16541) by Benoit Jacob · 1 year, 2 months ago
  13. e991798 Unroll fixed-trip-count loops within mmt4d ukernel tile functions. (#16626) by Benoit Jacob · 1 year, 2 months ago
  14. 88b1d4d Replace std::iterator with our custom iterator typedefs (#16423) (#16583) by Peyman Barazandeh · 1 year, 2 months ago
  15. 9dc8ae4 [cuda][hip] Fix launch host func and worker thread state update (#16568) by Lei Zhang · 1 year, 2 months ago
  16. 862a031 Adding --task_abort_on_failure flag/API. (#16565) by Ben Vanik · 1 year, 2 months ago
  17. 23f2828 Adding iree-benchmark-executable tool. (#16550) by Ben Vanik · 1 year, 2 months ago
  18. c15b610 [EmitC] Remove the forked emitter and generate all the code in the conversion pass (#16357) by Simon Camphausen · 1 year, 2 months ago
  19. d500494 Add s8s4s32 dotprod microkernel (#16473) by mariecwhite · 1 year, 2 months ago
  20. c3b3d96 Adding hal.device.id queries to HAL devices. (#16495) by Ben Vanik · 1 year, 2 months ago
  21. 6d293af Retrying try-lock in synchronization_test to avoid arm64 flakes. (#16436) by Ben Vanik · 1 year, 2 months ago
  22. 4463f8d [python] Enable building of 3.12 wheels on Linux. (#16424) by Stella Laurenzo · 1 year, 2 months ago
  23. 1f3e907 ukernels: update README.md (#16358) by Benoit Jacob · 1 year, 2 months ago
  24. d1e1d05 [python] Add a couple more async APIs. (#16419) by Stella Laurenzo · 1 year, 2 months ago
  25. 00aa173 [hip] Add missing source locations and fix parsing (#16418) by Lei Zhang · 1 year, 2 months ago
  26. d32609e Add s8s4s32 ukernel for ARM (#16259) by mariecwhite · 1 year, 2 months ago
  27. c02b89e [cuda][hip] Guard against NULL cleanup callbacks (#16403) by Lei Zhang · 1 year, 2 months ago
  28. 7c2ec73 Fix a bug in the fastpath of iree_hal_task_semaphore_multi_wait which was doing a spurious wait. (#16404) by Stella Laurenzo · 1 year, 2 months ago
  29. 60ac333 [python] Add a HalDeviceLoop class for routing runtime events to futures. (#16385) by Stella Laurenzo · 1 year, 2 months ago
  30. c70bf22 [HAL] Remove pool assert during allocator creation (#16388) by Nithin Meganathan · 1 year, 2 months ago
  31. 14927d1 Replacing the ancient vm_util with function_io/function_util. (#16351) by Ben Vanik · 1 year, 3 months ago
  32. 9aabcb3 Add conversions for FP8 types (F8E5M2 and F8E4M3) (#16374) by Benoit Jacob · 1 year, 3 months ago
  33. 30901f5 Replacing the ancient vm_util with function_io/function_util. by Ben Vanik · 1 year, 3 months ago
  34. 49f8a61 Adding iree_io_vec_stream_t. by Ben Vanik · 1 year, 3 months ago
  35. 29a7462 Adding iree_io_stdio_stream_t. by Ben Vanik · 1 year, 3 months ago
  36. 0a2483a Splitting iree_io_memory_stream_t from iree/io/stream.h. by Ben Vanik · 1 year, 3 months ago
  37. 9234f42 Add a number of runtime python bindings and refine the HalFence.wait() behavior. (#16371) by Stella Laurenzo · 1 year, 3 months ago
  38. 87bf971 Fixing implicit casting that caused 4GB fill/copy limits in local-task. (#16364) by Ben Vanik · 1 year, 3 months ago
  39. 10fd98b Fixes to enable clang-cl compilation of compiler/runtime. (#16299) by Ben Vanik · 1 year, 3 months ago
  40. 065e04a Adding support for outputting binary files from tooling. (#16291) by Ben Vanik · 1 year, 3 months ago
  41. 406626b [Vulkan][SPIRV] Introduce `address` vulkan device property (#16282) by Jakub Kuderski · 1 year, 3 months ago
  42. ef79e51 [doc] Add README in CUDA and Metal HAL drivers directory (#16275) by Lei Zhang · 1 year, 3 months ago
  43. a8c1e17 [doc] Expose CUDA and Metal HAL driver doc to the website (#16256) by Lei Zhang · 1 year, 3 months ago
  44. 5d4c0ba Nothing is unreachable (#16261) by Benoit Jacob · 1 year, 3 months ago
  45. 1c83020 [doc] Update docs about the CUDA HAL driver (#16234) by Lei Zhang · 1 year, 3 months ago
  46. 41583ca [cuda] NFC: rename cuda2 to cuda (#16232) by Lei Zhang · 1 year, 3 months ago
  47. cfe865f [cuda] Drop cuda1 HAL implementation code (#16188) by Lei Zhang · 1 year, 3 months ago
  48. a724043 Adding util inlining policy attr interface and always/never attrs. by Ben Vanik · 1 year, 3 months ago
  49. 37e65de Freeze all statuses emitted by calls into dynamic modules (#16066) by Quinn Dawkins · 1 year, 3 months ago
  50. 3b3cef9 [cuda] Switch cuda2 on and cuda1 off by default (#16107) by Lei Zhang · 1 year, 3 months ago
  51. a2df2cc [cuda] Collect tracing events after command buffer completion (#16158) by Lei Zhang · 1 year, 3 months ago
  52. dd787a7 Removing trace_replay infra and the libyaml dependency. by Ben Vanik · 1 year, 3 months ago
  53. c0379f3 Removing python binding support for tracing. by Ben Vanik · 1 year, 3 months ago
  54. ac94b2a Removing iree-run-trace/iree-benchmark-trace tools. by Ben Vanik · 1 year, 3 months ago
  55. 7d736b5 Ukernels: simplify the architecture-specific bitcode build. (#16126) by Benoit Jacob · 1 year, 3 months ago
  56. 91803de Allow specifying multiple --device= flags in tooling. (#16132) by Ben Vanik · 1 year, 3 months ago
  57. 4b1b8e2 Fixing task worker utilization tracing plot. (#16131) by Ben Vanik · 1 year, 3 months ago
  58. e3db254 Simplify how mmt4d ukernels deal with the K=0 case. (#16137) by Benoit Jacob · 1 year, 3 months ago
  59. 17e9529 [spirv][vulkan] Refine device query to be more descriptive (#16101) by Lei Zhang · 1 year, 4 months ago
  60. 869e505 Disable const-eval for parameters unit test (#16089) by Max191 · 1 year, 4 months ago
  61. 171e31c [cuda] Move to hal/drivers and wire up BUILD files (#14620) by Lei Zhang · 1 year, 4 months ago
  62. a7a7ad6 Add vm.buffer.hash and util.buffer.hash ops (#16003) by Quinn Dawkins · 1 year, 4 months ago
  63. c8ecc1c Reland "[spirv][vulkan] Enable device query generation and execution" (#16075) by Lei Zhang · 1 year, 4 months ago
  64. 282ab77 Revert "[spirv][vulkan] Enable device query generation and execution" (#16077) by Han-Chung Wang · 1 year, 4 months ago
  65. 852684a [spirv][vulkan] Enable device query generation and execution (#15977) by Lei Zhang · 1 year, 4 months ago
  66. b55ba25 Fixing/silencing some warnings that have crept in over time. (#16072) by Ben Vanik · 1 year, 4 months ago
  67. d21a99c Check for source location resolution function in dynamic modules (#16065) by Quinn Dawkins · 1 year, 4 months ago
  68. d6dad12 ukernel: unroll the s16u4 VNNI ukernel, and drop the unused N0=16 variant (#16047) by Benoit Jacob · 1 year, 4 months ago
  69. c35d8e9 Standardizes CMake setup of C directory trees behind a macro. (#16011) by Stella Laurenzo · 1 year, 4 months ago
  70. 15c306f Build functioning dev packages for IREECompiler and IREERuntime. (#16008) by Stella Laurenzo · 1 year, 4 months ago
  71. 92df2b4 Make iree.compiler.api.Output.map_memory() retain its backing reference. (#15975) by Stella Laurenzo · 1 year, 4 months ago
  72. f97aa4d [HIP] Adds support for native executable and cache (#15937) by Nithin Meganathan · 1 year, 4 months ago
  73. 46d9347 Tweaks to e2e matmul tests (#15930) by bjacob · 1 year, 5 months ago
  74. f81f361 Removing transfer_range from the HAL device vtable. (#15919) by Ben Vanik · 1 year, 5 months ago
  75. 80e70ca Replacing hal.ex.shared_device with hal.devices.* ops. (#15916) by Ben Vanik · 1 year, 5 months ago
  76. 605aca9 [CPU][ArmSME] Add (initial) tiling and lowering pipeline for ArmSME (#15794) by Benjamin Maxwell · 1 year, 5 months ago
  77. 62c4f98 Replacing cpuinfo on Mac and adding support for E/P cores. (#15891) by Ben Vanik · 1 year, 5 months ago
  78. fbbccde Moving memory info queries to iree/base/internal/memory.h. (#15882) by Ben Vanik · 1 year, 5 months ago
  79. 8d01698 [vulkan] Enable initial executable linking (#15802) by Lei Zhang · 1 year, 5 months ago
  80. 69398bc [python] Add python bindings for creating IRPA files. (#15868) by Stella Laurenzo · 1 year, 5 months ago
  81. 51a9225 Cleanup parts of the Bazel build and document usage. (#15727) by Scott Todd · 1 year, 5 months ago
  82. 38878ee Relax NCCL version constraints (#14633) by Boian Petkantchin · 1 year, 5 months ago
  83. 31a5dcb Fix embedded builds of IREE. (#15761) (#15768) by Thomas Preud'homme · 1 year, 5 months ago
  84. ac9d6d5 Adding optional exported function declaration string to bytecode modules. (#15782) by Ben Vanik · 1 year, 5 months ago
  85. 888040b Make metal depends on flags. (#15762) by Rechie Kho · 1 year, 5 months ago
  86. 3266925 ukernel: add support for s16xs8 data types for ukernel (#15771) by Lun Dong · 1 year, 5 months ago
  87. 4973ef2 Batching parameter load operations and cleaning up gather/scatter. (#15706) by Ben Vanik · 1 year, 5 months ago
  88. fce839f Adding IREE parameter archive format and tooling support. (#15670) by Ben Vanik · 1 year, 5 months ago
  89. 2bb8019 Adding iree_io_stream_t and memory stream implementation. (#15668) by Ben Vanik · 1 year, 5 months ago
  90. dc6f0cd Adding multiple_modules sample (and fixing bugs). (#15653) by Ben Vanik · 1 year, 5 months ago
  91. f83ca74 Add newline in parameter help print (#15647) by Quinn Dawkins · 1 year, 5 months ago
  92. 5b2cb64 Fix intermittent failure - functions with `_try_` in their name may fail spuriously. (#15636) by bjacob · 1 year, 5 months ago
  93. 916dae9 Removing io_parameters.read/write in favor of gather/scatter. (#15607) by Ben Vanik · 1 year, 5 months ago
  94. 1012586 Fix MSVC build by disabling AVX-512-BF16 in non-latest MSVC versions. (#15589) by bjacob · 1 year, 6 months ago
  95. 0908ff8 ukernels: add `bf16 * bf16 -> bf16` optimized tile functions for x86 and arm64. (#15543) by bjacob · 1 year, 6 months ago
  96. 16e4346 ukernel test improvements (#15542) by bjacob · 1 year, 6 months ago
  97. 9393f94 Avoid stack allocation for VM->HAL iree_hal_fence_join calls. (#15569) by Ben Vanik · 1 year, 6 months ago
  98. dc506b8 Fix redundant IREE_UK_STATIC_ASSERT macro definition (#15567) by bjacob · 1 year, 6 months ago
  99. 4546b95 Simplify ukernel headers now that C++ is out of the picture (#15564) by bjacob · 1 year, 6 months ago
  100. 199ecee Simplify ukernel headers now that out-of-line asm is out of the picture (#15563) by bjacob · 1 year, 6 months ago