1. a5a532a Fix arm64 inline asm, was still referencing hardcoded register as in old out-of-line asm. (#13845) by bjacob · 1 year, 11 months ago
  2. 041b4e8 Separate architecture generic<->specific bitcode (#13825) by bjacob · 1 year, 11 months ago
  3. 67555c0 Drop conditionals and configured headers from the ukernels build (#13834) by bjacob · 1 year, 11 months ago
  4. bd5174b `iree_c_embed_data` improvements (#13814) by bjacob · 1 year, 11 months ago
  5. a8a70fb Add Promise wait API and loop_emscripten wait_* cmds. (#13669) by Scott Todd · 1 year, 11 months ago
  6. c1d499e Use correct pyobject for ref counting in `VmModule` pybindings (#13759) by Kojo Acquah · 1 year, 11 months ago
  7. cd7293e Reimplement ukernel arch-specific code path fallbacks as weak symbols. (#13715) by bjacob · 2 years ago
  8. d42d1d4 set -target, not -march (following up on #13708) (#13709) by bjacob · 2 years ago
  9. bd806c6 More fixes post #13460, #13703. (#13708) by bjacob · 2 years ago
  10. 6928af8 Removing errant printf in NCCL version check. by Ben Vanik · 2 years ago
  11. 81dcabe Print NCCL warning to stderr and add a newline. (#13707) by Stella Laurenzo · 2 years ago
  12. 2496f8d Windows and macOS fixes following #13460. (#13703) by Scott Todd · 2 years ago
  13. 29647b3 CPU ukernels as bitcode (x86-only for now) (#13460) by MaheshRavishankar · 2 years ago
  14. aa28b4a Add missing `inline` keywords to public header functions (#13689) by Niklas Haas · 2 years ago
  15. 90ed2d0 Adding util.cast/!util.object and lowering to vm.cast.* ops. (#13687) by Ben Vanik · 2 years ago
  16. 7016b8c Support mhlo.collective_permute with NCCL (#13502) by Trevor Morris · 2 years ago
  17. 6f81ceb Add module dependencies via python bindings (#13472) by Eugene Zhulenev · 2 years ago
  18. a50bc65 Adding export attribute reflection in native VM modules. (#13617) by Ben Vanik · 2 years ago
  19. f396c05 Moving cached rodata buffers to bytecode modules. (#13616) by Ben Vanik · 2 years ago
  20. 26d9eb8 Removing frame requirement from iree_vm_module_resolve_source_location. (#13618) by Ben Vanik · 2 years ago
  21. 41af5a1 return 0 in ukernels (#13613) by bjacob · 2 years ago
  22. 9e9d709 Fixing vm.switch.* op encoding. (#13611) by Ben Vanik · 2 years ago
  23. e7b8111 Swapping context/params order on CPU import functions. (#13600) by Ben Vanik · 2 years ago
  24. bb21d92 Fix many broken links across code and docs. (#13592) by Scott Todd · 2 years ago
  25. 17bcb02 Adding collective channel splitting to flow/stream/hal. (#13578) by Ben Vanik · 2 years ago
  26. dd977b1 Bumping NCCL to 2.18.1 in order to get ncclCommSplit. (#13569) by Ben Vanik · 2 years ago
  27. ef2bb52 Adding a VM implementation detail around expected import signatures. (#13562) by Ben Vanik · 2 years ago
  28. 03110de Skip command buffer copy/fill/dispatch when they are known no-op. (#13540) by Ben Vanik · 2 years ago
  29. cc0c7a8 Adding vm.round.fXX.even op. (#13525) by Ben Vanik · 2 years ago
  30. 4133b6e Removing VM verifier checks on return registers. (#13511) by Ben Vanik · 2 years ago
  31. 3dc368e Builtin ukernels as system/standalone plugins (#13433) by bjacob · 2 years ago
  32. 9e58489 Cleanup MPI error handling. (#13315) by Calin Cascaval · 2 years ago
  33. e040486 [NCCL] check version first before loading symbols (#13432) by Okwan Kwon · 2 years ago
  34. 8aa35a4 Removing asserts from the exported CUDA device methods. (#13429) by Ben Vanik · 2 years ago
  35. 93781b3 Pass IREE_UK_FLAG_MMT4D_ACCUMULATE_BIT_POS as immediate (#13410) by bjacob · 2 years ago
  36. 75cbdf8 Removing iree_hal_command_buffer_dyn_cast from the HAL. (#13408) by Ben Vanik · 2 years ago
  37. 2fb5a54 Add presubmit check for BUILD.bazel files (#13380) by Tori Baker · 2 years ago
  38. 7520cad ukernel/mmt4d/arm64: convert out-of-line asm to intrinsics and inline asm. (#13383) by bjacob · 2 years ago
  39. dc9728a Support mhlo.all_to_all with NCCL (#13326) by Trevor Morris · 2 years ago
  40. 64373fa Fix heap buffer overflow with clang-14 on Arm Ubuntu 22.04 (#13013) by Per Åstrand · 2 years ago
  41. afff73a Adding vulkan api.h methods for buffer/semaphore types. (#13364) by Ben Vanik · 2 years ago
  42. 5363ea3 Adding IREE_EXTERNAL_TOOLING_MODULES cmake flag. (#13367) by Ben Vanik · 2 years ago
  43. a2bf490 [mpi] pass correct pointers to rank and count (#13356) by Okwan Kwon · 2 years ago
  44. b3942a3 Move flatbuffer schemas into `iree.hal.*` namespaces. (#13352) by Scott Todd · 2 years ago
  45. 8738c2d Plumb source MLIR locs to SPIR-V and CUDA executables. (#13333) by Scott Todd · 2 years ago
  46. cc2ccc3 Use our private copy of Vulkan header files explicitly. (#13346) by Scott Todd · 2 years ago
  47. 935b110 [hal][cts] Add more tests for drivers device creation APIs (#12064) by Lei Zhang · 2 years ago
  48. 1cbb0fe benchmark: doubling batch count, set bytes processed (#13269) by bjacob · 2 years ago
  49. b8a8f5c Update python bindings for `iree-benchmark-module` to use `--module=-` (#13345) by Kojo Acquah · 2 years ago
  50. 72c6169 Making iree_hal_channel_provider_t a ref object and exposing on devices. (#13317) by Ben Vanik · 2 years ago
  51. 50a6c17 Add prefetches to fix performance regression on ARM Cortex-X2 (#13342) by bjacob · 2 years ago
  52. d318c54 Rework iree-run-mlir to operate against the IREE compiler C API. (#12715) by Stella Laurenzo · 2 years ago
  53. c792591 nonfatal failures in ukernel tests (#13316) by bjacob · 2 years ago
  54. 02f85ea Moving MPI library loading to hal/utils. (#13152) by Calin Cascaval · 2 years ago
  55. c6ba2a8 polish ukernel test cpu features (#13266) by bjacob · 2 years ago
  56. 26f9cdf Unify typed VMVX entry points into untyped functions. (#13270) by bjacob · 2 years ago
  57. 49d0123 ukernels: pack: move the generation of i64 padding_value to codegen (#13264) by bjacob · 2 years ago
  58. 1fd449b ukernels: fold type enums into flags (#13260) by bjacob · 2 years ago
  59. 9a8e63e Ukernel interface: take offsets, reorder fields (#13235) by bjacob · 2 years ago
  60. 5f16489 ukernels: let `pack` take `padding_value` by value (#13233) by bjacob · 2 years ago
  61. 6bd3211 ukernels: drop the unused `i8` case in `unpack`. (#13231) by bjacob · 2 years ago
  62. 4bca308 Ukernels: separate public vs internal headers (#13230) by bjacob · 2 years ago
  63. 772a335 [runtime] Fix std::array parameter unpacking (#13222) by Eugene Zhulenev · 2 years ago
  64. 978754a tidy up elementwise ukernels (#13204) by bjacob · 2 years ago
  65. e1a4a2b Remove the `matmul` ukernel (#13175) by bjacob · 2 years ago
  66. eafc042 Fix ASAN issue casting to uint32 (#13193) by Tori Baker · 2 years ago
  67. 6a7f69f Fixing vm::ref operator& after type consistency changes. (#13178) by Ben Vanik · 2 years ago
  68. 81cf28c Simplifying iree-run-mlir by making it run only a single function. (#13149) by Ben Vanik · 2 years ago
  69. 9461d3b Adding support for loading VM modules from dynamic libraries. (#13112) by Ben Vanik · 2 years ago
  70. 27179e2 Use MPI for NCCL unique ID exchange by default (#12902) by Okwan Kwon · 2 years ago
  71. 0c3a30e Revert "Reorder ukernel operands to match what `ukernel.generic_raw` can generate" (#13136) by bjacob · 2 years ago
  72. e19fc8e Adding a local executable plugin mechanism. (#12625) by Ben Vanik · 2 years ago
  73. 3f1c154 Reorder ukernel operands to match what `ukernel.generic_raw` can generate (#13103) by bjacob · 2 years ago
  74. b798319 Fix MSVC warning: wrong pointer type in `_mm_prefetch` (#13102) by bjacob · 2 years ago
  75. 09630d6 Finally moving VM type registration to iree_vm_instance_t. (#12650) by Ben Vanik · 2 years, 1 month ago
  76. 27b4b5b [runtime] Add iree::vm::make_ref helper (#12985) by Eugene Zhulenev · 2 years, 1 month ago
  77. e25916e Minor tweaks to ukernel/common.h (#12934) by bjacob · 2 years, 1 month ago
  78. d576bc9 Fixing iree_runtime_session_* module memory management. (#12997) by Ben Vanik · 2 years, 1 month ago
  79. 9600a73 [vm] Corrently handle float and double types in native module args (#12986) by Eugene Zhulenev · 2 years, 1 month ago
  80. fa9f46b [CUDA] Add a function to get CUDA context wrapper from the CUDA device (#12909) by Eugene Zhulenev · 2 years, 1 month ago
  81. ec5f9a0 add prefetch instructions to avx512 float mmt4d kernel (#12937) by bjacob · 2 years, 1 month ago
  82. 3e4f872 added support for complex numbers in python bindings (#12872) by Eliasj42 · 2 years, 1 month ago
  83. be0f1e1 Roll-up of changes needed to support the nvgpu out of tree project. (#12888) by Stella Laurenzo · 2 years, 1 month ago
  84. 51dbeb8 Fix Python dtype conversion for int64 on Windows. (#12880) by Scott Todd · 2 years, 1 month ago
  85. be62a3c e2e matmul benchmark as standalone C calling pack, mmt4d, unpack ukernels (#12848) by bjacob · 2 years, 1 month ago
  86. cf8b214 Avoid more sanitizer test timeouts (#12887) by bjacob · 2 years, 1 month ago
  87. f749adb Simplify handling of CPU features in ukernel tests (#12847) by bjacob · 2 years, 1 month ago
  88. 15b0fae Fix ukernel/x86 issues discovered by #12818 (#12846) by bjacob · 2 years, 1 month ago
  89. 41453bb Upgrade releases and metadata to Python >= 3.8. (#12849) by Stella Laurenzo · 2 years, 1 month ago
  90. d65dbe9 Fix MSVC compilation of AVX2 float mmt4d ukernel that was slow on AMD Zen2 (#12826) by bjacob · 2 years, 1 month ago
  91. 8e4ebcf Refreshing local stack state after VM import calls. (#12809) by Ben Vanik · 2 years, 1 month ago
  92. 0635b09 Adding FatELF support to the embedded ELF loader. (#12624) by Ben Vanik · 2 years, 1 month ago
  93. 788f6d5 Ukernel cleanups (standard CPU feature sets, consistently including config.h files) (#12790) by bjacob · 2 years, 1 month ago
  94. 199c9c8 x86 ukernels for pack/unpack (#12789) by bjacob · 2 years, 1 month ago
  95. c966f36 x86 ukernels for mmt4d (#12750) by bjacob · 2 years, 1 month ago
  96. deac6b4 Generalize `iree_uk_cpu_features_list_t` for x86 (#12749) by bjacob · 2 years, 1 month ago
  97. 9cf99ae `ukernel/arch/arm_64`: simplify build, allow non-GCC-compatible toolchains (#12700) by bjacob · 2 years, 1 month ago
  98. 4e2c85d removed channel_provider from iree_hal_cuda_device_t (#12776) by Okwan Kwon · 2 years, 1 month ago
  99. 76aaedc Reworking CUDA channel creation and plumbing group/ID. (#12695) by Ben Vanik · 2 years, 1 month ago
  100. e7e662d Ukernel tools: port to C and generalize (#12662) by bjacob · 2 years, 1 month ago