- a5a532a Fix arm64 inline asm, was still referencing hardcoded register as in old out-of-line asm. (#13845) by bjacob · 1 year, 11 months ago
- 041b4e8 Separate architecture generic<->specific bitcode (#13825) by bjacob · 1 year, 11 months ago
- 67555c0 Drop conditionals and configured headers from the ukernels build (#13834) by bjacob · 1 year, 11 months ago
- bd5174b `iree_c_embed_data` improvements (#13814) by bjacob · 1 year, 11 months ago
- a8a70fb Add Promise wait API and loop_emscripten wait_* cmds. (#13669) by Scott Todd · 1 year, 11 months ago
- c1d499e Use correct pyobject for ref counting in `VmModule` pybindings (#13759) by Kojo Acquah · 1 year, 11 months ago
- cd7293e Reimplement ukernel arch-specific code path fallbacks as weak symbols. (#13715) by bjacob · 2 years ago
- d42d1d4 set -target, not -march (following up on #13708) (#13709) by bjacob · 2 years ago
- bd806c6 More fixes post #13460, #13703. (#13708) by bjacob · 2 years ago
- 6928af8 Removing errant printf in NCCL version check. by Ben Vanik · 2 years ago
- 81dcabe Print NCCL warning to stderr and add a newline. (#13707) by Stella Laurenzo · 2 years ago
- 2496f8d Windows and macOS fixes following #13460. (#13703) by Scott Todd · 2 years ago
- 29647b3 CPU ukernels as bitcode (x86-only for now) (#13460) by MaheshRavishankar · 2 years ago
- aa28b4a Add missing `inline` keywords to public header functions (#13689) by Niklas Haas · 2 years ago
- 90ed2d0 Adding util.cast/!util.object and lowering to vm.cast.* ops. (#13687) by Ben Vanik · 2 years ago
- 7016b8c Support mhlo.collective_permute with NCCL (#13502) by Trevor Morris · 2 years ago
- 6f81ceb Add module dependencies via python bindings (#13472) by Eugene Zhulenev · 2 years ago
- a50bc65 Adding export attribute reflection in native VM modules. (#13617) by Ben Vanik · 2 years ago
- f396c05 Moving cached rodata buffers to bytecode modules. (#13616) by Ben Vanik · 2 years ago
- 26d9eb8 Removing frame requirement from iree_vm_module_resolve_source_location. (#13618) by Ben Vanik · 2 years ago
- 41af5a1 return 0 in ukernels (#13613) by bjacob · 2 years ago
- 9e9d709 Fixing vm.switch.* op encoding. (#13611) by Ben Vanik · 2 years ago
- e7b8111 Swapping context/params order on CPU import functions. (#13600) by Ben Vanik · 2 years ago
- bb21d92 Fix many broken links across code and docs. (#13592) by Scott Todd · 2 years ago
- 17bcb02 Adding collective channel splitting to flow/stream/hal. (#13578) by Ben Vanik · 2 years ago
- dd977b1 Bumping NCCL to 2.18.1 in order to get ncclCommSplit. (#13569) by Ben Vanik · 2 years ago
- ef2bb52 Adding a VM implementation detail around expected import signatures. (#13562) by Ben Vanik · 2 years ago
- 03110de Skip command buffer copy/fill/dispatch when they are known no-op. (#13540) by Ben Vanik · 2 years ago
- cc0c7a8 Adding vm.round.fXX.even op. (#13525) by Ben Vanik · 2 years ago
- 4133b6e Removing VM verifier checks on return registers. (#13511) by Ben Vanik · 2 years ago
- 3dc368e Builtin ukernels as system/standalone plugins (#13433) by bjacob · 2 years ago
- 9e58489 Cleanup MPI error handling. (#13315) by Calin Cascaval · 2 years ago
- e040486 [NCCL] check version first before loading symbols (#13432) by Okwan Kwon · 2 years ago
- 8aa35a4 Removing asserts from the exported CUDA device methods. (#13429) by Ben Vanik · 2 years ago
- 93781b3 Pass IREE_UK_FLAG_MMT4D_ACCUMULATE_BIT_POS as immediate (#13410) by bjacob · 2 years ago
- 75cbdf8 Removing iree_hal_command_buffer_dyn_cast from the HAL. (#13408) by Ben Vanik · 2 years ago
- 2fb5a54 Add presubmit check for BUILD.bazel files (#13380) by Tori Baker · 2 years ago
- 7520cad ukernel/mmt4d/arm64: convert out-of-line asm to intrinsics and inline asm. (#13383) by bjacob · 2 years ago
- dc9728a Support mhlo.all_to_all with NCCL (#13326) by Trevor Morris · 2 years ago
- 64373fa Fix heap buffer overflow with clang-14 on Arm Ubuntu 22.04 (#13013) by Per Åstrand · 2 years ago
- afff73a Adding vulkan api.h methods for buffer/semaphore types. (#13364) by Ben Vanik · 2 years ago
- 5363ea3 Adding IREE_EXTERNAL_TOOLING_MODULES cmake flag. (#13367) by Ben Vanik · 2 years ago
- a2bf490 [mpi] pass correct pointers to rank and count (#13356) by Okwan Kwon · 2 years ago
- b3942a3 Move flatbuffer schemas into `iree.hal.*` namespaces. (#13352) by Scott Todd · 2 years ago
- 8738c2d Plumb source MLIR locs to SPIR-V and CUDA executables. (#13333) by Scott Todd · 2 years ago
- cc2ccc3 Use our private copy of Vulkan header files explicitly. (#13346) by Scott Todd · 2 years ago
- 935b110 [hal][cts] Add more tests for drivers device creation APIs (#12064) by Lei Zhang · 2 years ago
- 1cbb0fe benchmark: doubling batch count, set bytes processed (#13269) by bjacob · 2 years ago
- b8a8f5c Update python bindings for `iree-benchmark-module` to use `--module=-` (#13345) by Kojo Acquah · 2 years ago
- 72c6169 Making iree_hal_channel_provider_t a ref object and exposing on devices. (#13317) by Ben Vanik · 2 years ago
- 50a6c17 Add prefetches to fix performance regression on ARM Cortex-X2 (#13342) by bjacob · 2 years ago
- d318c54 Rework iree-run-mlir to operate against the IREE compiler C API. (#12715) by Stella Laurenzo · 2 years ago
- c792591 nonfatal failures in ukernel tests (#13316) by bjacob · 2 years ago
- 02f85ea Moving MPI library loading to hal/utils. (#13152) by Calin Cascaval · 2 years ago
- c6ba2a8 polish ukernel test cpu features (#13266) by bjacob · 2 years ago
- 26f9cdf Unify typed VMVX entry points into untyped functions. (#13270) by bjacob · 2 years ago
- 49d0123 ukernels: pack: move the generation of i64 padding_value to codegen (#13264) by bjacob · 2 years ago
- 1fd449b ukernels: fold type enums into flags (#13260) by bjacob · 2 years ago
- 9a8e63e Ukernel interface: take offsets, reorder fields (#13235) by bjacob · 2 years ago
- 5f16489 ukernels: let `pack` take `padding_value` by value (#13233) by bjacob · 2 years ago
- 6bd3211 ukernels: drop the unused `i8` case in `unpack`. (#13231) by bjacob · 2 years ago
- 4bca308 Ukernels: separate public vs internal headers (#13230) by bjacob · 2 years ago
- 772a335 [runtime] Fix std::array parameter unpacking (#13222) by Eugene Zhulenev · 2 years ago
- 978754a tidy up elementwise ukernels (#13204) by bjacob · 2 years ago
- e1a4a2b Remove the `matmul` ukernel (#13175) by bjacob · 2 years ago
- eafc042 Fix ASAN issue casting to uint32 (#13193) by Tori Baker · 2 years ago
- 6a7f69f Fixing vm::ref operator& after type consistency changes. (#13178) by Ben Vanik · 2 years ago
- 81cf28c Simplifying iree-run-mlir by making it run only a single function. (#13149) by Ben Vanik · 2 years ago
- 9461d3b Adding support for loading VM modules from dynamic libraries. (#13112) by Ben Vanik · 2 years ago
- 27179e2 Use MPI for NCCL unique ID exchange by default (#12902) by Okwan Kwon · 2 years ago
- 0c3a30e Revert "Reorder ukernel operands to match what `ukernel.generic_raw` can generate" (#13136) by bjacob · 2 years ago
- e19fc8e Adding a local executable plugin mechanism. (#12625) by Ben Vanik · 2 years ago
- 3f1c154 Reorder ukernel operands to match what `ukernel.generic_raw` can generate (#13103) by bjacob · 2 years ago
- b798319 Fix MSVC warning: wrong pointer type in `_mm_prefetch` (#13102) by bjacob · 2 years ago
- 09630d6 Finally moving VM type registration to iree_vm_instance_t. (#12650) by Ben Vanik · 2 years, 1 month ago
- 27b4b5b [runtime] Add iree::vm::make_ref helper (#12985) by Eugene Zhulenev · 2 years, 1 month ago
- e25916e Minor tweaks to ukernel/common.h (#12934) by bjacob · 2 years, 1 month ago
- d576bc9 Fixing iree_runtime_session_* module memory management. (#12997) by Ben Vanik · 2 years, 1 month ago
- 9600a73 [vm] Corrently handle float and double types in native module args (#12986) by Eugene Zhulenev · 2 years, 1 month ago
- fa9f46b [CUDA] Add a function to get CUDA context wrapper from the CUDA device (#12909) by Eugene Zhulenev · 2 years, 1 month ago
- ec5f9a0 add prefetch instructions to avx512 float mmt4d kernel (#12937) by bjacob · 2 years, 1 month ago
- 3e4f872 added support for complex numbers in python bindings (#12872) by Eliasj42 · 2 years, 1 month ago
- be0f1e1 Roll-up of changes needed to support the nvgpu out of tree project. (#12888) by Stella Laurenzo · 2 years, 1 month ago
- 51dbeb8 Fix Python dtype conversion for int64 on Windows. (#12880) by Scott Todd · 2 years, 1 month ago
- be62a3c e2e matmul benchmark as standalone C calling pack, mmt4d, unpack ukernels (#12848) by bjacob · 2 years, 1 month ago
- cf8b214 Avoid more sanitizer test timeouts (#12887) by bjacob · 2 years, 1 month ago
- f749adb Simplify handling of CPU features in ukernel tests (#12847) by bjacob · 2 years, 1 month ago
- 15b0fae Fix ukernel/x86 issues discovered by #12818 (#12846) by bjacob · 2 years, 1 month ago
- 41453bb Upgrade releases and metadata to Python >= 3.8. (#12849) by Stella Laurenzo · 2 years, 1 month ago
- d65dbe9 Fix MSVC compilation of AVX2 float mmt4d ukernel that was slow on AMD Zen2 (#12826) by bjacob · 2 years, 1 month ago
- 8e4ebcf Refreshing local stack state after VM import calls. (#12809) by Ben Vanik · 2 years, 1 month ago
- 0635b09 Adding FatELF support to the embedded ELF loader. (#12624) by Ben Vanik · 2 years, 1 month ago
- 788f6d5 Ukernel cleanups (standard CPU feature sets, consistently including config.h files) (#12790) by bjacob · 2 years, 1 month ago
- 199c9c8 x86 ukernels for pack/unpack (#12789) by bjacob · 2 years, 1 month ago
- c966f36 x86 ukernels for mmt4d (#12750) by bjacob · 2 years, 1 month ago
- deac6b4 Generalize `iree_uk_cpu_features_list_t` for x86 (#12749) by bjacob · 2 years, 1 month ago
- 9cf99ae `ukernel/arch/arm_64`: simplify build, allow non-GCC-compatible toolchains (#12700) by bjacob · 2 years, 1 month ago
- 4e2c85d removed channel_provider from iree_hal_cuda_device_t (#12776) by Okwan Kwon · 2 years, 1 month ago
- 76aaedc Reworking CUDA channel creation and plumbing group/ID. (#12695) by Ben Vanik · 2 years, 1 month ago
- e7e662d Ukernel tools: port to C and generalize (#12662) by bjacob · 2 years, 1 month ago