- 78e9dbc [FlashAttention] Adapt attention for tiling + implement tiling/decompose (#15217) by Abhishek Varma · 1 year, 5 months ago
- 3323519 Port samples/dynamic_shapes/ to PyTorch using SHARK-Turbine. (#15255) by Scott Todd · 1 year, 5 months ago
- 20e2112 Cleaning up hal.executable.variant syntax. (#15254) by Ben Vanik · 1 year, 5 months ago
- a95a28a Move expected output URL into the parameter of `iree_run_module_test` (#15248) by Jerry Wu · 1 year, 5 months ago
- eb9b8b6 Revert "Improvements to e2e matmul tests" (#15252) by bjacob · 1 year, 5 months ago
- 71c22da Improvements to e2e matmul tests (#15243) by bjacob · 1 year, 5 months ago
- 5a20dce bf16: select appropriate tile sizes on x86 and Arm, and enable in x86 bitcode build (#15244) by bjacob · 1 year, 5 months ago
- 02e34b0 Optimize `moveOp[Up,Down]InBlock` functions in `SimplifyGlobalAccesses`. (#15245) by Scott Todd · 1 year, 5 months ago
- df00df9 Disable flaky metal `e2e/*_ops` tests. (#15240) by Scott Todd · 1 year, 5 months ago
- fed372c Make iree_benchmark_suite_module_test easier to run locally (#15238) by Jerry Wu · 1 year, 5 months ago
- 63381a8 Switching external resources to be device-local only. (#14016) by Ben Vanik · 1 year, 5 months ago
- 87c968c Lint fix trailing space in debugging/releases.md. (#15239) by Scott Todd · 1 year, 5 months ago
- 3a70dda Improving the `*.tensor.trace` op to carry shapes/encodings. (#15228) by Ben Vanik · 1 year, 5 months ago
- c232aeb Run markdownlint on some files under `docs/developers/*`. (#15168) by Scott Todd · 1 year, 5 months ago
- b13037b Fix build after #15151 (#15236) by Andrzej Warzyński · 1 year, 5 months ago
- 4db7c1a Avoid complaints about missing bc file if compiling for CUDA (#15232) by Jacques Pienaar · 1 year, 5 months ago
- 479f4ed [mlir][Flow] Reland https://github.com/openxla/iree/pull/14656 but keep `CollapseReductionDims` pass (#15219) by MaheshRavishankar · 1 year, 5 months ago
- add9417 [Bindings] Implement alloc + copy to local host when map is unavailable. (#14997) by Stanley Winata · 1 year, 5 months ago
- 8b1af38 Add support for converting bf16 to uint16 on func ops. (#15231) by Han-Chung Wang · 1 year, 5 months ago
- 82611a9 Making execution region results queue-ordered allocas. (#15149) by Ben Vanik · 1 year, 5 months ago
- cda49ca [rocm] Print GPU information when dumping device info (#15230) by Lei Zhang · 1 year, 5 months ago
- e8f184d Revert "[LLVMGPU] Splitting TensorCoreVectorization to two passes." (#15225) by Han-Chung Wang · 1 year, 5 months ago
- a9d7aa5 [Cleanup] Retire filter-based vectorization patterns. (#15185) by Han-Chung Wang · 1 year, 5 months ago
- 6b5b989 Remove unnecessary pragma (#15224) by bjacob · 1 year, 5 months ago
- f84545e Implementing stack trace capture on iree_status_t for Win/Mac. (#15151) by Ben Vanik · 1 year, 5 months ago
- e918678 [LLVMGPU] Splitting TensorCoreVectorization to two passes. (#15184) by Han-Chung Wang · 1 year, 5 months ago
- 87a1cc6 Fork CUDA and ROCm guides into separate pages. (#15196) by Scott Todd · 1 year, 5 months ago
- 0b2997e Tune compiler tracing usage of 'frame's. (#15216) by Scott Todd · 1 year, 5 months ago
- bb087a1 [ConstEval] Make ConstExprMaxSizeIncreaseThreshold be controlled by API. (#15183) by Han-Chung Wang · 1 year, 5 months ago
- 00b75cb [spirv] Account for dynamic dimensions when computing parallelism (#15211) by Lei Zhang · 1 year, 5 months ago
- d87a97e Delete a.out. by Scott Todd · 1 year, 5 months ago
- 58afd02 Bump TF to 2.16.0.dev20231013 (#15198) by Jerry Wu · 1 year, 5 months ago
- b7fd668 Move passes implementation to GlobalOptimization. (#15206) by Han-Chung Wang · 1 year, 5 months ago
- b2fc30d [docs] Add instructions for printing iree SHA from venv (#15212) by Jakub Kuderski · 1 year, 5 months ago
- ba3a6e0 Remove GPT-2 benchmarks on Andreno and Mali by mariecwhite · 1 year, 5 months ago
- 1af382f Update RISCV prebuild toolchains to support manylinux_2_28_x86_64 (#15170) by CindyLiu · 1 year, 5 months ago
- 1fa8b48 Replace more uses of `cc_library` with `iree_runtime_cc_library`. (#15204) by Scott Todd · 1 year, 5 months ago
- 46a2305 Integrate llvm 20231012 (#15163) by Stella Laurenzo · 1 year, 5 months ago
- acddd0c Adding hal.dispatch.extern op. (#15193) by Ben Vanik · 1 year, 5 months ago
- d7f97ba Drop references to an already-fixed test and some no-longer-existing files. (#15203) by bjacob · 1 year, 5 months ago
- d0dac3f Add vscode workspace files to `.gitignore` (#15202) by bjacob · 1 year, 5 months ago
- bc6643b Fix `ireeCompilerLoadBinary` on Apple: pass unprefixed symbol name to `dlsym` (#15201) by bjacob · 1 year, 5 months ago
- 9d7a4ba Data-tiling encodings: take the element types out of the enums. (#15182) by bjacob · 1 year, 5 months ago
- 2b5e61f Bump Nvidia libraries in docker to 535.113.01 (#15200) by Jerry Wu · 1 year, 5 months ago
- 15fe302 [Target][ROCM] Linking OCLC params as global instead of device BC. (#15169) by Stanley Winata · 1 year, 5 months ago
- e33613c Fix allocator crashes on Apple (#15199) by bjacob · 1 year, 5 months ago
- 467c86e [spirv] Generalize transposed batch matmul op (#15197) by Lei Zhang · 1 year, 5 months ago
- 193c132 [GPU] Fix hoisting after upstream change disallowing view ops (#15192) by Lei Zhang · 1 year, 5 months ago
- d3a152b Rebase PyTorch guide on SHARK-Turbine. (#15181) by Scott Todd · 1 year, 5 months ago
- ebdb098 [experimental][regression] Add ROCM Regression test. (#14861) by Stanley Winata · 1 year, 5 months ago
- fd63b3f Remove CUDA from benchmark_large suite by mariecwhite · 1 year, 5 months ago
- e190dd9 Ensure that compiler output is sufficiently aligned. (#15188) by Stella Laurenzo · 1 year, 5 months ago
- 95bece1 Add an API to set global compiler CL options. (#15190) by Stella Laurenzo · 1 year, 5 months ago
- 3eaface Add an API to disable nanobind's leak checker. (#15189) by Stella Laurenzo · 1 year, 5 months ago
- 7a8d8f7 [ROCM] Replace rocm sdk ld.lld with iree-lld for compile-time linkage. (#15187) by Stella Laurenzo · 1 year, 5 months ago
- e023ea7 [rocm] Bundle HIP headers into a submodule and use that by default. (#15186) by Stella Laurenzo · 1 year, 5 months ago
- 89031fc Delete duplicate canonicalization patterns from LinalgExt dialect. (#12124) by Han-Chung Wang · 1 year, 5 months ago
- b6c7c62 Limit the latency of e2e matmul tests (#15180) by bjacob · 1 year, 5 months ago
- c397258 Add pytorch_aot_simple sample Colab notebook using SHARK-Turbine. (#15166) by Scott Todd · 1 year, 5 months ago
- 3f001b6 [NFC] Switch CUDA to use GenericVectorization pass. (#15176) by Han-Chung Wang · 1 year, 5 months ago
- d11c5e5 Add a transform to hoist unrolled vector ops out of scf.for ops. (#14281) by Han-Chung Wang · 1 year, 5 months ago
- 4cdfb91 Update Nvidia host driver to 535.113.01 (#15179) by Jerry Wu · 1 year, 6 months ago
- 29b8197 Splitting dispatch annotation from outlining. (#15177) by Ben Vanik · 1 year, 6 months ago
- 2a42fd3 `avx512bf16` ukernel: work around yet another Clang-16 crash (#15164) by bjacob · 1 year, 6 months ago
- ef25b0c [metal] Fix creating device via registry path (#15142) by Lei Zhang · 1 year, 6 months ago
- 374f854 Fixing warnings on windows build (#15167) by Han-Chung Wang · 1 year, 6 months ago
- 2617126 Rename Colab notebooks to highlight frameworks used. (#15162) by Scott Todd · 1 year, 6 months ago
- b1263d1 Add pkgci artifacts to .gitignore (#15165) by Kunwar Grover · 1 year, 6 months ago
- dd26475 [PkgCI] Add recipe for correctness on CPU (#15131) by Kunwar Grover · 1 year, 6 months ago
- 9c424c4 [LinalgExt] Remove LinalgExt::Softmax and use upstream linalg::softmax (#15021) by Abhishek Varma · 1 year, 6 months ago
- b3cd60a Add pytorch_jit sample Colab notebook using SHARK-Turbine. (#15146) by Scott Todd · 1 year, 6 months ago
- f782069 [CPU] Share SCF tile size computation logic with `-iree-llvmcpu-tile` (#15154) by Benjamin Maxwell · 1 year, 6 months ago
- aef639b Add TF GPT2 benchmarks by mariecwhite · 1 year, 7 months ago
- 4e06f74 Re-enable PyTorch Benchmarks by mariecwhite · 1 year, 6 months ago
- 8380941 [NFC][SPIRV] Cleanup after deprecating LinalgVectorizationPattern. (#15156) by Han-Chung Wang · 1 year, 6 months ago
- 4983668 `mmt4d` ukernel for the `bf16*bf16->f32` case using AVX-512-BF16 (#15089) by bjacob · 1 year, 6 months ago
- a7a784e Update RISCV prebuilt toolchains (#15153) by CindyLiu · 1 year, 6 months ago
- 02e1625 Prune mobile benchmark suites (#14923) by Jerry Wu · 1 year, 6 months ago
- dec2949 [NFC] Switch to use upstreamed memref methods and transform ops. (#14871) by Han-Chung Wang · 1 year, 6 months ago
- 5b92b73 [Flow] Add naming heuristic for possible slow memory copies (#15150) by Kunwar Grover · 1 year, 6 months ago
- ba3e6a7 [CPU][SVE] Enable scalable vectorization and tiling for non-padded matmuls (#15108) by Benjamin Maxwell · 1 year, 6 months ago
- 729bb75 [spirv] NFC: Restructure TileAndVectorizeToCooperativeOps pass (#15138) by Lei Zhang · 1 year, 6 months ago
- 77a8741 Fixing TRACY_NO_EXIT on MacOS and supporting MacOS tracy builds. (#15143) by Ben Vanik · 1 year, 6 months ago
- 9181525 [CPU] Enable fast min/max ops in CPU codegen (#15130) by Diego Caballero · 1 year, 6 months ago
- 94e7e23 Revert "Fixing TRACY_NO_EXIT on MacOS. (#15139)" (#15140) by mariecwhite · 1 year, 6 months ago
- 198af34 Fixing TRACY_NO_EXIT on MacOS. (#15139) by Ben Vanik · 1 year, 6 months ago
- b5bbea2 [Codegen] Add the `bitcast -> extui` to `shuffle` folding patterns to EmulateNarrowTypes pass. (#15102) by MaheshRavishankar · 1 year, 6 months ago
- 17e758b [Reducer] Fix ReduceOptimizationBarrierDelta (#15129) by Kunwar Grover · 1 year, 6 months ago
- d545d82 [spirv] NFC: Restructure the vector lowering pass (#15134) by Lei Zhang · 1 year, 6 months ago
- 6baf83f [GlobalOptimization] Add RaiseSpecialOps earlier in the pipeline (#15125) by Quinn Dawkins · 1 year, 6 months ago
- 46d1860 Fix SSA use-def violation created by Tile and fuse pass. (#15133) by MaheshRavishankar · 1 year, 6 months ago
- 91dfbf4 Adds torch-mlir dequant ops to the lowering pipeline (#15128) by Daniel Garvey · 1 year, 6 months ago
- 5121c3f [Flow] Add peephole optimization for partial negation and reverse (#15121) by Quinn Dawkins · 1 year, 6 months ago
- 2dcd04d Integrate llvm 20231005 4 (#15119) by Stella Laurenzo · 1 year, 6 months ago
- 4b9cd62 [Reducer] Add bytecode support to iree-reduce (#15079) by Kunwar Grover · 1 year, 6 months ago
- 8e1befd [Flow] Optionally fuse the fill in the dequant fusion pass (#15124) by Quinn Dawkins · 1 year, 6 months ago
- 8651777 [gpu] Add basic heuristics for better reduction occupancy (#15120) by Lei Zhang · 1 year, 6 months ago
- 9e9aff0 [PkgCI] Add llama2_7b_i4 recipe for correctness testing on cuda (#15113) by Kunwar Grover · 1 year, 6 months ago
- 58b5670 [rocm] Enable the ROCM compiler target backend by default. (#15111) by Stella Laurenzo · 1 year, 6 months ago
- c64b31f [PkgCI] Add tqdm bar while downloading artifacts (#15112) by Kunwar Grover · 1 year, 6 months ago