commit | 316275a3646181a89b6bc4b29453333336c8e6e2 | [log] [tgz] |
---|---|---|
author | Geoffrey Martin-Noble <gcmn@google.com> | Wed Feb 10 11:04:18 2021 -0800 |
committer | Geoffrey Martin-Noble <gcmn@google.com> | Wed Feb 10 11:04:18 2021 -0800 |
tree | 06f112c04d5997d683e1457f668d68fce681f845 | |
parent | 8283e02a460c7e43330809d4a0d465ba44abfed8 [diff] | |
parent | 707cf08c913f222ab73a641f5434b3ccc6f4341e [diff] |
Merge main -> google (#4800) * 707cf08c9 Merge pull request #4798= * 8e5112d58 Move dispatch region creation for Linalg on tensors after all HLO passes. (#47.. * 59dcd0db4 Bump CMake minimum to 3.13.4 and remove dead code (#4792) * 8a55a927a Adapt dispatch region creation to always clone metadata-like/simple operations.. * 46963f778 Disabling failing reverse linalg-on-tensors test. (#4788) * 52a1fe6af Enable passing e2e/xla_ops linalg-on-tensors tests (#4779) * 11a73b8af [CodeGen] Pack scf.for vector<8xf16> loop-carried values (#4775) * 21b9520db [codegen] Enable vectorizing convolution with output height != 1 (#4758) * 9d1e3261f Add logic for better ordering of fusion. (#4772) * d0cc95f4c Add CUDA SDK headers to third_party (#4776) * d436f8345 Use stack allocation on CPU side for Linalg on tensors path. (#4770) * 3c3c5b499 Drop lowering for tensor::GenerateOp. (#4769) * 0281bb72f Convert mhlo.const -> const whel running ConvertHLOToLinalgOnTensorsPass (#4763) * 82ef99e7f Merge google -> main (#4755) * 110a2f5f0 Merge pull request #4760 from google/benvanik-hal-simplification * c6393544b Resolve merge conflicts between main and google * a299a320e Removing runtime hal.buffer.read_data/write_data/copy_data. The ops still exis.. * bc3b45aa4 Removing runtime hal.buffer_view.compute_offset/compute_range. These are now c.. * 0498257ba Removing runtime hal.buffer_view.dims.* and hal.buffer_view.subview. These are.. * 28d2cc48d Rematerialize DispatchWorkgroupsOp constant arguments (#4756) * 056b0167d Fixed iree tflite import to include required dialects. (#4754) * f168c55c5 Force CPU benchmark to run on single thread. (#4751) * a7c1bd277 Fixes to enable executing matmul on GPU. (#4745) * 2325b050a Merge pull request #4743 from google/benvanik-reordering-1 * ff46a1e00 Add missing dependency for emitc shift test (#4747) * 02704ab7a Fix include of EmitC generated headers (#4746) * 4e09bd731 Unify patterns in the vm to emitc conversion (#4724) * a3c888c81 Reordering hal.buffer_view.create/iree_hal_buffer_view_create args. * 9743fa8e4 Reordering operands of flow.tensor.trace/hal.buffer_view.trace. * 43b756275 Simplifying hal.command_buffer.execution_barrier. It was ignoring its args and.. * 7fd240f6f Remove unneeded ref_ptr dep. * b0f8fd090 Reordering params in iree_hal_executable_layout_create. Progress on #4736. * e511e25d1 Retain the backing device with python mapped array memory. * c85a426ab Enable lowering of Linalg on tensors on GPU. (#4688) * 893979361 Rename ".module" to ".vmfb" in generated IREE bytecode modules. (#4729) * 29c1c3822 Removing abseil dep in tracing.h to make it ok for API exposure. (#4721) * 915723075 Add VM tests for native integer arithmetics (#4725) * a89c9d821 Add tests for log, exp, maximum, minimum, comparison, ceil and floor lowerings.. PiperOrigin-RevId: 356627846
IREE (Intermediate Representation Execution Environment, pronounced as “eerie”) is an MLIR-based end-to-end compiler that lowers ML models to a unified IR optimized for real-time mobile/edge inference against heterogeneous hardware accelerators. IREE also provides flexible deployment solutions for the compiled ML models.
IREE is still in its early phase. We have settled down on the overarching infrastructure and are actively improving various software components as well as project logistics. It is still quite far from ready for everyday use and is made available without any support at the moment. With that said, we welcome any kind of feedback on any communication channels!
Python packages are published on the releases page. See the colab/ directory for examples.
IREE can be built from source using both Bazel and CMake on Windows and Linux. We also have experimental macOS support.
Please see the Getting Started pages on IREE's documentation hub to configure, compile, and run IREE in your favorite development environment!
IREE hosts all its documentation and project status dashboards on GitHub Pages. We are still building up the website; please feel free to create issues for the documentation you'd like to see!
We also have some public talks that explain IREE's concepts and architecture:
IREE adopts a holistic approach towards ML model compilation: the IR produced contains both the scheduling logic, required to communicate data dependencies to low-level parallel pipelined hardware/API like Vulkan, and the execution logic, encoding dense computation on the hardware in the form of hardware/API-specific binaries like SPIR-V.
The architecture of IREE is best illustrated by the following picture:
Being compilation-based means IREE does not have a traditional runtime that dispatches “ops” to their fat kernel implementations. What IREE provides is a toolbox for different deployment scenarios. It scales from running generated code on a particular API (such as emitting C code calling external DSP kernels), to a HAL (Hardware Abstraction Layer) that allows the same generated code to target multiple APIs (like Vulkan and Direct3D 12), to a full VM allowing runtime model loading for flexible deployment options and heterogeneous execution.
IREE aims to
IREE is in the early stages of development and not yet ready for broad adoption. Check out the long-term design roadmap to get a sense of where we're headed.
We plan on a quarterly basis using OKRs. Review our latest objectives to get a sense of what we're up to in the near term.
We use GitHub Projects to track progress on IREE components and specific efforts. We use GitHub Milestones to track the work associated with plans for each quarter.
IREE is licensed under the terms of the Apache license. See LICENSE for more information.