blob: 60e9497f94718c72f34116acac48e827cfdee90e [file] [view]
# IREE
IREE (**I**ntermediate **R**epresentation **E**xecution **E**nvironment[^1]) is
an [MLIR](https://mlir.llvm.org/)-based end-to-end compiler and runtime that
lowers Machine Learning (ML) models to a unified IR that scales up to meet the
needs of the datacenter and down to satisfy the constraints and special
considerations of mobile and edge deployments.
## Key features
- [x] Ahead-of-time compilation of scheduling and execution logic together
- [x] Support for dynamic shapes, flow control, streaming, and other advanced
model features
- [x] Optimized for many CPU and GPU architectures
- [x] Low overhead, pipelined execution for efficient power and resource usage
- [x] Binary size as low as 30KB on embedded systems
- [x] Debugging and profiling support
## Support matrix
IREE supports importing from a variety of ML frameworks:
- [x] TensorFlow
- [x] TensorFlow Lite
- [x] JAX
- [ ] PyTorch (planned)
- [ ] ONNX (hoped for)
The IREE compiler tools run on :fontawesome-brands-linux: Linux,
:fontawesome-brands-windows: Windows, and :fontawesome-brands-apple: macOS
and can generate efficient code for a variety of runtime platforms:
- [x] Linux
- [x] Windows
- [x] Android
- [ ] macOS (planned)
- [ ] iOS (planned)
- [x] Bare metal
- [ ] WebAssembly (planned)
and architectures:
- [x] ARM
- [x] x86
- [x] RISC-V
Support for hardware accelerators and APIs is also included:
- [x] Vulkan
- [x] CUDA
- [ ] Metal (planned)
- [ ] WebGPU (planned)
## Project architecture
IREE adopts a _holistic_ approach towards ML model compilation: the IR produced
contains both the _scheduling_ logic, required to communicate data dependencies
to low-level parallel pipelined hardware/API like
[Vulkan](https://www.khronos.org/vulkan/), and the _execution_ logic, encoding
dense computation on the hardware in the form of hardware/API-specific binaries
like [SPIR-V](https://www.khronos.org/spir/).
![IREE Architecture](assets/images/iree_architecture.svg)
## Workflow overview
Using IREE involves these general steps:
1. **Import your model**
Work in your [framework of choice](./ml-frameworks), then run your model
through one of IREE's import tools.
2. **Select your [deployment configuration](./deployment-configurations)**
Identify your target platform, accelerator(s), and other constraints.
3. **Compile your model**
Compile through IREE, picking compilation targets based on your
deployment configuration.
4. **Run your model**
Use IREE's runtime components to execute your compiled model.
### Importing models from ML frameworks
IREE supports importing models from a growing list of ML frameworks and model
formats:
* [TensorFlow](ml-frameworks/tensorflow.md)
* [TensorFlow Lite](ml-frameworks/tensorflow-lite.md)
* [JAX](ml-frameworks/jax.md)
### Selecting deployment configurations
IREE provides a flexible set of tools for various deployment scenarios.
Fully featured environments can use IREE for dynamic model deployments taking
advantage of multi-threaded hardware, while embedded systems can bypass IREE's
runtime entirely or interface with custom accelerators.
* What platforms are you targeting? Desktop? Mobile? An embedded system?
* What hardware should the bulk of your model run on? CPU? GPU?
* How fixed is your model itself? Can the weights be changed? Do you want
to support loading different model architectures dynamically?
IREE supports the full set of these configurations using the same underlying
technology.
### Compiling models
Model compilation is performed ahead-of-time on a _host_ machine for any
combination of _targets_. The compilation process converts from layers and
operators used by high level frameworks down into optimized native code and
associated scheduling logic.
For example, compiling for
[GPU execution](deployment-configurations/gpu-vulkan.md) using Vulkan generates
SPIR-V kernels and Vulkan API calls. For
[CPU execution](deployment-configurations/cpu-dylib.md), native code with
static or dynamic linkage and the associated function calls are generated.
### Running models
IREE offers a low level C API, as well as several specialized sets of
[bindings](./bindings) for running IREE models using other languages:
* [C API](bindings/c-api.md)
* [Python](bindings/python.md)
* [TensorFlow Lite](bindings/tensorflow-lite.md)
## Communication channels
* :fontawesome-brands-github:
[GitHub issues](https://github.com/google/iree/issues): Feature requests,
bugs, and other work tracking
* :fontawesome-brands-discord:
[IREE Discord server](https://discord.gg/26P4xW4): Daily development
discussions with the core team and collaborators
* :fontawesome-solid-users: [iree-discuss email list](https://groups.google.com/forum/#!forum/iree-discuss):
Announcements, general and low-priority discussion
## Roadmap
IREE is in the early stages of development and is not yet ready for broad
adoption. Check out the
[long-term design roadmap](https://github.com/google/iree/blob/main/docs/developers/design_roadmap.md)
to get a sense of where we're headed.
We plan on a quarterly basis using [OKRs](https://en.wikipedia.org/wiki/OKR).
Review our latest
[objectives](https://github.com/google/iree/blob/main/docs/developers/objectives.md) to
see what we're up to.
We use [GitHub Projects](https://github.com/google/iree/projects) to track
progress on IREE components and specific efforts and
[GitHub Milestones](https://github.com/google/iree/milestones) to track the
work associated with plans for each quarter.
[^1]:
Pronounced "eerie" and often styled with the :iree-ghost: emoji
*[IR]: Intermediate Representation