IREE (Intermediate Representation Execution Environment[^1]) is an MLIR-based end-to-end compiler and runtime that lowers Machine Learning (ML) models to a unified IR that scales up to meet the needs of the datacenter and down to satisfy the constraints and special considerations of mobile and edge deployments.
IREE supports importing from a variety of ML frameworks:
The IREE compiler tools run on :fontawesome-brands-linux: Linux, :fontawesome-brands-windows: Windows, and :fontawesome-brands-apple: macOS and can generate efficient code for a variety of runtime platforms:
and architectures:
Support for hardware accelerators and APIs is also included:
IREE adopts a holistic approach towards ML model compilation: the IR produced contains both the scheduling logic, required to communicate data dependencies to low-level parallel pipelined hardware/API like Vulkan, and the execution logic, encoding dense computation on the hardware in the form of hardware/API-specific binaries like SPIR-V.
Specific examples outlining IREE's workflow can be found in the User Getting Started Guide. Using IREE involves the following general steps:
Import your model
Develop your program using one of the supported frameworks, then run your model using one of IREE's import tools.
Select your deployment configuration
Identify your target platform, accelerator(s), and other constraints.
Compile your model
Compile through IREE, picking compilation targets based on your deployment configuration.
Run your model
Use IREE's runtime components to execute your compiled model.
IREE supports importing models from a growing list of ML frameworks and model formats:
IREE provides a flexible set of tools for various deployment scenarios. Fully featured environments can use IREE for dynamic model deployments taking advantage of multi-threaded hardware, while embedded systems can bypass IREE's runtime entirely or interface with custom accelerators.
IREE supports the full set of these configurations using the same underlying technology.
Model compilation is performed ahead-of-time on a host machine for any combination of targets. The compilation process converts from layers and operators used by high level frameworks down into optimized native code and associated scheduling logic.
For example, compiling for GPU execution using Vulkan generates SPIR-V kernels and Vulkan API calls. For CPU execution, native code with static or dynamic linkage and the associated function calls are generated.
IREE offers a low level C API, as well as several specialized sets of bindings for running IREE models using other languages:
IREE is in the early stages of development and is not yet ready for broad adoption. Check out the long-term design roadmap to get a sense of where we're headed.
We plan on a quarterly basis using OKRs. Review our latest objectives to see what we're up to.
We use GitHub Projects to track progress on IREE components and specific efforts and GitHub Milestones to track the work associated with plans for each quarter.
[^1]: Pronounced “eerie” and often styled with the :iree-ghost: emoji
*[IR]: Intermediate Representation