| # IREE: An Experimental MLIR Execution Environment |
| |
| **DISCLAIMER**: This is not an officially supported Google product. It's an |
| experimental playground for low-level/tightly integrated machine learning |
| libraries that make use of modern hardware acceleration APIs and techniques (see |
| [non goals](#non-goals)). |
| |
| ## Table of Contents |
| |
| - [Quickstart](#quickstart) |
| - [Project Goals](#project-goals) |
| - [Milestones](#milestones) |
| - [Status](#status) |
| - [Dependencies](#dependencies) |
| - [License](#license) |
| |
| <a name="quickstart"></a> |
| |
| ## Quickstart |
| |
| More Coming soon! Performing full model translation may require a few steps |
| (such as ensuring you have a working TensorFlow build), however we'll have |
| pre-translated example models that allow independent testing of the runtime |
| portions. |
| |
| * [Getting Started on Windows](docs/getting_started_on_windows.md) |
| * [Getting Started on Linux](docs/getting_started_on_linux.md) |
| * [Getting Started](docs/getting_started.md) |
| |
| See also: |
| |
| * [Using Colab](docs/using_colab.md) |
| * [Vulkan and SPIR-V](docs/vulkan_and_spirv.md) |
| |
| <a name="project-goals"></a> |
| |
| ## Project Goals |
| |
| IREE (**I**ntermediate **R**epresentation **E**xecution **E**nvironment, |
| pronounced as "eerie") is an experimental compiler backend for |
| [MLIR](https://github.com/tensorflow/mlir) that lowers ML models to an IR that |
| is optimized for real-time mobile/edge inference against heterogeneous hardware |
| accelerators. |
| |
| The IR produced contains the sequencing information required to communicate |
| pipelined data dependencies and parallelism to low-level hardware APIs like |
| Vulkan and embed hardware/API-specific binaries such as SPIR-V or compiled ARM |
| code. As the IR is specified against an abstract execution environment there are |
| many potential ways to run a compiled model, and one such way is included as an |
| example and testbed for runtime optimization experiments. |
| |
| The included layered runtime scales from generated code for a particular API |
| (such as emitting C code calling external DSP kernels), to a HAL (**H**ardware |
| **A**bstraction **L**ayer) that allows the same generated code to target |
| multiple APIs (like Vulkan and Direct3D 12), to a full VM allowing runtime model |
| loading for flexible deployment options and heterogeneous execution. Consider |
| both the compiler and the included runtime a toolbox for making it easier - via |
| the versatility of MLIR - to take ML models from their source to some varying |
| degree of integration with your application. |
| |
| ### Demonstrate MLIR |
| |
| IREE has been developed alongside MLIR and is used as an example of how |
| non-traditional ML compiler backends and runtimes can be built: it focuses more |
| on the math being performed and how that math is scheduled rather than graphs of |
| "ops" and in some cases allows doing away with a runtime entirely. It seeks to |
| show how more holistic approaches that exploit the MLIR framework and its |
| various dialects can be both easy to understand and powerful in the |
| optimizations to code size, runtime complexity, and performance they enable. |
| |
| ### Demonstrate Advanced Models |
| |
| By using models with much greater complexity than the usual references (such as |
| MobileNet) we want to show how weird things can get when model authors are |
| allowed to get creative: dynamic shapes, dynamic flow control, dynamic |
| multi-model dispatch (including models that conditionally dispatch other |
| models), streaming models, tree-based search algorithms, etc. We are trying to |
| build IREE from the ground-up to enable these models and run them efficiently on |
| modern hardware. Many of our example models are sequence-to-sequence language |
| models from the [Lingvo](https://github.com/tensorflow/lingvo) project |
| representing cutting edge speech recognition and translation work. |
| |
| ### Demonstrate ML-as-a-Game-Engine |
| |
| An observation that has driven the development of IREE is one of ML workloads |
| not being much different than traditional game rendering workloads: math is |
| performed on buffers with varying levels of concurrency and ordering in a |
| pipelined fashion against accelerators designed to make such operations fast. In |
| fact, most ML is performed on the same hardware that was designed for games! Our |
| approach is to use the compiler to transform ML workloads to ones that look |
| eerily _(pun intended)_ similar to what a game performs in per-frame render |
| workloads, optimize for low-latency and predictable execution, and integrate |
| well into existing systems both for batched and interactive usage. The IREE |
| runtime is designed to feel more like game engine middleware than a standalone |
| ML inference system, though we still have much work towards that goal. This |
| should make it easy to use existing tools for high-performance/low-power |
| optimization of GPU workloads, identify driver or system issues introducing |
| latency, and help to improve the ecosystem overall. |
| |
| ### Demonstrate Standards-based ML via Vulkan and SPIR-V |
| |
| With the above observation that ML can look like games from the systems |
| perspective it follows that APIs and technologies good for games should probably |
| also be good for ML. In many cases we've identified only a few key differences |
| that exist and just as extensions have been added and API improvements have been |
| made to graphics/compute standards for decades we hope to demonstrate and |
| evaluate small, tactical changes that can have large impacts on ML performance |
| through these standard APIs. We would love to allow hardware vendors to be able |
| to make ML efficient on their hardware without the need for bespoke runtimes and |
| special access such that any ML workload produced by any tool runs well. We'd |
| consider the IREE experiment a success if what resulted was some worked examples |
| that help advance the entire ecosystem! |
| |
| <a name="non-goals"></a> |
| |
| ## Non-Goals |
| |
| * Replace parts of the supported TensorFlow ecosystem of tools: The authors |
| within Google work closely with TensorFlow and contribute to it regularly. |
| However, IREE is exploring some different angles of the problem and is |
| experimental. We will seek to leverage anything of value that we learn in an |
| appropriate way to make TensorFlow better over time, but the two should not |
| be conflated. |
| * Providing an [SLA](https://en.wikipedia.org/wiki/Service-level_agreement) of |
| any kind: IREE is infrastructure research, not a supported product. If it |
| gains mind-share or traction, we would revisit that in conjunction with |
| finding a more permanent way to align it with the broader constellation of |
| ML tooling. |
| |
| <a name="milestones"></a> |
| |
| ## Milestones |
| |
| We are currently just at the starting line, with basic |
| [MNIST MLP](https://github.com/keras-team/keras/blob/master/examples/mnist_mlp.py) |
| running end-to-end on both a CPU interpreter and Vulkan. As we scale out the |
| compiler we will be increasing the complexity of the models that can run and |
| demonstrating more of the optimizations we've found useful in previous efforts |
| to run them efficiently on edge devices. |
| |
| A short-term |
| [Roadmap](https://github.com/google/iree/blob/master/docs/roadmap.md) is |
| available talking about the major areas where are focusing on in addition to the |
| more infrastructure-focused work listed below. |
| |
| We'll be setting up GitHub milestones with issues tracking major feature work we |
| are planning. For now, our areas of work are: |
| |
| * Allocation tracking and aliasing in the compiler |
| * Pipelined scheduler in the VM for issuing proper command buffers |
| * New CPU interpreter that enables lightweight execution on ARM and x86 |
| * C code generator and API to demonstrate "runtimeless" mode |
| * Quantization using the MLIR quantization framework |
| * High-level integration and examples when working with TensorFlow 2.0 |
| * Switching from IREE's XLA-to-SPIR-V backend to the general MLIR SPIR-V |
| backend |
| |
| Things we are interested in but don't yet have in-progress: |
| |
| * Ahead-of-time compiled ARM NEON backend (perhaps via |
| [SPIRV-LLVM](https://github.com/KhronosGroup/SPIRV-LLVM-Translator/), |
| [SPIRV-to-ISPC](https://github.com/GameTechDev/SPIRV-Cross), or some other |
| technique) |
| * HAL backends for Metal 2 and Direct3D 12 |
| * Profile-guided optimization support for scheduling feedback |
| |
| <a name="status"></a> |
| |
| ## Current Status |
| |
| ### Documentation |
| |
| Coming soon :) |
| |
| ### Build System and CI |
| |
| * We support Bazel for builds of all parts of the project. |
| * We also maintain a CMake build for a subset of runtime components designed |
| to be used in other systems. |
| |
| ### Code and Style |
| |
| The project is currently _very_ early and a mix of code written prior to a lot |
| of the more recent ergonomics improvements in MLIR and its tablegen. Future |
| changes will replace the legacy code style with prettier forms and simplify the |
| project structure to make it easier to separate the different components. Some |
| entire portions of the code (such as the CPU interpreter) will likely be dropped |
| or rewritten. For now, assume churn! |
| |
| The compiler portions of the code (almost exclusively under `iree/compiler/`) |
| follows the LLVM style guide and has the same system requirements as MLIR |
| itself. It general requires a more modern C++ compiler. |
| |
| The runtime portions vary but most are designed to work with C++11 and use |
| [Abseil](https://github.com/abseil/abseil-cpp) to bring in future C++14 and |
| C++17 features. |
| |
| ### Hardware Support |
| |
| We are mostly targeting Vulkan and Metal on recent mobile devices and as such |
| have limited our usage of hardware features and vendor extensions to those we |
| have broad access to there. This is mainly just to keep our focus tight and does |
| not preclude usage of features outside the standard sets or for other hardware |
| types (in fact, we have a lot of fun ideas for |
| `VK_NVX_device_generated_commands` and Metal 2.1's Indirect Command Buffers!). |
| |
| <a name="dependencies"></a> |
| |
| ## Dependencies |
| |
| NOTE: during the initial open source release we are still cleaning things up. If |
| there's weird dependencies/layering that makes life difficult for your |
| particular use case please file an issue so we can make sure to fix it. |
| |
| ### Compiler |
| |
| The compiler has several layers that allow scaling the dependencies required |
| based on the source and target formats. In all cases |
| [MLIR](https://github.com/tensorflow/mlir) is required and for models not |
| originating from TensorFlow (or already in XLA HLO format) it is the only |
| dependency. When targeting the IREE Runtime VM and HAL |
| [FlatBuffers](https://google.github.io/flatbuffers/) is required for |
| serialization. Converting from TensorFlow models requires a dependency on |
| TensorFlow (however only those parts required for conversion). |
| |
| ### Runtime VM |
| |
| The VM providing dynamic model deployment and advanced scheduling behavior |
| requires [Abseil](https://github.com/abseil/abseil-cpp) for its common types, |
| however contributions are welcome to make it possible to replace Abseil with |
| other libraries via shims/forwarding. The core types used by the runtime |
| (excluding command line flags and such in tools) are limited to types coming in |
| future C++ versions (variant, optional, string_view, etc), cheap types |
| (absl::Span), or simple standard containers (absl::InlinedVector). |
| [FlatBuffers](https://google.github.io/flatbuffers/) is used to load compiled |
| modules. |
| |
| ### Runtime HAL |
| |
| The HAL and the provided implementations (Vulkan, etc) also use |
| [Abseil](https://github.com/abseil/abseil-cpp). Contributions are welcome to |
| allow other types to be swapped in. A C99 HAL API is planned for code generation |
| targets that will use no dependencies. |
| |
| ### Testing and Tooling |
| |
| [Swiftshader](https://github.com/google/swiftshader) is used to provide fast |
| hardware-independent testing of the Vulkan and SPIR-V portions of the toolchain. |
| |
| <a name="license"></a> |
| |
| ## License |
| |
| IREE is licensed under the terms of the Apache license. See [LICENSE](LICENSE) |
| for more information. |