blob: e85c92424aa5320d6d8a155a1d433abca5fa07ce [file] [log] [blame] [view]
# IREE Roadmap
## Winter 2019
Our goal for the end of the year is to have depth in a few complex examples
(such as streaming speech recognition) and breadth in platforms. This should
hopefully allow for contributions both from Googlers and externally to enable
broader platform support and optimizations as well as prove out some of the core
IREE concepts.
### Frontend: SavedModel/TF2.0
MLIR work to get SavedModels importing and lowering through the new MLIR-based
tf2xla bridge. This will give us a clean interface for writing stateful sample
models for both training and inference. The primary work on the IREE-side is
adding support for global variables to the sequencer IR and sequencer runtime
state tracking.
### Coverage: XLA HLO Ops
A majority of XLA HLO ops (what IREE works with) are already lowering to both
the IREE interpreter and the SPIR-V backend. A select few ops - such as
ReduceWindow and Convolution - are not yet implemented and need to be both
plumbed through the HLO dialect and the IREE lowering process as well as
implemented in the backends.
### Sequencer: IR Refactoring
The current sequencer IR is a placeholder designed to test the HAL backends and
needs to be reworked to its final (initial) form. This means rewriting the IR
description files, implementing lowerings, and rewriting the runtime dispatching
code. This will enable future work on codegen, binary size evaluation,
performance evaluation, and compiler optimizations around memory aliasing and
batching.
### Sequencer: Dynamic Shapes
Dynamic shapes requires a decent amount of work on the MLIR-side to flesh out
the tf2xla bridge such that we can get input IR that has dynamic shapes at all.
The shape inference dialect also needs to be designed and implemented so that we
have shape math in a form we can lower. As both of these are in progress we plan
to mostly design and experiment with how the runtime portions of dynamic shaping
will function in IREE.
### HAL: Dawn Implementation
To better engage with the WebGPU and WebML efforts we will be implementing a
[Dawn](https://dawn.googlesource.com/dawn/) backend that uses the same generated
SPIR-V kernels as the Vulkan backend but enables us to target Metal, Direct3D
12, and WebGPU. The goal is to get something working in place (even if
suboptimal) such that we can provide feedback to the various efforts.
### HAL: SIMD Dialect and Marl Implementation
Reusing most of the SPIR-V lowering we can implement a simple SIMD dialect for
both codegen and JITing. We're likely to start with the
[WebAssembly SIMD spec](https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md)
for the dialect (with the goal of being trivially compatible with WASM and to
avoid bikeshedding). Once we have at least one lowering to executable code
(either via codegen to JITing) we can use [Marl](https://github.com/google/marl)
to provide the work scheduling. This should be roughly equivalent to performance
to Swiftshader however with far less overhead. The ultimate goal is to be able
to delete the current IREE interpreter.
## Spring 2020
With the foundation laid in winter 2019 we'll be looking to expand support,
continue optimizations and tuning, and implement the cellular batching
techniques at the core of the IREE design.