blob: b979de760ed910d55112215bf99c8c953816ce35 [file] [log] [blame] [view]
# IREE Roadmap
## Design
Though many of the core dialects are now in place enough for correctness testing
a large majority of the features we are most excited to demonstrate are still
TODO and will be coming over the next few quarters. You can find a highlighted
set of coming features in the [design roadmap](roadmap_design.md).
## Spring/Summer 2020 Focus Areas
IREE is able to run many foundational models and more are expected to come
online this spring. Much of the work has been on infrastructure and getting the
code in a place to allow for rapid parallel development and now work is ramping
up on op coverage and completeness. There's still some core work to be done on
the primary IREE dialects (`flow` and `hal`) prior to beginning the low-hanging
fruit optimization burn-down, but we're getting close!
### Frontend: Enhanced SavedModel/TF2.0 Support
We are now able to import SavedModels written in the TF2.0 style with resource
variables and some simple usages of TensorList (`tf.TensorArray`, etc).
### Coverage: XLA HLO Ops
A select few ops - such as ReduceWindow - are not yet implemented and need to be
both plumbed through the HLO dialect and the IREE lowering process as well as
implemented in the backends. Work is ongoing to complete the remaining ops such
that we can focus on higher-level model usage semantics.
### Scheduler: Dynamic Shapes
Progress is underway on dynamic shape support throughout the stack. The tf2xla
effort is adding shape propagation/inference upstream and we have a decent
amount of glue mostly ready to accept it.
### HAL: Marl CPU Scheduling
We want to plug in [marl](https://github.com/google/marl) to provide
[CPU-side work scheduling](roadmap_design.md#gpu-like-cpu-scheduling) that
matches GPU semantics. This will enable improved CPU utilization and allow us to
verify the approach with benchmarks.
### Codegen: Full Linalg-based Conversion
A large part of the codegen story for both CPU (via LLVM IR) and GPU (via
SPIR-V) relies on the upstream
[Linalg dialect](https://mlir.llvm.org/docs/Dialects/Linalg/) and associated
lowerings. We are contributing here and have partial end-to-end demonstrations
of conversion. By the end of summer we should be fully switched over to this
path and can remove the index-propagation-based SPIR-V lowering approach in
favor of the more generalized solution.
## Beyond
### HAL: Dawn Implementation
To better engage with the WebGPU and WebML efforts we will be implementing a
[Dawn](https://dawn.googlesource.com/dawn/) backend that uses the same generated
SPIR-V kernels as the Vulkan backend which enables us to target Metal, Direct3D
12, and WebGPU. The goal is to get something working in place (even if
suboptimal) such that we can provide feedback to the various efforts.