docs/roadmap.md - 3p/openxla/iree - Git at Google

 # IREE Roadmap

 ## Design

 Though many of the core dialects are now in place enough for correctness testing
 a large majority of the features we are most excited to demonstrate are still
 TODO and will be coming over the next few quarters. You can find a highlighted
 set of coming features in the [design roadmap](roadmap_design.md).

 ## Spring/Summer 2020 Focus Areas

 IREE is able to run many foundational models and more are expected to come
 online this spring. Much of the work has been on infrastructure and getting the
 code in a place to allow for rapid parallel development and now work is ramping
 up on op coverage and completeness. There's still some core work to be done on
 the primary IREE dialects (`flow` and `hal`) prior to beginning the low-hanging
 fruit optimization burn-down, but we're getting close!

 ### Frontend: Enhanced SavedModel/TF2.0 Support

 We are now able to import SavedModels written in the TF2.0 style with resource
 variables and some simple usages of TensorList (`tf.TensorArray`, etc).

 ### Coverage: XLA HLO Ops

 A select few ops - such as ReduceWindow - are not yet implemented and need to be
 both plumbed through the HLO dialect and the IREE lowering process as well as
 implemented in the backends. Work is ongoing to complete the remaining ops such
 that we can focus on higher-level model usage semantics.

 ### Scheduler: Dynamic Shapes

 Progress is underway on dynamic shape support throughout the stack. The tf2xla
 effort is adding shape propagation/inference upstream and we have a decent
 amount of glue mostly ready to accept it.

 ### HAL: Marl CPU Scheduling

 We want to plug in [marl](https://github.com/google/marl) to provide
 [CPU-side work scheduling](roadmap_design.md#gpu-like-cpu-scheduling) that
 matches GPU semantics. This will enable improved CPU utilization and allow us to
 verify the approach with benchmarks.

 ### Codegen: Full Linalg-based Conversion

 A large part of the codegen story for both CPU (via LLVM IR) and GPU (via
 SPIR-V) relies on the upstream
 [Linalg dialect](https://mlir.llvm.org/docs/Dialects/Linalg/) and associated
 lowerings. We are contributing here and have partial end-to-end demonstrations
 of conversion. By the end of summer we should be fully switched over to this
 path and can remove the index-propagation-based SPIR-V lowering approach in
 favor of the more generalized solution.

 ## Beyond

 ### HAL: Dawn Implementation

 To better engage with the WebGPU and WebML efforts we will be implementing a
 [Dawn](https://dawn.googlesource.com/dawn/) backend that uses the same generated
 SPIR-V kernels as the Vulkan backend which enables us to target Metal, Direct3D
 12, and WebGPU. The goal is to get something working in place (even if
 suboptimal) such that we can provide feedback to the various efforts.
	# IREE Roadmap

	## Design

	Though many of the core dialects are now in place enough for correctness testing
	a large majority of the features we are most excited to demonstrate are still
	TODO and will be coming over the next few quarters. You can find a highlighted
	set of coming features in the [design roadmap](roadmap_design.md).

	## Spring/Summer 2020 Focus Areas

	IREE is able to run many foundational models and more are expected to come
	online this spring. Much of the work has been on infrastructure and getting the
	code in a place to allow for rapid parallel development and now work is ramping
	up on op coverage and completeness. There's still some core work to be done on
	the primary IREE dialects (`flow` and `hal`) prior to beginning the low-hanging
	fruit optimization burn-down, but we're getting close!

	### Frontend: Enhanced SavedModel/TF2.0 Support

	We are now able to import SavedModels written in the TF2.0 style with resource
	variables and some simple usages of TensorList (`tf.TensorArray`, etc).

	### Coverage: XLA HLO Ops

	A select few ops - such as ReduceWindow - are not yet implemented and need to be
	both plumbed through the HLO dialect and the IREE lowering process as well as
	implemented in the backends. Work is ongoing to complete the remaining ops such
	that we can focus on higher-level model usage semantics.

	### Scheduler: Dynamic Shapes

	Progress is underway on dynamic shape support throughout the stack. The tf2xla
	effort is adding shape propagation/inference upstream and we have a decent
	amount of glue mostly ready to accept it.

	### HAL: Marl CPU Scheduling

	We want to plug in [marl](https://github.com/google/marl) to provide
	[CPU-side work scheduling](roadmap_design.md#gpu-like-cpu-scheduling) that
	matches GPU semantics. This will enable improved CPU utilization and allow us to
	verify the approach with benchmarks.

	### Codegen: Full Linalg-based Conversion

	A large part of the codegen story for both CPU (via LLVM IR) and GPU (via
	SPIR-V) relies on the upstream
	[Linalg dialect](https://mlir.llvm.org/docs/Dialects/Linalg/) and associated
	lowerings. We are contributing here and have partial end-to-end demonstrations
	of conversion. By the end of summer we should be fully switched over to this
	path and can remove the index-propagation-based SPIR-V lowering approach in
	favor of the more generalized solution.

	## Beyond

	### HAL: Dawn Implementation

	To better engage with the WebGPU and WebML efforts we will be implementing a
	[Dawn](https://dawn.googlesource.com/dawn/) backend that uses the same generated
	SPIR-V kernels as the Vulkan backend which enables us to target Metal, Direct3D
	12, and WebGPU. The goal is to get something working in place (even if
	suboptimal) such that we can provide feedback to the various efforts.