docs/roadmap.md - 3p/openxla/iree - Git at Google

 # IREE Roadmap

 ## Winter 2019

 Our goal for the end of the year is to have depth in a few complex examples
 (such as streaming speech recognition) and breadth in platforms. This should
 hopefully allow for contributions both from Googlers and externally to enable
 broader platform support and optimizations as well as prove out some of the core
 IREE concepts.

 ### Frontend: SavedModel/TF2.0

 MLIR work to get SavedModels importing and lowering through the new MLIR-based
 tf2xla bridge. This will give us a clean interface for writing stateful sample
 models for both training and inference. The primary work on the IREE-side is
 adding support for global variables to the sequencer IR and sequencer runtime
 state tracking.

 ### Coverage: XLA HLO Ops

 A majority of XLA HLO ops (what IREE works with) are already lowering to both
 the IREE interpreter and the SPIR-V backend. A select few ops - such as
 ReduceWindow and Convolution - are not yet implemented and need to be both
 plumbed through the HLO dialect and the IREE lowering process as well as
 implemented in the backends.

 ### Sequencer: IR Refactoring

 The current sequencer IR is a placeholder designed to test the HAL backends and
 needs to be reworked to its final (initial) form. This means rewriting the IR
 description files, implementing lowerings, and rewriting the runtime dispatching
 code. This will enable future work on codegen, binary size evaluation,
 performance evaluation, and compiler optimizations around memory aliasing and
 batching.

 ### Sequencer: Dynamic Shapes

 Dynamic shapes requires a decent amount of work on the MLIR-side to flesh out
 the tf2xla bridge such that we can get input IR that has dynamic shapes at all.
 The shape inference dialect also needs to be designed and implemented so that we
 have shape math in a form we can lower. As both of these are in progress we plan
 to mostly design and experiment with how the runtime portions of dynamic shaping
 will function in IREE.

 ### HAL: Dawn Implementation

 To better engage with the WebGPU and WebML efforts we will be implementing a
 [Dawn](https://dawn.googlesource.com/dawn/) backend that uses the same generated
 SPIR-V kernels as the Vulkan backend but enables us to target Metal, Direct3D
 12, and WebGPU. The goal is to get something working in place (even if
 suboptimal) such that we can provide feedback to the various efforts.

 ### HAL: SIMD Dialect and Marl Implementation

 Reusing most of the SPIR-V lowering we can implement a simple SIMD dialect for
 both codegen and JITing. We're likely to start with the
 [WebAssembly SIMD spec](https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md)
 for the dialect (with the goal of being trivially compatible with WASM and to
 avoid bikeshedding). Once we have at least one lowering to executable code
 (either via codegen to JITing) we can use [Marl](https://github.com/google/marl)
 to provide the work scheduling. This should be roughly equivalent to performance
 to Swiftshader however with far less overhead. The ultimate goal is to be able
 to delete the current IREE interpreter.

 ## Spring 2020

 With the foundation laid in winter 2019 we'll be looking to expand support,
 continue optimizations and tuning, and implement the cellular batching
 techniques at the core of the IREE design.
	# IREE Roadmap

	## Winter 2019

	Our goal for the end of the year is to have depth in a few complex examples
	(such as streaming speech recognition) and breadth in platforms. This should
	hopefully allow for contributions both from Googlers and externally to enable
	broader platform support and optimizations as well as prove out some of the core
	IREE concepts.

	### Frontend: SavedModel/TF2.0

	MLIR work to get SavedModels importing and lowering through the new MLIR-based
	tf2xla bridge. This will give us a clean interface for writing stateful sample
	models for both training and inference. The primary work on the IREE-side is
	adding support for global variables to the sequencer IR and sequencer runtime
	state tracking.

	### Coverage: XLA HLO Ops

	A majority of XLA HLO ops (what IREE works with) are already lowering to both
	the IREE interpreter and the SPIR-V backend. A select few ops - such as
	ReduceWindow and Convolution - are not yet implemented and need to be both
	plumbed through the HLO dialect and the IREE lowering process as well as
	implemented in the backends.

	### Sequencer: IR Refactoring

	The current sequencer IR is a placeholder designed to test the HAL backends and
	needs to be reworked to its final (initial) form. This means rewriting the IR
	description files, implementing lowerings, and rewriting the runtime dispatching
	code. This will enable future work on codegen, binary size evaluation,
	performance evaluation, and compiler optimizations around memory aliasing and
	batching.

	### Sequencer: Dynamic Shapes

	Dynamic shapes requires a decent amount of work on the MLIR-side to flesh out
	the tf2xla bridge such that we can get input IR that has dynamic shapes at all.
	The shape inference dialect also needs to be designed and implemented so that we
	have shape math in a form we can lower. As both of these are in progress we plan
	to mostly design and experiment with how the runtime portions of dynamic shaping
	will function in IREE.

	### HAL: Dawn Implementation

	To better engage with the WebGPU and WebML efforts we will be implementing a
	[Dawn](https://dawn.googlesource.com/dawn/) backend that uses the same generated
	SPIR-V kernels as the Vulkan backend but enables us to target Metal, Direct3D
	12, and WebGPU. The goal is to get something working in place (even if
	suboptimal) such that we can provide feedback to the various efforts.

	### HAL: SIMD Dialect and Marl Implementation

	Reusing most of the SPIR-V lowering we can implement a simple SIMD dialect for
	both codegen and JITing. We're likely to start with the
	[WebAssembly SIMD spec](https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md)
	for the dialect (with the goal of being trivially compatible with WASM and to
	avoid bikeshedding). Once we have at least one lowering to executable code
	(either via codegen to JITing) we can use [Marl](https://github.com/google/marl)
	to provide the work scheduling. This should be roughly equivalent to performance
	to Swiftshader however with far less overhead. The ultimate goal is to be able
	to delete the current IREE interpreter.

	## Spring 2020

	With the foundation laid in winter 2019 we'll be looking to expand support,
	continue optimizations and tuning, and implement the cellular batching
	techniques at the core of the IREE design.