Fixing roadmap TOC. Closes https://github.com/google/iree/pull/1296 COPYBARA_INTEGRATE_REVIEW=https://github.com/google/iree/pull/1296 from google:benvanik-roadmap-toc d7e8c726117954aacffac43fdeb3af76683999b0 PiperOrigin-RevId: 303392309
diff --git a/docs/roadmap_design.md b/docs/roadmap_design.md index 6987a01..3063585 100644 --- a/docs/roadmap_design.md +++ b/docs/roadmap_design.md
@@ -1,6 +1,8 @@ # IREE Design Roadmap -<a id="markdown-iree-design-roadmap" name="iree-design-roadmap"></a> +<a id="markdown-IREE%20Design%20Roadmap" name="IREE%20Design%20Roadmap"></a> + +<!-- WARNING: DO NOT EDIT THIS FILE IN AN EDITOR WITH AUTO FORMATTING --> A not-so-concise walkthrough of various IREE features that are in the design process and planned for future versions. A lot of the questions around how the @@ -51,11 +53,11 @@ ## Input Dialects -<a id="markdown-input-dialects" name="input-dialects"></a> +<a id="markdown-Input%20Dialects" name="Input%20Dialects"></a> ### Future MLIR XLA HLO Replacement -<a id="markdown-future-mlir-xla-hlo-replacement" name="future-mlir-xla-hlo-replacement"></a> +<a id="markdown-Future%20MLIR%20XLA%20HLO%20Replacement" name="Future%20MLIR%20XLA%20HLO%20Replacement"></a> IREE's current input dialect is the XLA HLO dialect representing operations on tensors. This was a pragmatic decision based on having HLO already defined and @@ -69,7 +71,7 @@ ### `linalg`: High-level Hierarchical Optimization -<a id="markdown-linalg-high-level-hierarchical-optimization" name="linalg-high-level-hierarchical-optimization"></a> +<a id="markdown-%60linalg%60%3A%20High-level%20Hierarchical%20Optimization" name="%60linalg%60%3A%20High-level%20Hierarchical%20Optimization"></a> It's required that IREE inputs are all in tensor form (and not in-place memref updates) in order to perform a large majority of the `flow` transformations. @@ -84,7 +86,7 @@ ### XLA HLO: Canonicalizations -<a id="markdown-xla-hlo-canonicalizations" name="xla-hlo-canonicalizations"></a> +<a id="markdown-XLA%20HLO%3A%20Canonicalizations" name="XLA%20HLO%3A%20Canonicalizations"></a> Very little effort has been applied to `xla_hlo` optimizations and there are a significant number of missing folders, canonicalizers, and simple @@ -106,7 +108,7 @@ ### XLA HLO: Tensor to Primitive Conversion -<a id="markdown-xla-hlo-tensor-to-primitive-conversion" name="xla-hlo-tensor-to-primitive-conversion"></a> +<a id="markdown-XLA%20HLO%3A%20Tensor%20to%20Primitive%20Conversion" name="XLA%20HLO%3A%20Tensor%20to%20Primitive%20Conversion"></a> HLO only operates on tensor values - even for simple scalars - and this presents a problem when attempting to determine which code should be specified to run on @@ -171,7 +173,7 @@ ### Quantization -<a id="markdown-quantization" name="quantization"></a> +<a id="markdown-Quantization" name="Quantization"></a> It's assumed that any work related to quantization/compression has happened prior to lowering into IREE dialects. Our plan is to use the proposed @@ -189,7 +191,7 @@ ## `flow`: Data- and Execution-Flow Modeling -<a id="markdown-flow-data--and-execution-flow-modeling" name="flow-data--and-execution-flow-modeling"></a> +<a id="markdown-%60flow%60%3A%20Data-%20and%20Execution-Flow%20Modeling" name="%60flow%60%3A%20Data-%20and%20Execution-Flow%20Modeling"></a> The `flow` dialect is designed to allow us to extract as much concurrency as possible from a program and partition IR into the scheduling and execution @@ -202,7 +204,7 @@ ### Avoiding Readbacks with `flow.stream` -<a id="markdown-avoiding-readbacks-with-flowstream" name="avoiding-readbacks-with-flowstream"></a> +<a id="markdown-Avoiding%20Readbacks%20with%20%60flow.stream%60" name="Avoiding%20Readbacks%20with%20%60flow.stream%60"></a> A majority of the readbacks we have today (manifested as `flow.tensor.load.*` ops) will be removed when we have an @@ -241,7 +243,7 @@ ### Threading `flow.stream` through the CFG -<a id="markdown-threading-flowstream-through-the-cfg" name="threading-flowstream-through-the-cfg"></a> +<a id="markdown-Threading%20%60flow.stream%60%20through%20the%20CFG" name="Threading%20%60flow.stream%60%20through%20the%20CFG"></a> The current `flow.ex.stream.fragment`, as denoted by the `ex`perimental tag, is a temporary implementation designed to get the concept of streams lowered to the @@ -288,7 +290,7 @@ ### Predication of `flow.dispatch` -<a id="markdown-predication-of-flowdispatch" name="predication-of-flowdispatch"></a> +<a id="markdown-Predication%20of%20%60flow.dispatch%60" name="Predication%20of%20%60flow.dispatch%60"></a> While the [`flow.stream` threading through the CFG](#threading-flowstream-through-the-cfg) @@ -322,7 +324,7 @@ ### Deduping `flow.executable`s -<a id="markdown-deduping-flowexecutables" name="deduping-flowexecutables"></a> +<a id="markdown-Deduping%20%60flow.executable%60s" name="Deduping%20%60flow.executable%60s"></a> While still in the `flow` dialect, the executables are target-agnostic. This makes simple IR tree diffing a potential solution to deduplication. Since most @@ -335,7 +337,7 @@ ### Rematerializing CSE'd Expressions -<a id="markdown-rematerializing-csed-expressions" name="rematerializing-csed-expressions"></a> +<a id="markdown-Rematerializing%20CSE'd%20Expressions" name="Rematerializing%20CSE'd%20Expressions"></a> Common subexpression elimination is performed many times during lowering, however there comes a point where the CSE can introduce false dependencies and @@ -379,7 +381,7 @@ ### Device Placement -<a id="markdown-device-placement" name="device-placement"></a> +<a id="markdown-Device%20Placement" name="Device%20Placement"></a> While still within the `flow` dialect we have the ability to easily split streams and safely shuffle around operations. Target execution backends can opt @@ -393,7 +395,7 @@ ## `hal`: Hardware Abstraction Layer and Multi-Architecture Executables -<a id="markdown-hal-hardware-abstraction-layer-and-multi-architecture-executables" name="hal-hardware-abstraction-layer-and-multi-architecture-executables"></a> +<a id="markdown-%60hal%60%3A%20Hardware%20Abstraction%20Layer%20and%20Multi-Architecture%20Executables" name="%60hal%60%3A%20Hardware%20Abstraction%20Layer%20and%20Multi-Architecture%20Executables"></a> As the IREE HAL is designed almost 1:1 with a compute-only Vulkan API many of the techniques classically used in real-time graphics apply. The benefit we have @@ -404,7 +406,7 @@ ### Allow Targets to Specify `hal.interface`s -<a id="markdown-allow-targets-to-specify-halinterfaces" name="allow-targets-to-specify-halinterfaces"></a> +<a id="markdown-Allow%20Targets%20to%20Specify%20%60hal.interface%60s" name="Allow%20Targets%20to%20Specify%20%60hal.interface%60s"></a> The `hal.interface` op specifies the ABI between the scheduler and the device containing the buffer bindings and additional non-buffer data (parameters, @@ -426,7 +428,7 @@ ### Target-specific Scheduling Specialization -<a id="markdown-target-specific-scheduling-specialization" name="target-specific-scheduling-specialization"></a> +<a id="markdown-Target-specific%20Scheduling%20Specialization" name="Target-specific%20Scheduling%20Specialization"></a> Though the `flow` dialect attempts to fuse as many ops as possible into dispatch regions, it's not always possible for all target backends to schedule a region @@ -457,7 +459,7 @@ ### Buffer Usage Tracking -<a id="markdown-buffer-usage-tracking" name="buffer-usage-tracking"></a> +<a id="markdown-Buffer%20Usage%20Tracking" name="Buffer%20Usage%20Tracking"></a> Many explicit hardware APIs require knowing how buffers are used alongside with where they should be located. For example this additional information determines @@ -489,7 +491,7 @@ ### Batched Executable Caching and Precompilation -<a id="markdown-batched-executable-caching-and-precompilation" name="batched-executable-caching-and-precompilation"></a> +<a id="markdown-Batched%20Executable%20Caching%20and%20Precompilation" name="Batched%20Executable%20Caching%20and%20Precompilation"></a> For targets that may require runtime preprocessing of their executables prior to dispatch, such as SPIR-V or MSL, the IREE HAL provides a caching and batch @@ -523,7 +525,7 @@ ### Target-aware Executable Compression -<a id="markdown-target-aware-executable-compression" name="target-aware-executable-compression"></a> +<a id="markdown-Target-aware%20Executable%20Compression" name="Target-aware%20Executable%20Compression"></a> An advantage of representing executable binaries in IR after translation is that we can apply various post-compilation compression and minification techniques @@ -548,7 +550,7 @@ ### Target-aware Constant Compression -<a id="markdown-target-aware-constant-compression" name="target-aware-constant-compression"></a> +<a id="markdown-Target-aware%20Constant%20Compression" name="Target-aware%20Constant%20Compression"></a> It's still an area that needs more research but one goal of the IREE design was to enable efficient target- and context-aware compression of large constants @@ -564,7 +566,7 @@ ### Command Buffer Stateful Deduplication -<a id="markdown-command-buffer-stateful-deduplication" name="command-buffer-stateful-deduplication"></a> +<a id="markdown-Command%20Buffer%20Stateful%20Deduplication" name="Command%20Buffer%20Stateful%20Deduplication"></a> The IREE HAL - much like Vulkan it is based on - eschews much of the state that traditional APIs have in favor of (mostly) immutable state objects (pipeline @@ -580,7 +582,7 @@ ### Resource Timeline -<a id="markdown-resource-timeline" name="resource-timeline"></a> +<a id="markdown-Resource%20Timeline" name="Resource%20Timeline"></a> A core concept of the IREE scheduler that allows for overlapping in-flight invocations is that of the resource timeline. This identifies module state that @@ -617,7 +619,7 @@ ### Transient Tensor Ringbuffer -<a id="markdown-transient-tensor-ringbuffer" name="transient-tensor-ringbuffer"></a> +<a id="markdown-Transient%20Tensor%20Ringbuffer" name="Transient%20Tensor%20Ringbuffer"></a> (When properly implemented) almost all buffers required during execution never escape the command buffers they are used in or a single VM invocation. We can @@ -660,7 +662,7 @@ ### Timeline Semaphores on the Module ABI -<a id="markdown-timeline-semaphores-on-the-module-abi" name="timeline-semaphores-on-the-module-abi"></a> +<a id="markdown-Timeline%20Semaphores%20on%20the%20Module%20ABI" name="Timeline%20Semaphores%20on%20the%20Module%20ABI"></a> Functions calls made across modules (either from C++ into the VM, VM->VM, or VM->C++) should be able to define timeline semaphores used to wait and signal on @@ -681,7 +683,7 @@ ### GPU-like CPU Scheduling -<a id="markdown-gpu-like-cpu-scheduling" name="gpu-like-cpu-scheduling"></a> +<a id="markdown-GPU-like%20CPU%20Scheduling" name="GPU-like%20CPU%20Scheduling"></a> One approach to using multiple cores on a CPU is to perform interior parallelization of operations such as OpenMP or library-call-based custom thread @@ -727,7 +729,7 @@ ## `vm`: Lightweight Virtual Machine -<a id="markdown-vm-lightweight-virtual-machine" name="vm-lightweight-virtual-machine"></a> +<a id="markdown-%60vm%60%3A%20Lightweight%20Virtual%20Machine" name="%60vm%60%3A%20Lightweight%20Virtual%20Machine"></a> The VM is designed as a dynamic linkage ABI, stable bytecode representation, and intermediate lowering IR. Many of the optimizations we can perform on it will @@ -737,7 +739,7 @@ ### Coroutines for Batching and Cooperative Scheduling -<a id="markdown-coroutines-for-batching-and-cooperative-scheduling" name="coroutines-for-batching-and-cooperative-scheduling"></a> +<a id="markdown-Coroutines%20for%20Batching%20and%20Cooperative%20Scheduling" name="Coroutines%20for%20Batching%20and%20Cooperative%20Scheduling"></a> One of the largest features currently missing from the VM is coroutines (aka user-mode fiber scheduling). Coroutines are what will allow us to have multiple @@ -799,7 +801,7 @@ #### Cellular Batching -<a id="markdown-cellular-batching" name="cellular-batching"></a> +<a id="markdown-Cellular%20Batching" name="Cellular%20Batching"></a> Though coroutines help throughput there is a way we've found to reduce latency that's been documented as @@ -850,7 +852,7 @@ ### Lowering to LLVM IR -<a id="markdown-lowering-to-llvm-ir" name="lowering-to-llvm-ir"></a> +<a id="markdown-Lowering%20to%20LLVM%20IR" name="Lowering%20to%20LLVM%20IR"></a> For scenarios where dynamic module loading is not required and entire modules can be compiled into applications we can lower the VM IR to LLVM IR within @@ -874,7 +876,7 @@ ### Improved Type Support -<a id="markdown-improved-type-support" name="improved-type-support"></a> +<a id="markdown-Improved%20Type%20Support" name="Improved%20Type%20Support"></a> Currently the VM only supports two types: `i32` and `vm.ref<T>`. This is an intentional limitation such that we can determine what is really needed to @@ -892,7 +894,7 @@ ### Indirect Command Buffer/On-Accelerator Execution -<a id="markdown-indirect-command-bufferon-accelerator-execution" name="indirect-command-bufferon-accelerator-execution"></a> +<a id="markdown-Indirect%20Command%20Buffer%2FOn-Accelerator%20Execution" name="Indirect%20Command%20Buffer%2FOn-Accelerator%20Execution"></a> Though IREE will use many different tricks such as [predication](#predication-of-flowdispatch) to build deep pipelines there is