| # Dyanmic Shapes |
| |
| NOTE: Effort is being made to make this facility generic so that it can be |
| eventually upstreamed to MLIR in some fashion. However, because MLIR lacks a set |
| of frontend ops and generally does not currently have any frontend oriented |
| infrastructure, it is being prototyped within IREE in order to find a working |
| set of ops and algorithms. |
| |
| ## Levels of dynamicism |
| |
| In general, there are three levels of shape information that can be present in |
| the input IR (or trivially derived by applying some form of shape inferencing |
| algorithm). Each additional one imposes more work on the compiler and runtime, |
| so generally, the implementation progresses by addressing each once the former |
| is well established: |
| |
| 1. Fully static shapes: No tensors have dynamic dimensions. All tensors are |
| ranked. |
| 2. Ranked Dynamicism: All tensors have ranks, but some dimensions may be |
| unspecified. |
| 3. Unranked Dynamicism: Some tensors have indeterminate ranks. |
| |
| At this stage, *Dynamic Shapes* in IREE refers to supporting dynamic ranked |
| dynamic tensors, where some dimensions are left unspecified at public function |
| boundaries. It is expected that once this is solid, some support can be |
| considered for unranked dynamicism, and it is further expected that will entail |
| new ops, algorithms and runtime support, apart from what is needed for ranked |
| dynamicism. |
| |
| Within the category of Ranked Dynamicism, it is well known that some dynamic |
| dimensions are easier to deal with than others: in common DNN use, outer |
| dimensions are much easier and more common with respect to code generation and |
| kernel fanout than dynamic inner dimensions. |
| |
| While the shape handling machinery is relatively generic, we expect that real |
| backends will be limited with respect to how much they support all combinations |
| of dynamic dimensions. Eventually, IREE intends to solve this by having |
| relatively robust CPU fallback for fully dynamic cases and actionable warnings |
| that pinpoint when more specificity could increase performance. |
| |
| ## Compiler Frontend |
| |
| In general, the IREE compiler frontend should accept modules containing |
| functions with operands/results that have dynamic dimensions. Such functions may |
| also have runtime dependent shapes in the form of `GetShape`-style ops which get |
| a shape from an arbitrary tensor, perform some arithmetic on it and use the |
| results elsewhere. |
| |
| ### Shape dialect and lowering |
| |
| IREE is introducing a `shape` dialect with a handful of ops and transformations |
| that are useful for materializing dynamic shape computations in terms of high |
| level ops on tensors. |
| |
| #### Types: |
| |
| * `ranked_shape`: This value type represents the dynamic dimensions of a |
| partially known, ranked shape. It is used early in the compiler to represent |
| anywhere that dynamic dimensions need to be passed (i.e. function |
| args/results, etc). At lower levels of the compiler, it will generally be |
| dis-aggregated into loose SSA values. This type also carries the datatype |
| used to represent the dimensions. This is currently fixed to i32 but may be |
| leveraged eventually to use smaller integer when such things are known to be |
| legal. |
| |
| #### Ops: |
| |
| * `get_ranked_shape`: Takes a tensor SSA value and returns a corresponding |
| `ranked_shape`. Early in the compilation flow, anything that needs a ranked |
| shape should add such ops so that the compiler can later determine which |
| shape computations to materialize. Getting the `ranked_shape` of a static |
| tensor should yield a constant. |
| * `tie_shape`: Takes tensor and ranked_shape SSA values and returns the |
| tensor. This is used as a junction point by the shape materialization passes |
| to know at various points precisely what the shape is. |
| * ... TODO: need `get_shape_dim` and conversions to/from 1D tensors and loose |
| SSA values. |
| |
| ### Materialization |
| |
| #### Function signature expansion |
| |
| Early in the process, all functions should have their arguments and results |
| expanded so that any dynamic tensors in their signature will gain a new |
| argument/result for the corresponding `ranked_shape`. This is done by expanding |
| the signatures and for arguments, inserting placeholder `tie_shape` ops which |
| preserve the association for later materialization. For results, |
| `get_ranked_shape` ops are inserted. |
| |
| This is carried out by the `iree-shape-expand-function-dynamic-dims` pass, which |
| uses the conversion framework under the hood to perform type expansion. |
| |
| This pass is typically done early in the compiler flow. |
| |
| #### Shape dependent codegen |
| |
| A lot of scheduling logic will need to access shapes (i.e. allocation, workgroup |
| size calculation, etc). In general, this should all be done based on a |
| `get_ranked_shape` op and corresponding `get_shape_dim` ops. For fully static |
| cases, these should reduce down to constants. For dynamic dimensions, the |
| `get_ranked_shape` ops serve as anchors where later parts of the compiler know |
| they need to materialize shape values. |
| |
| #### Materializing shape computations |
| |
| TODO: We have a sketch of this but are still proving it out. |
| |
| Generally, it should be possible, for any `get_ranked_shape` op, to trace up the |
| use-def chain and materialize shape manipulation arithmetic. Once materialized, |
| a `tie_shape` op should be inserted to memorialize the junction. Eventually, |
| every `get_ranked_shape` op should be follow a `tie_shape` op, and the |
| canonicalization rules will elide the `get_ranked_shape`. There is complexity |
| around blocks, control flow, etc, but this basic algorithm should be workable. |
| |
| Work is ongoing upstream to provide a facility to register shape functions with |
| ops, which would provide a dynamic, dialect independent way to know what |
| arithmetic to materialize. However, in most cases this is not necessary. The |
| built-in traits around types and sizes will allow most propagation to happen |
| without shape functions. We intend to start with a static set of cases for the |
| rest in order to prove the concept. |
| |
| #### Scalarization |
| |
| TODO: We have a sketch of this but are still proving it out. |
| |
| It is quite common in real-world DNN usage to get the 1D tensor representing a |
| shape and perform arbitrary tensor ops on it (usually basic arithmetic, slicing, |
| concating, tiling, etc). While this is perfectly acceptable from a correctness |
| standpoint, it is usually not performant: shapes are typically very small one |
| dimensional vectors, and computations on them are usually trivial to reduce to |
| small sequences of scalar machine code of a form that CPUs are very good at |
| executing. Further, we often want to run these calculations eagerly when |
| dispatching functions, etc (i.e. to pre-allocate buffers) and having them |
| isolated (versus treating them as arbitrary dense computations) can be quite |
| valuable. |
| |
| We expect that the type bracketing that happens with `ranked_shape` and the |
| corresponding ops will make it easy to write some simple DRR patterns to |
| identify such shape manipulation sequences and lower them directly to regions of |
| `vm` ops operating on scalars. Such regions can be retained and directly emitted |
| when lowering to the `vm` dialect and/or CPU code generation and would run with |
| low overhead along with any other scheduling code. |
| |
| While an optimization, we suspect this is an important one. |
| |
| ### Shape inference |
| |
| TODO: This is mostly placeholder |
| |
| There is work happening upstream to implement MLIR-integrated shape inference. |
| In the mean-time, IREE expects that the input to the compiler has already had |
| some shape inference performed on it. In practice, for TensorFlow, there is a |
| pass which applies TensorFlow's pre-MLIR shape inference mechanisms to derive |
| such things. This has limitations but is a reasonable starting point. |
| |
| ## Compiler Backends |
| |
| TODO: This is mostly placeholder. |
| |
| Much of the newer structured-ops based codegen is capable of working (within |
| bounds) with ranked dynamic shapes without much work. Given the lack of an e2e |
| story, much of this has been done "by way of code review" and there are |
| certainly issues to be resolved. |
| |
| In addition, there are several ABI issues and negotiations with the backend that |
| still need to be fleshed out. |