Stella Laurenzo | aab85b9 | 2019-12-12 20:28:57 -0800 | [diff] [blame] | 1 | # Function signatures |
| 2 | |
| 3 | A key job of the IREE compiler and runtime is capturing function call semantics |
| 4 | from the originating system and providing mechanisms so that invocations can be |
| 5 | performed in as similar way as possible in various target languages. In general, |
| 6 | this requires additional metadata on top of the raw characteristics of a |
| 7 | function. Where possible, this is done by attaching attributes to a function. |
| 8 | |
brett koonce | 4f5a1b1 | 2019-12-26 14:58:39 -0800 | [diff] [blame] | 9 | * `abi` : string indicating the abi/calling convention in use |
Stella Laurenzo | aab85b9 | 2019-12-12 20:28:57 -0800 | [diff] [blame] | 10 | * `abiv` : numeric version of the `abi` |
| 11 | |
| 12 | Each abi can require additional attributes as needed. |
| 13 | |
| 14 | ## Generic Signature Mangling |
| 15 | |
| 16 | Where possible, ABI metadata is encoded into a plain-text signature in a way |
| 17 | that is easily transported across component boundaries and can be efficiently |
| 18 | implemented without additional dependencies (i.e. just string manipulation). |
| 19 | |
| 20 | The suggested format is manipulated via the C++ reference implementations |
| 21 | `SignatureBuilder` and `SignatureParser` classes (see |
| 22 | `iree/base/signature_mangle.h`). See documentation and code for those classes |
| 23 | for more details. |
| 24 | |
| 25 | ## ABIs |
| 26 | |
Stella Laurenzo | e0746b1 | 2019-12-13 15:09:20 -0800 | [diff] [blame] | 27 | ### Raw Function ABI |
| 28 | |
| 29 | All exported functions implement the raw function ABI, which defines the |
| 30 | metadata and calling convention for marshalling inputs and results to their |
| 31 | underlying implementations. |
| 32 | |
| 33 | *Attributes:* |
| 34 | |
| 35 | * `fv` = 1 (current version of the raw function ABI) |
| 36 | * `f` = encoded raw function signature (see below) |
| 37 | * `fbr` = result buffer allocation function name (optional) |
| 38 | |
| 39 | The reflection metadata documented here augments the underlying type system such |
| 40 | that host language bindings can interop as needed. This additional metadata is |
| 41 | needed in most dynamic cases because the compiled assets operate on fundamental |
| 42 | types with most characteristics type erased away (think: `void*` level things vs |
| 43 | high-level `ShapedBuffer` level things). |
| 44 | |
| 45 | #### Grammar |
| 46 | |
| 47 | The signature is implemented in terms of the SignatureBuilder, using tagged |
| 48 | Integer and Spans. |
| 49 | |
| 50 | ```text |
| 51 | signature ::= 'I' length-prefixed(type-sequence) |
| 52 | 'R' length-prefixed(type-sequence) |
| 53 | |
| 54 | type-sequence ::= (arg-result-type)* |
Stella Laurenzo | e34013c | 2019-12-19 18:11:22 -0800 | [diff] [blame] | 55 | arg-result-type ::= buffer-type | ref-object-type | unrecognized-type |
Stella Laurenzo | e0746b1 | 2019-12-13 15:09:20 -0800 | [diff] [blame] | 56 | buffer-type ::= 'B' length-prefixed(scalar-type? dim*) |
| 57 | scalar-type ::= 't' ( |
| 58 | '0' # IEEE float32 (default if not specified) |
| 59 | | '1' # IEEE float16 |
| 60 | | '2' # IEEE float64 |
| 61 | | '3' # Google bfloat16 |
| 62 | | '4' # Signed int8 |
| 63 | | '5' # Signed int16 |
| 64 | | '6' # Signed int32 |
| 65 | | '7' # Signed int64 |
| 66 | | '8' # Unsigned int8 |
| 67 | | '9' # Unsigned int16 |
| 68 | | '10' # Unsigned int32 |
| 69 | | '11' # Unsigned int64 |
| 70 | ) |
| 71 | dim :: = 'd' integer # -1 indicates a dynamic dim |
| 72 | ref-object-type ::= 'O' length-prefixed() # Details TBD |
Stella Laurenzo | e34013c | 2019-12-19 18:11:22 -0800 | [diff] [blame] | 73 | unrecognized-type ::= 'U' length-prefixed() |
Stella Laurenzo | e0746b1 | 2019-12-13 15:09:20 -0800 | [diff] [blame] | 74 | |
| 75 | # Lexical primitives |
| 76 | integer ::= -?[0-9]+ |
| 77 | length ::= [0-9]+ |
| 78 | # The `length` encodes the length in bytes of `production`, plus 1 for the '!'. |
| 79 | length-prefixed(production) ::= length '!' production |
| 80 | any-byte-sequence ::= <any byte sequence> |
| 81 | ``` |
| 82 | |
| 83 | #### Interpretation and Rationale |
| 84 | |
| 85 | ##### Memory layout |
| 86 | |
| 87 | The astute reader will note that the above metadata is insufficient to determine |
| 88 | the memory layout of a buffer. The reason is that any more specific details than |
| 89 | this (contiguity, strides, alignment, etc) can actually only be known once the |
| 90 | actual compute devices have been enumerated and the resulting matrix of |
| 91 | conversions is more dynamic than can be expressed in something as static as a |
| 92 | function signature. The above formulation is an input to an additional runtime |
| 93 | oracle which produces appropriate full buffer descriptions. |
| 94 | |
| 95 | While the exact implementation is host-language specific, consider the following |
| 96 | more detailed set of declarations that may exist in such a binding layer: |
| 97 | |
| 98 | ```c++ |
| 99 | // Inspired heavily by the Py_buffer type. |
| 100 | // See: https://docs.python.org/3/c-api/buffer.html |
| 101 | struct BufferDescription { |
| 102 | ScalarType element_type; |
| 103 | // For contiguous arrays, this is is the length of the underlying memory. |
| 104 | // For non-contiguous, this is the size of the buffer if it were copied |
| 105 | // to a contiguous representation. |
| 106 | size_t len; |
| 107 | // Number of dims and strides. |
| 108 | size_t ndim; |
| 109 | int* shape; |
| 110 | int* strides; |
| 111 | }; |
| 112 | |
| 113 | // Mirrors the 'buffer-type' production in the above grammar. |
| 114 | struct SignatureBufferType; |
| 115 | |
| 116 | // Oracle which combines signature metadata with a user-provided, materialized |
| 117 | // BufferDescription to derive a BufferDescription that is compatible for |
| 118 | // invocation. Returns an updated buffer description if the original is |
| 119 | // not compatible or fully specified. |
| 120 | // This can be used in a couple of ways: |
| 121 | // a) On function invocation to determine whether a provided buffer can be |
| 122 | // used as-is or needs to be converted (copied). |
| 123 | // b) To provide a factory function to the host language to create a |
| 124 | // compatible buffer. |
| 125 | optional<BufferDescription> BufferDescriptionOracle( |
| 126 | DeviceContext*, SignatureBufferType, BufferDescription) |
| 127 | throws UnsupportedBufferException; |
| 128 | ``` |
| 129 | |
| 130 | The above scheme should allow host-language and device coordination with respect |
| 131 | to buffer layout. For the moment, the responsibility to convert the buffer to a |
| 132 | compatible memory layout is on the host-language binding. However, often it is |
| 133 | the most efficient to schedule this for execution on a device. In the future, it |
| 134 | is anticipated that there will be a built-in pathway for scheduling such a |
brett koonce | 4f5a1b1 | 2019-12-26 14:58:39 -0800 | [diff] [blame] | 135 | conversion (which would allow pipelining and offload of buffer conversions). |
Stella Laurenzo | e0746b1 | 2019-12-13 15:09:20 -0800 | [diff] [blame] | 136 | |
| 137 | ##### Deferred result allocation |
| 138 | |
| 139 | In general, exported functions accept pre-allocated results that should be |
| 140 | mutated. For the simplest cases, such results can be `null` and retrieved upon |
| 141 | completion of the function. This, however, puts severe limitations on the |
| 142 | ability to pipeline. For fully specified signatures (no dynamic shapes), the |
| 143 | `BufferDescriptionOracle` and the signature is sufficient to pre-allocate |
| 144 | appropriate results, which allows chains of result-producing invocations to be |
| 145 | pipelined. |
| 146 | |
| 147 | If, however, a `buffer-type` is not fully specified, the compiler may emit a |
| 148 | special *result allocator* function, which will be referenced in the `fbr` |
| 149 | attribute. Such a function would have a signature like this: |
| 150 | |
| 151 | ```c++ |
| 152 | tuple<buffer> __allocate_results(tuple<int> dynamic_dims); |
| 153 | ``` |
| 154 | |
| 155 | Such a function takes a tuple of all dynamic buffer dims in the function input |
| 156 | signature and returns a tuple of allocated buffers for each dynamic result. Note |
| 157 | that it may not be possible to fully allocate results in this fashion (i.e. if |
| 158 | the result layout is data dependent), in which case a null buffer is returned |
| 159 | for that slot (and the host library would need to await on the invocation to get |
| 160 | the fully populated result). |
| 161 | |
| 162 | A similar mechanism will need to be created at some future point for |
| 163 | under-specified results of other (non-buffer) types. |
| 164 | |
| 165 | ##### Contiguity hinting |
| 166 | |
| 167 | Commonly in some kinds of dataflows, the compiler needs to be free to internally |
| 168 | toggle buffer continuity (i.e. C/row-major, Fortran/col-major, etc). In many |
| 169 | cases, such toggling does not naturally escape through the exported function |
| 170 | boundaries, in which case, there is no ABI impact. However, it is anticipated |
| 171 | that there is benefit to letting the toggle propagate through the exported ABI |
| 172 | boundary, in which case, the `buffer-type` will likely be extended with a |
| 173 | contiguity hint indicating the preference. When combined with the buffer |
| 174 | description oracle and in-pipeline conversion features described above, this |
| 175 | could yield a powerful mechanism for dynamically and efficiently managing such |
| 176 | transitions. |
| 177 | |
| 178 | Such an enhancement would almost certainly necessitate a major version bump in |
| 179 | the ABI and would be logical to implement once the advanced features above are |
| 180 | functional. |
| 181 | |
Stella Laurenzo | aab85b9 | 2019-12-12 20:28:57 -0800 | [diff] [blame] | 182 | ### Structured Index Path ABI |
| 183 | |
Stella Laurenzo | e0746b1 | 2019-12-13 15:09:20 -0800 | [diff] [blame] | 184 | Functions may support the SIP ABI if their input and result tuples logically map |
| 185 | onto "structures" (nested sequence/dicts). |
| 186 | |
| 187 | *Attributes:* |
| 188 | |
| 189 | * `sipv` = 1 (current SIP ABI version) |
| 190 | * `sip` = encoded SIP signature (see below) |
Stella Laurenzo | aab85b9 | 2019-12-12 20:28:57 -0800 | [diff] [blame] | 191 | |
| 192 | This ABI maps a raw, linear sequence of inputs and results onto an input and |
| 193 | result "structure" -- which in this context refers to a nested assembly of |
| 194 | sequences (with integer keys) and dictionaries (with string keys). Such a |
| 195 | facility is useful for encoding input/result mappings in a way that is common in |
| 196 | dynamic languages (such as Python). |
| 197 | |
| 198 | In practice, this ABI supports the calling convention for TensorFlow, which |
| 199 | allows functions that accept and produce nestings via the |
| 200 | [`tf.nest`](https://www.tensorflow.org/api_docs/python/tf/nest) facility. In |
| 201 | implementing it, however, care has been taken to allow the calling convention to |
| 202 | generalize to other similar cases. |
| 203 | |
| 204 | #### Grammar |
| 205 | |
| 206 | The signature is implemented in terms of the SignatureBuilder, using tagged |
| 207 | Integer and Spans. |
| 208 | |
| 209 | ```text |
| 210 | # Defines the structured value for the inputs ('I') and results ('R') |
| 211 | # of the function. |
| 212 | signature ::= 'I' length-prefixed(structured-value) |
| 213 | 'R' length-prefixed(structured-value) |
| 214 | |
| 215 | structured-value ::= raw-fn-index | sequence | dict |
| 216 | raw-fn-index ::= '_' integer |
| 217 | sequence ::= 'S' length-prefixed( (integer-key structured-value)* ) |
| 218 | integer-key ::= 'k' integer |
| 219 | dict ::= 'D' length-prefixed( (string-key structured-value)* ) |
| 220 | string-key ::= 'K' length-prefixed( any-byte-sequence ) |
| 221 | |
| 222 | # Low-level lexical primitives: |
| 223 | integer ::= -?[0-9]+ |
| 224 | length ::= [0-9]+ |
| 225 | # The `length` encodes the length in bytes of `production`, plus 1 for the '!'. |
| 226 | length-prefixed(production) ::= length '!' production |
| 227 | any-byte-sequence ::= <any byte sequence> |
| 228 | ``` |
| 229 | |
| 230 | Structured values define a tree of recursive dicts/lists, with `raw-fn-index` at |
| 231 | the leaves. The interpretation is that a raw-fn-index that has been reached by |
| 232 | traversing N expansions of the structured-value production is assigned an "index |
| 233 | path" which is a list of the N keys that were traversed to reach it. For |
| 234 | example, for N=0, the index path is empty. For N=1, and if an integer-key with |
| 235 | numerical value 0 was traversed to reach the raw-fn-index, then the index path |
| 236 | is [0]. |
| 237 | |
| 238 | .... give a few examples more, writing out various nested dicts/lists in |
| 239 | Python-esque notation to clarify this concept .... |
| 240 | |
| 241 | See the `SipSignatureParser::ToStringVisitor` for a canonical example of how to |
| 242 | interpret the signature. |
| 243 | |
| 244 | #### Implementations |
| 245 | |
| 246 | * C++ |
| 247 | * `SipSignatureMangler`: Produces a function signature given individual |
| 248 | input and result assignment of physical indices to nested index paths in |
| 249 | the structure tree. |
| 250 | * `SipSignatureParser`: Parses signatures and dispatches calls to a |
| 251 | visitor. |