| # Custom CPU Dispatch Functions for Dynamically-linked Plugins |
| |
| See the [custom_dispatch README](/samples/custom_dispatch/README.md) for an |
| overview of this approach. This sample is derived from the |
| [custom_dispatch/cpu/embedded/](/samples/custom_dispatch/cpu/embedded/) sample |
| and information about the calling conventions can be found there. |
| |
| This sample demonstrates how to define external device functions that can be |
| dispatched from within IREE programs via simple function calls. Here the |
| functions are declared in the MLIR executables, called as normal calls, and |
| then defined in a .c file that is either compiled for the system platform |
| (ELF, DLL, DyLib, etc) or cross-compiled into a platform-independent embedded |
| ELF. The compiler merely emits the imports and leaves it to the user to specify |
| which plugins to load at runtime such that the imports can be resolved. |
| |
| Note that dynamically-linked plugins are discouraged unless absolutely required. |
| Prefer instead to use compiler embedded imports that allow for hermetic |
| deployable artifacts that don't require re-deploying runtimes and produce the |
| minimal amount of code as only the imports required by the compiled program are |
| pulled in and they can be optimized with LTO to propagate ranges/constants. |
| Dynamically-linked plugins should generally only be used for extreme mechanisms |
| like JITs, though even those are better done as ahead-of-time code generation in |
| the compiler. IREE supports dynamically-linked imports for completeness and they |
| should be used with careful consideration. |
| |
| ## Workflow for System Dynamic Libraries |
| |
| ```text |
| +----------+ +---------------+ +--------------+ |
| | plugin.c | -> | plugin.so/dll |-+ | example.mlir | |
| +----------+ +---------------+ | +--------------+ |
| | v |
| | iree-compile |
| | v |
| | +--------------+ |
| | | example.vmfb | (non-hermetic) |
| | +--------------+ |
| | | |
| +-----+-----+ |
| v |
| +-----------------+ |
| | iree-run-module | |
| +-----------------+ |
| ``` |
| |
| When plugins need to rely on platform-specific functionality (syscalls, TLS, |
| etc) they can be built as normal system libraries of the type that can be loaded |
| with dlopen/LoadLibrary/etc. Users will need to handle deployment themselves and |
| the IREE runtime will load the plugin library using the platform APIs. There are |
| still restrictions with this approach as the imports provided by the plugin will |
| be called from arbitrary threads where syscalls, TLS, and other features are |
| quite complicated to get right. An advantage of system libraries are that most |
| tooling (perf, debuggers, etc) will work with no additional configuration. As |
| such it's recommended that if portable ELF libraries are used for deployment |
| users still preserve a path where they can be compiled as system libraries. |
| |
| 1. The user authors their functions in whatever language they want with whatever |
| system dependencies they want (with caveats/YMMV) and exposes them via the |
| IREE C [executable_plugin.h](/runtime/src/iree/hal/local/executable_plugin.h) |
| API. These functions can cover entire workgroups (and a dispatch can |
| be a single workgroup so effectively just function calls) or be utilities |
| used by the function for localized work (microkernels, data type conversion, |
| etc). It's important to remember that parallelism scheduling is done |
| _outside_ of the function via the workgroup count and multiple threads may be |
| executing the function at any time. |
| |
| In addition to the import function (see |
| [custom_dispatch/cpu/embedded/](/samples/custom_dispatch/cpu/embedded/) |
| for the structure of the imports) the plugin must provide a query function |
| that is used to provide the plugin information to the runtime: |
| |
| ```c |
| IREE_HAL_EXECUTABLE_PLUGIN_EXPORT const iree_hal_executable_plugin_header_t** |
| iree_hal_executable_plugin_query( |
| iree_hal_executable_plugin_version_t max_version, void* reserved) { |
| // Return a plugin header populated with metadata and function pointers. |
| } |
| ``` |
| |
| 2. Source files are compiled to platform dynamic libraries via normal build |
| system goo. Each platform and architecture the user is targeting will need |
| its own libraries. Note that only the header file is required to be included |
| and no IREE runtime libraries need to be linked into the plugin. |
| |
| ```cmake |
| add_library(my_plugin SHARED my_plugin.c) |
| target_include_directories(...) |
| ``` |
| |
| 3. The user (or compiler transforms) adds calls to their functions by declaring |
| them. For each of the two inputs and one output, a `<baseptr, offset>` pair |
| is used to get the position to read from. It is essential for the |
| implementations of these functions to manually perform the |
| `baseptr + offset` before reading the data. The `memref` semantics in MLIR |
| only guarantee that the `baseptr + offset` represents the valid position to |
| read from. Also note that the `offset` here is in number of elements |
| (i.e. number of floats). |
| |
| |
| ```mlir |
| func.call @simple_mul_workgroup( |
| %memref0_baseptr, %memref0_offset, |
| %memref1_baseptr, %memref1_offset, |
| %memref2_baseptr, %memref2_offset, |
| %dim, %tid) |
| : (memref<f32>, index, memref<f32>, index, memref<f32>, index, index, index) -> () |
| ``` |
| |
| 4. The user either programmatically registers the plugins via the plugin manager |
| or when using IREE tools passes them using the `--executable_plugin=` flag. |
| Note that imports are resolved in reverse registration order such that |
| fallbacks can be supported; a reference plugin can be registered first |
| followed by more specialized plugins that may only handle a subset of |
| imports. |
| |
| ```bash |
| iree-run-module \ |
| --device=local-sync \ |
| --executable_plugin=my_plugins.so \ |
| --executable_plugin=other_plugins.so \ |
| --function=mixed_invocation \ |
| --input=8xf32=2 \ |
| --input=8xf32=4 |
| ``` |
| |
| ## Workflow for Embedded ELF Libraries |
| |
| ```text |
| +----------+ +-------------------+ +--------------+ |
| | plugin.c | -+-> | plugin_aarch64.so | -+ | example.mlir | |
| +----------+ | +-------------------+ | +--------------+ |
| | +-------------------+ | v |
| +-> | plugin_x86_64.so | -+ iree-compile |
| +-------------------+ | v |
| +------------+ | +--------------+ |
| | plugin.sos | <--+ | example.vmfb | (non-hermetic) |
| +------------+ +--------------+ |
| | | |
| +----------+------------+ |
| v |
| +-----------------+ |
| | iree-run-module | |
| +-----------------+ |
| ``` |
| |
| The workflow is similar to the system library version except that the plugin |
| code needs to be written in a bare-metal flavor (no TLS, no threads, no malloc, |
| etc). Most kernel libraries not performing JITing can be authored like this and |
| take advantage of the multi-targeting and cross-platform support provided by the |
| plugin loader. A plugin can be compiled for multiple architectures (aarch64, |
| x86_64, etc) and then load and run on all platforms (Windows, MacOS, Linux, |
| and bare-metal). |
| |
| See the sample `CMakeLists.txt` for how the standalone plugins can be compiled |
| using the appropriate clang flags. Other compilers can be used if care is taken |
| to ensure compatible platform-agnostic ELF files. After building each |
| architecture-specific ELF they can be combined into a FatELF using the |
| `iree-fatelf` tool; this single `.sos` file can contain multiple architectures |
| and the required one will be loaded at runtime. |
| |
| ## Instructions |
| |
| This presumes that `iree-compile` and `iree-run-module` have been installed or |
| built. [See here](https://iree.dev/building-from-source/getting-started/) |
| for instructions for CMake setup and building from source. |
| |
| 1. Build the `iree-sample-deps` CMake target to compile |
| [standalone_plugin.c](./standalone_plugin.c) and |
| [system_plugin.c](./system_plugin.c) sources to object files for aarch64 and |
| x86_64 or the current target system: |
| |
| ```bash |
| cmake --build ../iree-build/ --target iree-sample-deps |
| ``` |
| |
| In a user application this would be replaced with whatever build |
| infrastructure the user has for compiling code to object files. No IREE |
| compiler or runtime changes are required and the normal compiler install can |
| be used. Note that specific flags are required when producing the object |
| files. |
| |
| 2. Compile the [example module](./standalone_example.mlir) to a .vmfb file and pass |
| the path to the build directory so the .spv files can be found: |
| |
| ```bash |
| iree-compile \ |
| --iree-hal-target-backends=llvm-cpu \ |
| samples/custom_dispatch/cpu/plugin/standalone_example.mlir \ |
| -o=/tmp/example.vmfb |
| ``` |
| |
| 3. Run the example program using the plugins for either platform-independent |
| embedded ELF files or the system libraries: |
| |
| ```bash |
| iree-run-module \ |
| --device=local-sync \ |
| --executable_plugin=../iree-build/samples/custom_dispatch/cpu/plugin/standalone_plugin.sos \ |
| --function=mixed_invocation \ |
| --input=8xf32=2 \ |
| --input=8xf32=4 \ |
| --module=/tmp/example.vmfb |
| ``` |
| |
| ```bash |
| iree-run-module \ |
| --device=local-sync \ |
| --executable_plugin=../iree-build/samples/custom_dispatch/cpu/plugin/system_plugin.so \ |
| --function=mixed_invocation \ |
| --input=8xf32=2 \ |
| --input=8xf32=4 \ |
| --module=/tmp/example.vmfb |
| ``` |