blob: df385d20d980616adcca96f094e5c914fbe281eb [file] [log] [blame] [view]
# Custom CPU Dispatch Functions for Dynamically-linked Plugins
See the [custom_dispatch README](/samples/custom_dispatch/README.md) for an
overview of this approach. This sample is derived from the
[custom_dispatch/cpu/embedded/](/samples/custom_dispatch/cpu/embedded/) sample
and information about the calling conventions can be found there.
This sample demonstrates how to define external device functions that can be
dispatched from within IREE programs via simple function calls. Here the
functions are declared in the MLIR executables, called as normal calls, and
then defined in a .c file that is either compiled for the system platform
(ELF, DLL, DyLib, etc) or cross-compiled into a platform-independent embedded
ELF. The compiler merely emits the imports and leaves it to the user to specify
which plugins to load at runtime such that the imports can be resolved.
Note that dynamically-linked plugins are discouraged unless absolutely required.
Prefer instead to use compiler embedded imports that allow for hermetic
deployable artifacts that don't require re-deploying runtimes and produce the
minimal amount of code as only the imports required by the compiled program are
pulled in and they can be optimized with LTO to propagate ranges/constants.
Dynamically-linked plugins should generally only be used for extreme mechanisms
like JITs, though even those are better done as ahead-of-time code generation in
the compiler. IREE supports dynamically-linked imports for completeness and they
should be used with careful consideration.
## Workflow for System Dynamic Libraries
```text
+----------+ +---------------+ +--------------+
| plugin.c | -> | plugin.so/dll |-+ | example.mlir |
+----------+ +---------------+ | +--------------+
| v
| iree-compile
| v
| +--------------+
| | example.vmfb | (non-hermetic)
| +--------------+
| |
+-----+-----+
v
+-----------------+
| iree-run-module |
+-----------------+
```
When plugins need to rely on platform-specific functionality (syscalls, TLS,
etc) they can be built as normal system libraries of the type that can be loaded
with dlopen/LoadLibrary/etc. Users will need to handle deployment themselves and
the IREE runtime will load the plugin library using the platform APIs. There are
still restrictions with this approach as the imports provided by the plugin will
be called from arbitrary threads where syscalls, TLS, and other features are
quite complicated to get right. An advantage of system libraries are that most
tooling (perf, debuggers, etc) will work with no additional configuration. As
such it's recommended that if portable ELF libraries are used for deployment
users still preserve a path where they can be compiled as system libraries.
1. The user authors their functions in whatever language they want with whatever
system dependencies they want (with caveats/YMMV) and exposes them via the
IREE C [executable_plugin.h](/runtime/src/iree/hal/local/executable_plugin.h)
API. These functions can cover entire workgroups (and a dispatch can
be a single workgroup so effectively just function calls) or be utilities
used by the function for localized work (microkernels, data type conversion,
etc). It's important to remember that parallelism scheduling is done
_outside_ of the function via the workgroup count and multiple threads may be
executing the function at any time.
In addition to the import function (see
[custom_dispatch/cpu/embedded/](/samples/custom_dispatch/cpu/embedded/)
for the structure of the imports) the plugin must provide a query function
that is used to provide the plugin information to the runtime:
```c
IREE_HAL_EXECUTABLE_PLUGIN_EXPORT const iree_hal_executable_plugin_header_t**
iree_hal_executable_plugin_query(
iree_hal_executable_plugin_version_t max_version, void* reserved) {
// Return a plugin header populated with metadata and function pointers.
}
```
2. Source files are compiled to platform dynamic libraries via normal build
system goo. Each platform and architecture the user is targeting will need
its own libraries. Note that only the header file is required to be included
and no IREE runtime libraries need to be linked into the plugin.
```cmake
add_library(my_plugin SHARED my_plugin.c)
target_include_directories(...)
```
3. The user (or compiler transforms) adds calls to their functions by declaring
them. For each of the two inputs and one output, a `<baseptr, offset>` pair
is used to get the position to read from. It is essential for the
implementations of these functions to manually perform the
`baseptr + offset` before reading the data. The `memref` semantics in MLIR
only guarantee that the `baseptr + offset` represents the valid position to
read from. Also note that the `offset` here is in number of elements
(i.e. number of floats).
```mlir
func.call @simple_mul_workgroup(
%memref0_baseptr, %memref0_offset,
%memref1_baseptr, %memref1_offset,
%memref2_baseptr, %memref2_offset,
%dim, %tid)
: (memref<f32>, index, memref<f32>, index, memref<f32>, index, index, index) -> ()
```
4. The user either programmatically registers the plugins via the plugin manager
or when using IREE tools passes them using the `--executable_plugin=` flag.
Note that imports are resolved in reverse registration order such that
fallbacks can be supported; a reference plugin can be registered first
followed by more specialized plugins that may only handle a subset of
imports.
```bash
iree-run-module \
--device=local-sync \
--executable_plugin=my_plugins.so \
--executable_plugin=other_plugins.so \
--function=mixed_invocation \
--input=8xf32=2 \
--input=8xf32=4
```
## Workflow for Embedded ELF Libraries
```text
+----------+ +-------------------+ +--------------+
| plugin.c | -+-> | plugin_aarch64.so | -+ | example.mlir |
+----------+ | +-------------------+ | +--------------+
| +-------------------+ | v
+-> | plugin_x86_64.so | -+ iree-compile
+-------------------+ | v
+------------+ | +--------------+
| plugin.sos | <--+ | example.vmfb | (non-hermetic)
+------------+ +--------------+
| |
+----------+------------+
v
+-----------------+
| iree-run-module |
+-----------------+
```
The workflow is similar to the system library version except that the plugin
code needs to be written in a bare-metal flavor (no TLS, no threads, no malloc,
etc). Most kernel libraries not performing JITing can be authored like this and
take advantage of the multi-targeting and cross-platform support provided by the
plugin loader. A plugin can be compiled for multiple architectures (aarch64,
x86_64, etc) and then load and run on all platforms (Windows, MacOS, Linux,
and bare-metal).
See the sample `CMakeLists.txt` for how the standalone plugins can be compiled
using the appropriate clang flags. Other compilers can be used if care is taken
to ensure compatible platform-agnostic ELF files. After building each
architecture-specific ELF they can be combined into a FatELF using the
`iree-fatelf` tool; this single `.sos` file can contain multiple architectures
and the required one will be loaded at runtime.
## Instructions
This presumes that `iree-compile` and `iree-run-module` have been installed or
built. [See here](https://iree.dev/building-from-source/getting-started/)
for instructions for CMake setup and building from source.
1. Build the `iree-sample-deps` CMake target to compile
[standalone_plugin.c](./standalone_plugin.c) and
[system_plugin.c](./system_plugin.c) sources to object files for aarch64 and
x86_64 or the current target system:
```bash
cmake --build ../iree-build/ --target iree-sample-deps
```
In a user application this would be replaced with whatever build
infrastructure the user has for compiling code to object files. No IREE
compiler or runtime changes are required and the normal compiler install can
be used. Note that specific flags are required when producing the object
files.
2. Compile the [example module](./standalone_example.mlir) to a .vmfb file and pass
the path to the build directory so the .spv files can be found:
```bash
iree-compile \
--iree-hal-target-backends=llvm-cpu \
samples/custom_dispatch/cpu/plugin/standalone_example.mlir \
-o=/tmp/example.vmfb
```
3. Run the example program using the plugins for either platform-independent
embedded ELF files or the system libraries:
```bash
iree-run-module \
--device=local-sync \
--executable_plugin=../iree-build/samples/custom_dispatch/cpu/plugin/standalone_plugin.sos \
--function=mixed_invocation \
--input=8xf32=2 \
--input=8xf32=4 \
--module=/tmp/example.vmfb
```
```bash
iree-run-module \
--device=local-sync \
--executable_plugin=../iree-build/samples/custom_dispatch/cpu/plugin/system_plugin.so \
--function=mixed_invocation \
--input=8xf32=2 \
--input=8xf32=4 \
--module=/tmp/example.vmfb
```