Dynamic Library CPU HAL Driver

IREE supports efficient model execution on CPU. IREE uses LLVM to compile dense computation in the model into highly optimized CPU native instruction streams, which are embedded in IREE's deployable format as dynamic libraries (dylibs). IREE uses its own low-overhead minimal dynamic library loader to load them and then schedule them with concrete workloads onto various CPU cores.

!!! todo

Add IREE's CPU support matrix: what architectures are supported; what
architectures are well optimized; etc.

Get runtime and compiler

Get IREE runtime with dylib HAL driver

You will need to get an IREE runtime that supports the dylib HAL driver so it can execute the model on CPU via dynamic libraries containing native CPU instructions.

Build runtime from source

Please make sure you have followed the Getting started page to build IREE for your host platform and the Android cross-compilation page if you are cross compiling for Android. The dylib HAL driver is compiled in by default on all platforms.

Ensure that the IREE_HAL_DRIVER_DYLIB CMake option is ON when configuring for the target.

Get compiler for CPU native instructions

Download as Python package

Python packages for various IREE functionalities are regularly published to PyPI. See the Python Bindings page for more details. The core iree-compiler package includes the LLVM-based CPU compiler:

python -m pip install iree-compiler

!!! tip iree-compile is installed as /path/to/python/site-packages/iree/tools/core/iree-compile. You can find out the full path to the site-packages directory via the python -m site command.

Build compiler from source

Please make sure you have followed the Getting started page to build IREE for your host platform and the Android cross-compilation page if you are cross compiling for Android. The dylib compiler backend is compiled in by default on all platforms.

Ensure that the IREE_TARGET_BACKEND_DYLIB_LLVM_AOT CMake option is ON when configuring for the host.

Compile and run the model

With the compiler and runtime for dynamic libraries, we can now compile a model and run it on the CPU.

Compile the model

IREE compilers transform a model into its final deployable format in many sequential steps. A model authored with Python in an ML framework should use the corresponding framework's import tool to convert into a format (i.e., MLIR) expected by main IREE compilers first.

Using MobileNet v2 as an example, you can download the SavedModel with trained weights from TensorFlow Hub and convert it using IREE's TensorFlow importer. Then,

Compile using the command-line

In the build directory, run the following command:

iree/tools/iree-compile \
    -iree-mlir-to-vm-bytecode-module \
    -iree-hal-target-backends=dylib-llvm-aot \
    iree_input.mlir -o mobilenet-dylib.vmfb

!!! todo

Choose the suitable target triple for the current CPU

where iree_input.mlir is the model‘s initial MLIR representation generated by IREE’s TensorFlow importer.

Run the model

Run using the command-line

In the build directory, run the following command:

iree/tools/iree-run-module \
    --driver=dylib \
    --module_file=mobilenet-dylib.vmfb \
    --entry_function=predict \
    --function_input="1x224x224x3xf32=0"

The above assumes the exported function in the model is named as predict and it expects one 224x224 RGB image. We are feeding in an image with all 0 values here for brevity, see iree-run-module --help for the format to specify concrete values.