This page contains a list of best practices for getting the most out of IREE, spanning model authoring, ahead-of-time compilation, and runtime use. Treat these as a collection of ideas to consider or areas to start benchmarking when working on your own applications.
Common themes include:
If your model is stateful prefer to store that state directly within your program rather than externalizing it through arguments and return values. By keeping state inside your program the compiler is better able to reason about it and function calls will have lower overhead.
If you do externalize state, try to pack that state into a limited number of arguments.
See the variables and state sample for further guidance on tracking and using state.
While IREE aims to support general dynamic shapes use, it is better able to optimize parts of programs where shapes are static. Slow varying dimensions like batch index or timestamp are safer uses of dynamic shapes than faster varying dimensions like the x/y/channel dimensions of images.
See the dynamic shapes sample for further guidance on using dynamic shapes.
TODO: mention parameters to tune
TODO: which compiler targets to use (try both CUDA and Vulkan?)
TODO: use the most specific LLVM target triple you can?
Try to front-load queries, particularly queries using strings that look up into maps like iree_runtime_session_call_by_name
, so that hot sections of code are doing the minimum amount of work: routing inputs through buffers, scheduling runtime calls, and routing outputs through other buffers.
TODO: sample code, profile numbers