IREE benchmarking gives us an overall view of the performance at a specific level of granularity. To understand performance details of the IREE system, one would need to profile IREE, in order to know how components at different levels function and interact.
IREE uses Tracy as the main tool to perform whole-system profiling. Tracy is a real-time, nanosecond resolution, remote telemetry, hybrid frame and sampling profiler. It can profile CPU, GPU, memory, locks, context switches, and many more.
To use tracing in IREE, you need to build IREE with following requirements:
IREE_ENABLE_RUNTIME_TRACING
to ON
.-DNDEBUG
to IREE_DEFAULT_COPTS
.For example:
export IREE_DEFAULT_COPTS='-DNDEBUG' cmake -B build/ \ -DIREE_ENABLE_RUNTIME_TRACING=ON \ -DCMAKE_BUILD_TYPE=RelWithDebInfo
The above compiles IREE with Tracy APIs so that IREE will stream profiling data back to Tracy when running. To be able to collect and analyze these data, you can either use GUI or CLI tools. Tracy profiler is the GUI tool. You can find the Tracy manual on its releases page for more details on Tracy itself.
To build the profiler on Linux, you will need to install some external libraries. Some Linux distributions will require you to add a lib
prefix and a -dev
, or -devel
postfix to library names. For example, you might see the error:
Package glfw3 was not found in the pkg-config search path.
and then you could try to install libglfw3-dev
.
Instructions to build Tracy profiler:
cd third_party/tracy/profiler/build/unix make release
Launch the profiler UI, and click connect. Then the server will wait for the connection. Now you can launch the IREE binary you want to trace, it should connect automatically and stream data. For example:
Prepare the module to profile:
build/iree/tools/iree-benchmark-module \ --module_file=/tmp/module.fb \ --driver=vmla \ --entry_function=abs \ --function_inputs="i32=-2"
Run the module:
build/iree/tools/iree-run-module \ --module_file=/tmp/module.fb \ --driver=vmla \ --entry_function=abs \ --function_inputs="i32=-2"
Note that typically IREE binaries complete running the module and exit very quickly before even connecting to Tracy. For such cases, you can set TRACY_NO_EXIT=1
in the environment to keep the IREE binary alive until Tracy connects to it.
Tracy gives us great insights over CPU/GPU interactions and Vulkan API usage details. However, information at a finer granularity, especially inside a particular shader dispatch, is missing. To supplement, one would typically need to use other third-party or vendor-specific tools.
(TODO: add some pictures for each tool)
There are multiple GPU vendors for the Android platforms. One can use tools provided by the GPU vendor. Android GPU Inspector (AGI) provides a cross-vendor solution. See the documentation for more details.
In order to perform capture and analysis with AGI, one will need a full Android app. In IREE we have a simple Android native app wrapper to help package IREE core libraries together with a specific VM bytecode invocation into an Android app. The wrapper and its documentation is placed at iree/tools/android/run_module_app/
.
For example, to package a module compiled from the following mhlo-dot.mlir
as an Android app:
func @dot(%lhs: tensor<2x4xf32>, %rhs: tensor<4x2xf32>) -> tensor<2x2xf32> attributes { iree.module.export } { %0 = "mhlo.dot"(%lhs, %rhs) : (tensor<2x4xf32>, tensor<4x2xf32>) -> tensor<2x2xf32> return %0 : tensor<2x2xf32> }
# First translate into VM bytecode module /path/to/iree/build/iree/tools/iree-translate -- \ -iree-mlir-to-vm-bytecode-module \ --iree-hal-target-backends=vulkan \ /path/to/mhlo-dot.mlir \ -o /tmp/mhlo-dot.vmfb # Then package the Android app /path/to/iree/source/iree/tools/android/run_module_app/build_apk.sh \ ./build-apk \ --module_file /tmp/mhlo-dot.mlir \ --entry_function dot \ --function_inputs_file /path/to/inputs/file \ --driver vulkan
Where /path/to/input/file
is a file containing inputs to dot
, for example:
2x4xf32=[[1.0 2.0 3.0 4.0][5.0 6.0 7.0 8.0]] 4x2xf32=[[9.0 10.0][11.0 12.0][13.0 14.0][15.0 16.0]]
The above will build an iree-run-module.apk
under the ./build-apk/
directory. One can then install via adb install
.
build_apk.sh
needs Android SDK and NDK internally. And easy way to manage them is by installing the Android Studio. After installation, you will need to set up a few environment variables, which are printed at the beginning of build_apk.sh
invocation.
You can follow AGI's get started page to learn how to use it. In general the steps are:
android.intent.action.MAIN:com.google.iree.run_module/android.app.NativeActivity
.The generated trace is in perfetto format. To view the trace does not need the device anymore. It can be viewed directly with AGI and also online in a browswer at https://ui.perfetto.dev/.
Vulkan is traditionally used for graphics rendering. So Vulkan profiling tools at the moment typically have such assumption and require a rendering boundary marked by framebuffer presentation. This means additional steps for IREE and other Vulkan applications that solely rely on headless compute. We need to wrap the core IREE logic inside a dummy rendering loop in order to provide tools the necessary markers to perform capture and analysis.
IREE provides an iree-run-module-vulkan-gui
binary that can invoke a specific bytecode module within a proper GUI application. The graphics side is leveraging Dear ImGui; it invokes IREE core synchronously during rendering each frame and prints the bytecode invoation result to the screen.
To build iree-run-module-vulkan-gui
:
# Using Bazel bazel build //iree/testing/vulkan:iree-run-module-vulkan-gui # Using CMake cmake --build /path/to/build/dir --target iree-run-module-vulkan-gui
The generated binary should be invoked in a console environment and it takes the same command-line options as the main iree-run-module
, except the --driver
option. You can use --help
to learn them all. The binary will invoke a GUI window to let one to use Vulkan tools.
For AMD GPUs, Radeon GPU Profiler (RGP) is the tool to understand fine details of how IREE GPU performs. See the documentation for details. In general the steps to get started are:
iree-run-module-vulkan-gui
as said in the above.iree-run-module-vulkan-gui
from console with proper VM bytecode module invocation.Afterwards you can analyze the profile with RGP. Viewing the profile does not need the GPU anymore; it can be opened by a RGP application installed anywhere.
For NVIDIA GPUs, NVIDIA Nsight Graphics is the tool to understand fine details of how IREE GPU performs. See the documentation for details. In general the steps to get started are:
iree-run-module-vulkan-gui
as said in the above.iree-run-module-vulkan-gui
and a specific VM bytecode module and its invocation information.