blob: cffdb962c78db42d3dea01fddec7d73fb5ecb3dc [file] [log] [blame] [view]
# Vulkan GPU Profiling
[Tracy](./profiling_with_tracy.md) offers great insights into CPU/GPU
interactions and Vulkan API usage
details. However, information at a finer granularity, especially inside a
particular shader dispatch, is missing. To supplement general purpose tools
like Tracy, vendor-specific tools can be used.
(TODO: add some pictures for each tool)
## RenderDoc
Support for [RenderDoc](https://github.com/baldurk/renderdoc) can be enabled by
configuring cmake with `-DIREE_ENABLE_RENDERDOC_PROFILING=ON`. When built in to
IREE the profiling functionality is available for programmatic use via the
`iree_hal_device_profiling_begin` and `iree_hal_device_profiling_end` APIs.
When using one of the standard IREE tools (`iree-run-module`,
`iree-benchmark-module`, etc) the `--device_profiling_mode=queue` flag can be
passed to enable capture around the entire invocation (be careful when
benchmarking as the recordings can be quite large!). The default capture file
name can be specified with `--device_profiling_file=foo.rdc`.
Capturing in the RenderDoc UI can be done by specifying the IREE tool or
embedding application (`iree-run-module`, etc) as the launch executable and
adding all arguments as normal.
Capturing from the command line can be done using `renderdoccmd` with the
specified file appearing (by default) in the executable directory:
```shell
$ renderdoccmd capture tools/iree-run-module --device_profiling_mode=queue --device_profiling_file=foo.rdc ...
$ stat tools/foo.rdc
$ renderdoccmd capture tools/iree-run-module --device_profiling_mode=queue --device_profiling_file=/some/path/foo.rdc ...
$ stat /some/path/foo.rdc
```
## Android GPUs
There are multiple GPU vendors for the Android platforms, each offering their
own tools. [Android GPU Inspector](https://gpuinspector.dev/)
(AGI) provides a cross-vendor solution. See the
[documentation](https://gpuinspector.dev/docs/) for more details.
### Build Android app to run IREE
In order to perform capture and analysis with AGI, you will need a full Android
app. In IREE we have a simple Android native app wrapper to help package
IREE core libraries together with a specific VM bytecode invocation into an
Android app. The wrapper and its documentation are placed at
[`tools/android/run_module_app/`](https://github.com/openxla/iree/tree/main/tools/android/run_module_app).
For example, to package a module compiled from the following `mhlo-dot.mlir` as
an Android app:
```mlir
func @dot(%lhs: tensor<2x4xf32>, %rhs: tensor<4x2xf32>) -> tensor<2x2xf32> {
%0 = "mhlo.dot"(%lhs, %rhs) : (tensor<2x4xf32>, tensor<4x2xf32>) -> tensor<2x2xf32>
return %0 : tensor<2x2xf32>
}
```
```shell
# First compile into a VM bytecode module
$ /path/to/iree/build/tools/iree-compile -- \
--iree-input-type=mhlo \
--iree-hal-target-backends=vulkan-spirv \
/path/to/mhlo-dot.mlir \
-o /tmp/mhlo-dot.vmfb
# Then package the Android app
$ /path/to/iree/source/tools/android/run_module_app/build_apk.sh \
./build-apk \
--device vulkan \
--module /tmp/mhlo-dot.vmfb \
--function dot \
--input=...
```
Where `/path/to/input/file` is a file containing inputs to `dot`, for example:
```
2x4xf32=[[1.0 2.0 3.0 4.0][5.0 6.0 7.0 8.0]]
4x2xf32=[[9.0 10.0][11.0 12.0][13.0 14.0][15.0 16.0]]
```
The above will build an `iree-run-module.apk` under the `./build-apk/`
directory, which you can then install via `adb install`.
`build_apk.sh` needs the Android SDK and NDK internally, an easy way to manage
them is by installing [Android Studio](https://developer.android.com/studio).
After installation, you will need to set up a few environment variables, which
are printed at the beginning of `build_apk.sh` invocation.
### Capture and analyze with AGI
You can follow AGI's
[Getting Started](https://gpuinspector.dev/docs/getting-started) page to learn
how to use it. In general the steps are:
* Install the latest AGI from https://github.com/google/agi/releases and launch.
* Fill in the "Application" field by searching the app. The line should read
like `android.intent.action.MAIN:dev.iree.run_module/android.app.NativeActivity`.
* Select start at beginning and choose a proper duration.
* Configure system profile to include all GPU counters.
* Start capture.
Generated traces are in the [perfetto](https://perfetto.dev/) format. They can
be viewed directly within AGI and also online in a browser at
https://ui.perfetto.dev/, without needing an Android device.
## Desktop GPUs
Vulkan supports both graphics and compute, but most tools in the Vulkan
ecosystem focus on graphics. As a result, some Vulkan profiling tools expect
commands to correspond to a sequence of frames presented to displays via
framebuffers. This means additional steps for IREE and other Vulkan
applications that solely rely on headless compute. For graphics-focused tools,
we need to wrap IREE's logic inside a dummy rendering loop in order to provide
the necessary markers for these tools to perform capture and analysis.
### AMD
For AMD GPUs, [Radeon GPU Profiler](https://gpuopen.com/rgp/) (RGP) is the tool
to understand fine details of how IREE GPU performs. See the
[documentation](https://radeon-gpuprofiler.readthedocs.io/en/latest/) for
details.
### NVIDIA
For NVIDIA GPUs, [NVIDIA Nsight Graphics](https://developer.nvidia.com/nsight-graphics)
is the tool to understand fine details of how IREE GPU performs. See the
[documentation](https://docs.nvidia.com/nsight-graphics/UserGuide/index.html)
for details.