| # Vulkan GPU Profiling |
| |
| [Tracy](./profiling_with_tracy.md) offers great insights into CPU/GPU |
| interactions and Vulkan API usage |
| details. However, information at a finer granularity, especially inside a |
| particular shader dispatch, is missing. To supplement general purpose tools |
| like Tracy, vendor-specific tools can be used. |
| |
| (TODO: add some pictures for each tool) |
| |
| ## Android GPUs |
| |
| There are multiple GPU vendors for the Android platforms, each offering their |
| own tools. [Android GPU Inspector](https://gpuinspector.dev/) |
| (AGI) provides a cross-vendor solution. See the |
| [documentation](https://gpuinspector.dev/docs/) for more details. |
| |
| ### Build Android app to run IREE |
| |
| In order to perform capture and analysis with AGI, you will need a full Android |
| app. In IREE we have a simple Android native app wrapper to help package |
| IREE core libraries together with a specific VM bytecode invocation into an |
| Android app. The wrapper and its documentation are placed at |
| [`iree/tools/android/run_module_app/`](https://github.com/google/iree/tree/main/iree/tools/android/run_module_app). |
| |
| For example, to package a module compiled from the following `mhlo-dot.mlir` as |
| an Android app: |
| |
| ```mlir |
| func @dot(%lhs: tensor<2x4xf32>, %rhs: tensor<4x2xf32>) -> tensor<2x2xf32> { |
| %0 = "mhlo.dot"(%lhs, %rhs) : (tensor<2x4xf32>, tensor<4x2xf32>) -> tensor<2x2xf32> |
| return %0 : tensor<2x2xf32> |
| } |
| ``` |
| |
| ```shell |
| # First translate into a VM bytecode module |
| $ /path/to/iree/build/iree/tools/iree-translate -- \ |
| -iree-input-type=mhlo \ |
| -iree-mlir-to-vm-bytecode-module \ |
| -iree-hal-target-backends=vulkan-spirv \ |
| /path/to/mhlo-dot.mlir \ |
| -o /tmp/mhlo-dot.vmfb |
| |
| # Then package the Android app |
| $ /path/to/iree/source/iree/tools/android/run_module_app/build_apk.sh \ |
| ./build-apk \ |
| --driver vulkan \ |
| --module_file /tmp/mhlo-dot.vmfb \ |
| --entry_function dot \ |
| --function_input=... |
| ``` |
| |
| Where `/path/to/input/file` is a file containing inputs to `dot`, for example: |
| |
| ``` |
| 2x4xf32=[[1.0 2.0 3.0 4.0][5.0 6.0 7.0 8.0]] |
| 4x2xf32=[[9.0 10.0][11.0 12.0][13.0 14.0][15.0 16.0]] |
| ``` |
| |
| The above will build an `iree-run-module.apk` under the `./build-apk/` |
| directory, which you can then install via `adb install`. |
| |
| `build_apk.sh` needs the Android SDK and NDK internally, an easy way to manage |
| them is by installing [Android Studio](https://developer.android.com/studio). |
| After installation, you will need to set up a few environment variables, which |
| are printed at the beginning of `build_apk.sh` invocation. |
| |
| ### Capture and analyze with AGI |
| |
| You can follow AGI's |
| [Getting Started](https://gpuinspector.dev/docs/getting-started) page to learn |
| how to use it. In general the steps are: |
| |
| * Install the latest AGI from https://github.com/google/agi/releases and launch. |
| * Fill in the "Application" field by searching the app. The line should read |
| like `android.intent.action.MAIN:com.google.iree.run_module/android.app.NativeActivity`. |
| * Select start at beginning and choose a proper duration. |
| * Configure system profile to include all GPU counters. |
| * Start capture. |
| |
| Generated traces are in the [perfetto](https://perfetto.dev/) format. They can |
| be viewed directly within AGI and also online in a browser at |
| https://ui.perfetto.dev/, without needing an Android device. |
| |
| ## Desktop GPUs |
| |
| Vulkan supports both graphics and compute, but most tools in the Vulkan |
| ecosystem focus on graphics. As a result, some Vulkan profiling tools expect |
| commands to correspond to a sequence of frames presented to displays via |
| framebuffers. This means additional steps for IREE and other Vulkan |
| applications that solely rely on headless compute. For graphics-focused tools, |
| we need to wrap IREE's logic inside a dummy rendering loop in order to provide |
| the necessary markers for these tools to perform capture and analysis. |
| |
| IREE provides an `iree-run-module-vulkan-gui` binary that can invoke a specific |
| bytecode module within a proper GUI application. The graphics side is leveraging |
| [Dear ImGui](https://github.com/ocornut/imgui); it calls into IREE |
| synchronously during rendering each frame and prints the bytecode invocation |
| results to the screen. |
| |
| To build `iree-run-module-vulkan-gui`: |
| |
| ```shell |
| # Using Bazel |
| $ bazel build //iree/testing/vulkan:iree-run-module-vulkan-gui |
| |
| # Using CMake |
| $ cmake --build /path/to/build/dir --target iree-run-module-vulkan-gui |
| ``` |
| |
| The generated binary should be invoked in a console environment and it takes |
| the same command-line options as the main |
| [`iree-run-module`](./developer-overview.md#iree-run-module), except the |
| `--driver` option. You can use `--help` to learn them all. The binary will |
| launch a GUI window for use with Vulkan tools. |
| |
| ### AMD |
| |
| For AMD GPUs, [Radeon GPU Profiler](https://gpuopen.com/rgp/) (RGP) is the tool |
| to understand fine details of how IREE GPU performs. See the |
| [documentation](https://radeon-gpuprofiler.readthedocs.io/en/latest/) for |
| details. In general the steps to get started are: |
| |
| * Download and install AMD RGP from https://gpuopen.com/rgp/. |
| * Compile `iree-run-module-vulkan-gui` as said in the above. |
| * Open "Radeon Developer Panel" and connect to the local |
| "Radeon Developer Service". |
| * Start `iree-run-module-vulkan-gui` from console with proper VM bytecode module |
| invocation. |
| * You should see it in the "Applications" panel of "Radeon Developer Panel". |
| Click "Capture profile" to capture. |
| |
| Afterwards you can analyze the profile with RGP. Viewing the profile does not |
| need the GPU anymore; it can be opened by a RGP application installed anywhere. |
| |
| ### NVIDIA |
| |
| For NVIDIA GPUs, [NVIDIA Nsight Graphics](https://developer.nvidia.com/nsight-graphics) |
| is the tool to understand fine details of how IREE GPU performs. See the |
| [documentation](https://docs.nvidia.com/nsight-graphics/UserGuide/index.html) |
| for details. In general the steps to get started are: |
| |
| * Download and install NVIDIA Nsight Graphics from https://developer.nvidia.com/nsight-graphics. |
| * Compile `iree-run-module-vulkan-gui` as said in the above. |
| * Open NVIDIA Nsight Graphics, select "Quick Launch" on the welcome page. |
| * Fill out the "Application Executable" and "Command Line Arguments" to point |
| to `iree-run-module-vulkan-gui` and a specific VM bytecode module and its |
| invocation information. |
| * Select an "Activity" ("Frame Profiler" and "GPU Trace" are particularly |
| interesting) and launch. |
| * Capture any frame to perform analysis. |