bjacob | 39073c0 | 2020-12-04 16:58:56 -0500 | [diff] [blame] | 1 | # Vulkan GPU Profiling |
| 2 | |
| 3 | [Tracy](./profiling_with_tracy.md) offers great insights into CPU/GPU |
| 4 | interactions and Vulkan API usage |
| 5 | details. However, information at a finer granularity, especially inside a |
| 6 | particular shader dispatch, is missing. To supplement general purpose tools |
| 7 | like Tracy, vendor-specific tools can be used. |
| 8 | |
| 9 | (TODO: add some pictures for each tool) |
| 10 | |
| 11 | ## Android GPUs |
| 12 | |
| 13 | There are multiple GPU vendors for the Android platforms, each offering their |
| 14 | own tools. [Android GPU Inspector](https://gpuinspector.dev/) |
| 15 | (AGI) provides a cross-vendor solution. See the |
| 16 | [documentation](https://gpuinspector.dev/docs/) for more details. |
| 17 | |
| 18 | ### Build Android app to run IREE |
| 19 | |
| 20 | In order to perform capture and analysis with AGI, you will need a full Android |
| 21 | app. In IREE we have a simple Android native app wrapper to help package |
| 22 | IREE core libraries together with a specific VM bytecode invocation into an |
| 23 | Android app. The wrapper and its documentation are placed at |
| 24 | [`iree/tools/android/run_module_app/`](https://github.com/google/iree/tree/main/iree/tools/android/run_module_app). |
| 25 | |
| 26 | For example, to package a module compiled from the following `mhlo-dot.mlir` as |
| 27 | an Android app: |
| 28 | |
| 29 | ```mlir |
Ben Vanik | f980bf6 | 2021-06-11 20:54:27 -0700 | [diff] [blame] | 30 | func @dot(%lhs: tensor<2x4xf32>, %rhs: tensor<4x2xf32>) -> tensor<2x2xf32> { |
bjacob | 39073c0 | 2020-12-04 16:58:56 -0500 | [diff] [blame] | 31 | %0 = "mhlo.dot"(%lhs, %rhs) : (tensor<2x4xf32>, tensor<4x2xf32>) -> tensor<2x2xf32> |
| 32 | return %0 : tensor<2x2xf32> |
| 33 | } |
| 34 | ``` |
| 35 | |
| 36 | ```shell |
| 37 | # First translate into a VM bytecode module |
| 38 | $ /path/to/iree/build/iree/tools/iree-translate -- \ |
CindyLiu | af7dfeb | 2021-06-08 22:31:37 +0000 | [diff] [blame] | 39 | -iree-input-type=mhlo \ |
bjacob | 39073c0 | 2020-12-04 16:58:56 -0500 | [diff] [blame] | 40 | -iree-mlir-to-vm-bytecode-module \ |
CindyLiu | af7dfeb | 2021-06-08 22:31:37 +0000 | [diff] [blame] | 41 | -iree-hal-target-backends=vulkan-spirv \ |
bjacob | 39073c0 | 2020-12-04 16:58:56 -0500 | [diff] [blame] | 42 | /path/to/mhlo-dot.mlir \ |
| 43 | -o /tmp/mhlo-dot.vmfb |
| 44 | |
| 45 | # Then package the Android app |
| 46 | $ /path/to/iree/source/iree/tools/android/run_module_app/build_apk.sh \ |
| 47 | ./build-apk \ |
Ben Vanik | ebeb5fc | 2021-04-24 09:40:50 -0700 | [diff] [blame] | 48 | --driver vulkan \ |
bjacob | 39073c0 | 2020-12-04 16:58:56 -0500 | [diff] [blame] | 49 | --module_file /tmp/mhlo-dot.vmfb \ |
| 50 | --entry_function dot \ |
Ben Vanik | ebeb5fc | 2021-04-24 09:40:50 -0700 | [diff] [blame] | 51 | --function_input=... |
bjacob | 39073c0 | 2020-12-04 16:58:56 -0500 | [diff] [blame] | 52 | ``` |
| 53 | |
| 54 | Where `/path/to/input/file` is a file containing inputs to `dot`, for example: |
| 55 | |
| 56 | ``` |
| 57 | 2x4xf32=[[1.0 2.0 3.0 4.0][5.0 6.0 7.0 8.0]] |
| 58 | 4x2xf32=[[9.0 10.0][11.0 12.0][13.0 14.0][15.0 16.0]] |
| 59 | ``` |
| 60 | |
| 61 | The above will build an `iree-run-module.apk` under the `./build-apk/` |
| 62 | directory, which you can then install via `adb install`. |
| 63 | |
| 64 | `build_apk.sh` needs the Android SDK and NDK internally, an easy way to manage |
| 65 | them is by installing [Android Studio](https://developer.android.com/studio). |
| 66 | After installation, you will need to set up a few environment variables, which |
| 67 | are printed at the beginning of `build_apk.sh` invocation. |
| 68 | |
| 69 | ### Capture and analyze with AGI |
| 70 | |
| 71 | You can follow AGI's |
| 72 | [Getting Started](https://gpuinspector.dev/docs/getting-started) page to learn |
| 73 | how to use it. In general the steps are: |
| 74 | |
| 75 | * Install the latest AGI from https://github.com/google/agi/releases and launch. |
| 76 | * Fill in the "Application" field by searching the app. The line should read |
| 77 | like `android.intent.action.MAIN:com.google.iree.run_module/android.app.NativeActivity`. |
| 78 | * Select start at beginning and choose a proper duration. |
| 79 | * Configure system profile to include all GPU counters. |
| 80 | * Start capture. |
| 81 | |
| 82 | Generated traces are in the [perfetto](https://perfetto.dev/) format. They can |
| 83 | be viewed directly within AGI and also online in a browser at |
| 84 | https://ui.perfetto.dev/, without needing an Android device. |
| 85 | |
| 86 | ## Desktop GPUs |
| 87 | |
| 88 | Vulkan supports both graphics and compute, but most tools in the Vulkan |
| 89 | ecosystem focus on graphics. As a result, some Vulkan profiling tools expect |
| 90 | commands to correspond to a sequence of frames presented to displays via |
| 91 | framebuffers. This means additional steps for IREE and other Vulkan |
| 92 | applications that solely rely on headless compute. For graphics-focused tools, |
| 93 | we need to wrap IREE's logic inside a dummy rendering loop in order to provide |
| 94 | the necessary markers for these tools to perform capture and analysis. |
| 95 | |
| 96 | IREE provides an `iree-run-module-vulkan-gui` binary that can invoke a specific |
| 97 | bytecode module within a proper GUI application. The graphics side is leveraging |
| 98 | [Dear ImGui](https://github.com/ocornut/imgui); it calls into IREE |
| 99 | synchronously during rendering each frame and prints the bytecode invocation |
| 100 | results to the screen. |
| 101 | |
| 102 | To build `iree-run-module-vulkan-gui`: |
| 103 | |
| 104 | ```shell |
| 105 | # Using Bazel |
| 106 | $ bazel build //iree/testing/vulkan:iree-run-module-vulkan-gui |
| 107 | |
| 108 | # Using CMake |
| 109 | $ cmake --build /path/to/build/dir --target iree-run-module-vulkan-gui |
| 110 | ``` |
| 111 | |
| 112 | The generated binary should be invoked in a console environment and it takes |
| 113 | the same command-line options as the main |
| 114 | [`iree-run-module`](./developer-overview.md#iree-run-module), except the |
| 115 | `--driver` option. You can use `--help` to learn them all. The binary will |
| 116 | launch a GUI window for use with Vulkan tools. |
| 117 | |
| 118 | ### AMD |
| 119 | |
| 120 | For AMD GPUs, [Radeon GPU Profiler](https://gpuopen.com/rgp/) (RGP) is the tool |
| 121 | to understand fine details of how IREE GPU performs. See the |
| 122 | [documentation](https://radeon-gpuprofiler.readthedocs.io/en/latest/) for |
| 123 | details. In general the steps to get started are: |
| 124 | |
| 125 | * Download and install AMD RGP from https://gpuopen.com/rgp/. |
| 126 | * Compile `iree-run-module-vulkan-gui` as said in the above. |
| 127 | * Open "Radeon Developer Panel" and connect to the local |
| 128 | "Radeon Developer Service". |
| 129 | * Start `iree-run-module-vulkan-gui` from console with proper VM bytecode module |
| 130 | invocation. |
| 131 | * You should see it in the "Applications" panel of "Radeon Developer Panel". |
| 132 | Click "Capture profile" to capture. |
| 133 | |
| 134 | Afterwards you can analyze the profile with RGP. Viewing the profile does not |
| 135 | need the GPU anymore; it can be opened by a RGP application installed anywhere. |
| 136 | |
| 137 | ### NVIDIA |
| 138 | |
| 139 | For NVIDIA GPUs, [NVIDIA Nsight Graphics](https://developer.nvidia.com/nsight-graphics) |
| 140 | is the tool to understand fine details of how IREE GPU performs. See the |
| 141 | [documentation](https://docs.nvidia.com/nsight-graphics/UserGuide/index.html) |
| 142 | for details. In general the steps to get started are: |
| 143 | |
| 144 | * Download and install NVIDIA Nsight Graphics from https://developer.nvidia.com/nsight-graphics. |
| 145 | * Compile `iree-run-module-vulkan-gui` as said in the above. |
| 146 | * Open NVIDIA Nsight Graphics, select "Quick Launch" on the welcome page. |
| 147 | * Fill out the "Application Executable" and "Command Line Arguments" to point |
| 148 | to `iree-run-module-vulkan-gui` and a specific VM bytecode module and its |
| 149 | invocation information. |
| 150 | * Select an "Activity" ("Frame Profiler" and "GPU Trace" are particularly |
| 151 | interesting) and launch. |
| 152 | * Capture any frame to perform analysis. |