start an overhaul of profiling docs (#4060) * start an overhaul of profiling docs. profiling.md becomes just a table of contents pointing to more specialized pages, so each remains manageable. specialized pages are named profiling_*.md so they're easily discoverable from an alphabetical directory listing. Tracy docs mostly rebooted. Includes Android/Linux details. Placeholders for other use cases (Mac, remote over network) to be filled by others! Moved the Vulkan profiling docs to separate doc, unchanged. Added a stub of perf/simpleperf CPU cache events doc. Deleted by `android_tracy_tutorial.sh`. Was incomplete/rotting/no longer useful.
diff --git a/docs/developing_iree/profiling.md b/docs/developing_iree/profiling.md index f1864d6..c73e532 100644 --- a/docs/developing_iree/profiling.md +++ b/docs/developing_iree/profiling.md
@@ -1,261 +1,25 @@ # Profiling IREE [benchmarking](./benchmarking.md) gives us an accurate and reproducible -view of program performance at specific levels of granularity. To analyze -system behavior in more depth, there are various ways to -[profile](https://en.wikipedia.org/wiki/Profiling_(computer_programming)) -IREE. +view of program performance at specific levels of granularity. To analyze system +behavior in more depth, there are various ways to +[profile](https://en.wikipedia.org/wiki/Profiling_(computer_programming)) IREE. -## Whole-system Profiling with Tracy +## Tracy -IREE uses Tracy as the main tool to perform whole-system profiling. -[Tracy](https://github.com/wolfpld/tracy) is a real-time, nanosecond resolution, -remote telemetry, hybrid frame and sampling profiler. Tracy can profile CPU, -GPU, memory, locks, context switches, and much more. - -### Building Tracy - -To use tracing in IREE, you need to build IREE with following requirements: - -* Set `IREE_ENABLE_RUNTIME_TRACING` to `ON`. -* Use Release/RelWithDebInfo build. - -For example: - -```shell -$ export IREE_DEFAULT_COPTS='-DNDEBUG' -$ cmake -B build/ \ - -DIREE_ENABLE_RUNTIME_TRACING=ON \ - -DCMAKE_BUILD_TYPE=RelWithDebInfo -``` - -The above compiles IREE with Tracy APIs so that IREE will stream profiling data -back to Tracy when running. To collect and analyze these data, you can either -use GUI or CLI tools. Tracy profiler is the GUI tool. You can find the -Tracy manual on its [releases page](https://github.com/wolfpld/tracy/releases) -for more details on Tracy itself. - -#### Building on Linux - -To build the profiler on Linux, you may need to install some external -libraries. Some Linux distributions will require you to add a `lib` prefix and a -`-dev`, or `-devel` postfix to library names. For example, you might see the -error: - -``` -Package glfw3 was not found in the pkg-config search path. -``` - -and then you could try to install `libglfw3-dev`. - -Instructions to build Tracy profiler: - -```shell -$ cd third_party/tracy/profiler/build/unix -$ make release -``` - -### Using Tracy - -Launch the profiler UI and click connect to start waiting for a traced program -to running. Now you can launch the IREE binary you want to trace and Tracy -should connect automatically and stream data. For example: - -Compile a .mlir file using `iree-translate`: - -```shell -$ build/iree/tools/iree-translate \ - -iree-mlir-to-vm-bytecode-module \ - -iree-hal-target-backends=vmla \ - $PWD/iree/tools/test/simple.mlir \ - -o /tmp/simple.vmfb -``` - -Run a compiled module once: - -```shell -$ build/iree/tools/iree-run-module \ - --module_file=/tmp/simple.vmfb \ - --driver=vmla \ - --entry_function=abs \ - --function_inputs="i32=-2" -``` - -Benchmark a compiled module, running it many times: - -```shell -$ build/iree/tools/iree-benchmark-module \ - --module_file=/tmp/simple.vmfb \ - --driver=vmla \ - --entry_function=abs \ - --function_inputs="i32=-2" -``` - -> Note:<br> -> IREE binaries may finish running before even -> connecting to Tracy. For such cases, you can set `TRACY_NO_EXIT=1` in the -> environment to keep the IREE binary alive until Tracy connects to it. - -### Configuring Tracy - -Set IREE's `IREE_TRACING_MODE` value (defined in -[iree/base/tracing.h](https://github.com/google/iree/blob/main/iree/base/tracing.h)) -to adjust which tracing features, such as allocation tracking and callstacks, -are enabled. - -In order for Tracy to record detailed statistics via sampling, the program -collecting data must be run using elevated permissions (Administrator on Windows, -root on Linux, rooted Android device). See Tracy's user manual for more -information. +Tracy is a profiler that's been used for a wide range of profiling tasks on +IREE. Refer to [profiling_with_tracy.md](./profiling_with_tracy.md). ## Vulkan GPU Profiling -Tracy offers great insights into CPU/GPU interactions and Vulkan API usage -details. However, information at a finer granularity, especially inside a -particular shader dispatch, is missing. To supplement general purpose tools -like Tracy, vendor-specific tools can be used. +[Tracy](./profiling_with_tracy.md) offers great insights into CPU/GPU +interactions and Vulkan API usage details. However, information at a finer +granularity, especially inside a particular shader dispatch, is missing. To +supplement general purpose tools like Tracy, vendor-specific tools can be used. +Refer to [profiling_vulkan_gpu.md](./profiling_vulkan_gpu.md). -(TODO: add some pictures for each tool) +## CPU cache and other CPU event profiling -### Android GPUs - -There are multiple GPU vendors for the Android platforms, each offering their -own tools. [Android GPU Inspector](https://gpuinspector.dev/) -(AGI) provides a cross-vendor solution. See the -[documentation](https://gpuinspector.dev/docs/) for more details. - -#### Build Android app to run IREE - -In order to perform capture and analysis with AGI, you will need a full Android -app. In IREE we have a simple Android native app wrapper to help package -IREE core libraries together with a specific VM bytecode invocation into an -Android app. The wrapper and its documentation are placed at -[`iree/tools/android/run_module_app/`](https://github.com/google/iree/tree/main/iree/tools/android/run_module_app). - -For example, to package a module compiled from the following `mhlo-dot.mlir` as -an Android app: - -```mlir -func @dot(%lhs: tensor<2x4xf32>, %rhs: tensor<4x2xf32>) -> tensor<2x2xf32> - attributes { iree.vmfb.export } { - %0 = "mhlo.dot"(%lhs, %rhs) : (tensor<2x4xf32>, tensor<4x2xf32>) -> tensor<2x2xf32> - return %0 : tensor<2x2xf32> -} -``` - -```shell -# First translate into a VM bytecode module -$ /path/to/iree/build/iree/tools/iree-translate -- \ - -iree-mlir-to-vm-bytecode-module \ - --iree-hal-target-backends=vulkan \ - /path/to/mhlo-dot.mlir \ - -o /tmp/mhlo-dot.vmfb - -# Then package the Android app -$ /path/to/iree/source/iree/tools/android/run_module_app/build_apk.sh \ - ./build-apk \ - --module_file /tmp/mhlo-dot.vmfb \ - --entry_function dot \ - --function_inputs_file /path/to/inputs/file \ - --driver vulkan -``` - -Where `/path/to/input/file` is a file containing inputs to `dot`, for example: - -``` -2x4xf32=[[1.0 2.0 3.0 4.0][5.0 6.0 7.0 8.0]] -4x2xf32=[[9.0 10.0][11.0 12.0][13.0 14.0][15.0 16.0]] -``` - -The above will build an `iree-run-module.apk` under the `./build-apk/` -directory, which you can then install via `adb install`. - -`build_apk.sh` needs the Android SDK and NDK internally, an easy way to manage -them is by installing [Android Studio](https://developer.android.com/studio). -After installation, you will need to set up a few environment variables, which -are printed at the beginning of `build_apk.sh` invocation. - -#### Capture and analyze with AGI - -You can follow AGI's -[Getting Started](https://gpuinspector.dev/docs/getting-started) page to learn -how to use it. In general the steps are: - -* Install the latest AGI from https://github.com/google/agi/releases and launch. -* Fill in the "Application" field by searching the app. The line should read - like `android.intent.action.MAIN:com.google.iree.run_module/android.app.NativeActivity`. -* Select start at beginning and choose a proper duration. -* Configure system profile to include all GPU counters. -* Start capture. - -Generated traces are in the [perfetto](https://perfetto.dev/) format. They can -be viewed directly within AGI and also online in a browser at -https://ui.perfetto.dev/, without needing an Android device. - -### Desktop GPUs - -Vulkan supports both graphics and compute, but most tools in the Vulkan -ecosystem focus on graphics. As a result, some Vulkan profiling tools expect -commands to correspond to a sequence of frames presented to displays via -framebuffers. This means additional steps for IREE and other Vulkan -applications that solely rely on headless compute. For graphics-focused tools, -we need to wrap IREE's logic inside a dummy rendering loop in order to provide -the necessary markers for these tools to perform capture and analysis. - -IREE provides an `iree-run-module-vulkan-gui` binary that can invoke a specific -bytecode module within a proper GUI application. The graphics side is leveraging -[Dear ImGui](https://github.com/ocornut/imgui); it calls into IREE -synchronously during rendering each frame and prints the bytecode invocation -results to the screen. - -To build `iree-run-module-vulkan-gui`: - -```shell -# Using Bazel -$ bazel build //iree/testing/vulkan:iree-run-module-vulkan-gui - -# Using CMake -$ cmake --build /path/to/build/dir --target iree-run-module-vulkan-gui -``` - -The generated binary should be invoked in a console environment and it takes -the same command-line options as the main -[`iree-run-module`](./developer-overview.md#iree-run-module), except the -`--driver` option. You can use `--help` to learn them all. The binary will -launch a GUI window for use with Vulkan tools. - -#### AMD - -For AMD GPUs, [Radeon GPU Profiler](https://gpuopen.com/rgp/) (RGP) is the tool -to understand fine details of how IREE GPU performs. See the -[documentation](https://radeon-gpuprofiler.readthedocs.io/en/latest/) for -details. In general the steps to get started are: - -* Download and install AMD RGP from https://gpuopen.com/rgp/. -* Compile `iree-run-module-vulkan-gui` as said in the above. -* Open "Radeon Developer Panel" and connect to the local - "Radeon Developer Service". -* Start `iree-run-module-vulkan-gui` from console with proper VM bytecode module - invocation. -* You should see it in the "Applications" panel of "Radeon Developer Panel". - Click "Capture profile" to capture. - -Afterwards you can analyze the profile with RGP. Viewing the profile does not -need the GPU anymore; it can be opened by a RGP application installed anywhere. - -#### NVIDIA - -For NVIDIA GPUs, [NVIDIA Nsight Graphics](https://developer.nvidia.com/nsight-graphics) -is the tool to understand fine details of how IREE GPU performs. See the -[documentation](https://docs.nvidia.com/nsight-graphics/UserGuide/index.html) -for details. In general the steps to get started are: - -* Download and install NVIDIA Nsight Graphics from https://developer.nvidia.com/nsight-graphics. -* Compile `iree-run-module-vulkan-gui` as said in the above. -* Open NVIDIA Nsight Graphics, select "Quick Launch" on the welcome page. -* Fill out the "Application Executable" and "Command Line Arguments" to point - to `iree-run-module-vulkan-gui` and a specific VM bytecode module and its - invocation information. -* Select an "Activity" ("Frame Profiler" and "GPU Trace" are particularly - interesting) and launch. -* Capture any frame to perform analysis. +For some advanced CPU profiling needs such as querying CPU cache and other +events, one may need to use some OS-specific profilers. See +[profiling_cpu_events.md](./profiling_cpu_events.md). \ No newline at end of file
diff --git a/docs/developing_iree/profiling_cpu_events.md b/docs/developing_iree/profiling_cpu_events.md new file mode 100644 index 0000000..b6beb99 --- /dev/null +++ b/docs/developing_iree/profiling_cpu_events.md
@@ -0,0 +1,13 @@ +# CPU cache and other CPU event profiling + +## Android + +On Android we can use [`simpleperf`](https://developer.android.com/ndk/guides/simpleperf). It's preinstalled on current Android `userdebug` images, and part of the Android NDK. + +TODO write me + +## Linux + +On desktop Linux we can use [`perf`](https://perf.wiki.kernel.org/index.php/Main_Page). + +TODO write me
diff --git a/docs/developing_iree/profiling_vulkan_gpu.md b/docs/developing_iree/profiling_vulkan_gpu.md new file mode 100644 index 0000000..c57d573 --- /dev/null +++ b/docs/developing_iree/profiling_vulkan_gpu.md
@@ -0,0 +1,152 @@ +# Vulkan GPU Profiling + +[Tracy](./profiling_with_tracy.md) offers great insights into CPU/GPU +interactions and Vulkan API usage +details. However, information at a finer granularity, especially inside a +particular shader dispatch, is missing. To supplement general purpose tools +like Tracy, vendor-specific tools can be used. + +(TODO: add some pictures for each tool) + +## Android GPUs + +There are multiple GPU vendors for the Android platforms, each offering their +own tools. [Android GPU Inspector](https://gpuinspector.dev/) +(AGI) provides a cross-vendor solution. See the +[documentation](https://gpuinspector.dev/docs/) for more details. + +### Build Android app to run IREE + +In order to perform capture and analysis with AGI, you will need a full Android +app. In IREE we have a simple Android native app wrapper to help package +IREE core libraries together with a specific VM bytecode invocation into an +Android app. The wrapper and its documentation are placed at +[`iree/tools/android/run_module_app/`](https://github.com/google/iree/tree/main/iree/tools/android/run_module_app). + +For example, to package a module compiled from the following `mhlo-dot.mlir` as +an Android app: + +```mlir +func @dot(%lhs: tensor<2x4xf32>, %rhs: tensor<4x2xf32>) -> tensor<2x2xf32> + attributes { iree.vmfb.export } { + %0 = "mhlo.dot"(%lhs, %rhs) : (tensor<2x4xf32>, tensor<4x2xf32>) -> tensor<2x2xf32> + return %0 : tensor<2x2xf32> +} +``` + +```shell +# First translate into a VM bytecode module +$ /path/to/iree/build/iree/tools/iree-translate -- \ + -iree-mlir-to-vm-bytecode-module \ + --iree-hal-target-backends=vulkan \ + /path/to/mhlo-dot.mlir \ + -o /tmp/mhlo-dot.vmfb + +# Then package the Android app +$ /path/to/iree/source/iree/tools/android/run_module_app/build_apk.sh \ + ./build-apk \ + --module_file /tmp/mhlo-dot.vmfb \ + --entry_function dot \ + --function_inputs_file /path/to/inputs/file \ + --driver vulkan +``` + +Where `/path/to/input/file` is a file containing inputs to `dot`, for example: + +``` +2x4xf32=[[1.0 2.0 3.0 4.0][5.0 6.0 7.0 8.0]] +4x2xf32=[[9.0 10.0][11.0 12.0][13.0 14.0][15.0 16.0]] +``` + +The above will build an `iree-run-module.apk` under the `./build-apk/` +directory, which you can then install via `adb install`. + +`build_apk.sh` needs the Android SDK and NDK internally, an easy way to manage +them is by installing [Android Studio](https://developer.android.com/studio). +After installation, you will need to set up a few environment variables, which +are printed at the beginning of `build_apk.sh` invocation. + +### Capture and analyze with AGI + +You can follow AGI's +[Getting Started](https://gpuinspector.dev/docs/getting-started) page to learn +how to use it. In general the steps are: + +* Install the latest AGI from https://github.com/google/agi/releases and launch. +* Fill in the "Application" field by searching the app. The line should read + like `android.intent.action.MAIN:com.google.iree.run_module/android.app.NativeActivity`. +* Select start at beginning and choose a proper duration. +* Configure system profile to include all GPU counters. +* Start capture. + +Generated traces are in the [perfetto](https://perfetto.dev/) format. They can +be viewed directly within AGI and also online in a browser at +https://ui.perfetto.dev/, without needing an Android device. + +## Desktop GPUs + +Vulkan supports both graphics and compute, but most tools in the Vulkan +ecosystem focus on graphics. As a result, some Vulkan profiling tools expect +commands to correspond to a sequence of frames presented to displays via +framebuffers. This means additional steps for IREE and other Vulkan +applications that solely rely on headless compute. For graphics-focused tools, +we need to wrap IREE's logic inside a dummy rendering loop in order to provide +the necessary markers for these tools to perform capture and analysis. + +IREE provides an `iree-run-module-vulkan-gui` binary that can invoke a specific +bytecode module within a proper GUI application. The graphics side is leveraging +[Dear ImGui](https://github.com/ocornut/imgui); it calls into IREE +synchronously during rendering each frame and prints the bytecode invocation +results to the screen. + +To build `iree-run-module-vulkan-gui`: + +```shell +# Using Bazel +$ bazel build //iree/testing/vulkan:iree-run-module-vulkan-gui + +# Using CMake +$ cmake --build /path/to/build/dir --target iree-run-module-vulkan-gui +``` + +The generated binary should be invoked in a console environment and it takes +the same command-line options as the main +[`iree-run-module`](./developer-overview.md#iree-run-module), except the +`--driver` option. You can use `--help` to learn them all. The binary will +launch a GUI window for use with Vulkan tools. + +### AMD + +For AMD GPUs, [Radeon GPU Profiler](https://gpuopen.com/rgp/) (RGP) is the tool +to understand fine details of how IREE GPU performs. See the +[documentation](https://radeon-gpuprofiler.readthedocs.io/en/latest/) for +details. In general the steps to get started are: + +* Download and install AMD RGP from https://gpuopen.com/rgp/. +* Compile `iree-run-module-vulkan-gui` as said in the above. +* Open "Radeon Developer Panel" and connect to the local + "Radeon Developer Service". +* Start `iree-run-module-vulkan-gui` from console with proper VM bytecode module + invocation. +* You should see it in the "Applications" panel of "Radeon Developer Panel". + Click "Capture profile" to capture. + +Afterwards you can analyze the profile with RGP. Viewing the profile does not +need the GPU anymore; it can be opened by a RGP application installed anywhere. + +### NVIDIA + +For NVIDIA GPUs, [NVIDIA Nsight Graphics](https://developer.nvidia.com/nsight-graphics) +is the tool to understand fine details of how IREE GPU performs. See the +[documentation](https://docs.nvidia.com/nsight-graphics/UserGuide/index.html) +for details. In general the steps to get started are: + +* Download and install NVIDIA Nsight Graphics from https://developer.nvidia.com/nsight-graphics. +* Compile `iree-run-module-vulkan-gui` as said in the above. +* Open NVIDIA Nsight Graphics, select "Quick Launch" on the welcome page. +* Fill out the "Application Executable" and "Command Line Arguments" to point + to `iree-run-module-vulkan-gui` and a specific VM bytecode module and its + invocation information. +* Select an "Activity" ("Frame Profiler" and "GPU Trace" are particularly + interesting) and launch. +* Capture any frame to perform analysis.
diff --git a/docs/developing_iree/profiling_with_tracy.md b/docs/developing_iree/profiling_with_tracy.md new file mode 100644 index 0000000..80eb9b0 --- /dev/null +++ b/docs/developing_iree/profiling_with_tracy.md
@@ -0,0 +1,207 @@ +# Profiling with Tracy + +[Tracy](https://github.com/wolfpld/tracy) is a profiler that puts together in a +single view: +* Both CPU and GPU profiling. +* Both sampling and instrumentation. +* Both specifics of our own process, and whole-system profiling a la "systrace". + +Since Tracy relies on instrumentation, it requires IREE binaries to be built +with a special flag to enable it. + +There are two components to Tracy. They communicate over a TCP socket. +* The "client" is the program being profiled. +* The "server" is the Tracy profiler UI. + +## The Tracy manual + +The primary source of Tracy documentation, including for build instructions, is +a PDF manual that's part of each numbered release. To find the latest one, +navigate [here](https://github.com/wolfpld/tracy/releases) and search for +`tracy.pdf`. + +## Building the Tracy UI (the "server") + +This is explained in section 2.3 of the manual for Windows and Linux. Here we +give some more detailed instructions for some systems. + +The IREE repository contains its own clone of the Tracy repository in +`third_party/tracy`, so there is no need to make a separate clone of it. You can +use one if you want, but be aware that the Tracy client/server protocol gets +updated sometimes. Building both sides from the same +`iree/third_party/tracy` lowers the risk of running into a protocol version +mismatch. + +### Linux + +Install dependencies (Debian-based distributions): +``` +sudo apt install libcapstone-dev libtbb-dev libglfw3-dev libfreetype6-dev libgtk-3-dev +``` + +Build (from your `iree/` clone root directory): +``` +make -C third_party/tracy/profiler/build/unix -j12 release +``` + +### Mac + +TODO write this (Kojo?) + +## Building IREE with Tracy instrumentation (the "client") + +IREE needs to be build with Tracy instrumentation enabled. This enables both the +collection of data, and its streaming to the Tracy server over a socket. + +This is only supported in the CMake build system of IREE, not in Bazel. + +In the initial CMake configuration command: +* Set `IREE_ENABLE_RUNTIME_TRACING` to `ON`. +* Use the `RelWithDebInfo` build type. + +For example: + +```shell +$ cmake \ + -DIREE_ENABLE_RUNTIME_TRACING=ON \ + -DCMAKE_BUILD_TYPE=RelWithDebInfo \ + ... # other cmake arguments as usual +``` +## Permissions issues + +The profiled application (i.e. the Tracy client) needs to have appropriate +permissions so perform the special I/O required to collect the profile +information. This is OS-specific. + +### Desktop Linux + +On desktop Linux, the Tracy client must be run as root, e.g. with `sudo`. +Otherwise, profile data will lack important components. + +### Android + +On Android it is not necessary to run as root and in fact, Android graphical +applications never run as root, so it's advisable to run all programs as +non-root for consistency. + +The Android device must be prepared as follows to enable Tracy profiling. +* The device must be rooted. + * That means that in `adb shell`, the command `su` must succeed. + * That does NOT mean doing `adb root`. The effect of `adb root` is to have the + `adbd` daemon itself run as root, which causes `adb shell` to give you a + root shell by default. If you are in that case, consider doing `adb unroot` + to restart the `adbd` server as non-root. Not mandatory, but again, running + anything as root on Android is a deviation from normal user conditions. +* Execute the following commands in a root shell on the device (i.e. `adb + shell`, then `su`, then the following commands). These are from the Tracy PDF + manual, but hard to find there, and copy-pasting from PDF introduces unwanted + whitespace. These settings normally persist until the next reboot of the + device. + * `setenforce 0` + * `mount -o remount,hidepid=0 /proc` + * `echo 0 > /proc/sys/kernel/perf_event_paranoid` + +## Port forwarding + +The Tracy client and server communicate by default over port `8086`. When they +run on different machines, e.g. with embedded/Android profiling or remote +profiling, port forwarding must be set up. + +### Between a computer and a local Android device connected to it by USB + +Run this command. You might need to run it again more a little frequently than +you reboot the device. When experiencing connection issues, try that first. + +```shell +adb forward tcp:8086 tcp:8086 +``` + +### Between two computers over the network + +TODO write this (`ssh` stuff...) + +## Running the profiled program + +Run your IREE workload as you normally would: now that it's been built with +Tracy instrumentation enabled, it should do all the right things automatically. + +The only change that you are likely to need in your command line is to set the +`TRACY_NO_EXIT=1` environment variable. This ensures that your program does not +exit until the Tracy server (the UI) has connected to it and finished uploading +the profile data. + +Typically, `TRACY_NO_EXIT=1` is needed when profiling `iree-benchmark-module`. +It wouldn't be needed when profiling a real user-facing application. + +Example: + +```shell +TRACY_NO_EXIT=1 /data/local/tmp/iree-benchmark-module \ + --driver=dylib \ + --function_inputs='1x384xi32,1x384xi32,1x384xi32' \ + --module_file=/data/local/tmp/android_module.fbvm \ + --entry_function=serving_default +``` + +## Running the Tracy profiler UI, connecting and visualizing + +While the profile program is still running (possibly thanks to +`TRACY_NO_EXIT=1`), start the Tracy profiler UI which we had built above. +From the IREE root directory: +```shell +./third_party/tracy/profiler/build/unix/Tracy-release +``` + +It should show a dialog offering to connect to a client i.e. a profiled program. + +If connecting doesn't work: +* If the profiled program is on a separate machine, make sure you've correctly + set up port forwarding. + * On Android, the `adb forward` many need to be run again. +* Make sure that the profiled program is still running. Do you need + `TRACY_NO_EXIT=1`? +* Kill the profiled program and restart it. + +You should then start seeing a profile. + +Before going further, check that your profile has sampling data recorded, not +just instrumentation data. Click the 'Statistics' button at the top. The window +that it opens should show a 'Sampling' radio button next to 'Instrumentation'. +Back to the main view, look for the 'Main thread' label on the far left. Next to +it should be a small ghost icon. If either of these things are missing, you need +to fix that first, because many of Tracy's interesting features rely on +smapling. Most likely this is a permissions issue: make sure that you have +performed the steps outlined in the above section on permissions. As an +experiment, retry with the profiled program running as root. Look for any +interesting `stderr` message (in the profiled program's terminal). Try +`strace`'ing it. + +Besides sampling, let's check that you also have "systrace" working. In the +initial main view, on the far left, you should see a list of CPU cores, +typically labelled `CPU0`, `CPU1`, etc. To the right, a timeline view should +show system-wide activity on all these CPUs - so for instance you can see +context switches and you can see if anything else happened on your device that +could have interfered with the benchmark's performance. If this "systrace" +functionality isn't working, then again that's likely a permissions issue, +perform the same troubleshooting as above about sampling. + +Finally, look for the part of the timeline that is of interest to you. Your area +of interest might not be on the Main thread. In fact, it might be on a thread +that's not visible in the initial view at all. Indeed, the initial view tends to +be zoomed-in a lot. Either use the mouse directly to zoom out and navigate, to +look for the 'Frame' control at the top of the Tracy window. Use the 'next +frame' arrow button until more interesting threads appear. Typically, IREE +generated code tends to run on a thread named `cpu0`, which is actually a thread +name and unrelated to `CPU0` from the systrace view. + +Once you click a zone, you should see a line-by-line or +instruction-by-instruction view annotated with the percentage of time spent. If +you don't see the percentages, then again that would point to sampling not +working, refer to the above steps. + +## Configuring Tracy instrumentation + +Set IREE's `IREE_TRACING_MODE` value (defined in +[iree/base/tracing.h](https://github.com/google/iree/blob/main/iree/base/tracing.h)) +to adjust which tracing features, such as allocation tracking and callstacks, +are enabled.
diff --git a/scripts/android_tracy_tutorial.sh b/scripts/android_tracy_tutorial.sh deleted file mode 100755 index 812c09e..0000000 --- a/scripts/android_tracy_tutorial.sh +++ /dev/null
@@ -1,261 +0,0 @@ -#!/bin/bash - -# Copyright 2020 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# This script assumes Debian Linux. It could easily be modified to run on any Unix, -# as the Debian assumption is only to automatically install some packages, -# (Tracy dependencies). With a bit more work it could be made to support other OSes -# such as Windows, but that's not at all handled yet. - -function print_status { - echo -e "\e[96m$@\e[39m" -} - -# Environment variables. They can be set manually, or will use the following defaults. -# IREE_ROOT and ANDROID_NDK are empty by default, must be defined by the user. -# -# To make this script very easy to play with, this script will also default to -# benchmarking a specific NN model (currently MobileBert encoder), and will take -# care of generating its input MLIR form and of setting the right --function_inputs -# and --entry_function for it. To use this script on any other specific NN model, -# define the following environment variables: -# IREE_INPUT_MLIR -# FUNCTION_INPUTS -# ENTRY_FUNCTION - -DEFAULT_IREE_INPUT_MLIR="/tmp/iree/modules/MobileBertSquad/iree_input.mlir" - -: ${IREE_ROOT:=""} -: ${IREE_BUILD_ANDROID:="$HOME/iree-build-android"} -: ${TRACY_ROOT:="$HOME/tracy"} -: ${PYTHON_BIN:=python3} -: ${CC:=clang} -: ${CXX:=clang++} -: ${IREE_INPUT_MLIR:="${DEFAULT_IREE_INPUT_MLIR}"} -: ${ANDROID_NDK:=""} -: ${IREE_LLVMAOT_LINKER_PATH:="$ANDROID_NDK/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android30-clang++ -static-libstdc++ -O3"} -: ${FUNCTION_INPUTS:=""} -: ${ENTRY_FUNCTION:=""} -# Validation of the environment variables. - -if [ -z "${IREE_ROOT}" ] -then - print_status "Please define IREE_ROOT to point to your IREE git clone." - print_status "Example:" - print_status " IREE_ROOT=\$HOME/iree $0" - exit 1 -fi - -if [ ! -f "${IREE_ROOT}/iree/base/CMakeLists.txt" ] -then - print_status "FATAL: bad IREE_ROOT (${IREE_ROOT})." - exit 1 -fi - -if [ -z "${ANDROID_NDK}" ] -then - print_status "Please define ANDROID_NDK to point to your Android NDK." - print_status "Example:" - print_status " ANDROID_NDK=\$HOME/android-ndk-r21d $0" - exit 1 -fi - -if [ ! -d "${ANDROID_NDK}/toolchains/llvm" ] -then - print_status "FATAL: bad ANDROID_NDK (${ANDROID_NDK})." - exit 1 -fi - -if ! $CC --version 1>/dev/null 2>/dev/null -then - print_status "FATAL: Install $CC and set CC to it." - exit 1 -fi - -if ! $CXX --version 1>/dev/null 2>/dev/null -then - print_status "FATAL: Install $CXX and set CXX to it." - exit 1 -fi - -if ! $IREE_LLVMAOT_LINKER_PATH --version 1>/dev/null 2>/dev/null -then - print_status "Bad IREE_LLVMAOT_LINKER_PATH (${IREE_LLVMAOT_LINKER_PATH}). Rerun with it correctly set." - exit 1 -fi - -# If we're playing with the default input MLIR, preset some reasonable -# FUNCTION_INPUTS and ENTRY_FUNCTION if they haven't been set. -# Otherwise, they must be provided. -if [[ -z "${FUNCTION_INPUTS}" || -z "${ENTRY_FUNCTION}" ]] -then - if [[ "${IREE_INPUT_MLIR}" == "${DEFAULT_IREE_INPUT_MLIR}" ]] - then - FUNCTION_INPUTS="1x384xi32,1x384xi32,1x384xi32" - ENTRY_FUNCTION="serving_default" - else - print_status "Please specify FUNCTION_INPUTS and ENTRY_FUNCTION appropriately for your IREE_INPUT_MLIR (${IREE_INPUT_MLIR})" - fi -fi - -print_status "Running with the following environment variables:" -print_status "ANDROID_NDK=${ANDROID_NDK}" -print_status "IREE_LLVMAOT_LINKER_PATH=${IREE_LLVMAOT_LINKER_PATH}" -print_status "IREE_ROOT=${IREE_ROOT}" -print_status "IREE_BUILD_ANDROID=${IREE_BUILD_ANDROID}" -print_status "TRACY_ROOT=${TRACY_ROOT}" -print_status "PYTHON_BIN=${PYTHON_BIN}" -print_status "CC=${CC}" -print_status "CXX=${CXX}" -print_status "IREE_INPUT_MLIR=${IREE_INPUT_MLIR}" -print_status "FUNCTION_INPUTS=${FUNCTION_INPUTS}" -print_status "ENTRY_FUNCTION=${ENTRY_FUNCTION}" - -echo - -print_status "Ensuring that we have Tracy source code..." -if [ ! -d ${TRACY_ROOT}/profiler/build/unix/ ] -then - print_status "Tracy not found at ${TRACY_ROOT}. Either set TRACY_ROOT, or we're going to clone the git repository now." - read -p "Press the return key..." - git clone https://github.com/wolfpld/tracy "${TRACY_ROOT}" -fi - -echo - -print_status "Ensuring that the Tracy profiler is built..." -if [ ! -x "${TRACY_ROOT}/profiler/build/unix/Tracy-release" ] -then - print_status "Checking Tracy dependencies - assuming Debian." - TRACY_DEPS="libcapstone-dev libtbb-dev libglfw3-dev libfreetype6-dev libgtk-3-dev" - TRACY_DEPS_COUNT=5 - TRACY_DEPS_INSTALLED="$(apt list $TRACY_DEPS) 2>/dev/null | grep installed | wc -l" - if [ $TRACY_DEPS_INSTALLED != $TRACY_DEPS_COUNT ] - then - print_status "Installing dependencies now - assuming Debian." - sudo apt install $TRACY_DEPS - fi - make -C "${TRACY_ROOT}/profiler/build/unix" -j12 release -fi - -echo - -print_status "Ensuring that we have the input MLIR file..." -if [ ! -f "${IREE_INPUT_MLIR}" ] -then - print_status "Set IREE_INPUT_MLIR to point to some iree input MLIR file. Not found at current value ${IREE_INPUT_MLIR}." - if [ "${IREE_INPUT_MLIR}" == "${DEFAULT_IREE_INPUT_MLIR}" ] - then - print_status "Okay, we actually know how to generate that file, ${IREE_INPUT_MLIR}, but it will take a while." - print_status "Press the return key to continue..." - pushd "${IREE_ROOT}" - scripts/get_e2e_artifacts.py --test_suites=mobile_bert_squad_tests - popd - fi -fi - -if [ ! -f "${IREE_INPUT_MLIR}" ] -then - print_status "FATAL: we should have ${IREE_INPUT_MLIR} by that point." - exit 1 -fi - -echo - -print_status "Building IREE for Android in ${IREE_BUILD_ANDROID}..." - -mkdir -p "${IREE_BUILD_ANDROID}" -pushd "${IREE_BUILD_ANDROID}" - -cmake -G Ninja ../iree \ - -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \ - -DANDROID_ABI="arm64-v8a" \ - -DANDROID_PLATFORM=android-30 \ - -DCMAKE_BUILD_TYPE=RelWithDebInfo \ - -DIREE_BUILD_COMPILER=OFF \ - -DIREE_BUILD_SAMPLES=OFF \ - -DIREE_HOST_C_COMPILER=`which "$CC"` \ - -DIREE_HOST_CXX_COMPILER=`which "$CXX"` \ - -DIREE_ENABLE_RUNTIME_TRACING=ON - -cmake --build . --target \ - iree_tools_iree-translate \ - iree_tools_iree-benchmark-module - -popd -echo - -print_status "Compiling the input MLIR file into a IREE module..." - -IREE_COMPILED_MODULE=/tmp/android_module.vmfb -IREE_LOG=/tmp/iree-translate.log - -rm -rf "${IREE_COMPILED_MODULE}" - -IREE_LLVMAOT_LINKER_PATH="${IREE_LLVMAOT_LINKER_PATH}" \ - "${IREE_BUILD_ANDROID}/host/bin/iree-translate" \ - --iree-hal-target-backends=dylib-llvm-aot \ - --iree-mlir-to-vm-bytecode-module \ - --iree-llvm-target-triple=aarch64-linux-android \ - /tmp/iree/modules/MobileBertSquad/iree_input.mlir \ - -o "${IREE_COMPILED_MODULE}" \ - 2>"${IREE_LOG}" - -if [ ! -f "${IREE_COMPILED_MODULE}" ] -then - print_status "iree-translate failed to produce ${IREE_COMPILED_MODULE}. Log saved in ${IREE_LOG}. First few lines:" - # The whole log might be enormous if it contains a big MLIR dump. - head -n10 "${IREE_LOG}" - print_status "tip: check if IREE_LLVMAOT_LINKER_PATH was correctly set." - exit 1 -fi - -echo - -DEVICE_IREE_COMPILED_MODULE=/data/local/tmp/android_module.vmfb - -print_status "Pushing the compiled module to the device..." -adb push "${IREE_COMPILED_MODULE}" "${DEVICE_IREE_COMPILED_MODULE}" -echo - -print_status "Pushing the IREE benchmarking program to the device..." -adb push "${IREE_BUILD_ANDROID}"/iree/tools/iree-benchmark-module /data/local/tmp -echo - -print_status "Setting up TCP port forwarding to let Tracy connect with the benchmark running on the device..." -adb forward tcp:8086 tcp:8086 -echo - -print_status "Now you can launch the Tracy UI in another shell and hit \"Connect\", while we run the benchmark on the device in this shell..." -print_status "Run this command in another shell:" -print_status " ${TRACY_ROOT}/profiler/build/unix/Tracy-release" -echo - -print_status "Running the benchmark... hit Ctrl-C to terminate it after Tracy is done with it." -# `TRACY_NO_EXIT=1` is to prevent it from exiting at the end: that's needed for profiling -# short-running tasks. -# `taskset 80` selects which CPU core to run on. On Pixel4, `taskset 80` gives the biggest -# core, which can get some reproducibility, as long as we don't run into thermal issues. -# `taskset 0f` would give the little cores, avoiding thermal issues but running slower and -# requiring different optimization work to be efficient on. -# `--driver=dylib` is to use the LLVM AOT generated code backend. -adb shell \ - TRACY_NO_EXIT=1 \ - taskset 80 \ - data/local/tmp/iree-benchmark-module \ - --driver=dylib \ - --module_file="${DEVICE_IREE_COMPILED_MODULE}" \ - --function_inputs="${FUNCTION_INPUTS}" \ - --entry_function="${ENTRY_FUNCTION}"
diff --git a/scripts/prepare_doc_publication.py b/scripts/prepare_doc_publication.py index 606c11c..20ad5a8 100755 --- a/scripts/prepare_doc_publication.py +++ b/scripts/prepare_doc_publication.py
@@ -62,6 +62,9 @@ 'objectives.md': 'Short-term Objectives', 'tensorflow_integrations.md': 'TensorFlow Integrations', 'e2e_benchmarking.md': 'Benchmarking TensorFlow with IREE and TFLite', + 'profiling_cpu_events.md': 'Profiling CPU events', + 'profiling_with_tracy.md': 'Profiling with Tracy', + 'profiling_vulkan_gpu.md': 'Profiling Vulkan GPU', } # A dictionary containing source file to permanent link mappings.