start an overhaul of profiling docs (#4060) * start an overhaul of profiling docs. profiling.md becomes just a table of contents pointing to more specialized pages, so each remains manageable. specialized pages are named profiling_*.md so they're easily discoverable from an alphabetical directory listing. Tracy docs mostly rebooted. Includes Android/Linux details. Placeholders for other use cases (Mac, remote over network) to be filled by others! Moved the Vulkan profiling docs to separate doc, unchanged. Added a stub of perf/simpleperf CPU cache events doc. Deleted by `android_tracy_tutorial.sh`. Was incomplete/rotting/no longer useful.
diff --git a/docs/developing_iree/profiling.md b/docs/developing_iree/profiling.md index f1864d6..c73e532 100644 --- a/docs/developing_iree/profiling.md +++ b/docs/developing_iree/profiling.md
@@ -1,261 +1,25 @@ # Profiling IREE [benchmarking](./benchmarking.md) gives us an accurate and reproducible -view of program performance at specific levels of granularity. To analyze -system behavior in more depth, there are various ways to -[profile](https://en.wikipedia.org/wiki/Profiling_(computer_programming)) -IREE. +view of program performance at specific levels of granularity. To analyze system +behavior in more depth, there are various ways to +[profile](https://en.wikipedia.org/wiki/Profiling_(computer_programming)) IREE. -## Whole-system Profiling with Tracy +## Tracy -IREE uses Tracy as the main tool to perform whole-system profiling. -[Tracy](https://github.com/wolfpld/tracy) is a real-time, nanosecond resolution, -remote telemetry, hybrid frame and sampling profiler. Tracy can profile CPU, -GPU, memory, locks, context switches, and much more. - -### Building Tracy - -To use tracing in IREE, you need to build IREE with following requirements: - -* Set `IREE_ENABLE_RUNTIME_TRACING` to `ON`. -* Use Release/RelWithDebInfo build. - -For example: - -```shell -$ export IREE_DEFAULT_COPTS='-DNDEBUG' -$ cmake -B build/ \ - -DIREE_ENABLE_RUNTIME_TRACING=ON \ - -DCMAKE_BUILD_TYPE=RelWithDebInfo -``` - -The above compiles IREE with Tracy APIs so that IREE will stream profiling data -back to Tracy when running. To collect and analyze these data, you can either -use GUI or CLI tools. Tracy profiler is the GUI tool. You can find the -Tracy manual on its [releases page](https://github.com/wolfpld/tracy/releases) -for more details on Tracy itself. - -#### Building on Linux - -To build the profiler on Linux, you may need to install some external -libraries. Some Linux distributions will require you to add a `lib` prefix and a -`-dev`, or `-devel` postfix to library names. For example, you might see the -error: - -``` -Package glfw3 was not found in the pkg-config search path. -``` - -and then you could try to install `libglfw3-dev`. - -Instructions to build Tracy profiler: - -```shell -$ cd third_party/tracy/profiler/build/unix -$ make release -``` - -### Using Tracy - -Launch the profiler UI and click connect to start waiting for a traced program -to running. Now you can launch the IREE binary you want to trace and Tracy -should connect automatically and stream data. For example: - -Compile a .mlir file using `iree-translate`: - -```shell -$ build/iree/tools/iree-translate \ - -iree-mlir-to-vm-bytecode-module \ - -iree-hal-target-backends=vmla \ - $PWD/iree/tools/test/simple.mlir \ - -o /tmp/simple.vmfb -``` - -Run a compiled module once: - -```shell -$ build/iree/tools/iree-run-module \ - --module_file=/tmp/simple.vmfb \ - --driver=vmla \ - --entry_function=abs \ - --function_inputs="i32=-2" -``` - -Benchmark a compiled module, running it many times: - -```shell -$ build/iree/tools/iree-benchmark-module \ - --module_file=/tmp/simple.vmfb \ - --driver=vmla \ - --entry_function=abs \ - --function_inputs="i32=-2" -``` - -> Note:<br> -> IREE binaries may finish running before even -> connecting to Tracy. For such cases, you can set `TRACY_NO_EXIT=1` in the -> environment to keep the IREE binary alive until Tracy connects to it. - -### Configuring Tracy - -Set IREE's `IREE_TRACING_MODE` value (defined in -[iree/base/tracing.h](https://github.com/google/iree/blob/main/iree/base/tracing.h)) -to adjust which tracing features, such as allocation tracking and callstacks, -are enabled. - -In order for Tracy to record detailed statistics via sampling, the program -collecting data must be run using elevated permissions (Administrator on Windows, -root on Linux, rooted Android device). See Tracy's user manual for more -information. +Tracy is a profiler that's been used for a wide range of profiling tasks on +IREE. Refer to [profiling_with_tracy.md](./profiling_with_tracy.md). ## Vulkan GPU Profiling -Tracy offers great insights into CPU/GPU interactions and Vulkan API usage -details. However, information at a finer granularity, especially inside a -particular shader dispatch, is missing. To supplement general purpose tools -like Tracy, vendor-specific tools can be used. +[Tracy](./profiling_with_tracy.md) offers great insights into CPU/GPU +interactions and Vulkan API usage details. However, information at a finer +granularity, especially inside a particular shader dispatch, is missing. To +supplement general purpose tools like Tracy, vendor-specific tools can be used. +Refer to [profiling_vulkan_gpu.md](./profiling_vulkan_gpu.md). -(TODO: add some pictures for each tool) +## CPU cache and other CPU event profiling -### Android GPUs - -There are multiple GPU vendors for the Android platforms, each offering their -own tools. [Android GPU Inspector](https://gpuinspector.dev/) -(AGI) provides a cross-vendor solution. See the -[documentation](https://gpuinspector.dev/docs/) for more details. - -#### Build Android app to run IREE - -In order to perform capture and analysis with AGI, you will need a full Android -app. In IREE we have a simple Android native app wrapper to help package -IREE core libraries together with a specific VM bytecode invocation into an -Android app. The wrapper and its documentation are placed at -[`iree/tools/android/run_module_app/`](https://github.com/google/iree/tree/main/iree/tools/android/run_module_app). - -For example, to package a module compiled from the following `mhlo-dot.mlir` as -an Android app: - -```mlir -func @dot(%lhs: tensor<2x4xf32>, %rhs: tensor<4x2xf32>) -> tensor<2x2xf32> - attributes { iree.vmfb.export } { - %0 = "mhlo.dot"(%lhs, %rhs) : (tensor<2x4xf32>, tensor<4x2xf32>) -> tensor<2x2xf32> - return %0 : tensor<2x2xf32> -} -``` - -```shell -# First translate into a VM bytecode module -$ /path/to/iree/build/iree/tools/iree-translate -- \ - -iree-mlir-to-vm-bytecode-module \ - --iree-hal-target-backends=vulkan \ - /path/to/mhlo-dot.mlir \ - -o /tmp/mhlo-dot.vmfb - -# Then package the Android app -$ /path/to/iree/source/iree/tools/android/run_module_app/build_apk.sh \ - ./build-apk \ - --module_file /tmp/mhlo-dot.vmfb \ - --entry_function dot \ - --function_inputs_file /path/to/inputs/file \ - --driver vulkan -``` - -Where `/path/to/input/file` is a file containing inputs to `dot`, for example: - -``` -2x4xf32=[[1.0 2.0 3.0 4.0][5.0 6.0 7.0 8.0]] -4x2xf32=[[9.0 10.0][11.0 12.0][13.0 14.0][15.0 16.0]] -``` - -The above will build an `iree-run-module.apk` under the `./build-apk/` -directory, which you can then install via `adb install`. - -`build_apk.sh` needs the Android SDK and NDK internally, an easy way to manage -them is by installing [Android Studio](https://developer.android.com/studio). -After installation, you will need to set up a few environment variables, which -are printed at the beginning of `build_apk.sh` invocation. - -#### Capture and analyze with AGI - -You can follow AGI's -[Getting Started](https://gpuinspector.dev/docs/getting-started) page to learn -how to use it. In general the steps are: - -* Install the latest AGI from https://github.com/google/agi/releases and launch. -* Fill in the "Application" field by searching the app. The line should read - like `android.intent.action.MAIN:com.google.iree.run_module/android.app.NativeActivity`. -* Select start at beginning and choose a proper duration. -* Configure system profile to include all GPU counters. -* Start capture. - -Generated traces are in the [perfetto](https://perfetto.dev/) format. They can -be viewed directly within AGI and also online in a browser at -https://ui.perfetto.dev/, without needing an Android device. - -### Desktop GPUs - -Vulkan supports both graphics and compute, but most tools in the Vulkan -ecosystem focus on graphics. As a result, some Vulkan profiling tools expect -commands to correspond to a sequence of frames presented to displays via -framebuffers. This means additional steps for IREE and other Vulkan -applications that solely rely on headless compute. For graphics-focused tools, -we need to wrap IREE's logic inside a dummy rendering loop in order to provide -the necessary markers for these tools to perform capture and analysis. - -IREE provides an `iree-run-module-vulkan-gui` binary that can invoke a specific -bytecode module within a proper GUI application. The graphics side is leveraging -[Dear ImGui](https://github.com/ocornut/imgui); it calls into IREE -synchronously during rendering each frame and prints the bytecode invocation -results to the screen. - -To build `iree-run-module-vulkan-gui`: - -```shell -# Using Bazel -$ bazel build //iree/testing/vulkan:iree-run-module-vulkan-gui - -# Using CMake -$ cmake --build /path/to/build/dir --target iree-run-module-vulkan-gui -``` - -The generated binary should be invoked in a console environment and it takes -the same command-line options as the main -[`iree-run-module`](./developer-overview.md#iree-run-module), except the -`--driver` option. You can use `--help` to learn them all. The binary will -launch a GUI window for use with Vulkan tools. - -#### AMD - -For AMD GPUs, [Radeon GPU Profiler](https://gpuopen.com/rgp/) (RGP) is the tool -to understand fine details of how IREE GPU performs. See the -[documentation](https://radeon-gpuprofiler.readthedocs.io/en/latest/) for -details. In general the steps to get started are: - -* Download and install AMD RGP from https://gpuopen.com/rgp/. -* Compile `iree-run-module-vulkan-gui` as said in the above. -* Open "Radeon Developer Panel" and connect to the local - "Radeon Developer Service". -* Start `iree-run-module-vulkan-gui` from console with proper VM bytecode module - invocation. -* You should see it in the "Applications" panel of "Radeon Developer Panel". - Click "Capture profile" to capture. - -Afterwards you can analyze the profile with RGP. Viewing the profile does not -need the GPU anymore; it can be opened by a RGP application installed anywhere. - -#### NVIDIA - -For NVIDIA GPUs, [NVIDIA Nsight Graphics](https://developer.nvidia.com/nsight-graphics) -is the tool to understand fine details of how IREE GPU performs. See the -[documentation](https://docs.nvidia.com/nsight-graphics/UserGuide/index.html) -for details. In general the steps to get started are: - -* Download and install NVIDIA Nsight Graphics from https://developer.nvidia.com/nsight-graphics. -* Compile `iree-run-module-vulkan-gui` as said in the above. -* Open NVIDIA Nsight Graphics, select "Quick Launch" on the welcome page. -* Fill out the "Application Executable" and "Command Line Arguments" to point - to `iree-run-module-vulkan-gui` and a specific VM bytecode module and its - invocation information. -* Select an "Activity" ("Frame Profiler" and "GPU Trace" are particularly - interesting) and launch. -* Capture any frame to perform analysis. +For some advanced CPU profiling needs such as querying CPU cache and other +events, one may need to use some OS-specific profilers. See +[profiling_cpu_events.md](./profiling_cpu_events.md). \ No newline at end of file
diff --git a/docs/developing_iree/profiling_cpu_events.md b/docs/developing_iree/profiling_cpu_events.md new file mode 100644 index 0000000..b6beb99 --- /dev/null +++ b/docs/developing_iree/profiling_cpu_events.md
@@ -0,0 +1,13 @@ +# CPU cache and other CPU event profiling + +## Android + +On Android we can use [`simpleperf`](https://developer.android.com/ndk/guides/simpleperf). It's preinstalled on current Android `userdebug` images, and part of the Android NDK. + +TODO write me + +## Linux + +On desktop Linux we can use [`perf`](https://perf.wiki.kernel.org/index.php/Main_Page). + +TODO write me
diff --git a/docs/developing_iree/profiling_vulkan_gpu.md b/docs/developing_iree/profiling_vulkan_gpu.md new file mode 100644 index 0000000..c57d573 --- /dev/null +++ b/docs/developing_iree/profiling_vulkan_gpu.md
@@ -0,0 +1,152 @@ +# Vulkan GPU Profiling + +[Tracy](./profiling_with_tracy.md) offers great insights into CPU/GPU +interactions and Vulkan API usage +details. However, information at a finer granularity, especially inside a +particular shader dispatch, is missing. To supplement general purpose tools +like Tracy, vendor-specific tools can be used. + +(TODO: add some pictures for each tool) + +## Android GPUs + +There are multiple GPU vendors for the Android platforms, each offering their +own tools. [Android GPU Inspector](https://gpuinspector.dev/) +(AGI) provides a cross-vendor solution. See the +[documentation](https://gpuinspector.dev/docs/) for more details. + +### Build Android app to run IREE + +In order to perform capture and analysis with AGI, you will need a full Android +app. In IREE we have a simple Android native app wrapper to help package +IREE core libraries together with a specific VM bytecode invocation into an +Android app. The wrapper and its documentation are placed at +[`iree/tools/android/run_module_app/`](https://github.com/google/iree/tree/main/iree/tools/android/run_module_app). + +For example, to package a module compiled from the following `mhlo-dot.mlir` as +an Android app: + +```mlir +func @dot(%lhs: tensor<2x4xf32>, %rhs: tensor<4x2xf32>) -> tensor<2x2xf32> + attributes { iree.vmfb.export } { + %0 = "mhlo.dot"(%lhs, %rhs) : (tensor<2x4xf32>, tensor<4x2xf32>) -> tensor<2x2xf32> + return %0 : tensor<2x2xf32> +} +``` + +```shell +# First translate into a VM bytecode module +$ /path/to/iree/build/iree/tools/iree-translate -- \ + -iree-mlir-to-vm-bytecode-module \ + --iree-hal-target-backends=vulkan \ + /path/to/mhlo-dot.mlir \ + -o /tmp/mhlo-dot.vmfb + +# Then package the Android app +$ /path/to/iree/source/iree/tools/android/run_module_app/build_apk.sh \ + ./build-apk \ + --module_file /tmp/mhlo-dot.vmfb \ + --entry_function dot \ + --function_inputs_file /path/to/inputs/file \ + --driver vulkan +``` + +Where `/path/to/input/file` is a file containing inputs to `dot`, for example: + +``` +2x4xf32=[[1.0 2.0 3.0 4.0][5.0 6.0 7.0 8.0]] +4x2xf32=[[9.0 10.0][11.0 12.0][13.0 14.0][15.0 16.0]] +``` + +The above will build an `iree-run-module.apk` under the `./build-apk/` +directory, which you can then install via `adb install`. + +`build_apk.sh` needs the Android SDK and NDK internally, an easy way to manage +them is by installing [Android Studio](https://developer.android.com/studio). +After installation, you will need to set up a few environment variables, which +are printed at the beginning of `build_apk.sh` invocation. + +### Capture and analyze with AGI + +You can follow AGI's +[Getting Started](https://gpuinspector.dev/docs/getting-started) page to learn +how to use it. In general the steps are: + +* Install the latest AGI from https://github.com/google/agi/releases and launch. +* Fill in the "Application" field by searching the app. The line should read + like `android.intent.action.MAIN:com.google.iree.run_module/android.app.NativeActivity`. +* Select start at beginning and choose a proper duration. +* Configure system profile to include all GPU counters. +* Start capture. + +Generated traces are in the [perfetto](https://perfetto.dev/) format. They can +be viewed directly within AGI and also online in a browser at +https://ui.perfetto.dev/, without needing an Android device. + +## Desktop GPUs + +Vulkan supports both graphics and compute, but most tools in the Vulkan +ecosystem focus on graphics. As a result, some Vulkan profiling tools expect +commands to correspond to a sequence of frames presented to displays via +framebuffers. This means additional steps for IREE and other Vulkan +applications that solely rely on headless compute. For graphics-focused tools, +we need to wrap IREE's logic inside a dummy rendering loop in order to provide +the necessary markers for these tools to perform capture and analysis. + +IREE provides an `iree-run-module-vulkan-gui` binary that can invoke a specific +bytecode module within a proper GUI application. The graphics side is leveraging +[Dear ImGui](https://github.com/ocornut/imgui); it calls into IREE +synchronously during rendering each frame and prints the bytecode invocation +results to the screen. + +To build `iree-run-module-vulkan-gui`: + +```shell +# Using Bazel +$ bazel build //iree/testing/vulkan:iree-run-module-vulkan-gui + +# Using CMake +$ cmake --build /path/to/build/dir --target iree-run-module-vulkan-gui +``` + +The generated binary should be invoked in a console environment and it takes +the same command-line options as the main +[`iree-run-module`](./developer-overview.md#iree-run-module), except the +`--driver` option. You can use `--help` to learn them all. The binary will +launch a GUI window for use with Vulkan tools. + +### AMD + +For AMD GPUs, [Radeon GPU Profiler](https://gpuopen.com/rgp/) (RGP) is the tool +to understand fine details of how IREE GPU performs. See the +[documentation](https://radeon-gpuprofiler.readthedocs.io/en/latest/) for +details. In general the steps to get started are: + +* Download and install AMD RGP from https://gpuopen.com/rgp/. +* Compile `iree-run-module-vulkan-gui` as said in the above. +* Open "Radeon Developer Panel" and connect to the local + "Radeon Developer Service". +* Start `iree-run-module-vulkan-gui` from console with proper VM bytecode module + invocation. +* You should see it in the "Applications" panel of "Radeon Developer Panel". + Click "Capture profile" to capture. + +Afterwards you can analyze the profile with RGP. Viewing the profile does not +need the GPU anymore; it can be opened by a RGP application installed anywhere. + +### NVIDIA + +For NVIDIA GPUs, [NVIDIA Nsight Graphics](https://developer.nvidia.com/nsight-graphics) +is the tool to understand fine details of how IREE GPU performs. See the +[documentation](https://docs.nvidia.com/nsight-graphics/UserGuide/index.html) +for details. In general the steps to get started are: + +* Download and install NVIDIA Nsight Graphics from https://developer.nvidia.com/nsight-graphics. +* Compile `iree-run-module-vulkan-gui` as said in the above. +* Open NVIDIA Nsight Graphics, select "Quick Launch" on the welcome page. +* Fill out the "Application Executable" and "Command Line Arguments" to point + to `iree-run-module-vulkan-gui` and a specific VM bytecode module and its + invocation information. +* Select an "Activity" ("Frame Profiler" and "GPU Trace" are particularly + interesting) and launch. +* Capture any frame to perform analysis.
diff --git a/docs/developing_iree/profiling_with_tracy.md b/docs/developing_iree/profiling_with_tracy.md new file mode 100644 index 0000000..80eb9b0 --- /dev/null +++ b/docs/developing_iree/profiling_with_tracy.md
@@ -0,0 +1,207 @@ +# Profiling with Tracy + +[Tracy](https://github.com/wolfpld/tracy) is a profiler that puts together in a +single view: +* Both CPU and GPU profiling. +* Both sampling and instrumentation. +* Both specifics of our own process, and whole-system profiling a la "systrace". + +Since Tracy relies on instrumentation, it requires IREE binaries to be built +with a special flag to enable it. + +There are two components to Tracy. They communicate over a TCP socket. +* The "client" is the program being profiled. +* The "server" is the Tracy profiler UI. + +## The Tracy manual + +The primary source of Tracy documentation, including for build instructions, is +a PDF manual that's part of each numbered release. To find the latest one, +navigate [here](https://github.com/wolfpld/tracy/releases) and search for +`tracy.pdf`. + +## Building the Tracy UI (the "server") + +This is explained in section 2.3 of the manual for Windows and Linux. Here we +give some more detailed instructions for some systems. + +The IREE repository contains its own clone of the Tracy repository in +`third_party/tracy`, so there is no need to make a separate clone of it. You can +use one if you want, but be aware that the Tracy client/server protocol gets +updated sometimes. Building both sides from the same +`iree/third_party/tracy` lowers the risk of running into a protocol version +mismatch. + +### Linux + +Install dependencies (Debian-based distributions): +``` +sudo apt install libcapstone-dev libtbb-dev libglfw3-dev libfreetype6-dev libgtk-3-dev +``` + +Build (from your `iree/` clone root directory): +``` +make -C third_party/tracy/profiler/build/unix -j12 release +``` + +### Mac + +TODO write this (Kojo?) + +## Building IREE with Tracy instrumentation (the "client") + +IREE needs to be build with Tracy instrumentation enabled. This enables both the +collection of data, and its streaming to the Tracy server over a socket. + +This is only supported in the CMake build system of IREE, not in Bazel. + +In the initial CMake configuration command: +* Set `IREE_ENABLE_RUNTIME_TRACING` to `ON`. +* Use the `RelWithDebInfo` build type. + +For example: + +```shell +$ cmake \ + -DIREE_ENABLE_RUNTIME_TRACING=ON \ + -DCMAKE_BUILD_TYPE=RelWithDebInfo \ + ... # other cmake arguments as usual +``` +## Permissions issues + +The profiled application (i.e. the Tracy client) needs to have appropriate +permissions so perform the special I/O required to collect the profile +information. This is OS-specific. + +### Desktop Linux + +On desktop Linux, the Tracy client must be run as root, e.g. with `sudo`. +Otherwise, profile data will lack important components. + +### Android + +On Android it is not necessary to run as root and in fact, Android graphical +applications never run as root, so it's advisable to run all programs as +non-root for consistency. + +The Android device must be prepared as follows to enable Tracy profiling. +* The device must be rooted. + * That means that in `adb shell`, the command `su` must succeed. + * That does NOT mean doing `adb root`. The effect of `adb root` is to have the + `adbd` daemon itself run as root, which causes `adb shell` to give you a + root shell by default. If you are in that case, consider doing `adb unroot` + to restart the `adbd` server as non-root. Not mandatory, but again, running + anything as root on Android is a deviation from normal user conditions. +* Execute the following commands in a root shell on the device (i.e. `adb + shell`, then `su`, then the following commands). These are from the Tracy PDF + manual, but hard to find there, and copy-pasting from PDF introduces unwanted + whitespace. These settings normally persist until the next reboot of the + device. + * `setenforce 0` + * `mount -o remount,hidepid=0 /proc` + * `echo 0 > /proc/sys/kernel/perf_event_paranoid` + +## Port forwarding + +The Tracy client and server communicate by default over port `8086`. When they +run on different machines, e.g. with embedded/Android profiling or remote +profiling, port forwarding must be set up. + +### Between a computer and a local Android device connected to it by USB + +Run this command. You might need to run it again more a little frequently than +you reboot the device. When experiencing connection issues, try that first. + +```shell +adb forward tcp:8086 tcp:8086 +``` + +### Between two computers over the network + +TODO write this (`ssh` stuff...) + +## Running the profiled program + +Run your IREE workload as you normally would: now that it's been built with +Tracy instrumentation enabled, it should do all the right things automatically. + +The only change that you are likely to need in your command line is to set the +`TRACY_NO_EXIT=1` environment variable. This ensures that your program does not +exit until the Tracy server (the UI) has connected to it and finished uploading +the profile data. + +Typically, `TRACY_NO_EXIT=1` is needed when profiling `iree-benchmark-module`. +It wouldn't be needed when profiling a real user-facing application. + +Example: + +```shell +TRACY_NO_EXIT=1 /data/local/tmp/iree-benchmark-module \ + --driver=dylib \ + --function_inputs='1x384xi32,1x384xi32,1x384xi32' \ + --module_file=/data/local/tmp/android_module.fbvm \ + --entry_function=serving_default +``` + +## Running the Tracy profiler UI, connecting and visualizing + +While the profile program is still running (possibly thanks to +`TRACY_NO_EXIT=1`), start the Tracy profiler UI which we had built above. +From the IREE root directory: +```shell +./third_party/tracy/profiler/build/unix/Tracy-release +``` + +It should show a dialog offering to connect to a client i.e. a profiled program. + +If connecting doesn't work: +* If the profiled program is on a separate machine, make sure you've correctly + set up port forwarding. + * On Android, the `adb forward` many need to be run again. +* Make sure that the profiled program is still running. Do you need + `TRACY_NO_EXIT=1`? +* Kill the profiled program and restart it. + +You should then start seeing a profile. + +Before going further, check that your profile has sampling data recorded, not +just instrumentation data. Click the 'Statistics' button at the top. The window +that it opens should show a 'Sampling' radio button next to 'Instrumentation'. +Back to the main view, look for the 'Main thread' label on the far left. Next to +it should be a small ghost icon. If either of these things are missing, you need +to fix that first, because many of Tracy's interesting features rely on +smapling. Most likely this is a permissions issue: make sure that you have +performed the steps outlined in the above section on permissions. As an +experiment, retry with the profiled program running as root. Look for any +interesting `stderr` message (in the profiled program's terminal). Try +`strace`'ing it. + +Besides sampling, let's check that you also have "systrace" working. In the +initial main view, on the far left, you should see a list of CPU cores, +typically labelled `CPU0`, `CPU1`, etc. To the right, a timeline view should +show system-wide activity on all these CPUs - so for instance you can see +context switches and you can see if anything else happened on your device that +could have interfered with the benchmark's performance. If this "systrace" +functionality isn't working, then again that's likely a permissions issue, +perform the same troubleshooting as above about sampling. + +Finally, look for the part of the timeline that is of interest to you. Your area +of interest might not be on the Main thread. In fact, it might be on a thread +that's not visible in the initial view at all. Indeed, the initial view tends to +be zoomed-in a lot. Either use the mouse directly to zoom out and navigate, to +look for the 'Frame' control at the top of the Tracy window. Use the 'next +frame' arrow button until more interesting threads appear. Typically, IREE +generated code tends to run on a thread named `cpu0`, which is actually a thread +name and unrelated to `CPU0` from the systrace view. + +Once you click a zone, you should see a line-by-line or +instruction-by-instruction view annotated with the percentage of time spent. If +you don't see the percentages, then again that would point to sampling not +working, refer to the above steps. + +## Configuring Tracy instrumentation + +Set IREE's `IREE_TRACING_MODE` value (defined in +[iree/base/tracing.h](https://github.com/google/iree/blob/main/iree/base/tracing.h)) +to adjust which tracing features, such as allocation tracking and callstacks, +are enabled.