start an overhaul of profiling docs (#4060)
* start an overhaul of profiling docs.
profiling.md becomes just a table of contents pointing to more
specialized pages, so each remains manageable.
specialized pages are named profiling_*.md so they're easily
discoverable from an alphabetical directory listing.
Tracy docs mostly rebooted. Includes Android/Linux details. Placeholders
for other use cases (Mac, remote over network) to be filled by others!
Moved the Vulkan profiling docs to separate doc, unchanged.
Added a stub of perf/simpleperf CPU cache events doc.
Deleted by `android_tracy_tutorial.sh`. Was incomplete/rotting/no longer
useful.
diff --git a/docs/developing_iree/profiling.md b/docs/developing_iree/profiling.md
index f1864d6..c73e532 100644
--- a/docs/developing_iree/profiling.md
+++ b/docs/developing_iree/profiling.md
@@ -1,261 +1,25 @@
# Profiling
IREE [benchmarking](./benchmarking.md) gives us an accurate and reproducible
-view of program performance at specific levels of granularity. To analyze
-system behavior in more depth, there are various ways to
-[profile](https://en.wikipedia.org/wiki/Profiling_(computer_programming))
-IREE.
+view of program performance at specific levels of granularity. To analyze system
+behavior in more depth, there are various ways to
+[profile](https://en.wikipedia.org/wiki/Profiling_(computer_programming)) IREE.
-## Whole-system Profiling with Tracy
+## Tracy
-IREE uses Tracy as the main tool to perform whole-system profiling.
-[Tracy](https://github.com/wolfpld/tracy) is a real-time, nanosecond resolution,
-remote telemetry, hybrid frame and sampling profiler. Tracy can profile CPU,
-GPU, memory, locks, context switches, and much more.
-
-### Building Tracy
-
-To use tracing in IREE, you need to build IREE with following requirements:
-
-* Set `IREE_ENABLE_RUNTIME_TRACING` to `ON`.
-* Use Release/RelWithDebInfo build.
-
-For example:
-
-```shell
-$ export IREE_DEFAULT_COPTS='-DNDEBUG'
-$ cmake -B build/ \
- -DIREE_ENABLE_RUNTIME_TRACING=ON \
- -DCMAKE_BUILD_TYPE=RelWithDebInfo
-```
-
-The above compiles IREE with Tracy APIs so that IREE will stream profiling data
-back to Tracy when running. To collect and analyze these data, you can either
-use GUI or CLI tools. Tracy profiler is the GUI tool. You can find the
-Tracy manual on its [releases page](https://github.com/wolfpld/tracy/releases)
-for more details on Tracy itself.
-
-#### Building on Linux
-
-To build the profiler on Linux, you may need to install some external
-libraries. Some Linux distributions will require you to add a `lib` prefix and a
-`-dev`, or `-devel` postfix to library names. For example, you might see the
-error:
-
-```
-Package glfw3 was not found in the pkg-config search path.
-```
-
-and then you could try to install `libglfw3-dev`.
-
-Instructions to build Tracy profiler:
-
-```shell
-$ cd third_party/tracy/profiler/build/unix
-$ make release
-```
-
-### Using Tracy
-
-Launch the profiler UI and click connect to start waiting for a traced program
-to running. Now you can launch the IREE binary you want to trace and Tracy
-should connect automatically and stream data. For example:
-
-Compile a .mlir file using `iree-translate`:
-
-```shell
-$ build/iree/tools/iree-translate \
- -iree-mlir-to-vm-bytecode-module \
- -iree-hal-target-backends=vmla \
- $PWD/iree/tools/test/simple.mlir \
- -o /tmp/simple.vmfb
-```
-
-Run a compiled module once:
-
-```shell
-$ build/iree/tools/iree-run-module \
- --module_file=/tmp/simple.vmfb \
- --driver=vmla \
- --entry_function=abs \
- --function_inputs="i32=-2"
-```
-
-Benchmark a compiled module, running it many times:
-
-```shell
-$ build/iree/tools/iree-benchmark-module \
- --module_file=/tmp/simple.vmfb \
- --driver=vmla \
- --entry_function=abs \
- --function_inputs="i32=-2"
-```
-
-> Note:<br>
-> IREE binaries may finish running before even
-> connecting to Tracy. For such cases, you can set `TRACY_NO_EXIT=1` in the
-> environment to keep the IREE binary alive until Tracy connects to it.
-
-### Configuring Tracy
-
-Set IREE's `IREE_TRACING_MODE` value (defined in
-[iree/base/tracing.h](https://github.com/google/iree/blob/main/iree/base/tracing.h))
-to adjust which tracing features, such as allocation tracking and callstacks,
-are enabled.
-
-In order for Tracy to record detailed statistics via sampling, the program
-collecting data must be run using elevated permissions (Administrator on Windows,
-root on Linux, rooted Android device). See Tracy's user manual for more
-information.
+Tracy is a profiler that's been used for a wide range of profiling tasks on
+IREE. Refer to [profiling_with_tracy.md](./profiling_with_tracy.md).
## Vulkan GPU Profiling
-Tracy offers great insights into CPU/GPU interactions and Vulkan API usage
-details. However, information at a finer granularity, especially inside a
-particular shader dispatch, is missing. To supplement general purpose tools
-like Tracy, vendor-specific tools can be used.
+[Tracy](./profiling_with_tracy.md) offers great insights into CPU/GPU
+interactions and Vulkan API usage details. However, information at a finer
+granularity, especially inside a particular shader dispatch, is missing. To
+supplement general purpose tools like Tracy, vendor-specific tools can be used.
+Refer to [profiling_vulkan_gpu.md](./profiling_vulkan_gpu.md).
-(TODO: add some pictures for each tool)
+## CPU cache and other CPU event profiling
-### Android GPUs
-
-There are multiple GPU vendors for the Android platforms, each offering their
-own tools. [Android GPU Inspector](https://gpuinspector.dev/)
-(AGI) provides a cross-vendor solution. See the
-[documentation](https://gpuinspector.dev/docs/) for more details.
-
-#### Build Android app to run IREE
-
-In order to perform capture and analysis with AGI, you will need a full Android
-app. In IREE we have a simple Android native app wrapper to help package
-IREE core libraries together with a specific VM bytecode invocation into an
-Android app. The wrapper and its documentation are placed at
-[`iree/tools/android/run_module_app/`](https://github.com/google/iree/tree/main/iree/tools/android/run_module_app).
-
-For example, to package a module compiled from the following `mhlo-dot.mlir` as
-an Android app:
-
-```mlir
-func @dot(%lhs: tensor<2x4xf32>, %rhs: tensor<4x2xf32>) -> tensor<2x2xf32>
- attributes { iree.vmfb.export } {
- %0 = "mhlo.dot"(%lhs, %rhs) : (tensor<2x4xf32>, tensor<4x2xf32>) -> tensor<2x2xf32>
- return %0 : tensor<2x2xf32>
-}
-```
-
-```shell
-# First translate into a VM bytecode module
-$ /path/to/iree/build/iree/tools/iree-translate -- \
- -iree-mlir-to-vm-bytecode-module \
- --iree-hal-target-backends=vulkan \
- /path/to/mhlo-dot.mlir \
- -o /tmp/mhlo-dot.vmfb
-
-# Then package the Android app
-$ /path/to/iree/source/iree/tools/android/run_module_app/build_apk.sh \
- ./build-apk \
- --module_file /tmp/mhlo-dot.vmfb \
- --entry_function dot \
- --function_inputs_file /path/to/inputs/file \
- --driver vulkan
-```
-
-Where `/path/to/input/file` is a file containing inputs to `dot`, for example:
-
-```
-2x4xf32=[[1.0 2.0 3.0 4.0][5.0 6.0 7.0 8.0]]
-4x2xf32=[[9.0 10.0][11.0 12.0][13.0 14.0][15.0 16.0]]
-```
-
-The above will build an `iree-run-module.apk` under the `./build-apk/`
-directory, which you can then install via `adb install`.
-
-`build_apk.sh` needs the Android SDK and NDK internally, an easy way to manage
-them is by installing [Android Studio](https://developer.android.com/studio).
-After installation, you will need to set up a few environment variables, which
-are printed at the beginning of `build_apk.sh` invocation.
-
-#### Capture and analyze with AGI
-
-You can follow AGI's
-[Getting Started](https://gpuinspector.dev/docs/getting-started) page to learn
-how to use it. In general the steps are:
-
-* Install the latest AGI from https://github.com/google/agi/releases and launch.
-* Fill in the "Application" field by searching the app. The line should read
- like `android.intent.action.MAIN:com.google.iree.run_module/android.app.NativeActivity`.
-* Select start at beginning and choose a proper duration.
-* Configure system profile to include all GPU counters.
-* Start capture.
-
-Generated traces are in the [perfetto](https://perfetto.dev/) format. They can
-be viewed directly within AGI and also online in a browser at
-https://ui.perfetto.dev/, without needing an Android device.
-
-### Desktop GPUs
-
-Vulkan supports both graphics and compute, but most tools in the Vulkan
-ecosystem focus on graphics. As a result, some Vulkan profiling tools expect
-commands to correspond to a sequence of frames presented to displays via
-framebuffers. This means additional steps for IREE and other Vulkan
-applications that solely rely on headless compute. For graphics-focused tools,
-we need to wrap IREE's logic inside a dummy rendering loop in order to provide
-the necessary markers for these tools to perform capture and analysis.
-
-IREE provides an `iree-run-module-vulkan-gui` binary that can invoke a specific
-bytecode module within a proper GUI application. The graphics side is leveraging
-[Dear ImGui](https://github.com/ocornut/imgui); it calls into IREE
-synchronously during rendering each frame and prints the bytecode invocation
-results to the screen.
-
-To build `iree-run-module-vulkan-gui`:
-
-```shell
-# Using Bazel
-$ bazel build //iree/testing/vulkan:iree-run-module-vulkan-gui
-
-# Using CMake
-$ cmake --build /path/to/build/dir --target iree-run-module-vulkan-gui
-```
-
-The generated binary should be invoked in a console environment and it takes
-the same command-line options as the main
-[`iree-run-module`](./developer-overview.md#iree-run-module), except the
-`--driver` option. You can use `--help` to learn them all. The binary will
-launch a GUI window for use with Vulkan tools.
-
-#### AMD
-
-For AMD GPUs, [Radeon GPU Profiler](https://gpuopen.com/rgp/) (RGP) is the tool
-to understand fine details of how IREE GPU performs. See the
-[documentation](https://radeon-gpuprofiler.readthedocs.io/en/latest/) for
-details. In general the steps to get started are:
-
-* Download and install AMD RGP from https://gpuopen.com/rgp/.
-* Compile `iree-run-module-vulkan-gui` as said in the above.
-* Open "Radeon Developer Panel" and connect to the local
- "Radeon Developer Service".
-* Start `iree-run-module-vulkan-gui` from console with proper VM bytecode module
- invocation.
-* You should see it in the "Applications" panel of "Radeon Developer Panel".
- Click "Capture profile" to capture.
-
-Afterwards you can analyze the profile with RGP. Viewing the profile does not
-need the GPU anymore; it can be opened by a RGP application installed anywhere.
-
-#### NVIDIA
-
-For NVIDIA GPUs, [NVIDIA Nsight Graphics](https://developer.nvidia.com/nsight-graphics)
-is the tool to understand fine details of how IREE GPU performs. See the
-[documentation](https://docs.nvidia.com/nsight-graphics/UserGuide/index.html)
-for details. In general the steps to get started are:
-
-* Download and install NVIDIA Nsight Graphics from https://developer.nvidia.com/nsight-graphics.
-* Compile `iree-run-module-vulkan-gui` as said in the above.
-* Open NVIDIA Nsight Graphics, select "Quick Launch" on the welcome page.
-* Fill out the "Application Executable" and "Command Line Arguments" to point
- to `iree-run-module-vulkan-gui` and a specific VM bytecode module and its
- invocation information.
-* Select an "Activity" ("Frame Profiler" and "GPU Trace" are particularly
- interesting) and launch.
-* Capture any frame to perform analysis.
+For some advanced CPU profiling needs such as querying CPU cache and other
+events, one may need to use some OS-specific profilers. See
+[profiling_cpu_events.md](./profiling_cpu_events.md).
\ No newline at end of file
diff --git a/docs/developing_iree/profiling_cpu_events.md b/docs/developing_iree/profiling_cpu_events.md
new file mode 100644
index 0000000..b6beb99
--- /dev/null
+++ b/docs/developing_iree/profiling_cpu_events.md
@@ -0,0 +1,13 @@
+# CPU cache and other CPU event profiling
+
+## Android
+
+On Android we can use [`simpleperf`](https://developer.android.com/ndk/guides/simpleperf). It's preinstalled on current Android `userdebug` images, and part of the Android NDK.
+
+TODO write me
+
+## Linux
+
+On desktop Linux we can use [`perf`](https://perf.wiki.kernel.org/index.php/Main_Page).
+
+TODO write me
diff --git a/docs/developing_iree/profiling_vulkan_gpu.md b/docs/developing_iree/profiling_vulkan_gpu.md
new file mode 100644
index 0000000..c57d573
--- /dev/null
+++ b/docs/developing_iree/profiling_vulkan_gpu.md
@@ -0,0 +1,152 @@
+# Vulkan GPU Profiling
+
+[Tracy](./profiling_with_tracy.md) offers great insights into CPU/GPU
+interactions and Vulkan API usage
+details. However, information at a finer granularity, especially inside a
+particular shader dispatch, is missing. To supplement general purpose tools
+like Tracy, vendor-specific tools can be used.
+
+(TODO: add some pictures for each tool)
+
+## Android GPUs
+
+There are multiple GPU vendors for the Android platforms, each offering their
+own tools. [Android GPU Inspector](https://gpuinspector.dev/)
+(AGI) provides a cross-vendor solution. See the
+[documentation](https://gpuinspector.dev/docs/) for more details.
+
+### Build Android app to run IREE
+
+In order to perform capture and analysis with AGI, you will need a full Android
+app. In IREE we have a simple Android native app wrapper to help package
+IREE core libraries together with a specific VM bytecode invocation into an
+Android app. The wrapper and its documentation are placed at
+[`iree/tools/android/run_module_app/`](https://github.com/google/iree/tree/main/iree/tools/android/run_module_app).
+
+For example, to package a module compiled from the following `mhlo-dot.mlir` as
+an Android app:
+
+```mlir
+func @dot(%lhs: tensor<2x4xf32>, %rhs: tensor<4x2xf32>) -> tensor<2x2xf32>
+ attributes { iree.vmfb.export } {
+ %0 = "mhlo.dot"(%lhs, %rhs) : (tensor<2x4xf32>, tensor<4x2xf32>) -> tensor<2x2xf32>
+ return %0 : tensor<2x2xf32>
+}
+```
+
+```shell
+# First translate into a VM bytecode module
+$ /path/to/iree/build/iree/tools/iree-translate -- \
+ -iree-mlir-to-vm-bytecode-module \
+ --iree-hal-target-backends=vulkan \
+ /path/to/mhlo-dot.mlir \
+ -o /tmp/mhlo-dot.vmfb
+
+# Then package the Android app
+$ /path/to/iree/source/iree/tools/android/run_module_app/build_apk.sh \
+ ./build-apk \
+ --module_file /tmp/mhlo-dot.vmfb \
+ --entry_function dot \
+ --function_inputs_file /path/to/inputs/file \
+ --driver vulkan
+```
+
+Where `/path/to/input/file` is a file containing inputs to `dot`, for example:
+
+```
+2x4xf32=[[1.0 2.0 3.0 4.0][5.0 6.0 7.0 8.0]]
+4x2xf32=[[9.0 10.0][11.0 12.0][13.0 14.0][15.0 16.0]]
+```
+
+The above will build an `iree-run-module.apk` under the `./build-apk/`
+directory, which you can then install via `adb install`.
+
+`build_apk.sh` needs the Android SDK and NDK internally, an easy way to manage
+them is by installing [Android Studio](https://developer.android.com/studio).
+After installation, you will need to set up a few environment variables, which
+are printed at the beginning of `build_apk.sh` invocation.
+
+### Capture and analyze with AGI
+
+You can follow AGI's
+[Getting Started](https://gpuinspector.dev/docs/getting-started) page to learn
+how to use it. In general the steps are:
+
+* Install the latest AGI from https://github.com/google/agi/releases and launch.
+* Fill in the "Application" field by searching the app. The line should read
+ like `android.intent.action.MAIN:com.google.iree.run_module/android.app.NativeActivity`.
+* Select start at beginning and choose a proper duration.
+* Configure system profile to include all GPU counters.
+* Start capture.
+
+Generated traces are in the [perfetto](https://perfetto.dev/) format. They can
+be viewed directly within AGI and also online in a browser at
+https://ui.perfetto.dev/, without needing an Android device.
+
+## Desktop GPUs
+
+Vulkan supports both graphics and compute, but most tools in the Vulkan
+ecosystem focus on graphics. As a result, some Vulkan profiling tools expect
+commands to correspond to a sequence of frames presented to displays via
+framebuffers. This means additional steps for IREE and other Vulkan
+applications that solely rely on headless compute. For graphics-focused tools,
+we need to wrap IREE's logic inside a dummy rendering loop in order to provide
+the necessary markers for these tools to perform capture and analysis.
+
+IREE provides an `iree-run-module-vulkan-gui` binary that can invoke a specific
+bytecode module within a proper GUI application. The graphics side is leveraging
+[Dear ImGui](https://github.com/ocornut/imgui); it calls into IREE
+synchronously during rendering each frame and prints the bytecode invocation
+results to the screen.
+
+To build `iree-run-module-vulkan-gui`:
+
+```shell
+# Using Bazel
+$ bazel build //iree/testing/vulkan:iree-run-module-vulkan-gui
+
+# Using CMake
+$ cmake --build /path/to/build/dir --target iree-run-module-vulkan-gui
+```
+
+The generated binary should be invoked in a console environment and it takes
+the same command-line options as the main
+[`iree-run-module`](./developer-overview.md#iree-run-module), except the
+`--driver` option. You can use `--help` to learn them all. The binary will
+launch a GUI window for use with Vulkan tools.
+
+### AMD
+
+For AMD GPUs, [Radeon GPU Profiler](https://gpuopen.com/rgp/) (RGP) is the tool
+to understand fine details of how IREE GPU performs. See the
+[documentation](https://radeon-gpuprofiler.readthedocs.io/en/latest/) for
+details. In general the steps to get started are:
+
+* Download and install AMD RGP from https://gpuopen.com/rgp/.
+* Compile `iree-run-module-vulkan-gui` as said in the above.
+* Open "Radeon Developer Panel" and connect to the local
+ "Radeon Developer Service".
+* Start `iree-run-module-vulkan-gui` from console with proper VM bytecode module
+ invocation.
+* You should see it in the "Applications" panel of "Radeon Developer Panel".
+ Click "Capture profile" to capture.
+
+Afterwards you can analyze the profile with RGP. Viewing the profile does not
+need the GPU anymore; it can be opened by a RGP application installed anywhere.
+
+### NVIDIA
+
+For NVIDIA GPUs, [NVIDIA Nsight Graphics](https://developer.nvidia.com/nsight-graphics)
+is the tool to understand fine details of how IREE GPU performs. See the
+[documentation](https://docs.nvidia.com/nsight-graphics/UserGuide/index.html)
+for details. In general the steps to get started are:
+
+* Download and install NVIDIA Nsight Graphics from https://developer.nvidia.com/nsight-graphics.
+* Compile `iree-run-module-vulkan-gui` as said in the above.
+* Open NVIDIA Nsight Graphics, select "Quick Launch" on the welcome page.
+* Fill out the "Application Executable" and "Command Line Arguments" to point
+ to `iree-run-module-vulkan-gui` and a specific VM bytecode module and its
+ invocation information.
+* Select an "Activity" ("Frame Profiler" and "GPU Trace" are particularly
+ interesting) and launch.
+* Capture any frame to perform analysis.
diff --git a/docs/developing_iree/profiling_with_tracy.md b/docs/developing_iree/profiling_with_tracy.md
new file mode 100644
index 0000000..80eb9b0
--- /dev/null
+++ b/docs/developing_iree/profiling_with_tracy.md
@@ -0,0 +1,207 @@
+# Profiling with Tracy
+
+[Tracy](https://github.com/wolfpld/tracy) is a profiler that puts together in a
+single view:
+* Both CPU and GPU profiling.
+* Both sampling and instrumentation.
+* Both specifics of our own process, and whole-system profiling a la "systrace".
+
+Since Tracy relies on instrumentation, it requires IREE binaries to be built
+with a special flag to enable it.
+
+There are two components to Tracy. They communicate over a TCP socket.
+* The "client" is the program being profiled.
+* The "server" is the Tracy profiler UI.
+
+## The Tracy manual
+
+The primary source of Tracy documentation, including for build instructions, is
+a PDF manual that's part of each numbered release. To find the latest one,
+navigate [here](https://github.com/wolfpld/tracy/releases) and search for
+`tracy.pdf`.
+
+## Building the Tracy UI (the "server")
+
+This is explained in section 2.3 of the manual for Windows and Linux. Here we
+give some more detailed instructions for some systems.
+
+The IREE repository contains its own clone of the Tracy repository in
+`third_party/tracy`, so there is no need to make a separate clone of it. You can
+use one if you want, but be aware that the Tracy client/server protocol gets
+updated sometimes. Building both sides from the same
+`iree/third_party/tracy` lowers the risk of running into a protocol version
+mismatch.
+
+### Linux
+
+Install dependencies (Debian-based distributions):
+```
+sudo apt install libcapstone-dev libtbb-dev libglfw3-dev libfreetype6-dev libgtk-3-dev
+```
+
+Build (from your `iree/` clone root directory):
+```
+make -C third_party/tracy/profiler/build/unix -j12 release
+```
+
+### Mac
+
+TODO write this (Kojo?)
+
+## Building IREE with Tracy instrumentation (the "client")
+
+IREE needs to be build with Tracy instrumentation enabled. This enables both the
+collection of data, and its streaming to the Tracy server over a socket.
+
+This is only supported in the CMake build system of IREE, not in Bazel.
+
+In the initial CMake configuration command:
+* Set `IREE_ENABLE_RUNTIME_TRACING` to `ON`.
+* Use the `RelWithDebInfo` build type.
+
+For example:
+
+```shell
+$ cmake \
+ -DIREE_ENABLE_RUNTIME_TRACING=ON \
+ -DCMAKE_BUILD_TYPE=RelWithDebInfo \
+ ... # other cmake arguments as usual
+```
+## Permissions issues
+
+The profiled application (i.e. the Tracy client) needs to have appropriate
+permissions so perform the special I/O required to collect the profile
+information. This is OS-specific.
+
+### Desktop Linux
+
+On desktop Linux, the Tracy client must be run as root, e.g. with `sudo`.
+Otherwise, profile data will lack important components.
+
+### Android
+
+On Android it is not necessary to run as root and in fact, Android graphical
+applications never run as root, so it's advisable to run all programs as
+non-root for consistency.
+
+The Android device must be prepared as follows to enable Tracy profiling.
+* The device must be rooted.
+ * That means that in `adb shell`, the command `su` must succeed.
+ * That does NOT mean doing `adb root`. The effect of `adb root` is to have the
+ `adbd` daemon itself run as root, which causes `adb shell` to give you a
+ root shell by default. If you are in that case, consider doing `adb unroot`
+ to restart the `adbd` server as non-root. Not mandatory, but again, running
+ anything as root on Android is a deviation from normal user conditions.
+* Execute the following commands in a root shell on the device (i.e. `adb
+ shell`, then `su`, then the following commands). These are from the Tracy PDF
+ manual, but hard to find there, and copy-pasting from PDF introduces unwanted
+ whitespace. These settings normally persist until the next reboot of the
+ device.
+ * `setenforce 0`
+ * `mount -o remount,hidepid=0 /proc`
+ * `echo 0 > /proc/sys/kernel/perf_event_paranoid`
+
+## Port forwarding
+
+The Tracy client and server communicate by default over port `8086`. When they
+run on different machines, e.g. with embedded/Android profiling or remote
+profiling, port forwarding must be set up.
+
+### Between a computer and a local Android device connected to it by USB
+
+Run this command. You might need to run it again more a little frequently than
+you reboot the device. When experiencing connection issues, try that first.
+
+```shell
+adb forward tcp:8086 tcp:8086
+```
+
+### Between two computers over the network
+
+TODO write this (`ssh` stuff...)
+
+## Running the profiled program
+
+Run your IREE workload as you normally would: now that it's been built with
+Tracy instrumentation enabled, it should do all the right things automatically.
+
+The only change that you are likely to need in your command line is to set the
+`TRACY_NO_EXIT=1` environment variable. This ensures that your program does not
+exit until the Tracy server (the UI) has connected to it and finished uploading
+the profile data.
+
+Typically, `TRACY_NO_EXIT=1` is needed when profiling `iree-benchmark-module`.
+It wouldn't be needed when profiling a real user-facing application.
+
+Example:
+
+```shell
+TRACY_NO_EXIT=1 /data/local/tmp/iree-benchmark-module \
+ --driver=dylib \
+ --function_inputs='1x384xi32,1x384xi32,1x384xi32' \
+ --module_file=/data/local/tmp/android_module.fbvm \
+ --entry_function=serving_default
+```
+
+## Running the Tracy profiler UI, connecting and visualizing
+
+While the profile program is still running (possibly thanks to
+`TRACY_NO_EXIT=1`), start the Tracy profiler UI which we had built above.
+From the IREE root directory:
+```shell
+./third_party/tracy/profiler/build/unix/Tracy-release
+```
+
+It should show a dialog offering to connect to a client i.e. a profiled program.
+
+If connecting doesn't work:
+* If the profiled program is on a separate machine, make sure you've correctly
+ set up port forwarding.
+ * On Android, the `adb forward` many need to be run again.
+* Make sure that the profiled program is still running. Do you need
+ `TRACY_NO_EXIT=1`?
+* Kill the profiled program and restart it.
+
+You should then start seeing a profile.
+
+Before going further, check that your profile has sampling data recorded, not
+just instrumentation data. Click the 'Statistics' button at the top. The window
+that it opens should show a 'Sampling' radio button next to 'Instrumentation'.
+Back to the main view, look for the 'Main thread' label on the far left. Next to
+it should be a small ghost icon. If either of these things are missing, you need
+to fix that first, because many of Tracy's interesting features rely on
+smapling. Most likely this is a permissions issue: make sure that you have
+performed the steps outlined in the above section on permissions. As an
+experiment, retry with the profiled program running as root. Look for any
+interesting `stderr` message (in the profiled program's terminal). Try
+`strace`'ing it.
+
+Besides sampling, let's check that you also have "systrace" working. In the
+initial main view, on the far left, you should see a list of CPU cores,
+typically labelled `CPU0`, `CPU1`, etc. To the right, a timeline view should
+show system-wide activity on all these CPUs - so for instance you can see
+context switches and you can see if anything else happened on your device that
+could have interfered with the benchmark's performance. If this "systrace"
+functionality isn't working, then again that's likely a permissions issue,
+perform the same troubleshooting as above about sampling.
+
+Finally, look for the part of the timeline that is of interest to you. Your area
+of interest might not be on the Main thread. In fact, it might be on a thread
+that's not visible in the initial view at all. Indeed, the initial view tends to
+be zoomed-in a lot. Either use the mouse directly to zoom out and navigate, to
+look for the 'Frame' control at the top of the Tracy window. Use the 'next
+frame' arrow button until more interesting threads appear. Typically, IREE
+generated code tends to run on a thread named `cpu0`, which is actually a thread
+name and unrelated to `CPU0` from the systrace view.
+
+Once you click a zone, you should see a line-by-line or
+instruction-by-instruction view annotated with the percentage of time spent. If
+you don't see the percentages, then again that would point to sampling not
+working, refer to the above steps.
+
+## Configuring Tracy instrumentation
+
+Set IREE's `IREE_TRACING_MODE` value (defined in
+[iree/base/tracing.h](https://github.com/google/iree/blob/main/iree/base/tracing.h))
+to adjust which tracing features, such as allocation tracking and callstacks,
+are enabled.
diff --git a/scripts/android_tracy_tutorial.sh b/scripts/android_tracy_tutorial.sh
deleted file mode 100755
index 812c09e..0000000
--- a/scripts/android_tracy_tutorial.sh
+++ /dev/null
@@ -1,261 +0,0 @@
-#!/bin/bash
-
-# Copyright 2020 Google LLC
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# https://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-# This script assumes Debian Linux. It could easily be modified to run on any Unix,
-# as the Debian assumption is only to automatically install some packages,
-# (Tracy dependencies). With a bit more work it could be made to support other OSes
-# such as Windows, but that's not at all handled yet.
-
-function print_status {
- echo -e "\e[96m$@\e[39m"
-}
-
-# Environment variables. They can be set manually, or will use the following defaults.
-# IREE_ROOT and ANDROID_NDK are empty by default, must be defined by the user.
-#
-# To make this script very easy to play with, this script will also default to
-# benchmarking a specific NN model (currently MobileBert encoder), and will take
-# care of generating its input MLIR form and of setting the right --function_inputs
-# and --entry_function for it. To use this script on any other specific NN model,
-# define the following environment variables:
-# IREE_INPUT_MLIR
-# FUNCTION_INPUTS
-# ENTRY_FUNCTION
-
-DEFAULT_IREE_INPUT_MLIR="/tmp/iree/modules/MobileBertSquad/iree_input.mlir"
-
-: ${IREE_ROOT:=""}
-: ${IREE_BUILD_ANDROID:="$HOME/iree-build-android"}
-: ${TRACY_ROOT:="$HOME/tracy"}
-: ${PYTHON_BIN:=python3}
-: ${CC:=clang}
-: ${CXX:=clang++}
-: ${IREE_INPUT_MLIR:="${DEFAULT_IREE_INPUT_MLIR}"}
-: ${ANDROID_NDK:=""}
-: ${IREE_LLVMAOT_LINKER_PATH:="$ANDROID_NDK/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android30-clang++ -static-libstdc++ -O3"}
-: ${FUNCTION_INPUTS:=""}
-: ${ENTRY_FUNCTION:=""}
-# Validation of the environment variables.
-
-if [ -z "${IREE_ROOT}" ]
-then
- print_status "Please define IREE_ROOT to point to your IREE git clone."
- print_status "Example:"
- print_status " IREE_ROOT=\$HOME/iree $0"
- exit 1
-fi
-
-if [ ! -f "${IREE_ROOT}/iree/base/CMakeLists.txt" ]
-then
- print_status "FATAL: bad IREE_ROOT (${IREE_ROOT})."
- exit 1
-fi
-
-if [ -z "${ANDROID_NDK}" ]
-then
- print_status "Please define ANDROID_NDK to point to your Android NDK."
- print_status "Example:"
- print_status " ANDROID_NDK=\$HOME/android-ndk-r21d $0"
- exit 1
-fi
-
-if [ ! -d "${ANDROID_NDK}/toolchains/llvm" ]
-then
- print_status "FATAL: bad ANDROID_NDK (${ANDROID_NDK})."
- exit 1
-fi
-
-if ! $CC --version 1>/dev/null 2>/dev/null
-then
- print_status "FATAL: Install $CC and set CC to it."
- exit 1
-fi
-
-if ! $CXX --version 1>/dev/null 2>/dev/null
-then
- print_status "FATAL: Install $CXX and set CXX to it."
- exit 1
-fi
-
-if ! $IREE_LLVMAOT_LINKER_PATH --version 1>/dev/null 2>/dev/null
-then
- print_status "Bad IREE_LLVMAOT_LINKER_PATH (${IREE_LLVMAOT_LINKER_PATH}). Rerun with it correctly set."
- exit 1
-fi
-
-# If we're playing with the default input MLIR, preset some reasonable
-# FUNCTION_INPUTS and ENTRY_FUNCTION if they haven't been set.
-# Otherwise, they must be provided.
-if [[ -z "${FUNCTION_INPUTS}" || -z "${ENTRY_FUNCTION}" ]]
-then
- if [[ "${IREE_INPUT_MLIR}" == "${DEFAULT_IREE_INPUT_MLIR}" ]]
- then
- FUNCTION_INPUTS="1x384xi32,1x384xi32,1x384xi32"
- ENTRY_FUNCTION="serving_default"
- else
- print_status "Please specify FUNCTION_INPUTS and ENTRY_FUNCTION appropriately for your IREE_INPUT_MLIR (${IREE_INPUT_MLIR})"
- fi
-fi
-
-print_status "Running with the following environment variables:"
-print_status "ANDROID_NDK=${ANDROID_NDK}"
-print_status "IREE_LLVMAOT_LINKER_PATH=${IREE_LLVMAOT_LINKER_PATH}"
-print_status "IREE_ROOT=${IREE_ROOT}"
-print_status "IREE_BUILD_ANDROID=${IREE_BUILD_ANDROID}"
-print_status "TRACY_ROOT=${TRACY_ROOT}"
-print_status "PYTHON_BIN=${PYTHON_BIN}"
-print_status "CC=${CC}"
-print_status "CXX=${CXX}"
-print_status "IREE_INPUT_MLIR=${IREE_INPUT_MLIR}"
-print_status "FUNCTION_INPUTS=${FUNCTION_INPUTS}"
-print_status "ENTRY_FUNCTION=${ENTRY_FUNCTION}"
-
-echo
-
-print_status "Ensuring that we have Tracy source code..."
-if [ ! -d ${TRACY_ROOT}/profiler/build/unix/ ]
-then
- print_status "Tracy not found at ${TRACY_ROOT}. Either set TRACY_ROOT, or we're going to clone the git repository now."
- read -p "Press the return key..."
- git clone https://github.com/wolfpld/tracy "${TRACY_ROOT}"
-fi
-
-echo
-
-print_status "Ensuring that the Tracy profiler is built..."
-if [ ! -x "${TRACY_ROOT}/profiler/build/unix/Tracy-release" ]
-then
- print_status "Checking Tracy dependencies - assuming Debian."
- TRACY_DEPS="libcapstone-dev libtbb-dev libglfw3-dev libfreetype6-dev libgtk-3-dev"
- TRACY_DEPS_COUNT=5
- TRACY_DEPS_INSTALLED="$(apt list $TRACY_DEPS) 2>/dev/null | grep installed | wc -l"
- if [ $TRACY_DEPS_INSTALLED != $TRACY_DEPS_COUNT ]
- then
- print_status "Installing dependencies now - assuming Debian."
- sudo apt install $TRACY_DEPS
- fi
- make -C "${TRACY_ROOT}/profiler/build/unix" -j12 release
-fi
-
-echo
-
-print_status "Ensuring that we have the input MLIR file..."
-if [ ! -f "${IREE_INPUT_MLIR}" ]
-then
- print_status "Set IREE_INPUT_MLIR to point to some iree input MLIR file. Not found at current value ${IREE_INPUT_MLIR}."
- if [ "${IREE_INPUT_MLIR}" == "${DEFAULT_IREE_INPUT_MLIR}" ]
- then
- print_status "Okay, we actually know how to generate that file, ${IREE_INPUT_MLIR}, but it will take a while."
- print_status "Press the return key to continue..."
- pushd "${IREE_ROOT}"
- scripts/get_e2e_artifacts.py --test_suites=mobile_bert_squad_tests
- popd
- fi
-fi
-
-if [ ! -f "${IREE_INPUT_MLIR}" ]
-then
- print_status "FATAL: we should have ${IREE_INPUT_MLIR} by that point."
- exit 1
-fi
-
-echo
-
-print_status "Building IREE for Android in ${IREE_BUILD_ANDROID}..."
-
-mkdir -p "${IREE_BUILD_ANDROID}"
-pushd "${IREE_BUILD_ANDROID}"
-
-cmake -G Ninja ../iree \
- -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
- -DANDROID_ABI="arm64-v8a" \
- -DANDROID_PLATFORM=android-30 \
- -DCMAKE_BUILD_TYPE=RelWithDebInfo \
- -DIREE_BUILD_COMPILER=OFF \
- -DIREE_BUILD_SAMPLES=OFF \
- -DIREE_HOST_C_COMPILER=`which "$CC"` \
- -DIREE_HOST_CXX_COMPILER=`which "$CXX"` \
- -DIREE_ENABLE_RUNTIME_TRACING=ON
-
-cmake --build . --target \
- iree_tools_iree-translate \
- iree_tools_iree-benchmark-module
-
-popd
-echo
-
-print_status "Compiling the input MLIR file into a IREE module..."
-
-IREE_COMPILED_MODULE=/tmp/android_module.vmfb
-IREE_LOG=/tmp/iree-translate.log
-
-rm -rf "${IREE_COMPILED_MODULE}"
-
-IREE_LLVMAOT_LINKER_PATH="${IREE_LLVMAOT_LINKER_PATH}" \
- "${IREE_BUILD_ANDROID}/host/bin/iree-translate" \
- --iree-hal-target-backends=dylib-llvm-aot \
- --iree-mlir-to-vm-bytecode-module \
- --iree-llvm-target-triple=aarch64-linux-android \
- /tmp/iree/modules/MobileBertSquad/iree_input.mlir \
- -o "${IREE_COMPILED_MODULE}" \
- 2>"${IREE_LOG}"
-
-if [ ! -f "${IREE_COMPILED_MODULE}" ]
-then
- print_status "iree-translate failed to produce ${IREE_COMPILED_MODULE}. Log saved in ${IREE_LOG}. First few lines:"
- # The whole log might be enormous if it contains a big MLIR dump.
- head -n10 "${IREE_LOG}"
- print_status "tip: check if IREE_LLVMAOT_LINKER_PATH was correctly set."
- exit 1
-fi
-
-echo
-
-DEVICE_IREE_COMPILED_MODULE=/data/local/tmp/android_module.vmfb
-
-print_status "Pushing the compiled module to the device..."
-adb push "${IREE_COMPILED_MODULE}" "${DEVICE_IREE_COMPILED_MODULE}"
-echo
-
-print_status "Pushing the IREE benchmarking program to the device..."
-adb push "${IREE_BUILD_ANDROID}"/iree/tools/iree-benchmark-module /data/local/tmp
-echo
-
-print_status "Setting up TCP port forwarding to let Tracy connect with the benchmark running on the device..."
-adb forward tcp:8086 tcp:8086
-echo
-
-print_status "Now you can launch the Tracy UI in another shell and hit \"Connect\", while we run the benchmark on the device in this shell..."
-print_status "Run this command in another shell:"
-print_status " ${TRACY_ROOT}/profiler/build/unix/Tracy-release"
-echo
-
-print_status "Running the benchmark... hit Ctrl-C to terminate it after Tracy is done with it."
-# `TRACY_NO_EXIT=1` is to prevent it from exiting at the end: that's needed for profiling
-# short-running tasks.
-# `taskset 80` selects which CPU core to run on. On Pixel4, `taskset 80` gives the biggest
-# core, which can get some reproducibility, as long as we don't run into thermal issues.
-# `taskset 0f` would give the little cores, avoiding thermal issues but running slower and
-# requiring different optimization work to be efficient on.
-# `--driver=dylib` is to use the LLVM AOT generated code backend.
-adb shell \
- TRACY_NO_EXIT=1 \
- taskset 80 \
- data/local/tmp/iree-benchmark-module \
- --driver=dylib \
- --module_file="${DEVICE_IREE_COMPILED_MODULE}" \
- --function_inputs="${FUNCTION_INPUTS}" \
- --entry_function="${ENTRY_FUNCTION}"
diff --git a/scripts/prepare_doc_publication.py b/scripts/prepare_doc_publication.py
index 606c11c..20ad5a8 100755
--- a/scripts/prepare_doc_publication.py
+++ b/scripts/prepare_doc_publication.py
@@ -62,6 +62,9 @@
'objectives.md': 'Short-term Objectives',
'tensorflow_integrations.md': 'TensorFlow Integrations',
'e2e_benchmarking.md': 'Benchmarking TensorFlow with IREE and TFLite',
+ 'profiling_cpu_events.md': 'Profiling CPU events',
+ 'profiling_with_tracy.md': 'Profiling with Tracy',
+ 'profiling_vulkan_gpu.md': 'Profiling Vulkan GPU',
}
# A dictionary containing source file to permanent link mappings.