blob: 80eb9b0bb274c8279c27d4a5533eec0a237401be [file] [log] [blame] [view]
# Profiling with Tracy
[Tracy](https://github.com/wolfpld/tracy) is a profiler that puts together in a
single view:
* Both CPU and GPU profiling.
* Both sampling and instrumentation.
* Both specifics of our own process, and whole-system profiling a la "systrace".
Since Tracy relies on instrumentation, it requires IREE binaries to be built
with a special flag to enable it.
There are two components to Tracy. They communicate over a TCP socket.
* The "client" is the program being profiled.
* The "server" is the Tracy profiler UI.
## The Tracy manual
The primary source of Tracy documentation, including for build instructions, is
a PDF manual that's part of each numbered release. To find the latest one,
navigate [here](https://github.com/wolfpld/tracy/releases) and search for
`tracy.pdf`.
## Building the Tracy UI (the "server")
This is explained in section 2.3 of the manual for Windows and Linux. Here we
give some more detailed instructions for some systems.
The IREE repository contains its own clone of the Tracy repository in
`third_party/tracy`, so there is no need to make a separate clone of it. You can
use one if you want, but be aware that the Tracy client/server protocol gets
updated sometimes. Building both sides from the same
`iree/third_party/tracy` lowers the risk of running into a protocol version
mismatch.
### Linux
Install dependencies (Debian-based distributions):
```
sudo apt install libcapstone-dev libtbb-dev libglfw3-dev libfreetype6-dev libgtk-3-dev
```
Build (from your `iree/` clone root directory):
```
make -C third_party/tracy/profiler/build/unix -j12 release
```
### Mac
TODO write this (Kojo?)
## Building IREE with Tracy instrumentation (the "client")
IREE needs to be build with Tracy instrumentation enabled. This enables both the
collection of data, and its streaming to the Tracy server over a socket.
This is only supported in the CMake build system of IREE, not in Bazel.
In the initial CMake configuration command:
* Set `IREE_ENABLE_RUNTIME_TRACING` to `ON`.
* Use the `RelWithDebInfo` build type.
For example:
```shell
$ cmake \
-DIREE_ENABLE_RUNTIME_TRACING=ON \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
... # other cmake arguments as usual
```
## Permissions issues
The profiled application (i.e. the Tracy client) needs to have appropriate
permissions so perform the special I/O required to collect the profile
information. This is OS-specific.
### Desktop Linux
On desktop Linux, the Tracy client must be run as root, e.g. with `sudo`.
Otherwise, profile data will lack important components.
### Android
On Android it is not necessary to run as root and in fact, Android graphical
applications never run as root, so it's advisable to run all programs as
non-root for consistency.
The Android device must be prepared as follows to enable Tracy profiling.
* The device must be rooted.
* That means that in `adb shell`, the command `su` must succeed.
* That does NOT mean doing `adb root`. The effect of `adb root` is to have the
`adbd` daemon itself run as root, which causes `adb shell` to give you a
root shell by default. If you are in that case, consider doing `adb unroot`
to restart the `adbd` server as non-root. Not mandatory, but again, running
anything as root on Android is a deviation from normal user conditions.
* Execute the following commands in a root shell on the device (i.e. `adb
shell`, then `su`, then the following commands). These are from the Tracy PDF
manual, but hard to find there, and copy-pasting from PDF introduces unwanted
whitespace. These settings normally persist until the next reboot of the
device.
* `setenforce 0`
* `mount -o remount,hidepid=0 /proc`
* `echo 0 > /proc/sys/kernel/perf_event_paranoid`
## Port forwarding
The Tracy client and server communicate by default over port `8086`. When they
run on different machines, e.g. with embedded/Android profiling or remote
profiling, port forwarding must be set up.
### Between a computer and a local Android device connected to it by USB
Run this command. You might need to run it again more a little frequently than
you reboot the device. When experiencing connection issues, try that first.
```shell
adb forward tcp:8086 tcp:8086
```
### Between two computers over the network
TODO write this (`ssh` stuff...)
## Running the profiled program
Run your IREE workload as you normally would: now that it's been built with
Tracy instrumentation enabled, it should do all the right things automatically.
The only change that you are likely to need in your command line is to set the
`TRACY_NO_EXIT=1` environment variable. This ensures that your program does not
exit until the Tracy server (the UI) has connected to it and finished uploading
the profile data.
Typically, `TRACY_NO_EXIT=1` is needed when profiling `iree-benchmark-module`.
It wouldn't be needed when profiling a real user-facing application.
Example:
```shell
TRACY_NO_EXIT=1 /data/local/tmp/iree-benchmark-module \
--driver=dylib \
--function_inputs='1x384xi32,1x384xi32,1x384xi32' \
--module_file=/data/local/tmp/android_module.fbvm \
--entry_function=serving_default
```
## Running the Tracy profiler UI, connecting and visualizing
While the profile program is still running (possibly thanks to
`TRACY_NO_EXIT=1`), start the Tracy profiler UI which we had built above.
From the IREE root directory:
```shell
./third_party/tracy/profiler/build/unix/Tracy-release
```
It should show a dialog offering to connect to a client i.e. a profiled program.
If connecting doesn't work:
* If the profiled program is on a separate machine, make sure you've correctly
set up port forwarding.
* On Android, the `adb forward` many need to be run again.
* Make sure that the profiled program is still running. Do you need
`TRACY_NO_EXIT=1`?
* Kill the profiled program and restart it.
You should then start seeing a profile.
Before going further, check that your profile has sampling data recorded, not
just instrumentation data. Click the 'Statistics' button at the top. The window
that it opens should show a 'Sampling' radio button next to 'Instrumentation'.
Back to the main view, look for the 'Main thread' label on the far left. Next to
it should be a small ghost icon. If either of these things are missing, you need
to fix that first, because many of Tracy's interesting features rely on
smapling. Most likely this is a permissions issue: make sure that you have
performed the steps outlined in the above section on permissions. As an
experiment, retry with the profiled program running as root. Look for any
interesting `stderr` message (in the profiled program's terminal). Try
`strace`'ing it.
Besides sampling, let's check that you also have "systrace" working. In the
initial main view, on the far left, you should see a list of CPU cores,
typically labelled `CPU0`, `CPU1`, etc. To the right, a timeline view should
show system-wide activity on all these CPUs - so for instance you can see
context switches and you can see if anything else happened on your device that
could have interfered with the benchmark's performance. If this "systrace"
functionality isn't working, then again that's likely a permissions issue,
perform the same troubleshooting as above about sampling.
Finally, look for the part of the timeline that is of interest to you. Your area
of interest might not be on the Main thread. In fact, it might be on a thread
that's not visible in the initial view at all. Indeed, the initial view tends to
be zoomed-in a lot. Either use the mouse directly to zoom out and navigate, to
look for the 'Frame' control at the top of the Tracy window. Use the 'next
frame' arrow button until more interesting threads appear. Typically, IREE
generated code tends to run on a thread named `cpu0`, which is actually a thread
name and unrelated to `CPU0` from the systrace view.
Once you click a zone, you should see a line-by-line or
instruction-by-instruction view annotated with the percentage of time spent. If
you don't see the percentages, then again that would point to sampling not
working, refer to the above steps.
## Configuring Tracy instrumentation
Set IREE's `IREE_TRACING_MODE` value (defined in
[iree/base/tracing.h](https://github.com/google/iree/blob/main/iree/base/tracing.h))
to adjust which tracing features, such as allocation tracking and callstacks,
are enabled.