Revise profiling.md, tweaking language and adding more Tracy details. (#3595)

diff --git a/docs/developing_iree/profiling.md b/docs/developing_iree/profiling.md
index ac2b6e9..e8a5ce4 100644
--- a/docs/developing_iree/profiling.md
+++ b/docs/developing_iree/profiling.md
@@ -1,43 +1,43 @@
 # Profiling
 
-IREE [benchmarking](./benchmarking.md) gives us an overall view of the
-performance at a specific level of granularity. To understand performance
-details of the IREE system, one would need to
+IREE [benchmarking](./benchmarking.md) gives us an accurate and reproducible
+view of program performance at specific levels of granularity. To analyze
+system behavior in more depth, there are various ways to
 [profile](https://en.wikipedia.org/wiki/Profiling_(computer_programming))
-IREE, in order to know how components at different levels function and interact.
+IREE.
 
 ## Whole-system Profiling with Tracy
 
 IREE uses Tracy as the main tool to perform whole-system profiling.
 [Tracy](https://github.com/wolfpld/tracy) is a real-time, nanosecond resolution,
-remote telemetry, hybrid frame and sampling profiler. It can profile CPU, GPU,
-memory, locks, context switches, and many more.
+remote telemetry, hybrid frame and sampling profiler. Tracy can profile CPU,
+GPU, memory, locks, context switches, and much more.
 
 ### Building Tracy
 
 To use tracing in IREE, you need to build IREE with following requirements:
 
 *   Set `IREE_ENABLE_RUNTIME_TRACING` to `ON`.
-*   Add `-DNDEBUG` to `IREE_DEFAULT_COPTS`.
 *   Use Release/RelWithDebInfo build.
 
 For example:
 
 ```shell
-export IREE_DEFAULT_COPTS='-DNDEBUG'
-cmake -B build/ \
-      -DIREE_ENABLE_RUNTIME_TRACING=ON \
-      -DCMAKE_BUILD_TYPE=RelWithDebInfo
+$ export IREE_DEFAULT_COPTS='-DNDEBUG'
+$ cmake -B build/ \
+  -DIREE_ENABLE_RUNTIME_TRACING=ON \
+  -DCMAKE_BUILD_TYPE=RelWithDebInfo
 ```
 
 The above compiles IREE with Tracy APIs so that IREE will stream profiling data
-back to Tracy when running. To be able to collect and analyze these data, you
-can either use GUI or CLI tools. Tracy profiler is the GUI tool. You can find
-the
+back to Tracy when running. To collect and analyze these data, you can either
+use GUI or CLI tools. Tracy profiler is the GUI tool. You can find the
 Tracy manual on its [releases page](https://github.com/wolfpld/tracy/releases)
 for more details on Tracy itself.
 
-To build the profiler on Linux, you will need to install some external
+#### Building on Linux
+
+To build the profiler on Linux, you may need to install some external
 libraries. Some Linux distributions will require you to add a `lib` prefix and a
 `-dev`, or `-devel` postfix to library names. For example, you might see the
 error:
@@ -51,69 +51,91 @@
 Instructions to build Tracy profiler:
 
 ```shell
-cd third_party/tracy/profiler/build/unix
-make release
+$ cd third_party/tracy/profiler/build/unix
+$ make release
 ```
 
 ### Using Tracy
 
-Launch the profiler UI, and click connect. Then the server will wait for the
-connection. Now you can launch the IREE binary you want to trace, it should
-connect automatically and stream data. For example:
+Launch the profiler UI and click connect to start waiting for a traced program
+to running. Now you can launch the IREE binary you want to trace and Tracy
+should connect automatically and stream data. For example:
 
-Prepare the module to profile:
+Compile a .mlir file using `iree-translate`:
 
 ```shell
-build/iree/tools/iree-benchmark-module \
-  --module_file=/tmp/module.fb \
+$ build/iree/tools/iree-translate \
+  -iree-mlir-to-vm-bytecode-module \
+  -iree-hal-target-backends=vmla \
+  $PWD/iree/tools/test/simple.mlir \
+  -o /tmp/simple.module
+```
+
+Run a compiled module once:
+
+```shell
+$ build/iree/tools/iree-run-module \
+  --module_file=/tmp/simple.module \
   --driver=vmla \
   --entry_function=abs \
   --function_inputs="i32=-2"
 ```
 
-Run the module:
+Benchmark a compiled module, running it many times:
 
 ```shell
-build/iree/tools/iree-run-module \
-  --module_file=/tmp/module.fb \
+$ build/iree/tools/iree-benchmark-module \
+  --module_file=/tmp/simple.module \
   --driver=vmla \
   --entry_function=abs \
   --function_inputs="i32=-2"
 ```
 
-Note that typically IREE binaries complete running the module and exit very
-quickly before even connecting to Tracy. For such cases, you can set
-`TRACY_NO_EXIT=1` in the environment to keep the IREE binary alive until
-Tracy connects to it.
+> Note:<br>
+> &nbsp;&nbsp;&nbsp;&nbsp;IREE binaries may finish running before even
+> connecting to Tracy. For such cases, you can set `TRACY_NO_EXIT=1` in the
+> environment to keep the IREE binary alive until Tracy connects to it.
+
+### Configuring Tracy
+
+Set IREE's `IREE_TRACING_MODE` value (defined in
+[iree/base/tracing.h](https://github.com/google/iree/blob/main/iree/base/tracing.h))
+to adjust which tracing features, such as allocation tracking and callstacks,
+are enabled.
+
+In order for Tracy to record detailed statistics via sampling, the program
+collecting data must be run using elevated permissions (Administrator on Windows,
+root on Linux, rooted Android device). See Tracy's user manual for more
+information.
 
 ## Vulkan GPU Profiling
 
-Tracy gives us great insights over CPU/GPU interactions and Vulkan API usage
+Tracy offers great insights into CPU/GPU interactions and Vulkan API usage
 details. However, information at a finer granularity, especially inside a
-particular shader dispatch, is missing. To supplement, one would typically need
-to use other third-party or vendor-specific tools.
+particular shader dispatch, is missing. To supplement general purpose tools
+like Tracy, vendor-specific tools can be used.
 
 (TODO: add some pictures for each tool)
 
 ### Android GPUs
 
-There are multiple GPU vendors for the Android platforms. One can use tools
-provided by the GPU vendor. [Android GPU Inspector](https://gpuinspector.dev/)
+There are multiple GPU vendors for the Android platforms, each offering their
+own tools. [Android GPU Inspector](https://gpuinspector.dev/)
 (AGI) provides a cross-vendor solution. See the
 [documentation](https://gpuinspector.dev/docs/) for more details.
 
 #### Build Android app to run IREE
 
-In order to perform capture and analysis with AGI, one will need a full Android
+In order to perform capture and analysis with AGI, you will need a full Android
 app. In IREE we have a simple Android native app wrapper to help package
 IREE core libraries together with a specific VM bytecode invocation into an
-Android app. The wrapper and its documentation is placed at
+Android app. The wrapper and its documentation are placed at
 [`iree/tools/android/run_module_app/`](https://github.com/google/iree/tree/main/iree/tools/android/run_module_app).
 
 For example, to package a module compiled from the following `mhlo-dot.mlir` as
 an Android app:
 
-```
+```mlir
 func @dot(%lhs: tensor<2x4xf32>, %rhs: tensor<4x2xf32>) -> tensor<2x2xf32>
   attributes { iree.module.export } {
   %0 = "mhlo.dot"(%lhs, %rhs) : (tensor<2x4xf32>, tensor<4x2xf32>) -> tensor<2x2xf32>
@@ -122,17 +144,17 @@
 ```
 
 ```shell
-# First translate into VM bytecode module
-/path/to/iree/build/iree/tools/iree-translate -- \
+# First translate into a VM bytecode module
+$ /path/to/iree/build/iree/tools/iree-translate -- \
   -iree-mlir-to-vm-bytecode-module \
   --iree-hal-target-backends=vulkan \
   /path/to/mhlo-dot.mlir \
   -o /tmp/mhlo-dot.vmfb
 
 # Then package the Android app
-/path/to/iree/source/iree/tools/android/run_module_app/build_apk.sh \
+$ /path/to/iree/source/iree/tools/android/run_module_app/build_apk.sh \
   ./build-apk \
-  --module_file /tmp/mhlo-dot.mlir \
+  --module_file /tmp/mhlo-dot.vmfb \
   --entry_function dot \
   --function_inputs_file /path/to/inputs/file \
   --driver vulkan
@@ -146,17 +168,18 @@
 ```
 
 The above will build an `iree-run-module.apk` under the `./build-apk/`
-directory. One can then install via `adb install`.
+directory, which you can then install via `adb install`.
 
-`build_apk.sh` needs Android SDK and NDK internally. And easy way to manage
-them is by installing the [Android Studio](https://developer.android.com/studio).
+`build_apk.sh` needs the Android SDK and NDK internally, an easy way to manage
+them is by installing [Android Studio](https://developer.android.com/studio).
 After installation, you will need to set up a few environment variables, which
 are printed at the beginning of `build_apk.sh` invocation.
 
 #### Capture and analyze with AGI
 
-You can follow AGI's [get started](https://gpuinspector.dev/docs/getting-started)
-page to learn how to use it. In general the steps are:
+You can follow AGI's
+[Getting Started](https://gpuinspector.dev/docs/getting-started) page to learn
+how to use it. In general the steps are:
 
 * Install the latest AGI from https://github.com/google/agi/releases and launch.
 * Fill in the "Application" field by searching the app. The line should read
@@ -165,40 +188,41 @@
 * Configure system profile to include all GPU counters.
 * Start capture.
 
-The generated trace is in perfetto format. To view the trace does not need the
-device anymore. It can be viewed directly with AGI and also online in a browswer
-at https://ui.perfetto.dev/.
+Generated traces are in the [perfetto](https://perfetto.dev/) format. They can
+be viewed directly within AGI and also online in a browser at
+https://ui.perfetto.dev/, without needing an Android device.
 
 ### Desktop GPUs
 
-Vulkan is traditionally used for graphics rendering. So Vulkan profiling tools
-at the moment typically have such assumption and require a rendering boundary
-marked by framebuffer presentation. This means additional steps for IREE and
-other Vulkan applications that solely rely on headless compute. We need to wrap
-the core IREE logic inside a dummy rendering loop in order to provide tools the
-necessary markers to perform capture and analysis.
+Vulkan supports both graphics and compute, but most tools in the Vulkan
+ecosystem focus on graphics. As a result, some Vulkan profiling tools expect
+commands to correspond to a sequence of frames presented to displays via
+framebuffers. This means additional steps for IREE and other Vulkan
+applications that solely rely on headless compute. For graphics-focused tools,
+we need to wrap IREE's logic inside a dummy rendering loop in order to provide
+the necessary markers for these tools to perform capture and analysis.
 
 IREE provides an `iree-run-module-vulkan-gui` binary that can invoke a specific
 bytecode module within a proper GUI application. The graphics side is leveraging
-[Dear ImGui](https://github.com/ocornut/imgui); it invokes IREE core
-synchronously during rendering each frame and prints the bytecode invoation
-result to the screen.
+[Dear ImGui](https://github.com/ocornut/imgui); it calls into IREE
+synchronously during rendering each frame and prints the bytecode invocation
+results to the screen.
 
 To build `iree-run-module-vulkan-gui`:
 
 ```shell
 # Using Bazel
-bazel build //iree/testing/vulkan:iree-run-module-vulkan-gui
+$ bazel build //iree/testing/vulkan:iree-run-module-vulkan-gui
 
 # Using CMake
-cmake --build /path/to/build/dir --target iree-run-module-vulkan-gui
+$ cmake --build /path/to/build/dir --target iree-run-module-vulkan-gui
 ```
 
 The generated binary should be invoked in a console environment and it takes
 the same command-line options as the main
 [`iree-run-module`](./developer-overview.md#iree-run-module), except the
 `--driver` option. You can use `--help` to learn them all. The binary will
-invoke a GUI window to let one to use Vulkan tools.
+launch a GUI window for use with Vulkan tools.
 
 #### AMD