Discuss compilation and runtime settings in best practices doc. (#7110)
Progress on https://github.com/google/iree/issues/7095
diff --git a/docs/developers/best_practices.md b/docs/developers/best_practices.md
index d3d9c91..8745b8a 100644
--- a/docs/developers/best_practices.md
+++ b/docs/developers/best_practices.md
@@ -45,19 +45,42 @@
## Practices for compilation settings
-TODO: mention parameters to tune
-
TODO: which compiler targets to use (try both CUDA and Vulkan?)
TODO: use the most specific LLVM target triple you can?
+### Tuning compilation heuristics
+
+IREE runs its own suite of benchmarks continuously using the definitions at
+https://github.com/google/iree/tree/main/benchmarks. The flags set for these
+benchmarks represent the latest manually tuned values for workloads we track
+closely and referencing them may help with your own search for peak performance.
+You can use these flags in your own explorations, but note that as compiler
+performance matures, the existing flags will gradually be replaced with
+attributes for autotuning or command line options for experimental features.
+
## Practices for runtime use
+TODO: sample code, profile numbers
+
+### Tuning runtime settings
+
+When running on the CPU, the task system flags specified in
+[iree/task/api.c](https://github.com/google/iree/blob/main/iree/task/api.c)
+give control over how worker threads will be created. For example, the
+`--task_topology_group_count=3` flag can be set to explicitly run on three
+workers rather than rely on heuristic selection that defaults to one worker
+per detected physical core.
+
+If running on a single thread or system with no threading support, the
+`dylib-sync` HAL driver can be used instead of the more generic `dylib` HAL
+driver. The synchronous driver performs execution inline rather than through
+IREE's task scheduling system.
+
### Do the minimum amount of work: cache queries and reuse buffers
-Try to front-load queries, particularly queries using strings that look up into
-maps like `iree_runtime_session_call_by_name`, so that hot sections of code are
-doing the minimum amount of work: routing inputs through buffers, scheduling
-runtime calls, and routing outputs through other buffers.
-
-TODO: sample code, profile numbers
+When using IREE's runtime libraries, try to front-load queries, particularly
+queries using strings that look up into maps like
+`iree_runtime_session_call_by_name`, so that hot sections of code are doing the
+minimum amount of work: routing inputs through buffers, scheduling runtime
+calls, and routing outputs through other buffers.