Add a doc to describe the benchmark dashboard (#6947) This commit adds a doc to explain the benchmark philosophy and the model benchmark specification used in the benchmark suite and dashboard.

commit: 664005aa366d81235d8886e3de24773932e086ed [log] [tgz]
author: Lei Zhang <antiagainst@google.com> Fri Sep 10 13:38:53 2021 -0400
committer: GitHub <noreply@github.com> Fri Sep 10 10:38:53 2021 -0700
tree: 54abe52626e60b2f1a3240e801dcfe9970dbc191
parent: 2ea4fe0bc3f373bdd1c8d6df3b757262ee30077a [diff]
diff --git a/benchmarks/dashboard.md b/benchmarks/dashboard.md
new file mode 100644
index 0000000..4b096a0
--- /dev/null
+++ b/benchmarks/dashboard.md

@@ -0,0 +1,145 @@
+# IREE Performance Dashboard
+
+This documentation explains IREE's performance dashboard (https://perf.iree.dev).
+A [Buildkite pipeline](https://buildkite.com/iree/iree-benchmark) runs on each
+commit to the `main` branch and posts those results to the dashboard.
+
+## Benchmarking philosophy
+
+Benchmarking and interpreting results properly is a delicate task. We can record
+metrics from various parts of a system, but depending on what we are trying to
+evaluate, those numbers may or may not be relevant. For example, for somebody
+working solely on better kernel code generation, the end-to-end model reference
+latency is unlikely meaningful given it also includes runtime overhead. The
+environment could also vary per benchmark run in uncontrollable ways, causing
+instability in the results. This is especially true for mobile and embedded
+systems, where a tight compromise between performance and thermal/battery limits
+is made. Too many aspects can affect the benchmarking results. So before going
+into details, it's worth nothing the general guideline to IREE benchmarking as
+context.
+
+The overarching goal for benchmarking here is to track IREE's performance
+progress and guard against regression. So the benchmarks are meant to understand
+the performance of IREE _itself_, not the absolute capability of the exercised
+hardware. In order to fulfill the above goal, we have the following guidelines
+for benchmarking:
+
+* We choose representative real-world models with varying characteristics.
+* We cover different IREE backends and different modes for each backend so that
+  folks working on different components can find the metrics they need.
+
+## Model benchmark specification
+
+Each benchmark in IREE has a unique identifier with the following format:
+
+```
+<model-name> `[` <model-tag>.. `]` `(` <model-source> `)` <benchmark-mode>..
+`with` <iree-driver>
+`@` <device-name> `(` <target-architecture> `)`
+```
+
+The following subsections explain possible choices in each field.
+
+### Model source
+
+This field specifies the original model source:
+
+```
+├── TensorFlow
+│     * models authored in TensorFlow and imported with `iree-import-tf`
+└── TFLite
+      * models converted to TensorFlow Lite and imported with `iree-import-tflite`
+```
+
+### Model name
+
+This field specifies the input model:
+
+* `DeepLabV3` [[source](https://tfhub.dev/tensorflow/lite-model/deeplabv3/1/default/1)]:
+  Vision model for semantic image segmentation.
+  Characteristics: convolution, feedforward NN.
+* `MobileBERT` [[source](https://github.com/google-research/google-research/tree/master/mobilebert)]:
+  NLP for Q&A.
+  Characteristics: matmul, attention, feedforward NN.
+* `MobileNetV2` [[source](https://www.tensorflow.org/api_docs/python/tf/keras/applications/MobileNetV2)]:
+  Vision model for image classification.
+  Characteristics: convolution, feedforward NN
+* `MobileNetV3Small` [[source](https://www.tensorflow.org/api_docs/python/tf/keras/applications/MobileNetV3Small)]:
+  Vision model for image classification.
+  Characteristics: convolution, feedforward NN.
+* `MobileSSD` [[source](https://www.tensorflow.org/lite/performance/gpu#demo_app_tutorials)]:
+  Vision model for object detection.
+  Characteristics: convolution, feedforward NN.
+* `PoseNet` [[source](https://tfhub.dev/tensorflow/lite-model/posenet/mobilenet/float/075/1/default/1)]:
+  Vision model for pose estimation.
+  Characteristics: convolution, feedforward NN.
+
+### Model tag
+
+This field specifies the model variant. It depends on the model, but here are
+some examples:
+
+* `f32`: the model is working on float types.
+* `imagenet`: the model takes ImageNet-sized inputs (224x224x3).
+
+### IREE driver
+
+This field specifies the IREE HAL driver:
+
+* [`Dylib`](https://google.github.io/iree/deployment-configurations/cpu-dylib/):
+  For CPU via dynamic library. Kernels contain CPU native instructions AOT
+  compiled using LLVM. This driver issues workload to the CPU in async
+  manner and supports multithreading.
+* [`Dylib-Sync`](https://google.github.io/iree/deployment-configurations/cpu-dylib/):
+  For CPU via dynamic library. Kernels contain contain CPU native instructions
+  AOT compiled using LLVM. This driver issues workload to the CPU in sync
+  manner.
+* [`VMVX`](https://github.com/google/iree/issues/5123):
+  For CPU via dynamic library. Kernels contain vector-level intrinsics that
+  are backed by fast implementations ([WIP](https://github.com/google/iree/issues/5819)).
+  This driver issues workload to the CPU in async manner and supports
+  multithreading.
+* [`Vulkan`](https://google.github.io/iree/deployment-configurations/gpu-vulkan/):
+  For GPU via Vulkan. Kernels contain SPIR-V. This driver issues workload to
+  the GPU via the Vulkan API.
+
+### Device name and target architecture
+
+These two fields are tightly coupled. They specify the device and hardware
+target for executing the benchmark.
+
+Right now there are two Android devices:
+
+* `Pixel-4`: Google Pixel 4 running Android 11. The SoC is
+  [Snapdragon 855](https://www.qualcomm.com/products/snapdragon-855-plus-and-860-mobile-platform),
+  with 1+3+4 ARMv8.2 CPU cores and Adreno 640 GPU.
+* `SM-G980F`: Samsung Galaxy S20 running Android 11. The SoC is
+  [Exynos 990](https://www.samsung.com/semiconductor/minisite/exynos/products/mobileprocessor/exynos-990/),
+  with 2+2+4 ARMv8.2 CPU cores and Mali G77 MP11 GPU.
+
+Therefore the target architectures are:
+
+* `CPU-CPU-ARMv8.2-A`: can benchmark all CPU-based IREE backends and drivers.
+* `GPU-Adreno-640`: can benchmark IREE Vulkan with Adreno target triples.
+* `GPU-Mali-G77`: can benchmark IREE Vulkan with Mali target triples.
+
+### Benchmark mode
+
+This field is to further specify the benchmark variant, given the same input
+model and target architecture. It controls important aspects like:
+
+* `*-core`: specifies the core flavor for CPU.
+* `*-thread`: specifies the number of threads for CPU.
+* `full-inference`: measures the latency for one full inference. Note that this
+  does not include the IREE system initialization time.
+* `kernel-execution`: measures only kernel execution latency for GPU. Note that
+  this is only possible for feedforward NN models that can be put into one
+  command buffer.
+
+`*-core` and `*-thread` together determines the `taskset` mask used for
+benchmarking IREE backends and drivers on CPU. For example,
+
+* `1-thread,big-core` would mean `taskset 80`.
+* `1-thread,little-core` would mean `taskset 08`.
+* `3-thread,big-core` would mean `taskset f0`.
+* `3-thread,little-core` would mean `taskset 0f`.

diff --git a/build_tools/android/upload_benchmarks_to_dashboard.py b/build_tools/android/upload_benchmarks_to_dashboard.py
index 6f6fc7d..093ff02 100755
--- a/build_tools/android/upload_benchmarks_to_dashboard.py
+++ b/build_tools/android/upload_benchmarks_to_dashboard.py

@@ -37,6 +37,10 @@
 <br>
 For the graph, the x axis is the Git commit index, and the y axis is the
 measured latency in milliseconds.
+<br>
+See <a href="https://github.com/google/iree/tree/main/benchmarks/dashboard.md">
+https://github.com/google/iree/tree/main/benchmarks/dashboard.md
+</a> for benchmark philosophy, specification, and definitions.
 """
 
 # A non-exhaustive list of models and their source URLs.
commit	664005aa366d81235d8886e3de24773932e086ed	[log] [tgz]
author	Lei Zhang <antiagainst@google.com>	Fri Sep 10 13:38:53 2021 -0400
committer	GitHub <noreply@github.com>	Fri Sep 10 10:38:53 2021 -0700
tree	54abe52626e60b2f1a3240e801dcfe9970dbc191
parent	2ea4fe0bc3f373bdd1c8d6df3b757262ee30077a [diff]