| # IREE Performance Dashboard |
| |
| This documentation explains IREE's performance dashboard (https://perf.iree.dev). |
| A [Buildkite pipeline](https://buildkite.com/iree/iree-benchmark) runs on each |
| commit to the `main` branch and posts those results to the dashboard. |
| |
| ## Benchmarking philosophy |
| |
| Benchmarking and interpreting results properly is a delicate task. We can record |
| metrics from various parts of a system, but depending on what we are trying to |
| evaluate, those numbers may or may not be relevant. For example, for somebody |
| working solely on better kernel code generation, the end-to-end model reference |
| latency is unlikely meaningful given it also includes runtime overhead. The |
| environment could also vary per benchmark run in uncontrollable ways, causing |
| instability in the results. This is especially true for mobile and embedded |
| systems, where a tight compromise between performance and thermal/battery limits |
| is made. Too many aspects can affect the benchmarking results. So before going |
| into details, it's worth nothing the general guideline to IREE benchmarking as |
| context. |
| |
| The overarching goal for benchmarking here is to track IREE's performance |
| progress and guard against regression. So the benchmarks are meant to understand |
| the performance of IREE _itself_, not the absolute capability of the exercised |
| hardware. In order to fulfill the above goal, we have the following guidelines |
| for benchmarking: |
| |
| * We choose representative real-world models with varying characteristics. |
| * We cover different IREE backends and different modes for each backend so that |
| folks working on different components can find the metrics they need. |
| |
| ## Model benchmark specification |
| |
| Each benchmark in IREE has a unique identifier with the following format: |
| |
| ``` |
| <model-name> `[` <model-tag>.. `]` `(` <model-source> `)` <benchmark-mode>.. |
| `with` <iree-driver> |
| `@` <device-name> `(` <target-architecture> `)` |
| ``` |
| |
| The following subsections explain possible choices in each field. |
| |
| ### Model source |
| |
| This field specifies the original model source: |
| |
| ``` |
| ├── TensorFlow |
| │ * models authored in TensorFlow and imported with `iree-import-tf` |
| └── TFLite |
| * models converted to TensorFlow Lite and imported with `iree-import-tflite` |
| ``` |
| |
| ### Model name |
| |
| This field specifies the input model: |
| |
| * `DeepLabV3` [[source](https://tfhub.dev/tensorflow/lite-model/deeplabv3/1/default/1)]: |
| Vision model for semantic image segmentation. |
| Characteristics: convolution, feedforward NN. |
| * `MobileBERT` [[source](https://github.com/google-research/google-research/tree/master/mobilebert)]: |
| NLP for Q&A. |
| Characteristics: matmul, attention, feedforward NN. |
| * `MobileNetV2` [[source](https://www.tensorflow.org/api_docs/python/tf/keras/applications/MobileNetV2)]: |
| Vision model for image classification. |
| Characteristics: convolution, feedforward NN |
| * `MobileNetV3Small` [[source](https://www.tensorflow.org/api_docs/python/tf/keras/applications/MobileNetV3Small)]: |
| Vision model for image classification. |
| Characteristics: convolution, feedforward NN. |
| * `MobileSSD` [[source](https://www.tensorflow.org/lite/performance/gpu#demo_app_tutorials)]: |
| Vision model for object detection. |
| Characteristics: convolution, feedforward NN. |
| * `PoseNet` [[source](https://tfhub.dev/tensorflow/lite-model/posenet/mobilenet/float/075/1/default/1)]: |
| Vision model for pose estimation. |
| Characteristics: convolution, feedforward NN. |
| |
| ### Model tag |
| |
| This field specifies the model variant. It depends on the model, but here are |
| some examples: |
| |
| * `f32`: the model is working on float types. |
| * `imagenet`: the model takes ImageNet-sized inputs (224x224x3). |
| |
| ### IREE driver |
| |
| This field specifies the IREE HAL driver: |
| |
| * [`Dylib`](https://google.github.io/iree/deployment-configurations/cpu-dylib/): |
| For CPU via dynamic library. Kernels contain CPU native instructions AOT |
| compiled using LLVM. This driver issues workload to the CPU in async |
| manner and supports multithreading. |
| * [`Dylib-Sync`](https://google.github.io/iree/deployment-configurations/cpu-dylib/): |
| For CPU via dynamic library. Kernels contain contain CPU native instructions |
| AOT compiled using LLVM. This driver issues workload to the CPU in sync |
| manner. |
| * [`VMVX`](https://github.com/google/iree/issues/5123): |
| For CPU via dynamic library. Kernels contain vector-level intrinsics that |
| are backed by fast implementations ([WIP](https://github.com/google/iree/issues/5819)). |
| This driver issues workload to the CPU in async manner and supports |
| multithreading. |
| * [`Vulkan`](https://google.github.io/iree/deployment-configurations/gpu-vulkan/): |
| For GPU via Vulkan. Kernels contain SPIR-V. This driver issues workload to |
| the GPU via the Vulkan API. |
| |
| ### Device name and target architecture |
| |
| These two fields are tightly coupled. They specify the device and hardware |
| target for executing the benchmark. |
| |
| Right now there are two Android devices: |
| |
| * `Pixel-4`: Google Pixel 4 running Android 11. The SoC is |
| [Snapdragon 855](https://www.qualcomm.com/products/snapdragon-855-plus-and-860-mobile-platform), |
| with 1+3+4 ARMv8.2 CPU cores and Adreno 640 GPU. |
| * `SM-G980F`: Samsung Galaxy S20 running Android 11. The SoC is |
| [Exynos 990](https://www.samsung.com/semiconductor/minisite/exynos/products/mobileprocessor/exynos-990/), |
| with 2+2+4 ARMv8.2 CPU cores and Mali G77 MP11 GPU. |
| |
| Therefore the target architectures are: |
| |
| * `CPU-CPU-ARMv8.2-A`: can benchmark all CPU-based IREE backends and drivers. |
| * `GPU-Adreno-640`: can benchmark IREE Vulkan with Adreno target triples. |
| * `GPU-Mali-G77`: can benchmark IREE Vulkan with Mali target triples. |
| |
| ### Benchmark mode |
| |
| This field is to further specify the benchmark variant, given the same input |
| model and target architecture. It controls important aspects like: |
| |
| * `*-core`: specifies the core flavor for CPU. |
| * `*-thread`: specifies the number of threads for CPU. |
| * `full-inference`: measures the latency for one full inference. Note that this |
| does not include the IREE system initialization time. |
| * `kernel-execution`: measures only kernel execution latency for GPU. Note that |
| this is only possible for feedforward NN models that can be put into one |
| command buffer. |
| |
| `*-core` and `*-thread` together determines the `taskset` mask used for |
| benchmarking IREE backends and drivers on CPU. For example, |
| |
| * `1-thread,big-core` would mean `taskset 80`. |
| * `1-thread,little-core` would mean `taskset 08`. |
| * `3-thread,big-core` would mean `taskset f0`. |
| * `3-thread,little-core` would mean `taskset 0f`. |