Benchmark IREE and TFLite

We use our end-to-end TensorFlow integration tests to test compilation and numerical accuracy, and to generate compilation and benchmarking artifacts. This allows us to validate that our benchmarks are behaving as we expect them to, and to run them using valid inputs for each model.

This guide assumes that you can run the tensorflow integration tests. See this doc for more information. That doc also covers writing new tests, which you‘ll need to do if you’d like to benchmark a new TensorFlow model.

1. Run IREE's E2E TensorFlow tests to generate the benchmarking artifacts

This command will compile and test all of our passing, non-manual targets.

bazel test //integrations/tensorflow/e2e/...

Running the above command populates a directory /tmp/iree/modules/ with the compilation artifacts needed to benchmark each TensorFlow model in our tests. Each test/module has a folder with the following artifacts (filtered to only include those relevant for benchmarking):

# Example for a generic module `ModuleName`:
/tmp/iree/modules/ModuleName
  ├── iree_vmla  # Or any other IREE backend.
  │   ├── compiled.vmfb
  │   │   # A flatbuffer containing IREE's compiled code.
  │   └── traces
  │       # Directory with a trace for each unittest in vision_model_test.py.
  │       ├── traced_function_1
  │       │   # Directory storing logs and serialization for a specific trace.
  │       │   └── flagfile
  │       │       # An Abseil flagfile containing arguments
  │       │       # iree-benchmark-module needs to benchmark this trace.
  │       ├── traced_function_2
  │       └── ...
  └── tflite
      ├── module_method_1.tflite
      │   # A method on ModuleName compiled to bytes with TFLite, which can
      │   # be used by the TFLite's benchmark_model binary.
      ├── module_method_2.tflite
      ├── ...
      └── traces
          ├── traced_function_1
          │   └── graph_path
          │       # In general, a trace's name does not have to match the name
          │       # of the method(s) on the tf.Module that it calls. This file
          │       # points to the correct module_method_*.tflite graph file
          │       # for TFLite's benchmark_model to use.
          ├── traced_function_2
          └── ...

# Example for MatrixOpsStaticModule:
/tmp/iree/modules/MatrixOpsStaticModule
  ├── iree_llvmjit
  │   ├── compiled.vmfb
  │   └── traces
  │       ├── basic_matmul
  │       │   └── flagfile
  │       ├── matmul_broadcast_singleton_dimension
  │       │   └── flagfile
  │       ├── matmul_lhs_batch
  │       │   └── flagfile
  │       └── matmul_rhs_batch
  │           └── flagfile
  ├── iree_vmla
  │   ├── compiled.vmfb
  │   └── traces  # ...same as iree_llvmjit/traces above.
  ├── iree_vulkan
  │   ├── compiled.vmfb
  │   └── traces  # ...same as iree_llvmjit/traces above.
  └── tflite
      ├── basic_matmul.tflite
      ├── matmul_broadcast_singleton_dimension.tflite
      ├── matmul_lhs_batch.tflite
      ├── matmul_rhs_batch.tflite
      └── traces
          ├── basic_matmul
          │   └── graph_path
          ├── matmul_broadcast_singleton_dimension
          │   └── graph_path
          ├── matmul_lhs_batch
          │   └── graph_path
          └── matmul_rhs_batch
              └── graph_path

Optional: Compile the Keras Applications Vision tests

The vision tests take a while to run, so we exclude them from the CI and wildcard expansion. They can be run by invoking the following test suite:

bazel test //integrations/tensorflow/e2e/keras:vision_external_tests

The previous command compiles MobileNet, MobileNetV2 and ResNet50 to run on cifar10 and imagenet weights on all backends. The artifacts generated by this test suite are slightly different than those above in that they are organized by /tmp/iree/modules/ModelName/Dataset/backends instead of just by /tmp/iree/modules/ModelName/backends.

2. Benchmarking IREE on desktop

2.1 Optional: Build the `iree-benchmark-module`

This step is optional, but allows running the benchmarks without running bazel at the same time.

bazel build -c opt //iree/tools:iree-benchmark-module

This creates bazel-bin/iree/tools/iree-benchmark-module. The rest of the guide will use this binary, but you could also use bazel run iree/tools:iree-benchmark-module in its place if your prefer.

2.2 Benchmark the model on IREE

The E2E tests generate a flagfile with all of the information that iree-benchmark-module needs to benchmark each trace. Namely it handles providing the following flags:

Flag	Description
--input_file	Absolute path to the IREE compiled VM flatbuffer
--inputs	A comma delimited string of input tensors
--driver	The backend driver to use for the benchmark
--entry_function	The method on the TensorFlow module to benchmark

You can find the flagfile to benchmark a specific TensorFlow module on a specific IREE backend and trace at the following path:

/tmp/iree/modules/ModuleName/backend/traces/trace_name/flagfile

For example, if we wanted to benchmark a static left-hand-side batched matmul using MatrixOpsStaticModule on VMLA we would run the following command:

./bazel-bin/iree/tools/iree-benchmark-module \
  --flagfile="/tmp/iree/modules/MatrixOpsStaticModule/iree_vmla/traces/matmul_lhs_batch/flagfile"

If you ran the Keras Applications vision test suite, then you'll be able to benchmark ResNet50, MobileNet or MobileNetV2 with cifar10 or imagenet weights. For example:

./bazel-bin/iree/tools/iree-benchmark-module \
  --flagfile="/tmp/iree/modules/ResNet50/cifar10/iree_vmla/traces/predict/flagfile"

3. Benchmarking TFLite on desktop

3.1 Build TFLite's `benchmark_model` binary

# Enter the TensorFlow Bazel workspace.
cd third_party/tensorflow/

# Build the benchmark_model binary without RUY...
bazel build --copt=-mavx2 -c opt \
  //tensorflow/lite/tools/benchmark:benchmark_model

# ...or build the benchmark_model binary with RUY. This will overwrite the
# previous binary unless you move it.
bazel build --copt=-mavx2 -c opt \
  --define=tflite_with_ruy=true \
  //tensorflow/lite/tools/benchmark:benchmark_model

# The binary can now be found in the following directory:
ls bazel-bin/tensorflow/lite/tools/benchmark/

3.2 Benchmark the model on TFLite

TFLite doesn't support flagfiles, so we need to manually pass the path to the graph file via cat. TFLite will generate fake inputs for the model.

Using MatrixOpsStaticModule's left-hand-side batched matmul again as an example we can run the benchmark as follows:

# Run within `third_party/tensorflow/`.
./bazel-bin/tensorflow/lite/tools/benchmark/benchmark_model \
  --graph=$(cat "/tmp/iree/modules/MatrixOpsStaticModule/tflite/traces/matmul_lhs_batch/graph_path") \
  --warmup_runs=1 \
  --num_threads=1 \
  --num_runs=100 \
  --enable_op_profiling=true

4. Benchmarking IREE on Android

4.1 Prepare the benchmarking tools

IREE only supports compiling to Android with CMake. Documentation on setting up your environment to cross-compile to Android can be found here.

# After following the instructions above up to 'Build all targets', the
# iree-benchmark-module binary should be in the following directory:
ls build-android/iree/tools/

# Copy the benchmarking binary to phone.
adb push build-android/iree/tools/iree-benchmark-module /data/local/tmp

# Allow executing benchmarking file as a program.
adb shell chmod +x /data/local/tmp/iree-benchmark-module

4.2 Push the IREE's compilation / benchmarking artifacts to the device

In this example we'll only copy over the files we need to benchmark a single module on a single backend, but you can easily copy all of the modules over as well.

Using MatrixOpsStaticModule's left-hand-side batched matmul again as an example:

# Make a directory for the module/backend pair we want to benchmark.
mkdir -p /data/local/tmp/MatrixOpsStaticModule/iree_vmla/

# Transfer the files.
adb push /tmp/iree/modules/MatrixOpsStaticModule/iree_vmla/* \
  /data/local/tmp/MatrixOpsStaticModule/iree_vmla/

4.3 Benchmark the module

adb shell /data/local/tmp/iree-benchmark-module \
  --flagfile="/data/local/tmp/MatrixOpsStaticModule/iree_vmla/traces/matmul_lhs_batch/flagfile"
  --input_file="/data/local/tmp/MatrixOpsStaticModule/iree_vmla/compiled.vmfb"

Note: Because the flagfile uses absolute paths, the --input_file flag must be specified manually if the location of the compiled flatbuffer (compiled.vmfb) changes. The flagfile can still take care of specifying the input data, driver and entry function however.

5. Benchmark the model on Android with TFLite

5.1 Prepare the benchmarking tools

There are three options for getting TFLite's benchmark_model binary for Android.

The first two are to build it directly, either in a docker container or in your own environment. Assuming you can build TensorFlow with Android, you can configure the TFLite benchmark_model binary in the following ways:

# Build the benchmark_model binary without any add-ons.
bazel build -c opt \
  --config=android_arm64 \
  --cxxopt='--std=c++17' \
  //tensorflow/lite/tools/benchmark:benchmark_model

# Copy the benchmarking binary to phone and allow execution.
adb push bazel-bin/tensorflow/lite/tools/benchmark/benchmark_model \
  /data/local/tmp
adb shell chmod +x /data/local/tmp/benchmark_model

# Build the benchmark_model binary with ruy.
bazel build --copt=-mavx2 -c opt \
  --config=android_arm64 \
  --cxxopt='--std=c++17' \
  --define=tflite_with_ruy=true \
  --copt=-DTFLITE_WITH_RUY_GEMV \
  //tensorflow/lite/tools/benchmark:benchmark_model

# Rename the binary for comparison with the standard benchmark_model.
mv bazel-bin/tensorflow/lite/tools/benchmark/benchmark_model \
  bazel-bin/tensorflow/lite/tools/benchmark/benchmark_model_plus_ruy
adb push bazel-bin/tensorflow/lite/tools/benchmark/benchmark_model_plus_ruy \
  /data/local/tmp/
adb shell chmod +x /data/local/tmp/benchmark_model_plus_ruy

# Build the benchmark_model binary with flex.
bazel build -c opt \
  --config=android_arm64 \
  --cxxopt='--std=c++17' \
  //tensorflow/lite/tools/benchmark:benchmark_model_plus_flex

# Copy the benchmarking binary to phone and allow execution.
adb push bazel-bin/tensorflow/lite/tools/benchmark/benchmark_model_plus_flex \
  /data/local/tmp
adb shell chmod +x /data/local/tmp/benchmark_model_plus_flex

Alternatively, you can download and install the Android Benchmark App. If you choose to install the app then you'll have to modify the benchmarking commands below slightly, as shown in this example.

5.2 Run the benchmark

# Copy the data over to the phone.
mkdir -p /data/local/tmp/MatrixOpsStaticModule/tflite
adb push /tmp/iree/modules/MatrixOpsStaticModule/tflite/* \
  /data/local/tmp/MatrixOpsStaticModule/tflite/

# Benchmark with TFLite.
adb shell taskset f0 /data/local/tmp/benchmark_model \
  --graph=/data/local/tmp/MatrixOpsStaticModule/tflite/matmul_lhs_batch.tflite \
  --warmup_runs=1 \
  --num_threads=1 \
  --num_runs=10 \
  --enable_op_profiling=true

# Benchmark with TFLite + RUY.
adb shell taskset f0 /data/local/tmp/benchmark_model_plus_ruy \
  --graph=/data/local/tmp/MatrixOpsStaticModule/tflite/matmul_lhs_batch.tflite \
  --warmup_runs=1 \
  --num_threads=1 \
  --num_runs=10 \
  --enable_op_profiling=true

# Benchmark with TFLite + Flex.
adb shell taskset f0 /data/local/tmp/benchmark_model_plus_flex \
  --graph=/data/local/tmp/MatrixOpsStaticModule/tflite/matmul_lhs_batch.tflite \
  --warmup_runs=1 \
  --num_threads=1 \
  --num_runs=10 \
  --enable_op_profiling=true

# Benchmark with TFLite running on GPU.
adb shell taskset f0 /data/local/tmp/benchmark_model \
  --graph=/data/local/tmp/MatrixOpsStaticModule/tflite/matmul_lhs_batch.tflite \
  --warmup_runs=1 \
  --num_threads=1 \
  --num_runs=10 \
  --enable_op_profiling=true \
  --use_gpu=true

Running benchmark on GPU won't give op profiling. To detailed profiling information for GPU you can run the following script:

# Op profiling on GPU using OpenCL backend.
sh tensorflow/lite/delegates/gpu/cl/testing/run_performance_profiling.sh \
  -m /data/local/tmp/MatrixOpsStaticModule/tflite/matmul_lhs_batch.tflite

Note: You will have to manually specify the TFLite graph that you want to benchmark, as the graph_path file assumes that the graph has not moved. The name of the .tflite graph that you need to benchmark may be different from the name of the trace that you want to benchmark, but you can use cat on the graph_path file to verify the correct .tflite filename if you're unsure.