blob: 2a9a8dcea575fad6cc421beb32617df341cfc0f0 [file] [log] [blame] [view]
Geoffrey Martin-Noble36dd3172020-02-14 16:03:49 -08001# Benchmarking
2
3IREE uses benchmarks to inspect performance at varying levels of granularity.
4Benchmarking is implemented using the
Lei Zhangc0b34e82020-10-08 14:35:48 -04005[Google Benchmark library](https://github.com/google/benchmark). To understand
6performance details and guide optimization, please refer to the
7IREE [profiling](./profiling.md) documentation.
Geoffrey Martin-Noble36dd3172020-02-14 16:03:49 -08008
9## Module Benchmarks
10
11`iree-benchmark-module` is a program accepting (almost) the same inputs as
12`iree-run-module` that will benchmark the invocation of a single entry function.
13It measures timing for the whole process of invoking a function through the VM,
14including allocating and freeing output buffers. This is a high-level benchmark
15of an entire invocation flow. It provides a big picture view, but depends on
16many different variables, like an integration test. For finer-grained
Kojo Acquah12d3b282021-11-10 20:53:53 -080017measurements more akin to unit tests, see [Executable Benchmarks](#executable-benchmarks).
Geoffrey Martin-Noble36dd3172020-02-14 16:03:49 -080018
Geoffrey Martin-Noble48d2d6d2020-03-02 16:04:58 -080019To use `iree-benchmark-module`, generate an IREE module for the target backend:
Geoffrey Martin-Noble36dd3172020-02-14 16:03:49 -080020
21```shell
Stella Laurenzo7f2972c2022-03-19 14:09:43 -070022$ bazel run //iree/tools:iree-compile -- \
Geoffrey Martin-Noble36dd3172020-02-14 16:03:49 -080023 -iree-mlir-to-vm-bytecode-module \
Ben Vanikab3b79f2021-05-14 11:19:17 -070024 -iree-hal-target-backends=vmvx \
CindyLiu63fd39f2022-04-22 22:21:15 -070025 $PWD/samples/models/simple_abs.mlir \
Geoffrey Martin-Noble36dd3172020-02-14 16:03:49 -080026 -o /tmp/module.fb
27```
28
29and then benchmark an exported function in that module:
30
31```shell
32$ bazel run //iree/tools:iree-benchmark-module -- \
Han-Chung Wang8255d342020-10-05 07:32:51 -070033 --module_file=/tmp/module.fb \
Ben Vanikab3b79f2021-05-14 11:19:17 -070034 --driver=vmvx \
Geoffrey Martin-Noble36dd3172020-02-14 16:03:49 -080035 --entry_function=abs \
CindyLiuaf7dfeb2021-06-08 22:31:37 +000036 --function_input=f32=-2
Geoffrey Martin-Noble36dd3172020-02-14 16:03:49 -080037```
38
39You'll see output like
40
41```shell
42Run on (12 X 4500 MHz CPU s)
43CPU Caches:
44 L1 Data 32K (x6)
45 L1 Instruction 32K (x6)
46 L2 Unified 1024K (x6)
47 L3 Unified 8448K (x1)
48Load Average: 2.21, 1.93, 3.34
49***WARNING*** CPU scaling is enabled, the benchmark real time measurements may
50 be noisy and will incur extra overhead.
51***WARNING*** Library was built as DEBUG. Timings may be affected.
52------------------------------------------------------------------------------
53Benchmark Time CPU Iterations
54------------------------------------------------------------------------------
Phoenix Meadowlarkfed66202020-06-16 11:17:19 -070055BM_RunModule/process_time/real_time 0.22 ms 0.23 ms 3356
Geoffrey Martin-Noble36dd3172020-02-14 16:03:49 -080056```
57
58Notice that there are a few warnings in there (you may not see all of these).
59The benchmark library helpfully warns about some common issues that will affect
60benchmark timing. When trying to obtain real benchmark numbers, you should
Geoffrey Martin-Noble48d2d6d2020-03-02 16:04:58 -080061generally build an optimized build (`-c opt` in Bazel) and
62[disable CPU scaling](#cpu-configuration).
Geoffrey Martin-Noble36dd3172020-02-14 16:03:49 -080063
64```shell
65$ bazel build -c opt //iree/tools:iree-benchmark-module
66```
67
Geoffrey Martin-Noble48d2d6d2020-03-02 16:04:58 -080068Another thing to consider is that depending on where you are running the
69benchmark you might want to avoid additional programs running at the same time.
70Bazel itself runs a server even when it's not being actively invoked that can be
71quite a memory hog, so we'll instead invoke the binary directly. Use your
72favorite process manager (e.g. [htop](https://hisham.hm/htop/) or
Geoffrey Martin-Noble36dd3172020-02-14 16:03:49 -080073[pkill](https://en.wikipedia.org/wiki/Pkill) on Linux) to kill heavy-weight
74programs such as Chrome and Bazel.
75
Geoffrey Martin-Noble36dd3172020-02-14 16:03:49 -080076Now we'll actually invoke the binary:
77
78```shell
79$ ./bazel-bin/iree/tools/iree-benchmark-module \
Han-Chung Wang8255d342020-10-05 07:32:51 -070080 --module_file=/tmp/module.fb \
Ben Vanikab3b79f2021-05-14 11:19:17 -070081 --driver=vmvx \
Geoffrey Martin-Noble36dd3172020-02-14 16:03:49 -080082 --entry_function=abs \
CindyLiuaf7dfeb2021-06-08 22:31:37 +000083 --function_input=f32=-2
Geoffrey Martin-Noble36dd3172020-02-14 16:03:49 -080084```
85
86```shell
87Run on (12 X 4500 MHz CPU s)
88CPU Caches:
89 L1 Data 32K (x6)
90 L1 Instruction 32K (x6)
91 L2 Unified 1024K (x6)
92 L3 Unified 8448K (x1)
93Load Average: 1.49, 3.42, 3.49
94------------------------------------------------------------------------------
95Benchmark Time CPU Iterations
96------------------------------------------------------------------------------
Phoenix Meadowlarkfed66202020-06-16 11:17:19 -070097BM_RunModule/process_time/real_time 0.011 ms 0.014 ms 61654
Geoffrey Martin-Noble36dd3172020-02-14 16:03:49 -080098```
99
Hanhan Wang904f0c12020-06-03 13:46:36 -0700100Remember to [restore CPU scaling](#cpu-configuration) when you're done.
Geoffrey Martin-Noble36dd3172020-02-14 16:03:49 -0800101
Han-Chung Wangd13da6a2020-10-30 09:00:53 -0700102## Executable Benchmarks
Geoffrey Martin-Noble36dd3172020-02-14 16:03:49 -0800103
Han-Chung Wangd13da6a2020-10-30 09:00:53 -0700104We also benchmark the performance of individual parts of the IREE system in
105isolation. IREE breaks a model down to dispatch functions. To benchmark all the
Ben Vanikbfe0bc42021-03-12 10:51:55 -0800106dispatch functions, generate an IREE module with the
107`-iree-flow-export-benchmark-funcs` flag set:
Han-Chung Wangd13da6a2020-10-30 09:00:53 -0700108
109```shell
Stella Laurenzo7f2972c2022-03-19 14:09:43 -0700110$ build/iree/tools/iree-compile \
CindyLiuaf7dfeb2021-06-08 22:31:37 +0000111 -iree-input-type=mhlo \
Ben Vanikbfe0bc42021-03-12 10:51:55 -0800112 -iree-mlir-to-vm-bytecode-module \
113 -iree-flow-export-benchmark-funcs \
Ben Vanikab3b79f2021-05-14 11:19:17 -0700114 -iree-hal-target-backends=vmvx \
Han-Chung Wangd13da6a2020-10-30 09:00:53 -0700115 iree/test/e2e/models/fullyconnected.mlir \
116 -o /tmp/fullyconnected.vmfb
117```
118
119and then benchmark all exported dispatch functions (and all exported functions)
120in that module:
121
122```shell
123$ build/iree/tools/iree-benchmark-module
124 --module_file=/tmp/fullyconnected.vmfb
Ben Vanikab3b79f2021-05-14 11:19:17 -0700125 --driver=vmvx
Han-Chung Wangd13da6a2020-10-30 09:00:53 -0700126```
127
128If no `entry_function` is specified, `iree-benchmark-module` will register a
129benchmark for each exported function that takes no inputs.
130
131You will see output like:
132
133```shell
134Run on (72 X 3700 MHz CPU s)
135CPU Caches:
136 L1 Data 32 KiB (x36)
137 L1 Instruction 32 KiB (x36)
138 L2 Unified 1024 KiB (x36)
139 L3 Unified 25344 KiB (x2)
140Load Average: 4.39, 5.72, 6.76
141---------------------------------------------------------------------------------------------
142Benchmark Time CPU Iterations
143---------------------------------------------------------------------------------------------
Ben Vanikbfe0bc42021-03-12 10:51:55 -0800144BM_main_ex_dispatch_0_benchmark/process_time/real_time 0.030 ms 0.037 ms 34065
145BM_main_ex_dispatch_1_benchmark/process_time/real_time 0.034 ms 0.042 ms 20567
146BM_main_ex_dispatch_2_benchmark/process_time/real_time 0.043 ms 0.051 ms 18576
147BM_main_ex_dispatch_3_benchmark/process_time/real_time 0.029 ms 0.036 ms 21345
148BM_main_ex_dispatch_4_benchmark/process_time/real_time 0.042 ms 0.051 ms 15880
149BM_main_ex_dispatch_5_benchmark/process_time/real_time 0.030 ms 0.037 ms 17854
150BM_main_ex_dispatch_6_benchmark/process_time/real_time 0.043 ms 0.052 ms 14919
151BM_main_benchmark/process_time/real_time 0.099 ms 0.107 ms 5892
Han-Chung Wangd13da6a2020-10-30 09:00:53 -0700152```
Geoffrey Martin-Noble36dd3172020-02-14 16:03:49 -0800153
154### Bytecode Module Benchmarks
155
Lei Zhangc0b34e82020-10-08 14:35:48 -0400156Normally, the IREE VM is expected to be integrated into applications and driving
157model execution. So its performance is of crucial importance. We strive to
158introduce as little overhead as possible and have several benchmark binaries
159dedicated for evaluating the VM's performance. These benchmark binaries are
Geoffrey Martin-Noblef822f922020-12-28 09:11:19 -0800160named as `*_benchmark` in the
161[`iree/vm/`](https://github.com/google/iree/tree/main/iree/vm) directory. They
162also use the Google Benchmark library as the above.
Geoffrey Martin-Noble48d2d6d2020-03-02 16:04:58 -0800163
164## CPU Configuration
165
166When benchmarking, it's important to consider the configuration of your CPUs.
167Most notably, CPU scaling can give variable results, so you'll usually want to
168disable it. This can get pretty complex, but the most basic thing to do is to
Geoffrey Martin-Noble1e580322020-10-19 09:56:13 -0700169run all CPUs at maximum frequency. The other thing to consider is what CPU(s)
170your program is running on. Both of these get more complicated on mobile and in
171multithreaded workloads.
Geoffrey Martin-Noble48d2d6d2020-03-02 16:04:58 -0800172
173### Linux
174
175Google benchmark provides some
Geoffrey Martin-Noble1e580322020-10-19 09:56:13 -0700176[instructions](https://github.com/google/benchmark#disabling-cpu-frequency-scaling).
177Note that the library will print "CPU scaling is enabled" warnings for any
178configuration that
179[doesn't have the quota governor set to performance](https://github.com/google/benchmark/blob/3d1c2677686718d906f28c1d4da001c42666e6d2/src/sysinfo.cc#L228).
180Similarly the CPU frequency it reports is the
181[maximum frequency of cpu0](https://github.com/google/benchmark/blob/3d1c2677686718d906f28c1d4da001c42666e6d2/src/sysinfo.cc#L533),
182not the frequency of the processor it's actually running on. This means that
183more advanced configurations should ignore these messages.
Geoffrey Martin-Noble48d2d6d2020-03-02 16:04:58 -0800184
Geoffrey Martin-Noble1e580322020-10-19 09:56:13 -0700185Turn off CPU scaling before benchmarking.
Geoffrey Martin-Noble48d2d6d2020-03-02 16:04:58 -0800186
187```shell
188$ sudo cpupower frequency-set --governor performance
189```
190
Hanhan Wang904f0c12020-06-03 13:46:36 -0700191Restore CPU scaling after benchmarking:
192
Geoffrey Martin-Noble48d2d6d2020-03-02 16:04:58 -0800193```shell
194$ sudo cpupower frequency-set --governor powersave
195```
196
Geoffrey Martin-Noble1e580322020-10-19 09:56:13 -0700197To learn more about different quota
198governor settings, see
199https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt. To restrict
200which CPUs you run on, use the `taskset` command which takes a hexadecimal mask.
201
202To only run on the lowest-numbered CPU you can run
203
204```shell
205$ taskset 1 sleep 20 &
206```
207
208You can confirm that the process is running on the given CPU:
209
210```shell
211$ ps -o psr $!
212```
213
214Note that `$!` indicates the process ID of the last executed background command,
215so you can only use this shorthand if you didn't run any commands after the
216sleep. For more info on taskset, see https://linux.die.net/man/1/taskset.
217
Geoffrey Martin-Noble48d2d6d2020-03-02 16:04:58 -0800218### Android
219
Geoffrey Martin-Noble1e580322020-10-19 09:56:13 -0700220Read and understand the [Linux](#linux) instructions first.
221
Geoffrey Martin-Noble48d2d6d2020-03-02 16:04:58 -0800222Android doesn't give us quite as nice tooling, but the principle is basically
Geoffrey Martin-Noble1e580322020-10-19 09:56:13 -0700223the same. One important difference is that thermal throttling is a much bigger
224concern on mobile. Without a cooling plate, it is likely that high clock speeds
225will overheat the device and engage thermal throttling, which will ignore
226whatever clock speeds you may have set to prevent things from catching on fire.
227Therefore the naive approach above is likely not a good idea.
Geoffrey Martin-Noble48d2d6d2020-03-02 16:04:58 -0800228
Geoffrey Martin-Noble1e580322020-10-19 09:56:13 -0700229You will likely need to be root (use `su` or `adb root`). The commands will
230depend on your exact phone and number of cores. First play around and make sure
231you understand what everything means. Note that each CPU has its own files which
232are used to control its behavior, but changes to a single CPU will sometimes
233affect others (see `/sys/devices/system/cpu/cpu0/cpufreq/affected_cpus`).
234
235Some useful files:
Geoffrey Martin-Noble48d2d6d2020-03-02 16:04:58 -0800236
237```shell
Geoffrey Martin-Noble1e580322020-10-19 09:56:13 -0700238/proc/cpuinfo
239/sys/devices/system/cpu/possible
240/sys/devices/system/cpu/present
241/sys/devices/system/cpu/cpu0/online
242/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
243/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
244/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
245/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq
246/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq
247/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq
248/sys/devices/system/cpu/cpu0/cpufreq/affected_cpus
249/sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed
Geoffrey Martin-Noble48d2d6d2020-03-02 16:04:58 -0800250```
251
Geoffrey Martin-Noble1e580322020-10-19 09:56:13 -0700252See the clockspeed of each CPU
Geoffrey Martin-Noble48d2d6d2020-03-02 16:04:58 -0800253
254```shell
Geoffrey Martin-Noble1e580322020-10-19 09:56:13 -0700255$ for i in `cat /sys/devices/system/cpu/present | tr '-' ' ' | xargs seq`; do \
256 paste \
257 "/sys/devices/system/cpu/cpu${i?}/cpufreq/cpuinfo_cur_freq" \
258 "/sys/devices/system/cpu/cpu${i?}/cpufreq/cpuinfo_min_freq" \
259 "/sys/devices/system/cpu/cpu${i?}/cpufreq/cpuinfo_max_freq"; \
260done
Geoffrey Martin-Noble48d2d6d2020-03-02 16:04:58 -0800261```
262
Geoffrey Martin-Noble1e580322020-10-19 09:56:13 -0700263Before changing things, make sure to check the current scaling governor settings
264first so you can put them back when you're done.
265
Geoffrey Martin-Noble48d2d6d2020-03-02 16:04:58 -0800266```shell
Geoffrey Martin-Noble1e580322020-10-19 09:56:13 -0700267$ for i in `cat /sys/devices/system/cpu/present | tr '-' ' ' | xargs seq`; do \
268 cat "/sys/devices/system/cpu/cpu${i?}/cpufreq/scaling_governor"; \
269done
Geoffrey Martin-Noble48d2d6d2020-03-02 16:04:58 -0800270```
271
Geoffrey Martin-Noble1e580322020-10-19 09:56:13 -0700272#### Single-Core Example
273
274Here's an example to run IREE in a single-threaded context on CPU 7 at its
275lowest clock speed.
276
277First we'll take control of the clockspeed by setting the governor to
278"userspace".
Geoffrey Martin-Noble48d2d6d2020-03-02 16:04:58 -0800279
280```shell
Geoffrey Martin-Noble1e580322020-10-19 09:56:13 -0700281$ for i in `cat /sys/devices/system/cpu/present | tr '-' ' ' | xargs seq`; do \
282 echo userspace > \
283 "/sys/devices/system/cpu/cpu${i?}/cpufreq/scaling_governor"; \
284done
285```
286
287We can now set individual clock speeds. We'll pin cpu7 to its minimum frequency.
288We choose the minimum instead of the maximum here to mitigate thermal throttling
289concerns
290
291```shell
292$ cat /sys/devices/system/cpu/cpu7/cpufreq/cpuinfo_min_freq > \
293/sys/devices/system/cpu/cpu7/cpufreq/scaling_setspeed
294```
295
296We can confirm the frequencies of all the CPUs by running the same command
297above. Now to run a command specifically on cpu7, use `taskset 80`
298(hex for 10000000):
299
300```shell
Phoenix Meadowlarke162bf62021-02-11 14:58:03 -0800301$ taskset 80 sleep 20 &
Geoffrey Martin-Noble1e580322020-10-19 09:56:13 -0700302$ ps -o psr $!
303```
304
305Remember to cleanup when you're done! Here we'll set the scaling governor back
306to schedutil because that's what they were before on the particular device this,
307was tested on, but that may not exist on all devices.
308
309```shell
310$ for i in `cat /sys/devices/system/cpu/present | tr '-' ' ' | xargs seq`; do \
311 echo schedutil > \
312 "/sys/devices/system/cpu/cpu${i?}/cpufreq/scaling_governor"; \
313done
Geoffrey Martin-Noble48d2d6d2020-03-02 16:04:58 -0800314```
315
316TODO(scotttodd): Windows instructions