commit | 6a37a3c0fd3d4286af3ea471859345505f47d1dd | [log] [tgz] |
---|---|---|
author | Han-Chung Wang <hanchung@google.com> | Mon Nov 22 13:31:02 2021 -0800 |
committer | GitHub <noreply@github.com> | Mon Nov 22 13:31:02 2021 -0800 |
tree | a079a8c51d439a928cdf797defd61a1e68e4cbb7 | |
parent | 06ba88af3354c7a6cb177661bb0ce9e849516f55 [diff] |
Hook vector unrolling in LLVMTileFuseAndVectorizePass. (#7682) This PR enables tiling reduction loop in L1 tiling, and unrolls the loops with native vector sizes. Vector unrolling could introduce register pressure issue. To prevent regression at this moment, we explicitly set default L1 tile sizes to 16,16,16 for x86. Vector unrolling does nothing in this case. I tested with different tile sizes (e.g.,16,16,32), there were differences. As mentioned. we use 16,16,16 as default to prevent big regressions. The next steps is to figure out different configurations between sandbox and IREE. Might have to set up different vector.contract lowering strategies, and tune with different tile sizes. Before unrolling: ``` ---------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------------------------------- BM_dot_384x384x512/process_time/real_time 2.82 ms 2.83 ms 252 BM_dot_384x128x128/process_time/real_time 0.152 ms 0.152 ms 4595 BM_dot_384x128x512/process_time/real_time 0.631 ms 0.633 ms 1153 BM_dot_384x512x128/process_time/real_time 0.653 ms 0.654 ms 858 BM_dot_384x512x2/process_time/real_time 0.535 ms 0.537 ms 1328 BM_dot_384x384x32/process_time/real_time 0.151 ms 0.151 ms 4666 BM_dot_384x384x512_exp/process_time/real_time 2.89 ms 2.90 ms 246 BM_dot_384x128x128_exp/process_time/real_time 0.167 ms 0.168 ms 4222 BM_dot_384x128x512_exp/process_time/real_time 0.666 ms 0.668 ms 1082 BM_dot_384x512x128_exp/process_time/real_time 0.650 ms 0.652 ms 1078 BM_dot_384x512x2_exp/process_time/real_time 0.550 ms 0.551 ms 1276 BM_dot_384x384x32_exp/process_time/real_time 0.157 ms 0.157 ms 4462 ``` After unrolling: ``` ---------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------------------------------- BM_dot_384x384x512/process_time/real_time 2.78 ms 2.79 ms 249 BM_dot_384x128x128/process_time/real_time 0.153 ms 0.153 ms 4561 BM_dot_384x128x512/process_time/real_time 0.596 ms 0.598 ms 1183 BM_dot_384x512x128/process_time/real_time 0.707 ms 0.708 ms 1096 BM_dot_384x512x2/process_time/real_time 0.528 ms 0.529 ms 1309 BM_dot_384x384x32/process_time/real_time 0.151 ms 0.151 ms 4533 BM_dot_384x384x512_exp/process_time/real_time 2.91 ms 2.92 ms 242 BM_dot_384x128x128_exp/process_time/real_time 0.166 ms 0.166 ms 4225 BM_dot_384x128x512_exp/process_time/real_time 0.649 ms 0.651 ms 1058 BM_dot_384x512x128_exp/process_time/real_time 0.650 ms 0.651 ms 1075 BM_dot_384x512x2_exp/process_time/real_time 0.539 ms 0.540 ms 1302 BM_dot_384x384x32_exp/process_time/real_time 0.157 ms 0.158 ms 4404 ```
IREE (Intermediate Representation Execution Environment, pronounced as “eerie”) is an MLIR-based end-to-end compiler and runtime that lowers Machine Learning (ML) models to a unified IR that scales up to meet the needs of the datacenter and down to satisfy the constraints and special considerations of mobile and edge deployments.
See our website for project details, user guides, and instructions on building from source.
IREE is still in its early phase. We have settled down on the overarching infrastructure and are actively improving various software components as well as project logistics. It is still quite far from ready for everyday use and is made available without any support at the moment. With that said, we welcome any kind of feedback on any communication channels!
See our website for more information.
IREE is licensed under the terms of the Apache 2.0 License with LLVM Exceptions. See LICENSE for more information.