[CPU][SVE] Update default tiles sizes for matmul ops (#15650)

The default tile sizes for SVE for matmuls, [8, 32, 16], are basically a
copy of one of the existing configurations. In the case of SVE, that
configuration leads to too aggressive unrolling and poor performance due
to register spilling. Hence the need to update.

This patch updates the default tile sizes for SVE to [8, 16, 1]. The
middle dimension corresponds to vector sizes after vectorisation (that's
also the dimension that's configured to be scalable). As the base vector
register size for SVE is 128 bits, there will be (depending on the
element size):

  * (16 x vscale) elements per vector register for i8,
  * (16 / 2 x vscale) elements per vector registers for i16,
  * (16 / 4 x vscale) elements per vector registers for i32,
  * (...)

So, effectively, 16 is the lowest number that can be used to avoid under
utilisation of vector registers (i.e. a lower number might be fine for
wider elements, but not for i8).

As for the remaining tile sizes, those were determined experimentally by
benchmarking `linalg.matmul` for:

  * square matrices (1020x1020, 1021x1021, 1024x1024),
  * both i8 and f32 element types,
  * input tensors with static and dynamic shapes.

In all the of the above cases, the new configuration improves the
performance.

It's probably worth pointing out that during compilation, IREE will
reduce the leading tile size from 8 to 6. So, effectively, the tile
sizes will be [6, 16, 1].
2 files changed
tree: 137f82246303797e5921bb6cba4253ce6bf8cdcd
  1. .devcontainer/
  2. .github/
  3. build_tools/
  4. compiler/
  5. docs/
  6. experimental/
  7. integrations/
  8. lib/
  9. llvm-external-projects/
  10. runtime/
  11. samples/
  12. Testing/
  13. tests/
  14. third_party/
  15. tools/
  16. .bazel_to_cmake.cfg.py
  17. .bazelignore
  18. .bazelrc
  19. .bazelversion
  20. .clang-format
  21. .dockerignore
  22. .git-blame-ignore-revs
  23. .gitignore
  24. .gitmodules
  25. .yamllint.yml
  26. AUTHORS
  27. BUILD.bazel
  28. CITATION.cff
  29. CMakeLists.txt
  30. configure_bazel.py
  31. CONTRIBUTING.md
  32. LICENSE
  33. README.md
  34. WORKSPACE
README.md

IREE: Intermediate Representation Execution Environment

IREE (Intermediate Representation Execution Environment, pronounced as “eerie”) is an MLIR-based end-to-end compiler and runtime that lowers Machine Learning (ML) models to a unified IR that scales up to meet the needs of the datacenter and down to satisfy the constraints and special considerations of mobile and edge deployments.

See our website for project details, user guides, and instructions on building from source.

CI Status

Project Status

IREE is still in its early phase. We have settled down on the overarching infrastructure and are actively improving various software components as well as project logistics. It is still quite far from ready for everyday use and is made available without any support at the moment. With that said, we welcome any kind of feedback on any communication channels!

Communication Channels

Related Project Channels

  • MLIR topic within LLVM Discourse: IREE is enabled by and heavily relies on MLIR. IREE sometimes is referred to in certain MLIR discussions. Useful if you are also interested in MLIR evolution.

Architecture Overview

IREE Architecture IREE Architecture

See our website for more information.

Presentations and Talks

License

IREE is licensed under the terms of the Apache 2.0 License with LLVM Exceptions. See LICENSE for more information.