Move GPU ukernel selection to KernelConfig (#19440)

This moves the logic deciding whether an op should be a ukernel out of
the GPULowerToUKernels pass, into KernelConfig.

So KernelConfig decides whether the op should be a ukernel, and encodes
that into the resulting `lowering_config`, in a new parameter, that is a
new attribute, UKernelSpecAttr. That attribute is directly modeled after
the equivalent C++ data structure that we have had in LowerToUKernels
passes, `FnNameAndDefAttrs`, which it replaces. If the attribute is
present, it means that the op was selected for ukernel lowering, with
the fields telling the ukernel name and some function definition
attributes (to import any dependencies, such as the `rocm` module for
runtime support symbols).

All the details about supplying the ukernel bitcode in a
`hal.executable.object` are also moved there, becoming a side effect of
`KernelConfig`.

The GPULowerToUKernels becomes much simpler, since all the
decision-making was already done for it. It just looks at the
`LoweringConfigAttr` and if it's there, it performs the requested
lowering.

The motivation for this split is that we need to know in KernelConfig
whether it's going to be a ukernel, because ops that will get lowered to
a ukernel require a different configuration. The important example for
us is `multi_mma`, which in the ukernel case needs to avoid
reduction-dimension tiling to 1 so that the ukernel gets to see the
reduction loop.

A few simplifications arise already in the current argmax ukernel logic,
confirming that this was the right design choice: the old ukernel's
matching logic was checking that the distribution tile sizes matched
what the ukernel could handle; now that is turned upside down: the
ukernel matching happens as a helper within KernelConfig where we know
we are setting the appropriate tile sizes on purpose.

Another nice improvement is that this puts just enough distance between
ukernel selection (which creates the `hal.executable.object`) and
ukernel lowering, that we are able to insert
`HoistExecutableObjectsPass` in between, simplifying the ukernel
lowering as it doesn't need to worry anymore about preserving the
`hal.executable.object`.

---------

Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
22 files changed
tree: 4bca58ccf0364634adf715d26dab0373be7d4cc9
  1. .github/
  2. build_tools/
  3. compiler/
  4. docs/
  5. experimental/
  6. integrations/
  7. lib/
  8. llvm-external-projects/
  9. runtime/
  10. samples/
  11. tests/
  12. third_party/
  13. tools/
  14. .bazel_to_cmake.cfg.py
  15. .bazelignore
  16. .bazelrc
  17. .bazelversion
  18. .clang-format
  19. .git-blame-ignore-revs
  20. .gitattributes
  21. .gitignore
  22. .gitmodules
  23. .pre-commit-config.yaml
  24. .yamllint.yml
  25. AUTHORS
  26. BUILD.bazel
  27. CITATION.cff
  28. CMakeLists.txt
  29. configure_bazel.py
  30. CONTRIBUTING.md
  31. LICENSE
  32. MAINTAINERS.md
  33. README.md
  34. RELEASING.md
  35. WORKSPACE
README.md

IREE: Intermediate Representation Execution Environment

IREE (Intermediate Representation Execution Environment, pronounced as “eerie”) is an MLIR-based end-to-end compiler and runtime that lowers Machine Learning (ML) models to a unified IR that scales up to meet the needs of the datacenter and down to satisfy the constraints and special considerations of mobile and edge deployments.

See our website for project details, user guides, and instructions on building from source.

IREE Discord Status pre-commit OpenSSF Best Practices

Project Status

IREE is still in its early phase. We have settled down on the overarching infrastructure and are actively improving various software components as well as project logistics. It is still quite far from ready for everyday use and is made available without any support at the moment. With that said, we welcome any kind of feedback on any communication channels

Release status

PackageRelease status
GitHub release (stable)GitHub Release
GitHub release (nightly)GitHub Release
Python iree-base-compilerPyPI version
Python iree-base-runtimePyPI version

Build status

CI PkgCI

Host platformBuild status
LinuxCI - Linux x64 clang
CI - Linux arm64 clang
macOSCI - macOS x64 clang
WindowsCI - Windows x64 MSVC

For the full list of workflows see https://iree.dev/developers/general/github-actions/.

Communication Channels

Related Project Channels

  • MLIR topic within LLVM Discourse: IREE is enabled by and heavily relies on MLIR. IREE sometimes is referred to in certain MLIR discussions. Useful if you are also interested in MLIR evolution.

Architecture Overview

IREE Architecture IREE Architecture

See our website for more information.

Presentations and Talks

Community meeting recordings: IREE YouTube channel

  • 2021-06-09: IREE Runtime Design Tech Talk (recording and slides)
  • 2020-08-20: IREE CodeGen: MLIR Open Design Meeting Presentation (recording and slides)
  • 2020-03-18: Interactive HAL IR Walkthrough (recording)
  • 2020-01-31: End-to-end MLIR Workflow in IREE: MLIR Open Design Meeting Presentation (recording and slides)

License

IREE is licensed under the terms of the Apache 2.0 License with LLVM Exceptions. See LICENSE for more information.