Rewriting the HAL CTS to support bazel and scale better. (#23644)

Rewrites the HAL Conformance Test Suite (CTS) from a CMake-only
template-instantiation system to a link-time composition architecture
that works with both Bazel and CMake. The new design compiles each test
once and links it against multiple backends, replacing the old approach
of generating a separate test binary for every (driver, test) pair. This
cuts CTS build work from O(drivers x tests) to O(drivers + tests) and
enables Bazel-native CTS support for the first time.

## Motivation

The old CTS had three compounding problems:

**Build scaling.** Each test header was compiled once per driver through
CMake `configure_file()` template instantiation. With 16 test suites and
8 drivers, that's 128 separate test binaries, each independently
compiling the same test logic against the same HAL API. As we add
drivers and tests, build time grows multiplicatively.

**CMake exclusivity.** The template-based code generation was a CMake
mechanism with no Bazel equivalent. This meant CTS tests couldn't run in
Bazel-based workflows, and adding Bazel support would have required
reimplementing the entire generation system in Starlark — with the same
scaling problem.

**Invisible test logic.** The old system generated `.cc` files at CMake
configure time from a `.cc.in` template that `#include`d test headers.
The actual test source lived in `.h` files that were neither standalone
translation units nor normal headers — they required specific macros to
be defined by the template before inclusion. This made tests hard to
navigate, hard to debug (breakpoints in generated files), and hard to
understand for new contributors.

## Design: link-time composition

The new CTS uses a registration-based architecture where test logic and
backend configuration are independent concerns connected at link time.

**Tests** are ordinary `.cc` files compiled into object libraries. Each
test class inherits from `CtsTestBase` and registers itself with the CTS
registry via a static initializer macro:

```cpp
class AllocatorTest : public CtsTestBase<> { ... };
TEST_P(AllocatorTest, BufferCompatibility) { ... }
CTS_REGISTER_TEST_SUITE(AllocatorTest);
```

**Backends** register themselves the same way — a single `.cc` file per
driver that provides a device factory, capability tags, and executable
format information:

```cpp
static bool registered_ = (CtsRegistry::RegisterBackend({
    "local_task",
    {.name = "local_task", .factory = CreateLocalTaskDevice},
    {"async_queue", "events", "file_io", "indirect"},
    {{.name = "vmvx", .format = "vmvx-bytecode-fb", .data_fn = ...}},
}), true);
```

**Composition** happens at link time: the build system links a backend
`.cc` against selected test object libraries and a shared
`test_main.cc`. At program start, static initializers populate the
registry, then `main()` calls `CtsRegistry::InstantiateAll()` to create
gtest parameterized test instances for every (backend, test suite) pair
that the backend's capabilities satisfy.

The result: each test file compiles exactly once. Adding a new driver
means writing one `backends.cc` file and one short build rule — the test
objects are already compiled and waiting to be linked.

## Build system integration

### Test suite macros

Both Bazel and CMake provide a `iree_hal_cts_test_suite()` macro that
generates the complete set of CTS test binaries for a driver. A typical
driver CTS configuration is around 20 lines of build rules:

```python
iree_hal_cts_test_suite(
    backends_lib = ":backends",
    executable_formats = {
        "amdgpu": {
            "target_device": "amdgpu",
            "identifier": "iree_cts_testdata_amdgpu",
            "backend_name": "amdgpu",
            "format_string": '"amdgcn-amd-amdhsa--{ROCM_TARGET}"',
            "flags": ["--iree-rocm-target={ROCM_TARGET}", ...],
        },
    },
    flag_values = {"ROCM_TARGET": "//build_tools/bazel:rocm_test_target"},
)
```

This produces 7 test binaries per driver (5 non-executable suites + 2
executable suites), each containing all tests in its category
parameterized across the driver's backends and formats. Compare to the
old system's 16 separate binaries per driver.

### iree_hal_executable rules

The CTS dispatch tests need compiled HAL executables as test data. New
`iree_hal_executable` and `iree_hal_executables` Starlark rules handle
this compilation in both Bazel and CMake:

- Compile MLIR sources to `.bin` files using `iree-compile
--compile-mode=hal-executable`
- Embed the binaries as C data arrays via `iree_c_embed_data`
- Support template variables in compiler flags via `flag_values`,
resolved at analysis time from Bazel `string_flag` build settings or
file targets

The `flag_values` mechanism enables hardware-specific compilation
without hard-coding target architectures. For example, AMDGPU tests
compile for `gfx1100` by default, but a developer can override at build
time:

```
bazel test --//build_tools/bazel:rocm_test_target=gfx942 //runtime/src/iree/hal/drivers/amdgpu/cts/...
```

### Test organization

Tests are organized by HAL API area, matching the structure developers
navigate when implementing a new driver:

```
runtime/src/iree/hal/cts/
  buffer/           allocator, mapping
  command_buffer/   basic ops, fill, copy, update, dispatch variants
  core/             driver, event, semaphore, executable, executable_cache
  file/             file mapping
  queue/            host calls, semaphore submission
  testdata/         MLIR sources for dispatch tests
  util/             registry, test_base, test_main
```

## Runtime features

### Capability-based test filtering

Backends declare their capabilities as tags (`"events"`, `"indirect"`,
`"file_io"`, etc.) at registration time. Test suites declare tag
requirements. The registry automatically skips tests for backends that
lack required capabilities — no per-driver exclusion lists needed.

Command buffer tests get special treatment: each test runs in both
direct and indirect recording modes, with indirect-mode tests filtered
to backends that advertise the `"indirect"` tag.

### Test exclusions and expected failures

Backends can declare permanent exclusions (features that will never be
supported) and temporary expected failures with explanations:

```cpp
.unsupported_tests = {{"FileTest.*", "WebGPU has no file I/O support"}},
.expected_failures = {{"SemaphoreTest.WaitThenSignal",
                       "Requires async signal from host thread; "
                       "blocked on WebGPU event loop integration"}},
```

Expected failures are skipped by default. Setting
`IREE_CTS_VERIFY_XFAILS=1` runs them instead, flagging unexpected passes
(XPASS) as test failures — this catches stale xfail entries that should
be removed after fixes land.

### GPU device caching

GPU backends can't afford to create and destroy devices per test — cloud
GPU runners have reliability issues with rapid device churn. The test
base caches backend resources (driver, device group, device, allocator)
across all tests for a given backend, creating them on first access and
releasing them in the correct order at program exit. Individual tests
hold their own references for isolation while sharing the underlying
resources.

## Other improvements in this PR

**Semaphore failure propagation in local_task.** CTS tests exposed a bug
where failures during command buffer dispatch were silently swallowed,
leaving semaphores in a permanently waiting state. The fix captures
dispatch failures and propagates them to signal semaphores, converting
them to error state so waiters get a clear failure instead of hanging.

**File descriptor exhaustion diagnostics.** `eventfd()` and `pipe()`
failures from hitting fd limits now return `RESOURCE_EXHAUSTED` with
actionable diagnostics (suggesting `ulimit -n` or `sysctl` adjustments)
instead of a generic errno translation.

**Bazel build for the HIP HAL driver.** Adds BUILD.bazel files for the
HIP driver, registration module, and utility library, plus a
`hip-api-headers` third-party dependency. This is the first step toward
full Bazel support for AMD GPU workflows.

**Three new dispatch tests.** `dispatch_constants_bindings` (push
constants with buffer bindings), `dispatch_multi_entrypoint` (multiple
entry points in one executable), and `dispatch_multi_workgroup`
(multi-dimensional workgroup dispatch) increase coverage beyond what the
old CTS tested.

---------

Co-authored-by: Claude <noreply@anthropic.com>
105 files changed
tree: fb13fafb9931f773ffea403a289961d4528d77e2
  1. .github/
  2. build_tools/
  3. compiler/
  4. docs/
  5. experimental/
  6. integrations/
  7. lib/
  8. llvm-external-projects/
  9. runtime/
  10. samples/
  11. tests/
  12. third_party/
  13. tools/
  14. .bazel_to_cmake.cfg.py
  15. .bazelignore
  16. .bazelrc
  17. .bazelversion
  18. .clang-format
  19. .git-blame-ignore-revs
  20. .gitattributes
  21. .gitignore
  22. .gitmodules
  23. .pre-commit-config.yaml
  24. .yamllint.yml
  25. AUTHORS
  26. BUILD.bazel
  27. CITATION.cff
  28. CMakeLists.txt
  29. configure_bazel.py
  30. CONTRIBUTING.md
  31. LICENSE
  32. MAINTAINERS.md
  33. MODULE.bazel
  34. README.md
  35. RELEASING.md
README.md

IREE: Intermediate Representation Execution Environment

IREE (Intermediate Representation Execution Eenvironment, pronounced as “eerie”) is an MLIR-based end-to-end compiler and runtime that lowers Machine Learning (ML) models to a unified IR that scales up to meet the needs of the datacenter and down to satisfy the constraints and special considerations of mobile and edge deployments.

See our website for project details, user guides, and instructions on building from source.

IREE Discord Status pre-commit OpenSSF Best Practices

Project news

Project status

Release status

Releases notes are published on GitHub releases.

PackageRelease status
GitHub release (stable)GitHub Release
GitHub release (nightly)GitHub Release
iree-base-compilerPyPI version
iree-base-runtimePyPI version

For more details on the release process, see https://iree.dev/developers/general/release-management/.

Build status

CI PkgCI

Nightly build status

Operating systemBuild status
LinuxCI - Linux arm64 clang
macOSCI - macOS x64 clang
macOSCI - macOS arm64 clang

For the full list of workflows see https://iree.dev/developers/general/github-actions/.

Communication channels

Related project channels

  • MLIR topic within LLVM Discourse: IREE is enabled by and heavily relies on MLIR. IREE sometimes is referred to in certain MLIR discussions. Useful if you are also interested in MLIR evolution.

Architecture overview

IREE Architecture IREE Architecture

See our website for more information.

Presentations and talks

Community meeting recordings: IREE YouTube channel

DateTitleRecordingSlides
2025-06-10Data-Tiling in IREE: Achieving High Performance Through Compiler Design (AsiaLLVM)recordingslides
2025-05-17Introduction to GPU architecture and IREE's GPU CodeGen Pipelinerecordingslides
2025-02-12The Long Tail of AI: SPIR-V in IREE and MLIR (Vulkanised)recordingslides
2024-10-01Unveiling the Inner Workings of IREE: An MLIR-Based Compiler for Diverse Hardwarerecording
2021-06-09IREE Runtime Design Tech Talkrecordingslides
2020-08-20IREE CodeGen (MLIR Open Design Meeting)recordingslides
2020-03-18Interactive HAL IR Walkthroughrecording
2020-01-31End-to-end MLIR Workflow in IREE (MLIR Open Design Meeting)recordingslides

License

IREE is licensed under the terms of the Apache 2.0 License with LLVM Exceptions. See LICENSE for more information.