[WebGPU] Add WebGPU HAL driver, WGSL compiler target, CTS, and sample. (#24463) This adds the first end-to-end WebGPU target path for IREE: a compiler backend that emits WGSL executables and a JavaScript-hosted HAL driver that can submit those executables through the browser/Node WebGPU API from a freestanding wasm32 runtime. Most gaps now exist in infrastructure and hosting applications, with the HAL being largely complete. The important product boundary is that this is a WebGPU driver for the Web platform, not an Emscripten port and not a native Dawn HAL. The C runtime owns IREE's HAL object model, synchronization contracts, command recording, executable metadata, and queue ordering. JavaScript owns the ambient WebGPU objects, Promise completion delivery, and the import module that maps integer wasm handles to real GPUAdapter/GPUDevice/GPUBuffer/GPUQueue objects. That split keeps the ABI narrow. All values crossing the wasm boundary are integers or pointers into wasm linear memory. WebGPU objects are represented as uint32 handles in a JS-side table, handle 0 is null, and async WebGPU APIs complete through the JS proactor token ring introduced by the wasm runtime commit. The C side never gets a raw JS object and the JS side does not need to understand HAL resources beyond the declared import ABI. The driver uses an instruction-stream bridge instead of one wasm import per HAL command. HAL command buffers and one-shot queue operations compile into compact uint32 instruction blocks. JavaScript walks those blocks in one bridge call, resolves dynamic bindings from a binding table, reuses static bindings for cached recordings, batches encoder commands, and submits pending GPUCommandBuffers at explicit queue-surface boundaries. This makes the wasm/JS boundary a command-stream boundary instead of a per-command overhead cliff. The runtime queue contract follows WebGPU's actual execution model. CPU-only operations can signal after their wait completes. GPU-submit operations wait, encode/submit work, register queue.onSubmittedWorkDone(), and signal HAL semaphores only when WebGPU reports that submitted work is complete. Queue epochs and async frontiers preserve causal ordering for downstream waits, while submitted-provenance tracking keeps FIFO waits from adding unnecessary host-side round trips. WebGPU does not provide every primitive that IREE's HAL exposes directly. The driver internalizes those gaps instead of pushing them onto callers: fill uses a builtin WGSL compute shader, unaligned copy/update paths fall back to a copy shader, executable loading creates compute pipelines and bind group layouts from WGSL, and command execution presents the usual HAL fill/copy/update/dispatch surface even though WebGPU splits those operations across queue, encoder, and compute-pass APIs. The compiler side lowers through the existing SPIR-V path and translates SPIR-V to WGSL with Tint/Dawn. The serialized executable format is `webgpu-wgsl-fb`: a FlatBuffer containing WGSL shader modules plus per-export metadata such as entry point names, workgroup sizes, binding flags, constant counts, source/debug data, and the information the runtime needs to create pipelines and bind groups. The target is registered as the `webgpu` device and `webgpu-spirv` executable backend. The initial runtime support contract is intentionally narrow. WebGPU exposes one queue per device, so the driver currently routes through a single queue while keeping queue state isolated enough for future queue[N] shaping. The JavaScript inline host can validate WGSL and run CTS-style entry points, but blocking C code cannot make JavaScript Promises settle while the same wasm thread is waiting. CTS expected failures document those blocking-completion cases instead of pretending they are implemented. This commit includes: * A `webgpu` HAL driver with driver/device/allocator/buffer/semaphore/ executable objects, executable cache, FD-backed file helpers, registration module, and public driver creation API. * A C import ABI and JavaScript companion module for WebGPU object handles, adapter/device requests, buffer mapping, command encoding, pipeline creation, bind group creation, command-stream execution, cached recordings, and queue.onSubmittedWorkDone() completion delivery. * A compact WebGPU command ISA and builder that records HAL commands into block-backed uint32 streams with dynamic/static binding slots and automatic encoder begin/end insertion. * Builtin WGSL fill/copy shaders used to provide HAL semantics where WebGPU has no native command or requires stricter alignment than HAL callers expose. * A `webgpu-spirv` compiler plugin that reuses SPIR-V codegen, prepares SPIR-V for WebGPU constraints, translates with Tint/Dawn, and packages WGSL plus executable metadata into `webgpu-wgsl-fb`. * HAL CTS wiring for the wasm32-wasi WebGPU path, including the Node `webgpu` package loader, WASI preopen/output setup, and expected failures for blocking-completion cases that the inline host cannot yet satisfy. * A WebGPU hello-world sample that builds a VMFB, dumps generated WGSL, and validates that WGSL through Dawn's WebGPU implementation. * Build-system integration for Bazel and CMake, including generated CMake targets and explicit selection of the WebGPU SPIR-V compiler target. The CTS coverage exercises the runtime side under wasm32-wasi with the JS WebGPU bridge. The passing coverage includes buffer, command buffer, core, file, and queue CTS groups, with expected failures kept to the operations that require a blocking C wait while JavaScript Promise completions are still pending on the same inline host. Together with the wasm runtime commit below it, this establishes the first coherent WebGPU bring-up slice: IREE can generate WGSL for WebGPU, package it in a HAL executable format, create WebGPU pipelines from that executable, validate a hello-world shader end to end, and run meaningful HAL CTS coverage through the same wasm/JS bridge that applications will use. Future changes will build tooling and samples that run VMFB programs.
IREE (Intermediate Representation Execution Eenvironment, pronounced as “eerie”) is an MLIR-based end-to-end compiler and runtime that lowers Machine Learning (ML) models to a unified IR that scales up to meet the needs of the datacenter and down to satisfy the constraints and special considerations of mobile and edge deployments.
See our website for project details, user guides, and instructions on building from source.
Releases notes are published on GitHub releases.
| Package | Release status |
|---|---|
| GitHub release (stable) | |
| GitHub release (nightly) | |
iree-base-compiler | |
iree-base-runtime |
For more details on the release process, see https://iree.dev/developers/general/release-management/.
| Operating system | Build status |
|---|---|
| Linux | |
| macOS | |
| macOS |
For the full list of workflows see https://iree.dev/developers/general/github-actions/.
See our website for more information.
Community meeting recordings: IREE YouTube channel
| Date | Title | Recording | Slides |
|---|---|---|---|
| 2025-06-10 | Data-Tiling in IREE: Achieving High Performance Through Compiler Design (AsiaLLVM) | recording | slides |
| 2025-05-17 | Introduction to GPU architecture and IREE's GPU CodeGen Pipeline | recording | slides |
| 2025-02-12 | The Long Tail of AI: SPIR-V in IREE and MLIR (Vulkanised) | recording | slides |
| 2024-10-01 | Unveiling the Inner Workings of IREE: An MLIR-Based Compiler for Diverse Hardware | recording | |
| 2021-06-09 | IREE Runtime Design Tech Talk | recording | slides |
| 2020-08-20 | IREE CodeGen (MLIR Open Design Meeting) | recording | slides |
| 2020-03-18 | Interactive HAL IR Walkthrough | recording | |
| 2020-01-31 | End-to-end MLIR Workflow in IREE (MLIR Open Design Meeting) | recording | slides |
IREE is licensed under the terms of the Apache 2.0 License with LLVM Exceptions. See LICENSE for more information.