commit | bb1c561cdb25e411071a84cd8173ba107c61c9d3 | [log] [tgz] |
---|---|---|
author | Benoit Jacob <jacob.benoit.1@gmail.com> | Thu Jan 09 12:32:45 2025 -0500 |
committer | GitHub <noreply@github.com> | Thu Jan 09 12:32:45 2025 -0500 |
tree | 7b8472561ea7caf29bfa7d46b937d632dcb9f779 | |
parent | a7bac5d9f5fab902d906f941a0f1002a809d3f35 [diff] |
Erase all address spaces and get inlined ukernels (#19646) The `LLVMGPUCastAddressSpaceFunction` pass was selectively erasing the shared memory address space from pointers around Call ops to achieve inlining. This PR generalizes that to erasing all address spaces after checking with its original author that there wasn't anything intentional here: [discord](https://discord.com/channels/689900678990135345/1282818085153407038/1326577591557296272) This has the intended effect of allowing AMDGPU ukernels to get inlined into their callers. There is a side benefit of not having to duplicate ukernels for the various combinations of address spaces of their pointer parameters. This benefit will be partly rolled back if and when we do assembly ukernels, as these will need to know the address spaces to write different instructions, but at least for C ukernels it is nice. It was counter-intuitive to me that erasing address spaces was possible at all. The key is that these ukernels only get compiled to LLVM IR, not to ISA, and the resulting IR gets inlined into a caller where the addrspacecast was done and where the actual address space is known. After inlining, the compiler is still able to propagate the actual address spaces all the way into the inlined ukernel code. For the current `multi_mma` ukernel there was no immediate problem. The changes to it in this PR are reaping the benefits of inlining: now the `unroll_*` parameters become compile-time constants after inlining so we get to simply declare our accumulator tile as a VLA and let it get specialized to a normal fixed-size array. No need anymore to use an arbitrary fixed size array and try to guard that with assertions. For the exising `argmax` ukernels, the inlining revealed a preexisting issue: these ukernels are reductions to a single scalar and instead of returning it by value, write their result value to an output buffer (which happens to be LDS memory, but the address space doesn't matter). The problem was that there was no synchronization between the thread writing the value in the ukernel, and the threads reading the value in the caller. Solved by adding a `__threadfence_block()`, which compiles to almost nothing in ISA (s_waitcnt, which we have anyway around memory accesses) but prevents IR rewrites removing the loads from the output buffer. I added `__threadfence_block()` to common.h, copied from AMD device library headers, along with a few other synchronization functions which we anticipate will be useful in other ukernels. `__syncthreads` is not used in this PR. Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
IREE (Intermediate Representation Execution Environment, pronounced as “eerie”) is an MLIR-based end-to-end compiler and runtime that lowers Machine Learning (ML) models to a unified IR that scales up to meet the needs of the datacenter and down to satisfy the constraints and special considerations of mobile and edge deployments.
See our website for project details, user guides, and instructions on building from source.
Releases notes are published on GitHub releases.
Package | Release status |
---|---|
GitHub release (stable) | |
GitHub release (nightly) | |
Python iree-base-compiler | |
Python iree-base-runtime |
Operating system | Build status |
---|---|
Linux | |
macOS | |
Windows |
For the full list of workflows see https://iree.dev/developers/general/github-actions/.
See our website for more information.
Community meeting recordings: IREE YouTube channel
Date | Title | Recording | Slides |
---|---|---|---|
2021-06-09 | IREE Runtime Design Tech Talk | recording | slides |
2020-08-20 | IREE CodeGen (MLIR Open Design Meeting) | recording | slides |
2020-03-18 | Interactive HAL IR Walkthrough | recording | |
2020-01-31 | End-to-end MLIR Workflow in IREE (MLIR Open Design Meeting) | recording | slides |
IREE is licensed under the terms of the Apache 2.0 License with LLVM Exceptions. See LICENSE for more information.