commit | 1444755ad2e19707f0f5461cf9e499d8f3c093a9 | [log] [tgz] |
---|---|---|
author | Stanley Winata <68087699+raikonenfnu@users.noreply.github.com> | Mon Nov 04 16:38:26 2024 -0800 |
committer | GitHub <noreply@github.com> | Mon Nov 04 16:38:26 2024 -0800 |
tree | c51889cbbdbf227857f890dc7854b618de952660 | |
parent | f71dd1269d3ec54c07e5d77eada3f4a9ab7cb66f [diff] |
[LLVMGPU] Add VMFMA for FP8 to align layouts between chained F8 contractions. (#19020) This PR introduces virtual intrinsics on F8 MFMA that breaks apart a single 8xF8 read into two interleaved 4xF8 read from shared memory. This main motivation for this virtual intrinsic is to enable faster F8 attention/chained matmuls. The reason for that is by doing interleaved reads on K-dimension, we can match the native F8 intrisic output layout coming from the 1st matmul to the rhs read of the 2nd matmul(with interleaved virtual MFMA layout). Once the layout is aligned, we just need to handle it using to_layout lowering that does reshape on the SIMT vector. This PR has been tested on attention of shape: [B: 1, M: 4096, K1: 64, K2: 4096, N: 64] as seen in this IR: [(link)](https://gist.githubusercontent.com/raikonenfnu/4d33b5addfa9c4ec9e76918704251e39/raw/5b20c0c359e3e2df7f8db4890d3cc0590352d18a/attention_f8_perf.mlir) and using this spec to specify the VMFMA on 2nd matmul and regular MFMA on 1st matmul: ([link](https://gist.githubusercontent.com/raikonenfnu/4d33b5addfa9c4ec9e76918704251e39/raw/5b20c0c359e3e2df7f8db4890d3cc0590352d18a/attn_config.mlir)) we were able to get perf of 1.63x speed up from (reference with same config but using MFMA_16x16x32xF16 on both contractions. With correct/same numerics. Signed-off-by: Stanley Winata <stanley.winata@amd.com>
IREE (Intermediate Representation Execution Environment, pronounced as “eerie”) is an MLIR-based end-to-end compiler and runtime that lowers Machine Learning (ML) models to a unified IR that scales up to meet the needs of the datacenter and down to satisfy the constraints and special considerations of mobile and edge deployments.
See our website for project details, user guides, and instructions on building from source.
IREE is still in its early phase. We have settled down on the overarching infrastructure and are actively improving various software components as well as project logistics. It is still quite far from ready for everyday use and is made available without any support at the moment. With that said, we welcome any kind of feedback on any communication channels
Package | Release status |
---|---|
GitHub release (stable) | |
GitHub release (nightly) | |
Python iree-compiler | |
Python iree-runtime |
Host platform | Build status |
---|---|
Linux | |
macOS | |
Windows |
For the full list of workflows see https://iree.dev/developers/general/github-actions/.
See our website for more information.
Community meeting recordings: IREE YouTube channel
IREE is licensed under the terms of the Apache 2.0 License with LLVM Exceptions. See LICENSE for more information.