)]}'
{
  "commit": "7ca09a2fc250c4c129d36fdf08c4069521371d2b",
  "tree": "8db839d47f000c249830453ab16147a00f4ec96d",
  "parents": [
    "65f88b5459ef0d37ca786893d8854ef755a3c49d"
  ],
  "author": {
    "name": "Zhewen Yu",
    "email": "zhewenyu@amd.com",
    "time": "Tue Apr 14 16:04:51 2026 +0200"
  },
  "committer": {
    "name": "GitHub",
    "email": "noreply@github.com",
    "time": "Tue Apr 14 15:04:51 2026 +0100"
  },
  "message": "[Codegen] Add XOR swizzle for BF16 matmul with DMA (#23932)\n\n## Results: BF16 square matmul (`transposed_rhs`)\n\n| Shape | Intrinsic | tile_k | DMA (no swizzle) Bank Conflicts | DMA\n(`xor_shuffle\u003c128,8\u003e`) Bank Conflicts | DMA (`xor_shuffle\u003c64,8\u003e`) Bank\nConflicts | DMA (no swizzle) Time | DMA (`xor_shuffle\u003c128,8\u003e`) Time |\nDMA(`xor_shuffle\u003c64,8\u003e`) Time |\n| ---: | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |\n| 512 | 16x16x32 | 4 | 7.00 | **0.00** | 1.00 | 0.063ms | 0.058ms |\n**0.055ms** |\n| 1024 | 16x16x32 | 4 | 7.00 | **0.00** | 1.00 | 0.068ms | 0.062ms |\n**0.057ms** |\n| 2048 | 16x16x32 | 1 | 1.00 | 1.00 | **0.00** | 0.096ms | 0.092ms |\n**0.086ms** |\n| 4096 | 16x16x32 | 1 | 1.00 | 1.00 | **0.00** | 0.222ms | 0.214ms |\n**0.210ms** |\n| 8192 | 16x16x32 | 1 | 1.00 | 1.00 | **0.00** | 1.61ms | 1.61ms |\n**1.47ms** |\n| 16384 | 32x32x16 | 2 | 3.00 | **0.00** | 1.00 | 10.0ms | **9.35ms** |\n9.42ms |\n\n## Results: Sweep on 320 product shapes\n\nGeometric mean speedup vs no-swizzle baseline:\n\n| Config | Geomean Speedup vs Baseline (positive is better) |\n|--------|--------------------------------------------------|\n| DMA (`xor_shuffle\u003c64,8\u003e`) | **+7.4%** |\n| DMA (`xor_shuffle\u003c128,8\u003e`) | +1.7% |\n| Oracle (best xor per shape) | +8.4% |\n| Baseline (no DMA) | +0.0% |\n| DMA (no swizzle) | -5.3% |\n\nThe oracle picks the best config per shape, showing +1.0% additional\nheadroom. However, it is actually difficult to summarize a simple\ncompile-time heuristic. We default to `\u003c64,8\u003e` as it gives the best\nsingle-config geomean.\n\nFixes: #23901\n\nAssisted-by: Cursor (Claude)\n\nSigned-off-by: Yu-Zhewen \u003czhewenyu@amd.com\u003e",
  "tree_diff": [
    {
      "type": "modify",
      "old_id": "665bd584ad3be9b1c54f363322bc0eab3da9296b",
      "old_mode": 33188,
      "old_path": "compiler/src/iree/compiler/Codegen/Dialect/GPU/TargetUtils/ConfigUtils.cpp",
      "new_id": "ab6673d8e24bc22c7309e2512e8359ad421cf23c",
      "new_mode": 33188,
      "new_path": "compiler/src/iree/compiler/Codegen/Dialect/GPU/TargetUtils/ConfigUtils.cpp"
    },
    {
      "type": "modify",
      "old_id": "dd830dd46ed51ef4387ea137eca32624484c6532",
      "old_mode": 33188,
      "old_path": "compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/config_tile_and_fuse_gfx950.mlir",
      "new_id": "a2941f82ce412df95050d92ef52ce2a843330d85",
      "new_mode": 33188,
      "new_path": "compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/config_tile_and_fuse_gfx950.mlir"
    },
    {
      "type": "modify",
      "old_id": "84dd7deb099da3494072da8e489397ff9b857094",
      "old_mode": 33188,
      "old_path": "compiler/src/iree/compiler/Codegen/Utils/GPUUtils.cpp",
      "new_id": "1df72308e20f46d89591132b30a5b8450179a4f7",
      "new_mode": 33188,
      "new_path": "compiler/src/iree/compiler/Codegen/Utils/GPUUtils.cpp"
    }
  ]
}
