)]}'
{
  "commit": "cdc5eee405937726cb63ebedb4dc32fc4d9a6ee0",
  "tree": "a5df137e2c4a0427107e8f76adcabb8dc28f26d8",
  "parents": [
    "68a9309688a6605151d97d369a68cb5b620802bc"
  ],
  "author": {
    "name": "Zhewen Yu",
    "email": "zhewenyu@amd.com",
    "time": "Thu Dec 04 11:23:12 2025 +0000"
  },
  "committer": {
    "name": "GitHub",
    "email": "noreply@github.com",
    "time": "Thu Dec 04 11:23:12 2025 +0000"
  },
  "message": "[GPU] Fix alignment check for scaled matmul (#22737)\n\n## Problem\n\nThe current alignment check in `GPUHeuristics.cpp` is incorrect for any\nintrinsic that has multiple M, N, and K dimensions. The root cause is\nthat the product of intrinsic sizes is passed to `GPUMMASchedule`\ninstead of passing the individual dimension sizes as a vector.\n\n## Example\n\n\nhttps://github.com/iree-org/iree/blob/b98c1b92cb630bd696992f47df591bb2f247a8d7/compiler/src/iree/compiler/Codegen/Common/GPU/GPUHeuristics.cpp#L516-L525\n\nConsider the scaled MFMA where `intrinsic.kSizes \u003d [K, KB] \u003d [4, 32]`.\nInstead of passing the vector `[4, 32]`, the value `128` (product: 4 ×\n32) is passed to `GPUMMASchedule`.\n\n\nhttps://github.com/iree-org/iree/blob/b98c1b92cb630bd696992f47df591bb2f247a8d7/compiler/src/iree/compiler/Codegen/Common/GPU/GPUHeuristics.cpp#L98-L108\n\nAssume tile size \u003d `[4, 1]`. The returned schedule sizes become `[4,\n128]` instead of the correct `[16, 32]`. As a result, the last dimension\n`128` always makes the alignment check fail, since the problem size of\nKB is `32` and `32 % 128 !\u003d 0`.\n\nWhen the alignment check fails, no intrinsic is selected and the\noperation falls back to complete serialization. This leads to extremely\nslow execution for workloads like Llama 405B FP4 prefill with direct\ncodegen.\n\n## Solution\n\nThis PR passes all intrinsic sizes as vectors to `GPUMMASchedule`.\n\n## Performance\n\n**Llama 405B FP4 prefill direct codegen with shark-ai:**\n- Before: 11 minutes\n- After: 234 ms\n\nCloses: #22559\n\nci-extra: test_torch\n\n---------\n\nSigned-off-by: Yu-Zhewen \u003czhewenyu@amd.com\u003e",
  "tree_diff": [
    {
      "type": "modify",
      "old_id": "630f157ada8a7f8fa0e95a743ed4bc477153981b",
      "old_mode": 33188,
      "old_path": "compiler/src/iree/compiler/Codegen/Common/GPU/GPUHeuristics.cpp",
      "new_id": "6c83c7fcb7df92ae948a51a58cae90043c59e4c2",
      "new_mode": 33188,
      "new_path": "compiler/src/iree/compiler/Codegen/Common/GPU/GPUHeuristics.cpp"
    },
    {
      "type": "modify",
      "old_id": "d1d40911e34b5d1f7de9c4df5e5e675f49c447dc",
      "old_mode": 33188,
      "old_path": "compiler/src/iree/compiler/Codegen/Common/GPU/GPUHeuristics.h",
      "new_id": "db998e5b8fee442e1fa4e4627ac8ce7ed581b8da",
      "new_mode": 33188,
      "new_path": "compiler/src/iree/compiler/Codegen/Common/GPU/GPUHeuristics.h"
    },
    {
      "type": "modify",
      "old_id": "3238d232a5428e1014678568eef345bdc90c8309",
      "old_mode": 33188,
      "old_path": "compiler/src/iree/compiler/Codegen/Dialect/GPU/TargetUtils/ConfigUtils.cpp",
      "new_id": "545dcba71130a8edba108b9bcd13dde6f3b72373",
      "new_mode": 33188,
      "new_path": "compiler/src/iree/compiler/Codegen/Dialect/GPU/TargetUtils/ConfigUtils.cpp"
    },
    {
      "type": "modify",
      "old_id": "44259ce3ae06b86b3419589bc1840cbf19d65461",
      "old_mode": 33188,
      "old_path": "compiler/src/iree/compiler/Codegen/LLVMGPU/KernelConfig.cpp",
      "new_id": "6c2c89357f7cb6304f6e042f8660471ec70a1d50",
      "new_mode": 33188,
      "new_path": "compiler/src/iree/compiler/Codegen/LLVMGPU/KernelConfig.cpp"
    },
    {
      "type": "modify",
      "old_id": "9d50043476482982153d7ef1c9555e02ffeff30d",
      "old_mode": 33188,
      "old_path": "compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/config_tile_and_fuse_gfx950.mlir",
      "new_id": "d98e73ffcd3f20b49c99ae35a9c59c40f7ecd633",
      "new_mode": 33188,
      "new_path": "compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/config_tile_and_fuse_gfx950.mlir"
    },
    {
      "type": "modify",
      "old_id": "400bde2705b4adce1a365e50649f8217bcfeee9b",
      "old_mode": 33188,
      "old_path": "compiler/src/iree/compiler/Codegen/SPIRV/KernelConfig.cpp",
      "new_id": "efba8390e7134903104827ff3953db664741ffc6",
      "new_mode": 33188,
      "new_path": "compiler/src/iree/compiler/Codegen/SPIRV/KernelConfig.cpp"
    }
  ]
}
