)]}'
{
  "commit": "94550280e13a69952d3934aafae9c010ec0b9d01",
  "tree": "137f82246303797e5921bb6cba4253ce6bf8cdcd",
  "parents": [
    "aee955b92938eca4f37d3100d0ca32bc377573e9"
  ],
  "author": {
    "name": "Andrzej Warzyński",
    "email": "andrzej.warzynski@arm.com",
    "time": "Mon Nov 27 19:57:49 2023 +0000"
  },
  "committer": {
    "name": "GitHub",
    "email": "noreply@github.com",
    "time": "Mon Nov 27 11:57:49 2023 -0800"
  },
  "message": "[CPU][SVE] Update default tiles sizes for matmul ops (#15650)\n\nThe default tile sizes for SVE for matmuls, [8, 32, 16], are basically a\r\ncopy of one of the existing configurations. In the case of SVE, that\r\nconfiguration leads to too aggressive unrolling and poor performance due\r\nto register spilling. Hence the need to update.\r\n\r\nThis patch updates the default tile sizes for SVE to [8, 16, 1]. The\r\nmiddle dimension corresponds to vector sizes after vectorisation (that\u0027s\r\nalso the dimension that\u0027s configured to be scalable). As the base vector\r\nregister size for SVE is 128 bits, there will be (depending on the\r\nelement size):\r\n\r\n  * (16 x vscale) elements per vector register for i8,\r\n  * (16 / 2 x vscale) elements per vector registers for i16,\r\n  * (16 / 4 x vscale) elements per vector registers for i32,\r\n  * (...)\r\n\r\nSo, effectively, 16 is the lowest number that can be used to avoid under\r\nutilisation of vector registers (i.e. a lower number might be fine for\r\nwider elements, but not for i8).\r\n\r\nAs for the remaining tile sizes, those were determined experimentally by\r\nbenchmarking `linalg.matmul` for:\r\n\r\n  * square matrices (1020x1020, 1021x1021, 1024x1024),\r\n  * both i8 and f32 element types,\r\n  * input tensors with static and dynamic shapes.\r\n\r\nIn all the of the above cases, the new configuration improves the\r\nperformance.\r\n\r\nIt\u0027s probably worth pointing out that during compilation, IREE will\r\nreduce the leading tile size from 8 to 6. So, effectively, the tile\r\nsizes will be [6, 16, 1].",
  "tree_diff": [
    {
      "type": "modify",
      "old_id": "e032dff1ef3f171e0f68f90787ebb335c0f1f9d1",
      "old_mode": 33188,
      "old_path": "compiler/src/iree/compiler/Codegen/LLVMCPU/KernelDispatch.cpp",
      "new_id": "7497c5f18efe6d5f0d1441d3441a8490ac7716c3",
      "new_mode": 33188,
      "new_path": "compiler/src/iree/compiler/Codegen/LLVMCPU/KernelDispatch.cpp"
    },
    {
      "type": "modify",
      "old_id": "2d845427283e2718f94a487814192764f6a24cbc",
      "old_mode": 33188,
      "old_path": "compiler/src/iree/compiler/Codegen/LLVMCPU/test/materialize_aarch64_launch_configuration.mlir",
      "new_id": "1ef583b42f5c9f33f6e418a3465a024af9529c98",
      "new_mode": 33188,
      "new_path": "compiler/src/iree/compiler/Codegen/LLVMCPU/test/materialize_aarch64_launch_configuration.mlir"
    }
  ]
}
