commit | 09685ee644d2d5676f9566e3275bca566efe547d | [log] [tgz] |
---|---|---|
author | bjacob <benoitjacob@google.com> | Sun Jul 16 23:50:11 2023 -0400 |
committer | GitHub <noreply@github.com> | Sun Jul 16 23:50:11 2023 -0400 |
tree | 33c3357a2b24a6c9fb1ab8c7aa4466fe28de8fee | |
parent | 49335b7ee56d3d3b58b0850a3dbba3e179524739 [diff] |
data-tiling: introduce `upper_bound_tile_size` op to defer padding-size choice to MaterializeEncoding. (#14349) This fixes #11632, by introducing a materializable `upper_bound_tile_size ` instead of hardcoding a fixed padding amount at Flow, and fixes it in sufficient generality to also solve the problem for narrow matmuls - let's explain that in more detail as this is an important part of what this PR is doing. For each combination of element types and each target, the MaterializeEncoding pass selects appropriate matmul tile shapes. Input tensors get padded to the next multiple of the next tile size. The padding increases the inherent arithmetic cost of the problem at hand. When, along some dimension, the original tensor size is smaller than the tile size, that can result in particulary large overhead. The extreme case, which is also a very common case, is matrix-times-vector multiplication. The "vector" right-hand side is really a matrix with one dimension size equal to 1, so if the general matmul tile shape along that dimension is 8 or 16, as is usually the case, that can be a 8x or 16x increase in the inherent arithmetic cost of the matmul op. The solution to that is to adjust MaterializeEncoding tile shapes to narrow dimensions. We had some logic in place to deal with that, but #11632 was leaving it moot: the flow-level padding of everything to the next multiple of 16 meant that our logic there never really had a chance of kicking in. With #11632 being fixed, this PR was the opportunity to also fix that along the way, and to ensure that the solution to #11632 worked also in that respect. As matrix-times-vector products were the common case that suffered the most from #11632, it would have been too bad to "solve" #11632 without addressing that. By the way, matrix-times-vector is only the extreme case, but other narrow cases matter too. When, e.g. on AVX-512, the general matmul tile size is 16, even width-8 matmuls (MxKx8) were suffering from 2x-widening. So the solution in this PR is making sure to address all narrow cases, defined as whenever a tensor dimension size is less than the general tile size. The difficulty was that when MaterializeEncoding runs on a dispatch function, it runs on an already-padded tensor; even as this PR introduces `upper_bound_tile_size`, that only makes it possible to select the right padding amount, but there's still a `tensor.pad` op and it's still getting in the way of knowing the actual, original tensor shape for the purpose of adjusting tile shapes for narrow cases. Moreover, as `MaterializeEncoding` is a type-converter pass, it can't just walk from a Value up to its defining-op to find the pre-padding tensor. There are no values there, only types. So the information about the pre-padding tensor shape has to be part of the tensor type that `MaterializeEncoding` sees, that its, the padded tensor type. The solution to that problem in this PR is to add a `original_type` field to `EncodingAttr`. Fixes #11632. Fixes a compiler issue encountered in #14398 but not the originally reported runtime crash by itself. This now also includes the removal of a now-useless VMVX pass, which was originally split into https://github.com/openxla/iree/pull/14383 .
IREE (Intermediate Representation Execution Environment, pronounced as “eerie”) is an MLIR-based end-to-end compiler and runtime that lowers Machine Learning (ML) models to a unified IR that scales up to meet the needs of the datacenter and down to satisfy the constraints and special considerations of mobile and edge deployments.
See our website for project details, user guides, and instructions on building from source.
IREE is still in its early phase. We have settled down on the overarching infrastructure and are actively improving various software components as well as project logistics. It is still quite far from ready for everyday use and is made available without any support at the moment. With that said, we welcome any kind of feedback on any communication channels!
See our website for more information.
IREE is licensed under the terms of the Apache 2.0 License with LLVM Exceptions. See LICENSE for more information.