commit	09685ee644d2d5676f9566e3275bca566efe547d	[log] [tgz]
author	bjacob <benoitjacob@google.com>	Sun Jul 16 23:50:11 2023 -0400
committer	GitHub <noreply@github.com>	Sun Jul 16 23:50:11 2023 -0400
tree	33c3357a2b24a6c9fb1ab8c7aa4466fe28de8fee
parent	49335b7ee56d3d3b58b0850a3dbba3e179524739 [diff]

data-tiling: introduce `upper_bound_tile_size` op to defer padding-size choice to MaterializeEncoding. (#14349)

This fixes #11632, by introducing a materializable
`upper_bound_tile_size ` instead of hardcoding a fixed padding amount at
Flow, and fixes it in sufficient generality to also solve the problem
for narrow matmuls - let's explain that in more detail as this is an
important part of what this PR is doing.

For each combination of element types and each target, the
MaterializeEncoding pass selects appropriate matmul tile shapes. Input
tensors get padded to the next multiple of the next tile size. The
padding increases the inherent arithmetic cost of the problem at hand.
When, along some dimension, the original tensor size is smaller than the
tile size, that can result in particulary large overhead. The extreme
case, which is also a very common case, is matrix-times-vector
multiplication. The "vector" right-hand side is really a matrix with one
dimension size equal to 1, so if the general matmul tile shape along
that dimension is 8 or 16, as is usually the case, that can be a 8x or
16x increase in the inherent arithmetic cost of the matmul op.

The solution to that is to adjust MaterializeEncoding tile shapes to
narrow dimensions. We had some logic in place to deal with that, but
#11632 was leaving it moot: the flow-level padding of everything to the
next multiple of 16 meant that our logic there never really had a chance
of kicking in. With #11632 being fixed, this PR was the opportunity to
also fix that along the way, and to ensure that the solution to #11632
worked also in that respect. As matrix-times-vector products were the
common case that suffered the most from #11632, it would have been too
bad to "solve" #11632 without addressing that. By the way,
matrix-times-vector is only the extreme case, but other narrow cases
matter too. When, e.g. on AVX-512, the general matmul tile size is 16,
even width-8 matmuls (MxKx8) were suffering from 2x-widening. So the
solution in this PR is making sure to address all narrow cases, defined
as whenever a tensor dimension size is less than the general tile size.

The difficulty was that when MaterializeEncoding runs on a dispatch
function, it runs on an already-padded tensor; even as this PR
introduces `upper_bound_tile_size`, that only makes it possible to
select the right padding amount, but there's still a `tensor.pad` op and
it's still getting in the way of knowing the actual, original tensor
shape for the purpose of adjusting tile shapes for narrow cases.
Moreover, as `MaterializeEncoding` is a type-converter pass, it can't
just walk from a Value up to its defining-op to find the pre-padding
tensor. There are no values there, only types. So the information about
the pre-padding tensor shape has to be part of the tensor type that
`MaterializeEncoding` sees, that its, the padded tensor type.

The solution to that problem in this PR is to add a `original_type`
field to `EncodingAttr`.

Fixes #11632.

Fixes a compiler issue encountered in #14398 but not the originally
reported runtime crash by itself.

This now also includes the removal of a now-useless VMVX pass, which was
originally split into https://github.com/openxla/iree/pull/14383 .

50 files changed

tree: 33c3357a2b24a6c9fb1ab8c7aa4466fe28de8fee

README.md

IREE: Intermediate Representation Execution Environment

IREE (Intermediate Representation Execution Environment, pronounced as “eerie”) is an MLIR-based end-to-end compiler and runtime that lowers Machine Learning (ML) models to a unified IR that scales up to meet the needs of the datacenter and down to satisfy the constraints and special considerations of mobile and edge deployments.

See our website for project details, user guides, and instructions on building from source.

Project Status

IREE is still in its early phase. We have settled down on the overarching infrastructure and are actively improving various software components as well as project logistics. It is still quite far from ready for everyday use and is made available without any support at the moment. With that said, we welcome any kind of feedback on any communication channels!

Communication Channels

GitHub issues: Feature requests, bugs, and other work tracking
IREE Discord server: Daily development discussions with the core team and collaborators
iree-discuss email list: Announcements, general and low-priority discussion

Related Project Channels

MLIR topic within LLVM Discourse: IREE is enabled by and heavily relies on MLIR. IREE sometimes is referred to in certain MLIR discussions. Useful if you are also interested in MLIR evolution.

Architecture Overview

IREE Architecture

See our website for more information.

Presentations and Talks

2021-06-09: IREE Runtime Design Tech Talk (recording and slides)
2020-08-20: IREE CodeGen: MLIR Open Design Meeting Presentation (recording and slides)
2020-03-18: Interactive HAL IR Walkthrough (recording)
2020-01-31: End-to-end MLIR Workflow in IREE: MLIR Open Design Meeting Presentation (recording and slides)

License

IREE is licensed under the terms of the Apache 2.0 License with LLVM Exceptions. See LICENSE for more information.