commit: 4973ef2d4e19e84adf48cdb304924f78fbb141b7
[log]
author: Ben Vanik <ben.vanik@gmail.com>
Wed Nov 29 20:02:10 2023 -0800
committer: GitHub <noreply@github.com>
Thu Nov 30 04:02:10 2023 +0000
tree: 92228f4d0720140d4575771c27bfd6f37dcf680d
parent: 21c80792ed2feebb0162a1fd093cea20dd210306 [diff]

Batching parameter load operations and cleaning up gather/scatter. (#15706)

This makes loads look like gathers/scatters and allows us to move the
(relatively) tricky concurrency scheduling logic to the runtime. A
single load operation can now return any number of parameters with
unique storage buffers (hopefully imported/zero-copy) so long as they
have matching buffer parameters (of which all in general do). The core
logic for scheduling the batched operations has been shared such that
load/gather/scatter are all going down the same path meaning that as we
add new parameter types and optimize scheduling we only have one code
path to tweak. Some minor optimizations have been done to elide batch
overhead but many have been deferred as compared to staging even 10MB of
parameters the current profile is in the noise. The standalone
read/write methods were removed to simplify the compiler<->runtime
interface and implementations of `iree_io_parameter_provider_t` -
currently the only overhead incurred is an additional queue join barrier
that we can optimize away in the future in most cases.

Since load/gather use the same code path now we shouldn't have
correctness issues unique to any particular path and can turn back on
the gather path which has much less overhead in the compiler/vmfb that
otherwise needs to handle independent buffers per parameter. We can
eventually optimize the load path to batch device buffer allocations but
the compiler/vmfb still needs to treat each as independent and we won't
get savings there. The rule is that the unified memory model should only
be used when building a vmfb that targets devices that can do zero-copy
loads from memory mapped files - every other case should use discrete.

Progress on #15521.
Progress on #15522.
Works around several issues in #15674.

24 files changed

tree: 92228f4d0720140d4575771c27bfd6f37dcf680d

README.md

IREE: Intermediate Representation Execution Environment

IREE (Intermediate Representation Execution Environment, pronounced as “eerie”) is an MLIR-based end-to-end compiler and runtime that lowers Machine Learning (ML) models to a unified IR that scales up to meet the needs of the datacenter and down to satisfy the constraints and special considerations of mobile and edge deployments.

See our website for project details, user guides, and instructions on building from source.

Project Status

IREE is still in its early phase. We have settled down on the overarching infrastructure and are actively improving various software components as well as project logistics. It is still quite far from ready for everyday use and is made available without any support at the moment. With that said, we welcome any kind of feedback on any communication channels!

Communication Channels

GitHub issues: Feature requests, bugs, and other work tracking
IREE Discord server: Daily development discussions with the core team and collaborators
iree-discuss email list: Announcements, general and low-priority discussion

Related Project Channels

MLIR topic within LLVM Discourse: IREE is enabled by and heavily relies on MLIR. IREE sometimes is referred to in certain MLIR discussions. Useful if you are also interested in MLIR evolution.

Architecture Overview

IREE Architecture

See our website for more information.

Presentations and Talks

Community meeting recordings: IREE YouTube channel
2021-06-09: IREE Runtime Design Tech Talk (recording and slides)
2020-08-20: IREE CodeGen: MLIR Open Design Meeting Presentation (recording and slides)
2020-03-18: Interactive HAL IR Walkthrough (recording)
2020-01-31: End-to-end MLIR Workflow in IREE: MLIR Open Design Meeting Presentation (recording and slides)

License

IREE is licensed under the terms of the Apache 2.0 License with LLVM Exceptions. See LICENSE for more information.