Batching parameter load operations and cleaning up gather/scatter. (#15706)

This makes loads look like gathers/scatters and allows us to move the
(relatively) tricky concurrency scheduling logic to the runtime. A
single load operation can now return any number of parameters with
unique storage buffers (hopefully imported/zero-copy) so long as they
have matching buffer parameters (of which all in general do). The core
logic for scheduling the batched operations has been shared such that
load/gather/scatter are all going down the same path meaning that as we
add new parameter types and optimize scheduling we only have one code
path to tweak. Some minor optimizations have been done to elide batch
overhead but many have been deferred as compared to staging even 10MB of
parameters the current profile is in the noise. The standalone
read/write methods were removed to simplify the compiler<->runtime
interface and implementations of `iree_io_parameter_provider_t` -
currently the only overhead incurred is an additional queue join barrier
that we can optimize away in the future in most cases.

Since load/gather use the same code path now we shouldn't have
correctness issues unique to any particular path and can turn back on
the gather path which has much less overhead in the compiler/vmfb that
otherwise needs to handle independent buffers per parameter. We can
eventually optimize the load path to batch device buffer allocations but
the compiler/vmfb still needs to treat each as independent and we won't
get savings there. The rule is that the unified memory model should only
be used when building a vmfb that targets devices that can do zero-copy
loads from memory mapped files - every other case should use discrete.

Progress on #15521.
Progress on #15522.
Works around several issues in #15674.
24 files changed
tree: 92228f4d0720140d4575771c27bfd6f37dcf680d
  1. .devcontainer/
  2. .github/
  3. build_tools/
  4. compiler/
  5. docs/
  6. experimental/
  7. integrations/
  8. lib/
  9. llvm-external-projects/
  10. runtime/
  11. samples/
  12. tests/
  13. third_party/
  14. tools/
  15. .bazel_to_cmake.cfg.py
  16. .bazelignore
  17. .bazelrc
  18. .bazelversion
  19. .clang-format
  20. .dockerignore
  21. .git-blame-ignore-revs
  22. .gitignore
  23. .gitmodules
  24. .yamllint.yml
  25. AUTHORS
  26. BUILD.bazel
  27. CITATION.cff
  28. CMakeLists.txt
  29. configure_bazel.py
  30. CONTRIBUTING.md
  31. LICENSE
  32. README.md
  33. WORKSPACE
README.md

IREE: Intermediate Representation Execution Environment

IREE (Intermediate Representation Execution Environment, pronounced as “eerie”) is an MLIR-based end-to-end compiler and runtime that lowers Machine Learning (ML) models to a unified IR that scales up to meet the needs of the datacenter and down to satisfy the constraints and special considerations of mobile and edge deployments.

See our website for project details, user guides, and instructions on building from source.

CI Status

Project Status

IREE is still in its early phase. We have settled down on the overarching infrastructure and are actively improving various software components as well as project logistics. It is still quite far from ready for everyday use and is made available without any support at the moment. With that said, we welcome any kind of feedback on any communication channels!

Communication Channels

Related Project Channels

  • MLIR topic within LLVM Discourse: IREE is enabled by and heavily relies on MLIR. IREE sometimes is referred to in certain MLIR discussions. Useful if you are also interested in MLIR evolution.

Architecture Overview

IREE Architecture IREE Architecture

See our website for more information.

Presentations and Talks

License

IREE is licensed under the terms of the Apache 2.0 License with LLVM Exceptions. See LICENSE for more information.