Adding IREE parameter archive format and tooling support. (#15670)

The new format allows us to store parameters for both inference and
training observing the requirements for both efficient CPU and
GPU/accelerator execution. We can also support additional storage types
such as splats allowing for stripped parameter files that work with
programs compiled assuming real parameters. The format has a provision
for referencing external file ranges but support for reading such files
is TBD.

The format can be used like gguf/safetensors and is supported by the
tooling in the same way. Additionally a new `iree-convert-parameters`
tool is added to convert any format supported for loading (gguf,
safetensors, and irpa itself) into irpa files with some control over
which parameters are included, renaming of parameters, and stripping
parameters and replacing them with splat values. This should make it
easy to take any gguf/safetensors file and quickly create stripped
variants for easy reproducers/CI benchmarking without needing to ship
around the original files. The `iree-create-parameters` tool can be used
to create empty archives that are ready for initialization from a
program that mutates them or to create parameter archives with named
parameters when there is no source gguf/safetensors file.

All of this is still using memory-mapped files; this limits our
parameter file sizes on 32-bit systems but I suspect no one is going to
run this tool for large models on 32-bit systems. In the future we can
make the conversion tool use the HAL and schedule out optimized file
I/O. For now we just copy parameters via a normal read/write loop and
it's fastish-enough (pretty much I/O bound, with less optimal reads
because of memory mapping). For me with a cold cache it takes ~1min to
rewrite a 25GB file and 25sec with a hot cache.

This initial commit has the IRPA builder using the new iree_io_stream_t
but switching all format parsers to use it is deferred to future
changes.

Progress on #15521.
55 files changed
tree: c7605cea225a64127b133f80048ee823910dc8e4
  1. .devcontainer/
  2. .github/
  3. build_tools/
  4. compiler/
  5. docs/
  6. experimental/
  7. integrations/
  8. lib/
  9. llvm-external-projects/
  10. runtime/
  11. samples/
  12. Testing/
  13. tests/
  14. third_party/
  15. tools/
  16. .bazel_to_cmake.cfg.py
  17. .bazelignore
  18. .bazelrc
  19. .bazelversion
  20. .clang-format
  21. .dockerignore
  22. .git-blame-ignore-revs
  23. .gitignore
  24. .gitmodules
  25. .yamllint.yml
  26. AUTHORS
  27. BUILD.bazel
  28. CITATION.cff
  29. CMakeLists.txt
  30. configure_bazel.py
  31. CONTRIBUTING.md
  32. LICENSE
  33. README.md
  34. WORKSPACE
README.md

IREE: Intermediate Representation Execution Environment

IREE (Intermediate Representation Execution Environment, pronounced as “eerie”) is an MLIR-based end-to-end compiler and runtime that lowers Machine Learning (ML) models to a unified IR that scales up to meet the needs of the datacenter and down to satisfy the constraints and special considerations of mobile and edge deployments.

See our website for project details, user guides, and instructions on building from source.

CI Status

Project Status

IREE is still in its early phase. We have settled down on the overarching infrastructure and are actively improving various software components as well as project logistics. It is still quite far from ready for everyday use and is made available without any support at the moment. With that said, we welcome any kind of feedback on any communication channels!

Communication Channels

Related Project Channels

  • MLIR topic within LLVM Discourse: IREE is enabled by and heavily relies on MLIR. IREE sometimes is referred to in certain MLIR discussions. Useful if you are also interested in MLIR evolution.

Architecture Overview

IREE Architecture IREE Architecture

See our website for more information.

Presentations and Talks

License

IREE is licensed under the terms of the Apache 2.0 License with LLVM Exceptions. See LICENSE for more information.