commit	fce839f0526a87987b7944e76388602e0630ac90	[log] [tgz]
author	Ben Vanik <ben.vanik@gmail.com>	Tue Nov 28 17:12:58 2023 -0800
committer	GitHub <noreply@github.com>	Wed Nov 29 01:12:58 2023 +0000
tree	c7605cea225a64127b133f80048ee823910dc8e4
parent	59297e0333c25ed31e2d6fb899a760191e939302 [diff]

Adding IREE parameter archive format and tooling support. (#15670)

The new format allows us to store parameters for both inference and
training observing the requirements for both efficient CPU and
GPU/accelerator execution. We can also support additional storage types
such as splats allowing for stripped parameter files that work with
programs compiled assuming real parameters. The format has a provision
for referencing external file ranges but support for reading such files
is TBD.

The format can be used like gguf/safetensors and is supported by the
tooling in the same way. Additionally a new `iree-convert-parameters`
tool is added to convert any format supported for loading (gguf,
safetensors, and irpa itself) into irpa files with some control over
which parameters are included, renaming of parameters, and stripping
parameters and replacing them with splat values. This should make it
easy to take any gguf/safetensors file and quickly create stripped
variants for easy reproducers/CI benchmarking without needing to ship
around the original files. The `iree-create-parameters` tool can be used
to create empty archives that are ready for initialization from a
program that mutates them or to create parameter archives with named
parameters when there is no source gguf/safetensors file.

All of this is still using memory-mapped files; this limits our
parameter file sizes on 32-bit systems but I suspect no one is going to
run this tool for large models on 32-bit systems. In the future we can
make the conversion tool use the HAL and schedule out optimized file
I/O. For now we just copy parameters via a normal read/write loop and
it's fastish-enough (pretty much I/O bound, with less optimal reads
because of memory mapping). For me with a cold cache it takes ~1min to
rewrite a 25GB file and 25sec with a hot cache.

This initial commit has the IRPA builder using the new iree_io_stream_t
but switching all format parsers to use it is deferred to future
changes.

Progress on #15521.

runtime/bindings/python/CMakeLists.txt[diff]
runtime/bindings/python/io.cc[diff]
runtime/bindings/python/iree/_runtime/scripts/iree_convert_parameters/__main__.py[Added - diff]
runtime/bindings/python/iree/_runtime/scripts/iree_create_parameters/__main__.py[Added - diff]
runtime/bindings/python/iree/runtime/_package_test.py[diff]
runtime/bindings/python/tests/testdata/generate_parameter_gguf.py[diff]
runtime/setup.py[diff]
runtime/src/iree/base/internal/file_io.c[diff]
runtime/src/iree/base/internal/file_io.h[diff]
runtime/src/iree/io/BUILD.bazel[diff]
runtime/src/iree/io/CMakeLists.txt[diff]
runtime/src/iree/io/file_handle.c[diff]
runtime/src/iree/io/file_handle.h[diff]
runtime/src/iree/io/formats/gguf/BUILD.bazel[diff]
runtime/src/iree/io/formats/gguf/CMakeLists.txt[diff]
runtime/src/iree/io/formats/gguf/gguf_parser.c[Renamed from runtime/src/iree/io/formats/gguf/gguf_format.c - diff]
runtime/src/iree/io/formats/gguf/gguf_parser.h[Renamed from runtime/src/iree/io/formats/gguf/gguf_format.h - diff]
runtime/src/iree/io/formats/gguf/gguf_parser_test.cc[Renamed from runtime/src/iree/io/formats/gguf/gguf_format_test.cc - diff]
runtime/src/iree/io/formats/irpa/BUILD.bazel[Added - diff]
runtime/src/iree/io/formats/irpa/CMakeLists.txt[Added - diff]
runtime/src/iree/io/formats/irpa/irpa_builder.c[Added - diff]
runtime/src/iree/io/formats/irpa/irpa_builder.h[Added - diff]
runtime/src/iree/io/formats/irpa/irpa_parser.c[Added - diff]
runtime/src/iree/io/formats/irpa/irpa_parser.h[Added - diff]
runtime/src/iree/io/formats/irpa/irpa_parser_test.cc[Added - diff]
runtime/src/iree/io/formats/irpa/testdata/BUILD.bazel[Added - diff]
runtime/src/iree/io/formats/irpa/testdata/CMakeLists.txt[Added - diff]
runtime/src/iree/io/formats/irpa/testdata/empty.irpa[Added - diff]
runtime/src/iree/io/formats/irpa/testdata/generate_irpa_files.sh[Added - diff]
runtime/src/iree/io/formats/irpa/testdata/mixed.irpa[Added - diff]
runtime/src/iree/io/formats/irpa/testdata/multiple.irpa[Added - diff]
runtime/src/iree/io/formats/irpa/testdata/single.irpa[Added - diff]
runtime/src/iree/io/formats/safetensors/BUILD.bazel[diff]
runtime/src/iree/io/formats/safetensors/CMakeLists.txt[diff]
runtime/src/iree/io/formats/safetensors/safetensors_parser.c[Renamed from runtime/src/iree/io/formats/safetensors/safetensors_format.c - diff]
runtime/src/iree/io/formats/safetensors/safetensors_parser.h[Renamed from runtime/src/iree/io/formats/safetensors/safetensors_format.h - diff]
runtime/src/iree/io/formats/safetensors/safetensors_parser_test.cc[Renamed from runtime/src/iree/io/formats/safetensors/safetensors_format_test.cc - diff]
runtime/src/iree/io/parameter_index.c[diff]
runtime/src/iree/io/parameter_index.h[diff]
runtime/src/iree/io/parameter_provider.h[diff]
runtime/src/iree/io/scope_map.c[diff]
runtime/src/iree/io/scope_map.h[diff]
runtime/src/iree/io/stream.c[diff]
runtime/src/iree/schemas/BUILD.bazel[diff]
runtime/src/iree/schemas/CMakeLists.txt[diff]
runtime/src/iree/schemas/parameter_archive.h[Added - diff]
runtime/src/iree/tooling/BUILD.bazel[diff]
runtime/src/iree/tooling/CMakeLists.txt[diff]
runtime/src/iree/tooling/parameter_util.c[diff]
runtime/src/iree/tooling/parameter_util.h[diff]
tools/BUILD.bazel[diff]
tools/CMakeLists.txt[diff]
tools/iree-convert-parameters-main.c[Added - diff]
tools/iree-create-parameters-main.c[Added - diff]
tools/iree-dump-parameters-main.c[diff]

55 files changed

tree: c7605cea225a64127b133f80048ee823910dc8e4

README.md

IREE: Intermediate Representation Execution Environment

IREE (Intermediate Representation Execution Environment, pronounced as “eerie”) is an MLIR-based end-to-end compiler and runtime that lowers Machine Learning (ML) models to a unified IR that scales up to meet the needs of the datacenter and down to satisfy the constraints and special considerations of mobile and edge deployments.

See our website for project details, user guides, and instructions on building from source.

Project Status

IREE is still in its early phase. We have settled down on the overarching infrastructure and are actively improving various software components as well as project logistics. It is still quite far from ready for everyday use and is made available without any support at the moment. With that said, we welcome any kind of feedback on any communication channels!

Communication Channels

GitHub issues: Feature requests, bugs, and other work tracking
IREE Discord server: Daily development discussions with the core team and collaborators
iree-discuss email list: Announcements, general and low-priority discussion

Related Project Channels

MLIR topic within LLVM Discourse: IREE is enabled by and heavily relies on MLIR. IREE sometimes is referred to in certain MLIR discussions. Useful if you are also interested in MLIR evolution.

Architecture Overview

IREE Architecture

See our website for more information.

Presentations and Talks

Community meeting recordings: IREE YouTube channel
2021-06-09: IREE Runtime Design Tech Talk (recording and slides)
2020-08-20: IREE CodeGen: MLIR Open Design Meeting Presentation (recording and slides)
2020-03-18: Interactive HAL IR Walkthrough (recording)
2020-01-31: End-to-end MLIR Workflow in IREE: MLIR Open Design Meeting Presentation (recording and slides)

License

IREE is licensed under the terms of the Apache 2.0 License with LLVM Exceptions. See LICENSE for more information.