commit | fce839f0526a87987b7944e76388602e0630ac90 | [log] [tgz] |
---|---|---|
author | Ben Vanik <ben.vanik@gmail.com> | Tue Nov 28 17:12:58 2023 -0800 |
committer | GitHub <noreply@github.com> | Wed Nov 29 01:12:58 2023 +0000 |
tree | c7605cea225a64127b133f80048ee823910dc8e4 | |
parent | 59297e0333c25ed31e2d6fb899a760191e939302 [diff] |
Adding IREE parameter archive format and tooling support. (#15670) The new format allows us to store parameters for both inference and training observing the requirements for both efficient CPU and GPU/accelerator execution. We can also support additional storage types such as splats allowing for stripped parameter files that work with programs compiled assuming real parameters. The format has a provision for referencing external file ranges but support for reading such files is TBD. The format can be used like gguf/safetensors and is supported by the tooling in the same way. Additionally a new `iree-convert-parameters` tool is added to convert any format supported for loading (gguf, safetensors, and irpa itself) into irpa files with some control over which parameters are included, renaming of parameters, and stripping parameters and replacing them with splat values. This should make it easy to take any gguf/safetensors file and quickly create stripped variants for easy reproducers/CI benchmarking without needing to ship around the original files. The `iree-create-parameters` tool can be used to create empty archives that are ready for initialization from a program that mutates them or to create parameter archives with named parameters when there is no source gguf/safetensors file. All of this is still using memory-mapped files; this limits our parameter file sizes on 32-bit systems but I suspect no one is going to run this tool for large models on 32-bit systems. In the future we can make the conversion tool use the HAL and schedule out optimized file I/O. For now we just copy parameters via a normal read/write loop and it's fastish-enough (pretty much I/O bound, with less optimal reads because of memory mapping). For me with a cold cache it takes ~1min to rewrite a 25GB file and 25sec with a hot cache. This initial commit has the IRPA builder using the new iree_io_stream_t but switching all format parsers to use it is deferred to future changes. Progress on #15521.
IREE (Intermediate Representation Execution Environment, pronounced as “eerie”) is an MLIR-based end-to-end compiler and runtime that lowers Machine Learning (ML) models to a unified IR that scales up to meet the needs of the datacenter and down to satisfy the constraints and special considerations of mobile and edge deployments.
See our website for project details, user guides, and instructions on building from source.
IREE is still in its early phase. We have settled down on the overarching infrastructure and are actively improving various software components as well as project logistics. It is still quite far from ready for everyday use and is made available without any support at the moment. With that said, we welcome any kind of feedback on any communication channels!
See our website for more information.
IREE is licensed under the terms of the Apache 2.0 License with LLVM Exceptions. See LICENSE for more information.