commit | c4d76f278d52485bc840e2effeaf865d98c36254 | [log] [tgz] |
---|---|---|
author | Ben Vanik <ben.vanik@gmail.com> | Fri Aug 18 12:06:17 2023 -0700 |
committer | GitHub <noreply@github.com> | Fri Aug 18 12:06:17 2023 -0700 |
tree | 79e6ab11af4defc4ea61c5dde3ddaf79454eb828 | |
parent | 73ddcce258c9be653be0997b25bef1dcb3d0ddbd [diff] |
Adding Vulkan sparse binding buffer support for native allocations. (#14536) This creates one logical VkBuffer that is backed by as many aligned max-size allocations as required. There's a lot we could tweak here and a lot to optimize but the initial proof of concept here is specifically for allowing large constant/variable buffers with long lifetimes. Most implementations don't allow using these buffers with dispatches, though, due to embarrassingly and arbitrarily small limits on shader storage buffer access ranges. We'll need device pointers to actually use these but at least we can allocate them now. Future changes will add asynchronous binding and sparse residency as part of the HAL API so that targets supporting constrained virtual memory management (CPU, CUDA, Vulkan, etc) can have such virtual/physical remapping exposed for use by the compiler. When that's implemented the sparse buffer type here will be reworked as a shared utility implementation using the binding/sparse residency APIs. In order for this to be used for large constants host allocation importing was implemented so that the buffers can be transferred. This required a change in the HAL APIs exposed to the compiler as what was there was a hack to approximate the proper import/mapping path but insufficient for doing it properly. This has been tested with imports of up to 15GB (and should work beyond that, device memory allowing). On discrete systems when the module is mmapped we can't import and stage in chunks:  If not mmapped we can import the host pointer as a staging source and avoid the chunk allocation:  On unified memory systems we can (sometimes) directly use the host buffer and avoid all allocations:  Progress on #14607. Fixes #7242.
IREE (Intermediate Representation Execution Environment, pronounced as “eerie”) is an MLIR-based end-to-end compiler and runtime that lowers Machine Learning (ML) models to a unified IR that scales up to meet the needs of the datacenter and down to satisfy the constraints and special considerations of mobile and edge deployments.
See our website for project details, user guides, and instructions on building from source.
IREE is still in its early phase. We have settled down on the overarching infrastructure and are actively improving various software components as well as project logistics. It is still quite far from ready for everyday use and is made available without any support at the moment. With that said, we welcome any kind of feedback on any communication channels!
See our website for more information.
IREE is licensed under the terms of the Apache 2.0 License with LLVM Exceptions. See LICENSE for more information.