blob: 68242ceb0a73720ceadb4cfe17e0180b0f53367e [file] [log] [blame] [view]
# Metal HAL Driver
This document lists technical details regarding the Metal HAL driver. Note that
the Metal HAL driver is working in progress; this document is expected to be
updated along the way.
IREE provides a [Hardware Abstraction Layer (HAL)][iree-hal] as a common
interface to different compute accelerators. IREE HAL's design draws inspiration
from modern GPU architecture and APIs; so implementing a HAL driver using modern
GPU APIs is generally straightforward. This applies to the Metal HAL driver.
## Overall Design Choices
### Metal Versions
The Metal HAL driver expects Metal 2+. Metal 2 introduces useful features like
argument buffer, performance shaders, and others, that can improve performance
and make IREE HAL implementation simpler. Metal 2 was released late 2017 and are
supported since macOS High Sierra and iOS 11. It is already dominant
([macOS][macos-version-share], [iOS][ios-version-share]) right now.
### Programming Languages and Libraries
The Metal HAL driver lives under the [`iree/hal/metal/`][iree-metal] directory.
Header (`.h`) and implementation (`.mm`) files are put adjacent to each other.
The Metal framework only exposes Objective-C or Swift programming language APIs.
Metal HAL driver needs to inherit from common HAL abstraction classes, which are
C++. So we use [Objective-C++][objcxx] for implementing the Metal HAL driver.
The headers try to stay with pure C/C++ syntax when possible, except for
`#import <Metal/Metal.h>` and using Metal `id` types.
### Object Lifetime Management
Objective-C uses refcount for tracking object lifetime and managing memory. This
is traditionally done manually by sending `retain` and `release` messages to
Objective-C objects. Modern Objective-C allows developers to opt in to use
[Automatic Reference Counting][objc-arc] to let the compiler to automatically
deduce and insert `retain`/`release` where possible to simplify the burdern of
manual management.
We don't use ARC in the Metal HAL driver given that IREE has its own object
[refcount][iree-refptr] and lifetime management mechanism. Metal HAL GPU objects
are tracked with that to be consistent with others. Each Metal HAL GPU object
`retain`s the underlying Metal `id<MTL*>` object on construction and `release`s
on destruction.
## GPU Objects
Metal is one of the main modern GPU APIs that provide more explicit control over
the hardware. The mapping between IREE HAL classes and Metal protocols are
relatively straightforward:
IREE HAL Class | Metal Protocol
:----------------------------------------: | :------------:
[`hal::Driver`][hal-driver] | N/A
[`hal::Device`][hal-device] | [`MTLDevice`][mtl-device]
[`hal::CommandQueue`][hal-command-queue] | [`MTLCommandQueue`][mtl-command-queue]
[`hal::CommandBuffer`][hal-command-buffer] | [`MTLCommandBuffer`][mtl-command-buffer]
[`hal::Semaphore`][hal-semaphore] | [`MTLSharedEvent`][mtl-shared-event]
[`hal::Allocator`][hal-allocator] | N/A
[`hal::Buffer`][hal-buffer] | [`MTLBuffer`][mtl-buffer]
[`hal::Executable`][hal-executable] | [`MTLLibrary`][mtl-library]
[`hal::ExecutableCache`][hal-executable-cache] | N/A
In the following subsections, we go over each pair to provide more details.
### Driver
There is no native driver abstraction in Metal. IREE's Metal HAL driver still
provides a [`hal::metal::MetalDriver`][metal-driver] subclass inheriting from
common [`hal::Driver`][hal-driver] class. `hal::metal::MetalDriver` just
`retain`s all available Metal devices in the system during its lifetime to
provide similar interface as other HAL drivers.
### Device
[`hal::metal::MetalDevice`][metal-device] inherits [`hal::Device`][hal-device]
to provide the interface to Metal GPU device by wrapping a `id<MTLDevice>`. Upon
construction, `hal::metal::MetalDevice` creates and retains one queue for both
dispatch and transfer during its lifetime.
Metal requres command buffers to be created from a `MTLCommandQueue`. In IREE
HAL, command buffers are directly created from the `hal::Device`.
`hal::metal::MetalDevice` chooses the proper queue to create the command buffer
under the hood.
### Command queue
IREE HAL command queue follows Vulkan for modelling submission. Specifically,
`hal::CommandQueue::Submit()` takes a `SubmissionBatch`, which contains a list
of waiting `hal::Semaphore`s, a list of command buffers, and a list signaling
`hal::Semaphore`s. There is no direct mapping in Metal; so
[`hal::metal::MetalCommandQueue`][metal-command-queue] performs the submission
in three steps:
1. Create a new `MTLCommandBuffer` to `encodeWaitForEvent:value` for all
waiting `hal::Semaphore`s and commit this command buffer.
1. Commit all command buffers in the `SubmissionBatch`.
1. Create a new `MTLCommandBuffer` to `encodeSignalEvent:value` for all
signaling `hal::Semaphore`s and commit this command buffer.
There is also no direct `WaitIdle()` for
[`MTLCommandQueue`][mtl-command-queue]s. `hal::metal::MetalCommandQueue`
implements `WaitIdle()` by committing an empty `MTLCommandBuffer` and
registering a complete handler for it to signal a semaphore to wake the current
thread, which is put into sleep by waiting on the semaphore.
### Command buffer
In Metal, commands are recorded into a command buffer with three different kinds
of [command encoders][mtl-command-encoder]: `MTLRenderCommandEncoder`,
`MTLComputeCommandEncoder`, `MTLBlitCommandEncoder`, and
`MTLParallelRenderCommandEncoder`. Each encoder has its own create/end call.
There is no overall begin/end call for the whold command buffer. So even
[`hal::metal::MetalCommandBuffer`][metal-command-buffer] implements an overall
`Begin()`/`End()` call, under the hood it may create a new command encoder for a
specific API call.
### Timeline semaphore
[`hal::Semaphore`][hal-semaphore] allows host->device, device->host, host->host,
and device->device synchronization. It maps to Vulkan timeline semaphore. In
Metal world, the counterpart would be [`MTLSharedEvent`][mtl-shared-event]. Most
of the `hal::Semaphore` APIs are simple to implement in
[`MetalSharedEvent`][metal-shared-event], with `Wait()` as an exception. A
listener is registered on the `MTLSharedEvent` with
`notifyListener:atValue:block:` to singal a semaphore to wake the current
thread, which is put into sleep by waiting on the semaphore.
### Allocator
At the moment the Metal HAL driver just has a very simple
[`hal::Allocator`][hal-allocator] implementation. It just wraps a `MTLDevice`
and redirects all allocation requests to the `MTLDevice`. No page/pool/slab or
whatever. This is only meant to get started. In the future we should have a
better memory allocation library, probably by layering the
[Vulkan Memory Allocator][vma] on top of [`MTLHeap`][mtl-heap].
### Buffer
IREE [`hal::Buffer`][hal-buffer] maps Metal `MTLBuffer`. See
[Memory Management](#memory-management) for more details.
### Executable
IREE [`hal::Executable`][hal-executable] represents a GPU program archive with
a driver-defined format. It maps naturally to Metal [`MTLLibrary`][mtl-library].
An entry point in a `MTLLibrary` is a [`MTLFunction`][mtl-function]. We define
[`hal::metal::MetalKernelLibrary`][metal-kernel-library] to wrap around a
`MTLLibrary`, its `MTLFunction`s, and also `MTLComputePipelineState` objects
constructed from `MTLFunction`s.
### Executable cache
IREE [`hal::ExecutableCache`][hal-executable-cache] is modelling a cache of
preprared GPU executables for a particular device. At the moment the Metal
HAL driver does not peforming any cache on GPU programs; it simply reads the
program from the FlatBuffer and hands it over to Metal driver.
## Compute Pipeline
### Shader/kernel compilation
Metal has [Metal Shading Language (MSL)][msl-spec] for authoring graphics
shaders and compute kernels. MSL source code can be directly consumed by the
Metal framework at run-time; it can also be compiled first into an opaque
library using [command-line tools][msl-cl-library] at build-time.
IREE uses compilers to compile ML models expressed with high-level op semantics
down to GPU native source format. This is also the case for the Metal HAL
driver. Metal does not provide an open intermediate language; we reuse the
[SPIR-V code generation pipeline][spirv-codegen] and then cross compile the
generated SPIR-V into MSL source with [SPIRV-Cross][spirv-cross]. This is
actually a fair common practice for targeting multiple GPU APIs in graphics
programming world. For example, the Vulkan implmenation in macOS/iOs,
[MoltenVK][moltenvk], is also doing the same for shaders/kernels. The path
is actually quite robust, as demonstrated by various games on top of MoltenVK.
Therefore, in IREE, we have a [`MetalSPIRVTargetBackend`][metal-spirv-target],
which pulls in the normal MHLO to Linalg and Linalg to SPIR-V passes to form
the compilation pipeline. The difference would be to provide a suitable
SPIR-V target environment to drive the compilation, which one can derive from
the Metal GPU families to target. (Not implemented yet; TODO for the future.)
The serialization step differs from
[`VulkanSPIRVTargetBackend`][vulkan-spirv-target] too: following the normal
SPIR-V serialization step, we additionally need to invoke SPRIV-Cross to
cross compile the generated SPIR-V into MSL, and then compile and/or serialize
the MSL source/library.
IREE uses [FlatBuffer][flatbuffer] to encode the whole workload module,
including both GPU shader/kernel (called executable in IREE terminology) and
CPU scheduling logic. The GPU executables are embedded as part of the module's
FlatBuffer, which are [`mmap`][mmap]ped when IREE runs.
For the Metal HAL driver, it means we need to embed the MSL kernels inside the
module FlatBuffer. Right now we just encode the MSL source strings and compile
them at Metal run-time. In the future this should be changed to allow encoding
the library instead.
## Memory Management
### Storage type
Metal provides four [`MTLStorageMode`][mtl-storage-mode] options:
* `MTLStorageModeShared`: The resource is stored in system memory and is
accessible to both the CPU and the GPU.
* `MTLStorageModeManaged`: The CPU and GPU may maintain separate copies of the
resource, and any changes must be explicitly synchronized.
* `MTLStorageModePrivate`: The resource can be accessed only by the GPU.
* `MTLStorageMemoryless`: The resource’s contents can be accessed only by the
GPU and only exist temporarily during a render pass.
Among them, `MTLStorageModeManaged` is only available on macOS.
IREE HAL defines serveral [`MemoryType`][hal-buffer]. They need to map to the
above storage modes:
* If `kDeviceLocal` but not `kHostVisible`, `MTLStorageModePrivate` is chosen.
* If `kDeviceLocal` and `kHostVisible`:
* If macOS, `MTLStorageModeManaged` can be chosen.
* Otherwise, `MTLStorageModeShared` is chosen.
* If not `DeviceLocal` but `kDeviceVisible`, `MTLStorageModeShared` is chosen.
* If not `kDeviceLocal` and not `kDeviceVisible`, `MTLStorageModeShared` is
chosen. (TODO: We should probably use host buffer here.)
IREE HAL also allows to create buffers with `kHostCoherent` bit. This may still
be backed by `MTLStorageModeManaged` `MTLBuffer`s in macOS. To respect the
`kHostCoherent` protocol, the Metal HAL driver will perform necessary
`InValidate`/`Flush` operations automatically under the hood.
[macos-version-share]: https://gs.statcounter.com/macos-version-market-share/desktop/worldwide
[ios-version-share]: https://developer.apple.com/support/app-store/
[iree-hal]: https://github.com/google/iree/tree/main/iree/hal
[iree-metal]: https://github.com/google/iree/tree/main/iree/hal/metal
[iree-refptr]: https://github.com/google/iree/blob/main/iree/base/ref_ptr.h
[hal-allocator]: https://github.com/google/iree/blob/main/iree/hal/allocator.h
[hal-buffer]: https://github.com/google/iree/blob/main/iree/hal/buffer.h
[hal-command-queue]: https://github.com/google/iree/blob/main/iree/hal/command_queue.h
[hal-command-buffer]: https://github.com/google/iree/blob/main/iree/hal/command_buffer.h
[hal-device]: https://github.com/google/iree/blob/main/iree/hal/device.h
[hal-driver]: https://github.com/google/iree/blob/main/iree/hal/driver.h
[hal-executable]: https://github.com/google/iree/blob/main/iree/hal/executable.h
[hal-executable-cache]: https://github.com/google/iree/blob/main/iree/hal/executable_cache.h
[hal-semaphore]: https://github.com/google/iree/blob/main/iree/hal/semaphore.h
[metal-command-queue]: https://github.com/google/iree/blob/main/iree/hal/metal/metal_command_queue.h
[metal-command-buffer]: https://github.com/google/iree/blob/main/iree/hal/metal/metal_command_buffer.h
[metal-device]: https://github.com/google/iree/blob/main/iree/hal/metal/metal_device.h
[metal-driver]: https://github.com/google/iree/blob/main/iree/hal/metal/metal_driver.h
[metal-kernel-library]: https://github.com/google/iree/blob/main/iree/hal/metal/metal_kernel_library.h
[metal-shared-event]: https://github.com/google/iree/blob/main/iree/hal/metal/metal_shared_event.h
[metal-spirv-target]: https://github.com/google/iree/tree/hal-metal/iree/compiler/Dialect/HAL/Target/MetalSPIRV
[mtl-buffer]: https://developer.apple.com/documentation/metal/mtlbuffer?language=objc
[mtl-command-buffer]: https://developer.apple.com/documentation/metal/mtlcommandbuffer?language=objc
[mtl-command-encoder]: https://developer.apple.com/documentation/metal/mtlcommandencoder?language=objc
[mtl-command-queue]: https://developer.apple.com/documentation/metal/mtlcommandqueue?language=objc
[mtl-device]: https://developer.apple.com/documentation/metal/mtldevice?language=objc
[mtl-function]: https://developer.apple.com/documentation/metal/mtlfunction?language=objc
[mtl-heap]: https://developer.apple.com/documentation/metal/mtlheap?language=objc
[mtl-library]: https://developer.apple.com/documentation/metal/mtllibrary?language=objc
[mtl-shared-event]: https://developer.apple.com/documentation/metal/mtlsharedevent?language=objc
[mtl-storage-mode]: https://developer.apple.com/documentation/metal/mtlstoragemode?language=objc
[msl-spec]: https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf
[msl-cl-library]: https://developer.apple.com/documentation/metal/libraries/building_a_library_with_metal_s_command-line_tools?language=objc
[objc-arc]: https://en.wikipedia.org/wiki/Automatic_Reference_Counting
[objcxx]: https://en.wikipedia.org/wiki/Objective-C#Objective-C++
[flatbuffer]: https://google.github.io/flatbuffers/
[mmap]: https://en.wikipedia.org/wiki/Mmap
[moltenvk]: https://github.com/KhronosGroup/MoltenVK
[spirv-codegen]: https://google.github.io/iree/design-docs/codegen-passes
[spirv-cross]: https://github.com/KhronosGroup/SPIRV-Cross
[vma]: https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator
[vulkan-spirv-target]: https://github.com/google/iree/tree/hal-metal/iree/compiler/Dialect/HAL/Target/VulkanSPIRV