| # Metal HAL Driver |
| |
| This document lists technical details regarding the Metal HAL driver. Note that |
| the Metal HAL driver is working in progress; this document is expected to be |
| updated along the way. |
| |
| IREE provides a [Hardware Abstraction Layer (HAL)][iree-hal] as a common |
| interface to different compute accelerators. IREE HAL's design draws inspiration |
| from modern GPU architecture and APIs; so implementing a HAL driver using modern |
| GPU APIs is generally straightforward. This applies to the Metal HAL driver. |
| |
| ## Overall Design Choices |
| |
| ### Metal Versions |
| |
| The Metal HAL driver expects Metal 2+. Metal 2 introduces useful features like |
| argument buffer, performance shaders, and others, that can improve performance |
| and make IREE HAL implementation simpler. Metal 2 was released late 2017 and are |
| supported since macOS High Sierra and iOS 11. It is already dominant |
| ([macOS][macos-version-share], [iOS][ios-version-share]) right now. |
| |
| ### Programming Languages and Libraries |
| |
| The Metal HAL driver lives under the [`iree/hal/metal/`][iree-metal] directory. |
| Header (`.h`) and implementation (`.mm`) files are put adjacent to each other. |
| |
| The Metal framework only exposes Objective-C or Swift programming language APIs. |
| Metal HAL driver needs to inherit from common HAL abstraction classes, which are |
| C++. So we use [Objective-C++][objcxx] for implementing the Metal HAL driver. |
| The headers try to stay with pure C/C++ syntax when possible, except for |
| `#import <Metal/Metal.h>` and using Metal `id` types. |
| |
| ### Object Lifetime Management |
| |
| Objective-C uses refcount for tracking object lifetime and managing memory. This |
| is traditionally done manually by sending `retain` and `release` messages to |
| Objective-C objects. Modern Objective-C allows developers to opt in to use |
| [Automatic Reference Counting][objc-arc] to let the compiler to automatically |
| deduce and insert `retain`/`release` where possible to simplify the burdern of |
| manual management. |
| |
| We don't use ARC in the Metal HAL driver given that IREE has its own object |
| [refcount][iree-refptr] and lifetime management mechanism. Metal HAL GPU objects |
| are tracked with that to be consistent with others. Each Metal HAL GPU object |
| `retain`s the underlying Metal `id<MTL*>` object on construction and `release`s |
| on destruction. |
| |
| ## GPU Objects |
| |
| Metal is one of the main modern GPU APIs that provide more explicit control over |
| the hardware. The mapping between IREE HAL classes and Metal protocols are |
| relatively straightforward: |
| |
| IREE HAL Class | Metal Protocol |
| :----------------------------------------: | :------------: |
| [`hal::Driver`][hal-driver] | N/A |
| [`hal::Device`][hal-device] | [`MTLDevice`][mtl-device] |
| [`hal::CommandQueue`][hal-command-queue] | [`MTLCommandQueue`][mtl-command-queue] |
| [`hal::CommandBuffer`][hal-command-buffer] | [`MTLCommandBuffer`][mtl-command-buffer] |
| [`hal::Semaphore`][hal-semaphore] | [`MTLSharedEvent`][mtl-shared-event] |
| [`hal::Allocator`][hal-allocator] | N/A |
| [`hal::Buffer`][hal-buffer] | [`MTLBuffer`][mtl-buffer] |
| [`hal::Executable`][hal-executable] | [`MTLLibrary`][mtl-library] |
| [`hal::ExecutableCache`][hal-executable-cache] | N/A |
| |
| In the following subsections, we go over each pair to provide more details. |
| |
| ### Driver |
| |
| There is no native driver abstraction in Metal. IREE's Metal HAL driver still |
| provides a [`hal::metal::MetalDriver`][metal-driver] subclass inheriting from |
| common [`hal::Driver`][hal-driver] class. `hal::metal::MetalDriver` just |
| `retain`s all available Metal devices in the system during its lifetime to |
| provide similar interface as other HAL drivers. |
| |
| ### Device |
| |
| [`hal::metal::MetalDevice`][metal-device] inherits [`hal::Device`][hal-device] |
| to provide the interface to Metal GPU device by wrapping a `id<MTLDevice>`. Upon |
| construction, `hal::metal::MetalDevice` creates and retains one queue for both |
| dispatch and transfer during its lifetime. |
| |
| Metal requres command buffers to be created from a `MTLCommandQueue`. In IREE |
| HAL, command buffers are directly created from the `hal::Device`. |
| `hal::metal::MetalDevice` chooses the proper queue to create the command buffer |
| under the hood. |
| |
| ### Command queue |
| |
| IREE HAL command queue follows Vulkan for modelling submission. Specifically, |
| `hal::CommandQueue::Submit()` takes a `SubmissionBatch`, which contains a list |
| of waiting `hal::Semaphore`s, a list of command buffers, and a list signaling |
| `hal::Semaphore`s. There is no direct mapping in Metal; so |
| [`hal::metal::MetalCommandQueue`][metal-command-queue] performs the submission |
| in three steps: |
| |
| 1. Create a new `MTLCommandBuffer` to `encodeWaitForEvent:value` for all |
| waiting `hal::Semaphore`s and commit this command buffer. |
| 1. Commit all command buffers in the `SubmissionBatch`. |
| 1. Create a new `MTLCommandBuffer` to `encodeSignalEvent:value` for all |
| signaling `hal::Semaphore`s and commit this command buffer. |
| |
| There is also no direct `WaitIdle()` for |
| [`MTLCommandQueue`][mtl-command-queue]s. `hal::metal::MetalCommandQueue` |
| implements `WaitIdle()` by committing an empty `MTLCommandBuffer` and |
| registering a complete handler for it to signal a semaphore to wake the current |
| thread, which is put into sleep by waiting on the semaphore. |
| |
| ### Command buffer |
| |
| In Metal, commands are recorded into a command buffer with three different kinds |
| of [command encoders][mtl-command-encoder]: `MTLRenderCommandEncoder`, |
| `MTLComputeCommandEncoder`, `MTLBlitCommandEncoder`, and |
| `MTLParallelRenderCommandEncoder`. Each encoder has its own create/end call. |
| There is no overall begin/end call for the whold command buffer. So even |
| [`hal::metal::MetalCommandBuffer`][metal-command-buffer] implements an overall |
| `Begin()`/`End()` call, under the hood it may create a new command encoder for a |
| specific API call. |
| |
| ### Timeline semaphore |
| |
| [`hal::Semaphore`][hal-semaphore] allows host->device, device->host, host->host, |
| and device->device synchronization. It maps to Vulkan timeline semaphore. In |
| Metal world, the counterpart would be [`MTLSharedEvent`][mtl-shared-event]. Most |
| of the `hal::Semaphore` APIs are simple to implement in |
| [`MetalSharedEvent`][metal-shared-event], with `Wait()` as an exception. A |
| listener is registered on the `MTLSharedEvent` with |
| `notifyListener:atValue:block:` to singal a semaphore to wake the current |
| thread, which is put into sleep by waiting on the semaphore. |
| |
| ### Allocator |
| |
| At the moment the Metal HAL driver just has a very simple |
| [`hal::Allocator`][hal-allocator] implementation. It just wraps a `MTLDevice` |
| and redirects all allocation requests to the `MTLDevice`. No page/pool/slab or |
| whatever. This is only meant to get started. In the future we should have a |
| better memory allocation library, probably by layering the |
| [Vulkan Memory Allocator][vma] on top of [`MTLHeap`][mtl-heap]. |
| |
| ### Buffer |
| |
| IREE [`hal::Buffer`][hal-buffer] maps Metal `MTLBuffer`. See |
| [Memory Management](#memory-management) for more details. |
| |
| ### Executable |
| |
| IREE [`hal::Executable`][hal-executable] represents a GPU program archive with |
| a driver-defined format. It maps naturally to Metal [`MTLLibrary`][mtl-library]. |
| An entry point in a `MTLLibrary` is a [`MTLFunction`][mtl-function]. We define |
| [`hal::metal::MetalKernelLibrary`][metal-kernel-library] to wrap around a |
| `MTLLibrary`, its `MTLFunction`s, and also `MTLComputePipelineState` objects |
| constructed from `MTLFunction`s. |
| |
| ### Executable cache |
| |
| IREE [`hal::ExecutableCache`][hal-executable-cache] is modelling a cache of |
| preprared GPU executables for a particular device. At the moment the Metal |
| HAL driver does not peforming any cache on GPU programs; it simply reads the |
| program from the FlatBuffer and hands it over to Metal driver. |
| |
| ## Compute Pipeline |
| |
| ### Shader/kernel compilation |
| |
| Metal has [Metal Shading Language (MSL)][msl-spec] for authoring graphics |
| shaders and compute kernels. MSL source code can be directly consumed by the |
| Metal framework at run-time; it can also be compiled first into an opaque |
| library using [command-line tools][msl-cl-library] at build-time. |
| |
| IREE uses compilers to compile ML models expressed with high-level op semantics |
| down to GPU native source format. This is also the case for the Metal HAL |
| driver. Metal does not provide an open intermediate language; we reuse the |
| [SPIR-V code generation pipeline][spirv-codegen] and then cross compile the |
| generated SPIR-V into MSL source with [SPIRV-Cross][spirv-cross]. This is |
| actually a fair common practice for targeting multiple GPU APIs in graphics |
| programming world. For example, the Vulkan implmenation in macOS/iOs, |
| [MoltenVK][moltenvk], is also doing the same for shaders/kernels. The path |
| is actually quite robust, as demonstrated by various games on top of MoltenVK. |
| |
| Therefore, in IREE, we have a [`MetalSPIRVTargetBackend`][metal-spirv-target], |
| which pulls in the normal MHLO to Linalg and Linalg to SPIR-V passes to form |
| the compilation pipeline. The difference would be to provide a suitable |
| SPIR-V target environment to drive the compilation, which one can derive from |
| the Metal GPU families to target. (Not implemented yet; TODO for the future.) |
| The serialization step differs from |
| [`VulkanSPIRVTargetBackend`][vulkan-spirv-target] too: following the normal |
| SPIR-V serialization step, we additionally need to invoke SPRIV-Cross to |
| cross compile the generated SPIR-V into MSL, and then compile and/or serialize |
| the MSL source/library. |
| |
| IREE uses [FlatBuffer][flatbuffer] to encode the whole workload module, |
| including both GPU shader/kernel (called executable in IREE terminology) and |
| CPU scheduling logic. The GPU executables are embedded as part of the module's |
| FlatBuffer, which are [`mmap`][mmap]ped when IREE runs. |
| |
| For the Metal HAL driver, it means we need to embed the MSL kernels inside the |
| module FlatBuffer. Right now we just encode the MSL source strings and compile |
| them at Metal run-time. In the future this should be changed to allow encoding |
| the library instead. |
| |
| ## Memory Management |
| |
| ### Storage type |
| |
| Metal provides four [`MTLStorageMode`][mtl-storage-mode] options: |
| |
| * `MTLStorageModeShared`: The resource is stored in system memory and is |
| accessible to both the CPU and the GPU. |
| * `MTLStorageModeManaged`: The CPU and GPU may maintain separate copies of the |
| resource, and any changes must be explicitly synchronized. |
| * `MTLStorageModePrivate`: The resource can be accessed only by the GPU. |
| * `MTLStorageMemoryless`: The resource’s contents can be accessed only by the |
| GPU and only exist temporarily during a render pass. |
| |
| Among them, `MTLStorageModeManaged` is only available on macOS. |
| |
| IREE HAL defines serveral [`MemoryType`][hal-buffer]. They need to map to the |
| above storage modes: |
| |
| * If `kDeviceLocal` but not `kHostVisible`, `MTLStorageModePrivate` is chosen. |
| * If `kDeviceLocal` and `kHostVisible`: |
| * If macOS, `MTLStorageModeManaged` can be chosen. |
| * Otherwise, `MTLStorageModeShared` is chosen. |
| * If not `DeviceLocal` but `kDeviceVisible`, `MTLStorageModeShared` is chosen. |
| * If not `kDeviceLocal` and not `kDeviceVisible`, `MTLStorageModeShared` is |
| chosen. (TODO: We should probably use host buffer here.) |
| |
| IREE HAL also allows to create buffers with `kHostCoherent` bit. This may still |
| be backed by `MTLStorageModeManaged` `MTLBuffer`s in macOS. To respect the |
| `kHostCoherent` protocol, the Metal HAL driver will perform necessary |
| `InValidate`/`Flush` operations automatically under the hood. |
| |
| [macos-version-share]: https://gs.statcounter.com/macos-version-market-share/desktop/worldwide |
| [ios-version-share]: https://developer.apple.com/support/app-store/ |
| [iree-hal]: https://github.com/google/iree/tree/main/iree/hal |
| [iree-metal]: https://github.com/google/iree/tree/main/iree/hal/metal |
| [iree-refptr]: https://github.com/google/iree/blob/main/iree/base/ref_ptr.h |
| [hal-allocator]: https://github.com/google/iree/blob/main/iree/hal/allocator.h |
| [hal-buffer]: https://github.com/google/iree/blob/main/iree/hal/buffer.h |
| [hal-command-queue]: https://github.com/google/iree/blob/main/iree/hal/command_queue.h |
| [hal-command-buffer]: https://github.com/google/iree/blob/main/iree/hal/command_buffer.h |
| [hal-device]: https://github.com/google/iree/blob/main/iree/hal/device.h |
| [hal-driver]: https://github.com/google/iree/blob/main/iree/hal/driver.h |
| [hal-executable]: https://github.com/google/iree/blob/main/iree/hal/executable.h |
| [hal-executable-cache]: https://github.com/google/iree/blob/main/iree/hal/executable_cache.h |
| [hal-semaphore]: https://github.com/google/iree/blob/main/iree/hal/semaphore.h |
| [metal-command-queue]: https://github.com/google/iree/blob/main/iree/hal/metal/metal_command_queue.h |
| [metal-command-buffer]: https://github.com/google/iree/blob/main/iree/hal/metal/metal_command_buffer.h |
| [metal-device]: https://github.com/google/iree/blob/main/iree/hal/metal/metal_device.h |
| [metal-driver]: https://github.com/google/iree/blob/main/iree/hal/metal/metal_driver.h |
| [metal-kernel-library]: https://github.com/google/iree/blob/main/iree/hal/metal/metal_kernel_library.h |
| [metal-shared-event]: https://github.com/google/iree/blob/main/iree/hal/metal/metal_shared_event.h |
| [metal-spirv-target]: https://github.com/google/iree/tree/hal-metal/iree/compiler/Dialect/HAL/Target/MetalSPIRV |
| [mtl-buffer]: https://developer.apple.com/documentation/metal/mtlbuffer?language=objc |
| [mtl-command-buffer]: https://developer.apple.com/documentation/metal/mtlcommandbuffer?language=objc |
| [mtl-command-encoder]: https://developer.apple.com/documentation/metal/mtlcommandencoder?language=objc |
| [mtl-command-queue]: https://developer.apple.com/documentation/metal/mtlcommandqueue?language=objc |
| [mtl-device]: https://developer.apple.com/documentation/metal/mtldevice?language=objc |
| [mtl-function]: https://developer.apple.com/documentation/metal/mtlfunction?language=objc |
| [mtl-heap]: https://developer.apple.com/documentation/metal/mtlheap?language=objc |
| [mtl-library]: https://developer.apple.com/documentation/metal/mtllibrary?language=objc |
| [mtl-shared-event]: https://developer.apple.com/documentation/metal/mtlsharedevent?language=objc |
| [mtl-storage-mode]: https://developer.apple.com/documentation/metal/mtlstoragemode?language=objc |
| [msl-spec]: https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf |
| [msl-cl-library]: https://developer.apple.com/documentation/metal/libraries/building_a_library_with_metal_s_command-line_tools?language=objc |
| [objc-arc]: https://en.wikipedia.org/wiki/Automatic_Reference_Counting |
| [objcxx]: https://en.wikipedia.org/wiki/Objective-C#Objective-C++ |
| [flatbuffer]: https://google.github.io/flatbuffers/ |
| [mmap]: https://en.wikipedia.org/wiki/Mmap |
| [moltenvk]: https://github.com/KhronosGroup/MoltenVK |
| [spirv-codegen]: https://google.github.io/iree/design-docs/codegen-passes |
| [spirv-cross]: https://github.com/KhronosGroup/SPIRV-Cross |
| [vma]: https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator |
| [vulkan-spirv-target]: https://github.com/google/iree/tree/hal-metal/iree/compiler/Dialect/HAL/Target/VulkanSPIRV |