[metal] Implement a Metal HAL driver (#12713)
This pull request resurrects the previously removed Metal
HAL driver due to HAL revamping. It now implements the
HAL driver using Objective-C, instead of Objective-C++
like previously. We also explicitly track resource lifetimes
and manage dependencies, following IREE explicitness.
Right now the following features are implemented:
- HAL driver: device emulation, and some feature printing
- HAL device: resource creation, kernel launches, and sync
- HAL allocation: simple allocator without any caching or sub
allocation (we expect common layer for those later)
- HAL command buffer: one-shot command buffer, both
direct and indirect dispatches
- HAL executables: both compile to MSL and `.metallib`
- HAL executable caches: just simple no op cache for now
- HAL builtin executables: for filling/copying buffers
- HAL semaphore: real async queue execute, but not
stream-ordered allocation yet
The implementation right now assumes Metal3 devices.
It passes all existing CTS tests, end-to-end StableHLO
and TOSA op and model tests in tree for Apple M1
MacBook.
Fixes https://github.com/openxla/iree/issues/4370