tree 4e7cd0d6a4030b6120d1ef995c540403565cb0e7
parent 53daa9513bbc9c0dbbc1300a9337f263d151a0cb
author Ben Vanik <ben.vanik@gmail.com> 1755790216 -0700
committer GitHub <noreply@github.com> 1755790216 +0000
gpgsig -----BEGIN PGP SIGNATURE-----
 
 wsFcBAABCAAQBQJopzuICRC1aQ7uu5UhlAAAaBEQAHMSu5oYQTU+nNHd6QpIbiHB
 8iL75ox6tmQ60iRDP3E9igqY49r5GFjahk6ERCpdWSg6Y7PPx9vj1HkHeEnmTsoi
 3w143BlDoJRJTZcV1JUIvpMZQdoxH9GPMMq1q2v1iXDaiqa5CabegRKN3TScds9p
 ssApq/vo7w4V8XpTnsuSeM3hkVyavnetBdxz9e1f35hpGB58m33e5iV1DW/h2h+q
 oc5Q4PbTI+qbvIniv6isQ/kf2XATxSjrXG5RJyLvEmoVcFT0POGRKgyvG8gf9cV5
 aBi0HYqEtNtUMuxVXFUBEh1WizazSdmgjXyXrUpZ+8pXg9rLa4lkpFHbzfgurZe9
 tAUAiOF01xur31XpGNXYdQNwqMaW0ljuEMJXqhh72Q/laUlXBXN1UX9HO4vFSTWA
 uaXvU8RjLIesjmrwNHIdo3V8tZDnW8ojhfQfio2UYWpIO4RFa7ptM48yZus73GZd
 RAoMUIZuK3TJgwsvK4AAOH0eZKyc/p1eFciMib1LpCHO+s9iCPIxmkFwipOOEwMN
 AlJP62/UYbllwCY8fxAKV9bblrrCnAjA1IgkYD+P9TOC3G8+TM7J57eXS99Jt62x
 c7GlfKQOz6g9gw3ZX4f/BKyFgfKPSBK804QdiOsqG/NSGhRwJaya2w+KeqDxSJPN
 20aP9ZGbKnaQgmYbmKV1
 =hJCU
 -----END PGP SIGNATURE-----
 

Adding iree_hal_device_queue_host_call and emulation.  (#21653)

This allows for both blocking and non-blocking device->host calls.

Emulation is provided targets that aren't yet using their native
features (CUDA/HIP/Metal) or don't have them (Vulkan), but it should
never be used once we start relying on this for programs as the
performance is terrible. The CPU sync and task implementations are done
here as the emulation is incompatible with sync semantics and it's
possible to implement it on the task system fairly easily.

I split out the existing queue emulation utilities out of the device.c
so high-fidelity backends can eventually not even link that code in. I
added the host call emulation in its own target so we can avoid
introducing threading dependencies into `iree::hal` for the emulation
(and makes it clearer what's part of the API vs what's an implementation
detail).

I suspect there may be some HIP flakes and we can disable the CTS
there/file issues if they pop up. I think there's a few cases in the HIP
semaphore that don't quite work but it'd be better to improve the
semaphore tests first. HIP is using an emulated host call here and
that's pretty much just a thread and some semaphores and should be
possible to test independent of the host call logic.

Fixes #21631.