tree 1bae5c92a92da98f1f10ee590c00de096b451783
parent baeffa7520cccb1387561dd1763b837cb99498df
author Lei Zhang <antiagainst@gmail.com> 1709007675 -0800
committer GitHub <noreply@github.com> 1709007675 +0000
gpgsig -----BEGIN PGP SIGNATURE-----
 
 wsFcBAABCAAQBQJl3WM7CRC1aQ7uu5UhlAAA1nsQAEHjv3P/8pDzGMSwSNPOmF3U
 q2/VzwHnzunpm5sk0he/Dv5aS8bqjoqbG2qPEjcOcLlL+Dv/VqawvjgmvwkJ5mO0
 DbiwGc75PnXM7wDMJM4ZHteoUue5l9qfsKH6sQq6ClqtRqFwvPv3oOqsLcRjQmtF
 5Hw8gHRD5lzDsKqvLJ5tTNLqAAT4pYzAQOSB+abNNwEhvWC7+f1ez0YPNsIbQdPX
 pb1QzgyP+mUXorzJZntedOWSDEjR8qGw6nAD648+jXMo0JZGHiDnphJ32TleY5d/
 vqdkBsqgUoyHe5KAO8VXN32DICYje+yIjqZ2kkZIg523ty5CNBw2TumzpMZFlPVT
 GEM3u+UJm8zmTELwvAGo9k6873O/+wpWTrX6irJWobEO0m6W1ip36d8G5pO79wLG
 r1aqpq6k3yTgTonQiwNhksmEM/WxX4/NpJypl6YpRAZXLA3AJXF6wGBvK6jAxm5l
 9GK/5vZMWOtT8FCvf/paJn/mEq4N558S0ADNH5sBKe1LhnGo6QLzEr0KiRoKFCnO
 TDUOck6+Jz4djCoaw0CFBbuWA1qZGcf7kO1SARWNJi7K8dvY7QmMniM8Ka4mPRyr
 N3qJ5glILU5tL24FMEe4IQ4kP9xF/B/WcL1OhR9JTuPNmWZfLfxTHoLQU+sDG7iJ
 5vkLhEVD+GGbjwVGb9w1
 =vIau
 -----END PGP SIGNATURE-----
 

[cuda][hip] Fix launch host func and worker thread state update (#16568)

This commits fixes a few issues in pending action queue to
resolve driver deadlock issues:

* In host launch func, which is called from a driver thread,
  we cannot invoke any GPU API. Otherwise we might see
  deadlock. This includes cleaning up the actions after
  execution--it may involve buffer releasing/unregistering
  which was the issue causing hip driver hang. Now move
  this cleanup into the worker thread. This is done by adding
  a state field to each action to indicate whether it's alive
  or zombie. We enqueue each action again after done
  execution by flipping its state to zombie to let the worker
  thread to cleanup.
* The worker thread can have five states--two normal states
  (idle waiting or workload pending), three exit states (requested,
  committed, error). They have increasing priorities w.r.t.
  overwriting. We cannot overwrite state later in the list
  without checking. This guarantees that exit requests are
  properly respected and not dropping to the floor so to have
  clean exit.
* When the worker thread is waken to process ready list, we
  need to immediately flip the worker state from workload
  pending to idle waiting, before any real processing. This
  makes sure we don't drop new workload enqueued while
  we are processing, and the worker thread can be waken
  up again properly later.

With the above fixes, we can pass all stablehlo/tosa e2e op
tests on hip driver without hang or crashes. The same
change is mirrored to the cuda pending action queue.

Fixes https://github.com/openxla/iree/issues/15790
Progress towards https://github.com/openxla/iree/issues/16504