tree a31c75e4087d7e61934c010fa941961ce44ee79d
parent aa28b4a786d137716866239e3b98d3a2631782e3
author MaheshRavishankar <1663364+MaheshRavishankar@users.noreply.github.com> 1684518882 -0700
committer GitHub <noreply@github.com> 1684518882 -0400
gpgsig -----BEGIN PGP SIGNATURE-----
 
 wsBcBAABCAAQBQJkZ7fiCRBK7hj4Ov3rIwAAuDkIAA6iYckqNyQ0cvqn77MYhhZ5
 4kdJ+zuZ+8jqVG5lFRXBTKF5iJd4eCiErcsAsPWEKx9iljUQcGjk3mlkJcI2JoWf
 MkuojmDVR2+bwN91OCL5/aGcF7yRFqzG4VZwh8MGNuxr+jUH28hoPKUS+fWk+C7M
 KcV+T2RVV5v5ay8+mgzX6e9YCSTuDGHh6ECtD+zQAromovV5fix9s6K7Y82S/wp5
 ZwFsW6s7IXY29EYfQ68ktjPazxTkAiqGKa9z0Nb6EKkyp1V6ZNEROHv1URAm+cdR
 mUBLCiMWDqzzK1pwAZK93VXEjBPfjHZYHPsJqEhSOUzCgYrj+CKb4i606lV/rTE=
 =CBNR
 -----END PGP SIGNATURE-----
 

CPU ukernels as bitcode (x86-only for now) (#13460)

While #13433 enabled the use of micro kernels within codegen backends
using the plugin mechanism, here the ukernel code is compiled into a
bitcode library. This bitcode library is linked with the generated code
at compilation time. The lowering to LLVM inlines the exact micro kernel
needed from the micro kernel library for a particular architecture. To
enable this end-to-end flow, the following changes are needed

- Add an enum attribute to HAL : `IREE::HAL::CallingConvention` that
allows specifying what calling convention to use for the micro kernel
call. `Default` leaves the params as is, `ParameterStruct` packs all the
returns and arguments into a parameter struct to mimic ABIs like
https://github.com/openxla/iree/blob/6cf092d022810d4347353b23e5ce2688a166dd67/runtime/src/iree/builtins/ukernel/mmt4d.h#L16
- Couple of patterns are added to `ConvertToLLVM` pass, to handle the
lowering of the function definition and function call, in keeping with
the specified ABI
- Allow specification of `hal.import.fields` to specify `processor_data`
and `processor_id` on ukernel function defn. This then generates the
code to forward this information to the microkernels (similar to what is
done for external calls using the plugin mechanism)
- Propagate the target CPU features in `hal.executable.target` of the
dispatch into the micro kernel call. This allows the LLVM passes to walk
through the branching used to pick the right micro kernel function and
effectively inline that.

Co-authored-by: Benoit Jacob <benoitjacob@google.com>