commit | 041b4e87c6fc9ec75b1612a68f5c24f4851f3028 | [log] [tgz] |
---|---|---|
author | bjacob <benoitjacob@google.com> | Tue May 30 19:59:26 2023 -0400 |
committer | GitHub <noreply@github.com> | Tue May 30 19:59:26 2023 -0400 |
tree | 2f735669a2315fac47eaf6213cbce5fa5cba265e | |
parent | f356ff26698ef8ad96986836f09f3fa7ade914e5 [diff] |
Separate architecture generic<->specific bitcode (#13825) This is the main PR towards #13804 . `iree_bitcode_library` gains the ability to produce either arch-specific or generic bitcode. We build separately the architecture-specific parts of ukernel code (what's under `ukernel/arch/`) and the generic parts (what's directly in `ukernel/`). Then in the compiler, we unconditionally load the generic bitcode, and architecture-specific bitcode if any is availble for the target architecture. Before you ask: why not just produce N side-by-side, architecture-specific bitcode modules, one per architecture that we care about? We want microkernels to just work, all the time, not be forever stuck in "advanced feature that may cause trouble" limbo. Since lacking a required microkernel is a linker error (unless perhaps you go through the trouble of linking a [plugin](https://github.com/openxla/iree/tree/main/experimental/cpu_ukernel) at runtime), we want to always unconditionally have bitcode for all ukernels for all architectures, even the ones that we don't have really optimized microkernels for yet and just want functional correctness for. That means at least 8 architectures today (`{x86,arm,riscv,wasm}_{32,64}`), probably dozens in the future. So that would be a lot of side-by-side copies. We would start to have to be reluctant to add more ukernels. By contrast, if we can get architecture-generic bitcode to work (as this PR does) then we can have 1 single copy of that architecture-generic bitcode regardless of the number of target architectures supported; and any additional bitcode, architecture-specific bitcode, is proportional to the engineering effort invested in optimizing for each target architecture. So that's why I think architecture-generic bitcode is worth the effort. The central difficulty is that Clang doesn't have any switch allowing to directly produce target-independent bitcode. From Clang's perspective (which IIUC is well summarized by [this answer](https://stackoverflow.com/questions/71868733/how-to-make-target-independent-ir-with-llvm)), target-independence is a property of the source language, and C isn't a target-independent language in general. But ukernels code isn't any C code, it's C code that's carefully written to be target-independent outside of that `arch` subdir: * We don't use target-dependent types (e.g. `ssize_t`) only fixed-width types (e.g. `iree_uk_ssize_t` is `iree_uk_int64_t`, see #13834). * We do use pointers, which are technically target-dependent, but that target-dependence doesn't appear until later down the lowerings: as we are outputting LLVM IR here, pointers are still an opaque `ptr` type. * We don't do `#if` based on target-dependent tokens. Selection of architecture-specific code paths has been reimplemented as strong symbols (in architecture-specific code) overriding weak symbols (in architecture-independent code) in #13715. * We don't `#include` any standard library or system header, so our code is truly self-contained, and that's guarded by the flags we pass Clang when compiling to bitcode. So we are in a special case here, so it's not unreasonable to think that we known better than Clang and try to work past its reluctance to produce target-independent IR. Inspecting the IR produced from compiling our architecture-independent ukernel files showed that the target-dependence in the resulting IR is limited to a few target attributes and a target triple, that have been automatically added but don't seem to play any role. Editing these away made `llc` happy to compile that IR to *another* target architecture. This motivated the approach in this PR: a `strip_target_info.py` script simply drops the target details from LLVM IR. `iree_bitcode_library` gains an `arch=` parameter. When not specified, IR is processed with `strip_target_info.py`. When specified, IR is left unprocessed and the right `-target` flag is passed. Generally, all the copts are automatically set by `iree_bitcode_library` now, though each call site may still override anything as usual (rule copts being appended after).
IREE (Intermediate Representation Execution Environment, pronounced as “eerie”) is an MLIR-based end-to-end compiler and runtime that lowers Machine Learning (ML) models to a unified IR that scales up to meet the needs of the datacenter and down to satisfy the constraints and special considerations of mobile and edge deployments.
See our website for project details, user guides, and instructions on building from source.
IREE is still in its early phase. We have settled down on the overarching infrastructure and are actively improving various software components as well as project logistics. It is still quite far from ready for everyday use and is made available without any support at the moment. With that said, we welcome any kind of feedback on any communication channels!
See our website for more information.
IREE is licensed under the terms of the Apache 2.0 License with LLVM Exceptions. See LICENSE for more information.