commit	041b4e87c6fc9ec75b1612a68f5c24f4851f3028	[log] [tgz]
author	bjacob <benoitjacob@google.com>	Tue May 30 19:59:26 2023 -0400
committer	GitHub <noreply@github.com>	Tue May 30 19:59:26 2023 -0400
tree	2f735669a2315fac47eaf6213cbce5fa5cba265e
parent	f356ff26698ef8ad96986836f09f3fa7ade914e5 [diff]

Separate architecture generic<->specific bitcode (#13825)

This is the main PR towards #13804 . `iree_bitcode_library` gains the
ability to produce either arch-specific or generic bitcode. We build
separately the architecture-specific parts of ukernel code (what's under
`ukernel/arch/`) and the generic parts (what's directly in `ukernel/`).
Then in the compiler, we unconditionally load the generic bitcode, and
architecture-specific bitcode if any is availble for the target
architecture.

Before you ask: why not just produce N side-by-side,
architecture-specific bitcode modules, one per architecture that we care
about? We want microkernels to just work, all the time, not be forever
stuck in "advanced feature that may cause trouble" limbo. Since lacking
a required microkernel is a linker error (unless perhaps you go through
the trouble of linking a
[plugin](https://github.com/openxla/iree/tree/main/experimental/cpu_ukernel)
at runtime), we want to always unconditionally have bitcode for all
ukernels for all architectures, even the ones that we don't have really
optimized microkernels for yet and just want functional correctness for.
That means at least 8 architectures today
(`{x86,arm,riscv,wasm}_{32,64}`), probably dozens in the future. So that
would be a lot of side-by-side copies. We would start to have to be
reluctant to add more ukernels. By contrast, if we can get
architecture-generic bitcode to work (as this PR does) then we can have
1 single copy of that architecture-generic bitcode regardless of the
number of target architectures supported; and any additional bitcode,
architecture-specific bitcode, is proportional to the engineering effort
invested in optimizing for each target architecture.

So that's why I think architecture-generic bitcode is worth the effort.

The central difficulty is that Clang doesn't have any switch allowing to
directly produce target-independent bitcode.

From Clang's perspective (which IIUC is well summarized by [this
answer](https://stackoverflow.com/questions/71868733/how-to-make-target-independent-ir-with-llvm)),
target-independence is a property of the source language, and C isn't a
target-independent language in general.

But ukernels code isn't any C code, it's C code that's carefully written
to be target-independent outside of that `arch` subdir:
* We don't use target-dependent types (e.g. `ssize_t`) only fixed-width
types (e.g. `iree_uk_ssize_t` is `iree_uk_int64_t`, see #13834).
* We do use pointers, which are technically target-dependent, but that
target-dependence doesn't appear until later down the lowerings: as we
are outputting LLVM IR here, pointers are still an opaque `ptr` type.
* We don't do `#if` based on target-dependent tokens. Selection of
architecture-specific code paths has been reimplemented as strong
symbols (in architecture-specific code) overriding weak symbols (in
architecture-independent code) in #13715.
* We don't `#include` any standard library or system header, so our code
is truly self-contained, and that's guarded by the flags we pass Clang
when compiling to bitcode.

So we are in a special case here, so it's not unreasonable to think that
we known better than Clang and try to work past its reluctance to
produce target-independent IR.

Inspecting the IR produced from compiling our architecture-independent
ukernel files showed that the target-dependence in the resulting IR is
limited to a few target attributes and a target triple, that have been
automatically added but don't seem to play any role. Editing these away
made `llc` happy to compile that IR to *another* target architecture.

This motivated the approach in this PR: a `strip_target_info.py` script
simply drops the target details from LLVM IR.

`iree_bitcode_library` gains an `arch=` parameter. When not specified,
IR is processed with `strip_target_info.py`. When specified, IR is left
unprocessed and the right `-target` flag is passed. Generally, all the
copts are automatically set by `iree_bitcode_library` now, though each
call site may still override anything as usual (rule copts being
appended after).

19 files changed

tree: 2f735669a2315fac47eaf6213cbce5fa5cba265e

README.md

IREE: Intermediate Representation Execution Environment

IREE (Intermediate Representation Execution Environment, pronounced as “eerie”) is an MLIR-based end-to-end compiler and runtime that lowers Machine Learning (ML) models to a unified IR that scales up to meet the needs of the datacenter and down to satisfy the constraints and special considerations of mobile and edge deployments.

See our website for project details, user guides, and instructions on building from source.

Project Status

IREE is still in its early phase. We have settled down on the overarching infrastructure and are actively improving various software components as well as project logistics. It is still quite far from ready for everyday use and is made available without any support at the moment. With that said, we welcome any kind of feedback on any communication channels!

Communication Channels

GitHub issues: Feature requests, bugs, and other work tracking
IREE Discord server: Daily development discussions with the core team and collaborators
iree-discuss email list: Announcements, general and low-priority discussion

Related Project Channels

MLIR topic within LLVM Discourse: IREE is enabled by and heavily relies on MLIR. IREE sometimes is referred to in certain MLIR discussions. Useful if you are also interested in MLIR evolution.

Architecture Overview

IREE Architecture

See our website for more information.

Presentations and Talks

2021-06-09: IREE Runtime Design Tech Talk (recording and slides)
2020-08-20: IREE CodeGen: MLIR Open Design Meeting Presentation (recording and slides)
2020-03-18: Interactive HAL IR Walkthrough (recording)
2020-01-31: End-to-end MLIR Workflow in IREE: MLIR Open Design Meeting Presentation (recording and slides)

License

IREE is licensed under the terms of the Apache 2.0 License with LLVM Exceptions. See LICENSE for more information.