blob: 275ab4b38e614d5db70cfed33a44e6a859610193 [file] [log] [blame] [view]
# IREE Buildkite Automation
## Status
We are in the process of migrating our automation infrastructure to
[Buildkite](https://buildkite.com). `samples.yml` and everything under the
`cmake/` directory are legacy pipelines from our previous ad hoc usage. This
document describes everything else, which is still in development and can
generally be ignored for now.
## General Setup
All IREE automation pipelines are moving to Buildkite. The jobs in these
pipelines should avoid duplicating work and instead pass artifacts between
machines. The workhorse of this automation is a collection of x86-64 linux build
machines (initially GCE VMs but probably moving to Kubernetes containers) which
perform all direct and cross compilation that is feasible. In cases where we
want to ensure compilation on a platform, not just for it (and/or
cross-compiling for it on Linux is difficult), we will also run build machines
of that platform (e.g. Windows).
Built artifacts will be farmed out to testing machines (some machines may play
double-duty as build and test), using docker images and emulators where useful.
Bazel builds will use
[Bazel remote caching](https://bazel.build/docs/remote-caching), but not remote
execution which we have found to be prohibitevly expensive to configure and
maintain. CMake builds will initially be uncached, but move to first local
caches (ccache) and then remote caching and/or distributed builds (more research
needed, but e.g. sccache or distcc).
Orchestration agents will take care of light tasks like uploading Buildkite
pipelines and polling for bigger jobs to be finished. These will be run on
minimally sized machines (initially e2-micro GCE VMs, but likely moving to
Kubernetes containers).
## Presubmits
There is a strict separation between machines and caches that act on code
that has been submitted to the IREE repository and code that is coming from
third party forks. Buildkite agents are tagged with `security: "submitted"` or
`security: "presubmit"` to indicate the class of code that they run. Agents
running unsubmitted code may have read access to artifacts (e.g. cache build
cache entries) generated by agents running submitted code, but not write access.
### Vetting Unsubmitted Code
There's no getting around that presubmit testing is remote code execution (even
if there's no confidential information accessible by those remote executors),
and we don't want to give malicious or nuisance actors free compute.
At the same time, we want to have minimal friction for new contributors, and
especially for routine contributors, *indepenedent of what company they work
for*. As a proxy for "real person who actually wants to contribute to the
project" we use having signed the
[Google Contributor License Agreement (CLA)](https://cla.developers.google.com),
which is already required to contribute to the project. This is checked using
already-checked-in scripts and configs before running anything using the code
from a pull request. Because CLA signing can sometimes create a roadblock
(especially in the case of corporate CLAs, which require getting lawyers
involved), if the CLA check fails, a
[block step](https://buildkite.com/docs/pipelines/block-step) is inserted in the
pipeline, allowing members of the IREE Buildkite org to unblock the runs
manually in the meantime. Additionally we block any bad actors that crop up
using Buildkite
[conditional filtering](https://buildkite.com/docs/pipelines/conditionals#conditionals-in-pipelines)
to stop builds from triggering on their PRs at all. IREE Buildkite organization
admins can
[update those options here](https://buildkite.com/iree/presubmit/settings/repository#:~:text=Filter%20builds%20using%20a%20conditional).
The [Presubmit pipeline](https://buildkite.com/iree/presubmit) runs on all PRs
sent to the main repository. The basic flow is that
[presubmit_bootstrap.yml](presubmit_bootstrap.yml) fetches and uploads
[presubmit.yml](presubmit.yml) from the main branch of the repository, which
similarly fetches [check_cla.py](check_cla.py) from the main branch and checks
whether the CLA check has passed on the target commit. If it hasn't, it inserts
a block step that prevents further execution until an authorized person in the
IREE Buildkite organization unblocks. Subsequent steps fetch
[wait_for_pipeline_success.py](wait_for_pipeline_success.py) from the main
branch and use it to trigger and wait for other pipelines to execute on the PR
commit.
## Postsubmits
Since it doesn't have to deal with potentially untrusted code, the
[Postsubmit pipeline](https://buildkite.com/iree/postsubmit) is much simpler. It
triggers on each commit to the `main` branch.
[postsubmit_bootstrap.yml](postsubmit_bootstrap.yml) fetches the main branch and
uploads [postsubmit.yml](postsubmit.yml), which triggers and waits for all the
specified pipelines to complete on the target commit.
## Triggering Multiple Runs
Both the presubmit and postsubmit pipelines are designed to be idempotent.
Triggering them again on the same commit will not trigger any new builds, only
orchestration pipelines. This is one of the reasons we use our own script rather
than a Buildkite trigger step.