blob: 86bcf9cfed4738b85534858d8bd031105799beb3 [file] [log] [blame] [view]
# IREE Buildkite Automation
## Status
We are in the process of migrating our automation infrastructure to
[Buildkite](https://buildkite.com). `samples.yml` and everything under the
`cmake/` directory are legacy pipelines from our previous ad hoc usage. This
document describes everything else, which is still in development and can
generally be ignored for now.
## General Setup
All IREE automation pipelines are moving to Buildkite. The jobs in these
pipelines should avoid duplicating work and instead pass artifacts between
machines. The workhorse of this automation is a collection of x86-64 linux build
machines (initially GCE VMs but probably moving to Kubernetes containers) which
perform all direct and cross compilation that is feasible. In cases where we
want to ensure compilation on a platform, not just for it (and/or
cross-compiling for it on Linux is difficult), we will also run build machines
of that platform (e.g. Windows).
Built artifacts will be farmed out to testing machines (some machines may play
double-duty as build and test), using docker images and emulators where useful.
Bazel builds will use
[Bazel remote caching](https://bazel.build/docs/remote-caching), but not remote
execution which we have found to be prohibitevly expensive to configure and
maintain. CMake builds will initially be uncached, but move to first local
caches (ccache) and then remote caching and/or distributed builds (more research
needed, but e.g. sccache or distcc).
Orchestration agents will take care of light tasks like uploading Buildkite
pipelines and polling for bigger jobs to be finished. These will be run on
minimally sized machines (initially e2-micro GCE VMs, but likely moving to
Kubernetes containers).
## Security
There is a strict separation between machines and caches that are used for
releases and automation, which are limited to running code that has been
submitted to the IREE repository, and those that are used to run code coming
from third party forks. Buildkite agents are tagged with `security: "trusted"`
or `security: "untrusted"` to indicate the class of code that they run. Agents
running unsubmitted code may have read access to artifacts (e.g. cache build
cache entries) generated by trusted agents.
### Vetting Unsubmitted Code
There's no getting around that presubmit testing is remote code execution (even
if there's no confidential information accessible by those remote executors),
and we don't want to give malicious or nuisance actors free compute. At the same
time, we want to have minimal friction for new contributors, and especially for
routine contributors, *indepenedent of what company they work for*. As a proxy
for "real person who actually wants to contribute to the project" we use having
signed the
[Google Contributor License Agreement (CLA)](https://cla.developers.google.com),
which is already required to contribute to the project. This is checked on
trusted runners using configs and scripts that have already been submitted to
the repository before running anything using the code from a pull request.
Because CLA signing can sometimes create a roadblock (especially in the case of
corporate CLAs, which require getting lawyers involved), if the CLA check fails,
a [block step](https://buildkite.com/docs/pipelines/block-step) is inserted in
the pipeline, allowing members of the IREE Buildkite org to unblock the runs
manually in the meantime. Additionally we block any bad actors that crop up
using Buildkite
[conditional filtering](https://buildkite.com/docs/pipelines/conditionals#conditionals-in-pipelines)
to stop builds from triggering on their PRs at all. IREE Buildkite organization
admins can
[update those options here](https://buildkite.com/iree/presubmit/settings/repository#:~:text=Filter%20builds%20using%20a%20conditional).
### Bootstrapping
We want to enable testing changes to the pipeline configurations themselves on a
PR before it is submitted. To enable this, we don't register the pipeline
configurations themselves with Buildkite. Instead, we register one of
[bootstrap-trusted.yml](pipelines/fragment/bootstrap-trusted.yml) or
[bootstrap-untrusted.yml](pipelines/fragment/bootstrap-untrusted.yml) (depending
on whether they are in the `trusted/` or `untrusted/` directory) for each of our
pipelines. This registration happens as one of the jobs on the postsubmit
pipeline and is performed for each pipeline configuration file in the pipeline
directories. These bootstrap configurations just upload the relevant pipeline
configuration file. But then we have no way to test changes to the bootstrap
pipeline configurations themselves on presubmit. Although we do not expect them
to change frequently, this is still undesirable. To get around this infinite
regress, the bootstrap pipeline configurations first upload the newest version
of themselves from the target commit, setting an environment variable to prevent
further self-upload. Only then do they upload the target pipeline.
## Presubmits
The [Presubmit pipeline](https://buildkite.com/iree/presubmit) runs on all PRs
sent to the main repository.
If the PR is from a fork, bootstrapping checks out the `main` branch and uploads
the [presubmit.yml](pipelines/trusted/presubmit.yml) pipeline configuration from
there. This pipeline checks everything out from the `main` branch. It runs
[check_cla.py](scripts/check_cla.py) to determine if the CLA check has passed on
the target commit. If it hasn't, it inserts a block step that prevents further
execution until an authorized person in the IREE Buildkite organization
unblocks. Subsequent steps use
[wait_for_pipeline_success.py](scripts/wait_for_pipeline_success.py) to trigger
and wait for other pipelines to execute on the PR commit.
It is important that the CLA check is performed on trusted runners and using
code from the main repository. This makes it difficult to test changes to the
presubmit pipeline configuration, however. To enable this, if the PR is coming
from another branch on the main repository, bootstrapping instead checks out the
commit on which it was triggered and instructs the presubmit pipeline to do the
same.
## Pipeline Files
The `pipelines/` directory contains Buildkite
[pipeline configuration yaml files](https://buildkite.com/docs/pipelines). They
are picked up automatically by Buildkite using automation in the postsubmit
pipeline. They are organized into three categories.
### Trusted
Pipelines that are allowed to run steps on trusted agents that have elevated
privileges. This is enforced by a pre-bootstrap hook deployed to these agents.
These agents should only run using configurations coming from trusted sources,
which right now means only commits that have been checked into the main IREE
repository.
### Untrusted
Pipelines that are only allowed to run steps on untrusted agents. The only risk
posed by access to these agents must be an ability to use their compute and
that of other untrusted agents and write to resources (including caches) used
only by other untrusted agents. They should not have access to any sensitive
data or the ability to delete or corrupt unrecoverable artifacts. We should be
able to turn down all untrusted agents with the only consequence being a
temporary delay in presubmit builds.
### Fragment
Buildkite pipelines configurations that are intended to be used dynamically or
inserted into other pipelines with the `buildkite-agent pipeline upload`
command. These are not registered with Buildkite.
## Creating a Pipeline
Creating a new pipeline is [almost](#making-pipeline-public) as simple as
sending a PR containing a new pipeline yaml configuration. Mostly pipelines
should be created as "untrusted" and invoked with `wait_for_pipeline_success.py`
from the presubmit and/or postsubmit pipelines.
### Unregistered Pipelines
When first introducing a new Buildkite pipeline, it will not be registered with
Buildkite yet. The presubmit pipeline therefore cannot trigger a run of this
pipeline. To enable testing of new pipelines, we have a special
[unregistered pipeline](https://buildkite.com/iree/unregistered). This pipeline
just uploads another pipeline file based on an environment variable. When a
pipeline with the given name doesn't exist,
[wait_for_pipeline_success.py](scripts/wait_for_pipeline_success.py) instead
invokes the "unregistered" pipeline, which runs the given pipeline
configuration. This misses features like grouping of pipeline runs and avoiding
unnecessary reruns of the same pipeline, but is useful for testing.
### Making Pipeline Public
Due to limitations in the Buildkite REST API, pipelines created via the API
can't be made publically visible or accessible to be run by our presubmit bots.
The pipeline will still be created, but it will require manual intervention to
set all the necessary permissions. It will run normally on postsubmit, but will
not be visible to the public. It will run on presubmit using the
["unregistered" pipeline](#unregistered-pipelines). To make it fully accessible,
you (or a member of the IREE team in the IREE Buildkite organization) must go to
the pipeline settings page and click "Make Pipeline Public" and then go to the
Teams subsection of pipeline settings and *only if this is a pipeline under
untrusted/* give "Build & Read" access to the "Presubmit" team and the
"Everyone" team.
## Postsubmits
Since it doesn't have to deal with potentially untrusted code, the
[Postsubmit pipeline](https://buildkite.com/iree/postsubmit) is much simpler. It
triggers on each commit to the `main` branch. Bootstrapping uploads
[postsubmit.yml](pipelines/trusted/postsubmit.yml), which triggers and waits for
all the specified pipelines to complete on the target commit. In addition, the
postsubmit build registers pipelines with Buildkite based on the pipeline files
checked in.
## Idempotency
Both the presubmit and postsubmit pipelines are designed to be idempotent.
Triggering them again on the same commit will not trigger any new builds, only
orchestration pipelines. This is one of the reasons we use our own script rather
than a Buildkite trigger step.
## Agent Configuration
Files under the `agent/` directory are those that are deployed to the Buildkite
agents to control their behavior. This is currently done manually, but they are
checked in here for versioning and review.