IREE Buildkite Automation

Status

We are in the process of migrating our automation infrastructure to Buildkite. samples.yml and everything under the cmake/ directory are legacy pipelines from our previous ad hoc usage. This document describes everything else, which is still in development and can generally be ignored for now.

General Setup

All IREE automation pipelines are moving to Buildkite. The jobs in these pipelines should avoid duplicating work and instead pass artifacts between machines. The workhorse of this automation is a collection of x86-64 linux build machines (initially GCE VMs but probably moving to Kubernetes containers) which perform all direct and cross compilation that is feasible. In cases where we want to ensure compilation on a platform, not just for it (and/or cross-compiling for it on Linux is difficult), we will also run build machines of that platform (e.g. Windows).

Built artifacts will be farmed out to testing machines (some machines may play double-duty as build and test), using docker images and emulators where useful. Bazel builds will use Bazel remote caching, but not remote execution which we have found to be prohibitevly expensive to configure and maintain. CMake builds will initially be uncached, but move to first local caches (ccache) and then remote caching and/or distributed builds (more research needed, but e.g. sccache or distcc).

Orchestration agents will take care of light tasks like uploading Buildkite pipelines and polling for bigger jobs to be finished. These will be run on minimally sized machines (initially e2-micro GCE VMs, but likely moving to Kubernetes containers).

Security

There is a strict separation between machines and caches that are used for releases and automation, which are limited to running code that has been submitted to the IREE repository, and those that are used to run code coming from third party forks. Buildkite agents are tagged with security: "trusted" or security: "untrusted" to indicate the class of code that they run. Agents running unsubmitted code may have read access to artifacts (e.g. cache build cache entries) generated by trusted agents.

Vetting Unsubmitted Code

There‘s no getting around that presubmit testing is remote code execution (even if there’s no confidential information accessible by those remote executors), and we don't want to give malicious or nuisance actors free compute. At the same time, we want to have minimal friction for new contributors, and especially for routine contributors, indepenedent of what company they work for. As a proxy for “real person who actually wants to contribute to the project” we use having signed the Google Contributor License Agreement (CLA), which is already required to contribute to the project. This is checked on trusted runners using configs and scripts that have already been submitted to the repository before running anything using the code from a pull request. Because CLA signing can sometimes create a roadblock (especially in the case of corporate CLAs, which require getting lawyers involved), if the CLA check fails, a block step is inserted in the pipeline, allowing members of the IREE Buildkite org to unblock the runs manually in the meantime. Additionally we block any bad actors that crop up using Buildkite conditional filtering to stop builds from triggering on their PRs at all. IREE Buildkite organization admins can update those options here.

Bootstrapping

We want to enable testing changes to the pipeline configurations themselves on a PR before it is submitted. To enable this, we don't register the pipeline configurations themselves with Buildkite. Instead, we register one of bootstrap-trusted.yml or bootstrap-untrusted.yml (depending on whether they are in the trusted/ or untrusted/ directory) for each of our pipelines. This registration happens as one of the jobs on the postsubmit pipeline and is performed for each pipeline configuration file in the pipeline directories. These bootstrap configurations just upload the relevant pipeline configuration file. But then we have no way to test changes to the bootstrap pipeline configurations themselves on presubmit. Although we do not expect them to change frequently, this is still undesirable. To get around this infinite regress, the bootstrap pipeline configurations first upload the newest version of themselves from the target commit, setting an environment variable to prevent further self-upload. Only then do they upload the target pipeline.

Presubmits

The Presubmit pipeline runs on all PRs sent to the main repository.

If the PR is from a fork, bootstrapping checks out the main branch and uploads the presubmit.yml pipeline configuration from there. This pipeline checks everything out from the main branch. It runs check_cla.py to determine if the CLA check has passed on the target commit. If it hasn't, it inserts a block step that prevents further execution until an authorized person in the IREE Buildkite organization unblocks. Subsequent steps use wait_for_pipeline_success.py to trigger and wait for other pipelines to execute on the PR commit.

It is important that the CLA check is performed on trusted runners and using code from the main repository. This makes it difficult to test changes to the presubmit pipeline configuration, however. To enable this, if the PR is coming from another branch on the main repository, bootstrapping instead checks out the commit on which it was triggered and instructs the presubmit pipeline to do the same.

Pipeline Files

The pipelines/ directory contains Buildkite pipeline configuration yaml files. They are picked up automatically by Buildkite using automation in the postsubmit pipeline. They are organized into three categories.

Trusted

Pipelines that are allowed to run steps on trusted agents that have elevated privileges. This is enforced by a pre-bootstrap hook deployed to these agents. These agents should only run using configurations coming from trusted sources, which right now means only commits that have been checked into the main IREE repository.

Untrusted

Pipelines that are only allowed to run steps on untrusted agents. The only risk posed by access to these agents must be an ability to use their compute and that of other untrusted agents and write to resources (including caches) used only by other untrusted agents. They should not have access to any sensitive data or the ability to delete or corrupt unrecoverable artifacts. We should be able to turn down all untrusted agents with the only consequence being a temporary delay in presubmit builds.

Fragment

Buildkite pipelines configurations that are intended to be used dynamically or inserted into other pipelines with the buildkite-agent pipeline upload command. These are not registered with Buildkite.

Creating a Pipeline

Creating a new pipeline is almost as simple as sending a PR containing a new pipeline yaml configuration. Mostly pipelines should be created as “untrusted” and invoked with wait_for_pipeline_success.py from the presubmit and/or postsubmit pipelines.

Unregistered Pipelines

When first introducing a new Buildkite pipeline, it will not be registered with Buildkite yet. The presubmit pipeline therefore cannot trigger a run of this pipeline. To enable testing of new pipelines, we have a special unregistered pipeline. This pipeline just uploads another pipeline file based on an environment variable. When a pipeline with the given name doesn't exist, wait_for_pipeline_success.py instead invokes the “unregistered” pipeline, which runs the given pipeline configuration. This misses features like grouping of pipeline runs and avoiding unnecessary reruns of the same pipeline, but is useful for testing.

Making Pipeline Public

Due to limitations in the Buildkite REST API, pipelines created via the API can't be made publically visible or accessible to be run by our presubmit bots. The pipeline will still be created, but it will require manual intervention to set all the necessary permissions. It will run normally on postsubmit, but will not be visible to the public. It will run on presubmit using the “unregistered” pipeline. To make it fully accessible, you (or a member of the IREE team in the IREE Buildkite organization) must go to the pipeline settings page and click “Make Pipeline Public” and then go to the Teams subsection of pipeline settings and only if this is a pipeline under untrusted/ give “Build & Read” access to the “Presubmit” team and the “Everyone” team.

Postsubmits

Since it doesn't have to deal with potentially untrusted code, the Postsubmit pipeline is much simpler. It triggers on each commit to the main branch. Bootstrapping uploads postsubmit.yml, which triggers and waits for all the specified pipelines to complete on the target commit. In addition, the postsubmit build registers pipelines with Buildkite based on the pipeline files checked in.

Idempotency

Both the presubmit and postsubmit pipelines are designed to be idempotent. Triggering them again on the same commit will not trigger any new builds, only orchestration pipelines. This is one of the reasons we use our own script rather than a Buildkite trigger step.

Agent Configuration

Files under the agent/ directory are those that are deployed to the Buildkite agents to control their behavior. This is currently done manually, but they are checked in here for versioning and review.