IREE Buildkite Automation

Status

We are in the process of migrating our automation infrastructure to Buildkite. samples.yml and everything under the cmake/ directory are legacy pipelines from our previous ad hoc usage. This document describes everything else, which is still in development and can generally be ignored for now.

General Setup

All IREE automation pipelines are moving to Buildkite. The jobs in these pipelines should avoid duplicating work and instead pass artifacts between machines. The workhorse of this automation is a collection of x86-64 linux build machines (initially GCE VMs but probably moving to Kubernetes containers) which perform all direct and cross compilation that is feasible. In cases where we want to ensure compilation on a platform, not just for it (and/or cross-compiling for it on Linux is difficult), we will also run build machines of that platform (e.g. Windows).

Built artifacts will be farmed out to testing machines (some machines may play double-duty as build and test), using docker images and emulators where useful. Bazel builds will use Bazel remote caching, but not remote execution which we have found to be prohibitevly expensive to configure and maintain. CMake builds will initially be uncached, but move to first local caches (ccache) and then remote caching and/or distributed builds (more research needed, but e.g. sccache or distcc).

Orchestration agents will take care of light tasks like uploading Buildkite pipelines and polling for bigger jobs to be finished. These will be run on minimally sized machines (initially e2-micro GCE VMs, but likely moving to Kubernetes containers).

Presubmits

There is a strict separation between machines and caches that act on code that has been submitted to the IREE repository and code that is coming from third party forks. Buildkite agents are tagged with security: "submitted" or security: "presubmit" to indicate the class of code that they run. Agents running unsubmitted code may have read access to artifacts (e.g. cache build cache entries) generated by agents running submitted code, but not write access.

Vetting Unsubmitted Code

There‘s no getting around that presubmit testing is remote code execution (even if there’s no confidential information accessible by those remote executors), and we don't want to give malicious or nuisance actors free compute. At the same time, we want to have minimal friction for new contributors, and especially for routine contributors, indepenedent of what company they work for. As a proxy for “real person who actually wants to contribute to the project” we use having signed the Google Contributor License Agreement (CLA), which is already required to contribute to the project. This is checked using already-checked-in scripts and configs before running anything using the code from a pull request. Because CLA signing can sometimes create a roadblock (especially in the case of corporate CLAs, which require getting lawyers involved), if the CLA check fails, a block step is inserted in the pipeline, allowing members of the IREE Buildkite org to unblock the runs manually in the meantime. Additionally we block any bad actors that crop up using Buildkite conditional filtering to stop builds from triggering on their PRs at all. IREE Buildkite organization admins can update those options here.

The Presubmit pipeline runs on all PRs sent to the main repository. The basic flow is that presubmit_bootstrap.yml fetches and uploads presubmit.yml from the main branch of the repository, which similarly fetches check_cla.py from the main branch and checks whether the CLA check has passed on the target commit. If it hasn't, it inserts a block step that prevents further execution until an authorized person in the IREE Buildkite organization unblocks. Subsequent steps fetch wait_for_pipeline_success.py from the main branch and use it to trigger and wait for other pipelines to execute on the PR commit.

Postsubmits

Since it doesn't have to deal with potentially untrusted code, the Postsubmit pipeline is much simpler. It triggers on each commit to the main branch. postsubmit_bootstrap.yml fetches the main branch and uploads postsubmit.yml, which triggers and waits for all the specified pipelines to complete on the target commit.

Triggering Multiple Runs

Both the presubmit and postsubmit pipelines are designed to be idempotent. Triggering them again on the same commit will not trigger any new builds, only orchestration pipelines. This is one of the reasons we use our own script rather than a Buildkite trigger step.