| # IREE Buildkite Automation |
| |
| ## Status |
| |
| We are in the process of migrating our automation infrastructure to |
| [Buildkite](https://buildkite.com). `samples.yml` and everything under the |
| `cmake/` directory are legacy pipelines from our previous ad hoc usage. This |
| document describes everything else, which is still in development and can |
| generally be ignored for now. |
| |
| ## General Setup |
| |
| All IREE automation pipelines are moving to Buildkite. The jobs in these |
| pipelines should avoid duplicating work and instead pass artifacts between |
| machines. The workhorse of this automation is a collection of x86-64 linux build |
| machines (initially GCE VMs but probably moving to Kubernetes containers) which |
| perform all direct and cross compilation that is feasible. In cases where we |
| want to ensure compilation on a platform, not just for it (and/or |
| cross-compiling for it on Linux is difficult), we will also run build machines |
| of that platform (e.g. Windows). |
| |
| Built artifacts will be farmed out to testing machines (some machines may play |
| double-duty as build and test), using docker images and emulators where useful. |
| Bazel builds will use |
| [Bazel remote caching](https://bazel.build/docs/remote-caching), but not remote |
| execution which we have found to be prohibitevly expensive to configure and |
| maintain. CMake builds will initially be uncached, but move to first local |
| caches (ccache) and then remote caching and/or distributed builds (more research |
| needed, but e.g. sccache or distcc). |
| |
| Orchestration agents will take care of light tasks like uploading Buildkite |
| pipelines and polling for bigger jobs to be finished. These will be run on |
| minimally sized machines (initially e2-micro GCE VMs, but likely moving to |
| Kubernetes containers). |
| |
| ## Security |
| |
| There is a strict separation between machines and caches that are used for |
| releases and automation, which are limited to running code that has been |
| submitted to the IREE repository, and those that are used to run code coming |
| from third party forks. Buildkite agents are tagged with `security: "trusted"` |
| or `security: "untrusted"` to indicate the class of code that they run. Agents |
| running unsubmitted code may have read access to artifacts (e.g. cache build |
| cache entries) generated by trusted agents. |
| |
| ### Vetting Unsubmitted Code |
| |
| There's no getting around that presubmit testing is remote code execution (even |
| if there's no confidential information accessible by those remote executors), |
| and we don't want to give malicious or nuisance actors free compute. At the same |
| time, we want to have minimal friction for new contributors, and especially for |
| routine contributors, *indepenedent of what company they work for*. As a proxy |
| for "real person who actually wants to contribute to the project" we use having |
| signed the |
| [Google Contributor License Agreement (CLA)](https://cla.developers.google.com), |
| which is already required to contribute to the project. This is checked on |
| trusted runners using configs and scripts that have already been submitted to |
| the repository before running anything using the code from a pull request. |
| Because CLA signing can sometimes create a roadblock (especially in the case of |
| corporate CLAs, which require getting lawyers involved), if the CLA check fails, |
| a [block step](https://buildkite.com/docs/pipelines/block-step) is inserted in |
| the pipeline, allowing members of the IREE Buildkite org to unblock the runs |
| manually in the meantime. Additionally we block any bad actors that crop up |
| using Buildkite |
| [conditional filtering](https://buildkite.com/docs/pipelines/conditionals#conditionals-in-pipelines) |
| to stop builds from triggering on their PRs at all. IREE Buildkite organization |
| admins can |
| [update those options here](https://buildkite.com/iree/presubmit/settings/repository#:~:text=Filter%20builds%20using%20a%20conditional). |
| |
| ### Bootstrapping |
| |
| We want to enable testing changes to the pipeline configurations themselves on a |
| PR before it is submitted. To enable this, we don't register the pipeline |
| configurations themselves with Buildkite. Instead, we register one of |
| [bootstrap-trusted.yml](pipelines/fragment/bootstrap-trusted.yml) or |
| [bootstrap-untrusted.yml](pipelines/fragment/bootstrap-untrusted.yml) (depending |
| on whether they are in the `trusted/` or `untrusted/` directory) for each of our |
| pipelines. This registration happens as one of the jobs on the postsubmit |
| pipeline and is performed for each pipeline configuration file in the pipeline |
| directories. These bootstrap configurations just upload the relevant pipeline |
| configuration file. But then we have no way to test changes to the bootstrap |
| pipeline configurations themselves on presubmit. Although we do not expect them |
| to change frequently, this is still undesirable. To get around this infinite |
| regress, the bootstrap pipeline configurations first upload the newest version |
| of themselves from the target commit, setting an environment variable to prevent |
| further self-upload. Only then do they upload the target pipeline. |
| |
| ## Presubmits |
| |
| The [Presubmit pipeline](https://buildkite.com/iree/presubmit) runs on all PRs |
| sent to the main repository. |
| |
| If the PR is from a fork, bootstrapping checks out the `main` branch and uploads |
| the [presubmit.yml](pipelines/trusted/presubmit.yml) pipeline configuration from |
| there. This pipeline checks everything out from the `main` branch. It runs |
| [check_cla.py](scripts/check_cla.py) to determine if the CLA check has passed on |
| the target commit. If it hasn't, it inserts a block step that prevents further |
| execution until an authorized person in the IREE Buildkite organization |
| unblocks. Subsequent steps use |
| [wait_for_pipeline_success.py](scripts/wait_for_pipeline_success.py) to trigger |
| and wait for other pipelines to execute on the PR commit. |
| |
| It is important that the CLA check is performed on trusted runners and using |
| code from the main repository. This makes it difficult to test changes to the |
| presubmit pipeline configuration, however. To enable this, if the PR is coming |
| from another branch on the main repository, bootstrapping instead checks out the |
| commit on which it was triggered and instructs the presubmit pipeline to do the |
| same. |
| |
| ## Pipeline Files |
| |
| The `pipelines/` directory contains Buildkite |
| [pipeline configuration yaml files](https://buildkite.com/docs/pipelines). They |
| are picked up automatically by Buildkite using automation in the postsubmit |
| pipeline. They are organized into three categories. |
| |
| ### Trusted |
| |
| Pipelines that are allowed to run steps on trusted agents that have elevated |
| privileges. This is enforced by a pre-bootstrap hook deployed to these agents. |
| These agents should only run using configurations coming from trusted sources, |
| which right now means only commits that have been checked into the main IREE |
| repository. |
| |
| ### Untrusted |
| |
| Pipelines that are only allowed to run steps on untrusted agents. The only risk |
| posed by access to these agents must be an ability to use their compute and |
| that of other untrusted agents and write to resources (including caches) used |
| only by other untrusted agents. They should not have access to any sensitive |
| data or the ability to delete or corrupt unrecoverable artifacts. We should be |
| able to turn down all untrusted agents with the only consequence being a |
| temporary delay in presubmit builds. |
| |
| ### Fragment |
| |
| Buildkite pipelines configurations that are intended to be used dynamically or |
| inserted into other pipelines with the `buildkite-agent pipeline upload` |
| command. These are not registered with Buildkite. |
| |
| ## Creating a Pipeline |
| |
| Creating a new pipeline is [almost](#making-pipeline-public) as simple as |
| sending a PR containing a new pipeline yaml configuration. Mostly pipelines |
| should be created as "untrusted" and invoked with `wait_for_pipeline_success.py` |
| from the presubmit and/or postsubmit pipelines. |
| |
| ### Unregistered Pipelines |
| |
| When first introducing a new Buildkite pipeline, it will not be registered with |
| Buildkite yet. The presubmit pipeline therefore cannot trigger a run of this |
| pipeline. To enable testing of new pipelines, we have a special |
| [unregistered pipeline](https://buildkite.com/iree/unregistered). This pipeline |
| just uploads another pipeline file based on an environment variable. When a |
| pipeline with the given name doesn't exist, |
| [wait_for_pipeline_success.py](scripts/wait_for_pipeline_success.py) instead |
| invokes the "unregistered" pipeline, which runs the given pipeline |
| configuration. This misses features like grouping of pipeline runs and avoiding |
| unnecessary reruns of the same pipeline, but is useful for testing. |
| |
| ### Making Pipeline Public |
| |
| Due to limitations in the Buildkite REST API, pipelines created via the API |
| can't be made publically visible or accessible to be run by our presubmit bots. |
| The pipeline will still be created, but it will require manual intervention to |
| set all the necessary permissions. It will run normally on postsubmit, but will |
| not be visible to the public. It will run on presubmit using the |
| ["unregistered" pipeline](#unregistered-pipelines). To make it fully accessible, |
| you (or a member of the IREE team in the IREE Buildkite organization) must go to |
| the pipeline settings page and click "Make Pipeline Public" and then go to the |
| Teams subsection of pipeline settings and *only if this is a pipeline under |
| untrusted/* give "Build & Read" access to the "Presubmit" team and the |
| "Everyone" team. |
| |
| ## Postsubmits |
| |
| Since it doesn't have to deal with potentially untrusted code, the |
| [Postsubmit pipeline](https://buildkite.com/iree/postsubmit) is much simpler. It |
| triggers on each commit to the `main` branch. Bootstrapping uploads |
| [postsubmit.yml](pipelines/trusted/postsubmit.yml), which triggers and waits for |
| all the specified pipelines to complete on the target commit. In addition, the |
| postsubmit build registers pipelines with Buildkite based on the pipeline files |
| checked in. |
| |
| ## Idempotency |
| |
| Both the presubmit and postsubmit pipelines are designed to be idempotent. |
| Triggering them again on the same commit will not trigger any new builds, only |
| orchestration pipelines. This is one of the reasons we use our own script rather |
| than a Buildkite trigger step. |
| |
| ## Agent Configuration |
| |
| Files under the `agent/` directory are those that are deployed to the Buildkite |
| agents to control their behavior. This is currently done manually, but they are |
| checked in here for versioning and review. |