build_tools/buildkite/README.md - 3p/openxla/iree - Git at Google

 # IREE Buildkite Automation

 ## Status

 We are in the process of migrating our automation infrastructure to
 [Buildkite](https://buildkite.com). `samples.yml` and everything under the
 `cmake/` directory are legacy pipelines from our previous ad hoc usage. This
 document describes everything else, which is still in development and can
 generally be ignored for now.

 ## General Setup

 All IREE automation pipelines are moving to Buildkite. The jobs in these
 pipelines should avoid duplicating work and instead pass artifacts between
 machines. The workhorse of this automation is a collection of x86-64 linux build
 machines (initially GCE VMs but probably moving to Kubernetes containers) which
 perform all direct and cross compilation that is feasible. In cases where we
 want to ensure compilation on a platform, not just for it (and/or
 cross-compiling for it on Linux is difficult), we will also run build machines
 of that platform (e.g. Windows).

 Built artifacts will be farmed out to testing machines (some machines may play
 double-duty as build and test), using docker images and emulators where useful.
 Bazel builds will use
 [Bazel remote caching](https://bazel.build/docs/remote-caching), but not remote
 execution which we have found to be prohibitevly expensive to configure and
 maintain. CMake builds will initially be uncached, but move to first local
 caches (ccache) and then remote caching and/or distributed builds (more research
 needed, but e.g. sccache or distcc).

 Orchestration agents will take care of light tasks like uploading Buildkite
 pipelines and polling for bigger jobs to be finished. These will be run on
 minimally sized machines (initially e2-micro GCE VMs, but likely moving to
 Kubernetes containers).

 ## Security

 There is a strict separation between machines and caches that are used for
 releases and automation, which are limited to running code that has been
 submitted to the IREE repository, and those that are used to run code coming
 from third party forks. Buildkite agents are tagged with `security: "trusted"`
 or `security: "untrusted"` to indicate the class of code that they run. Agents
 running unsubmitted code may have read access to artifacts (e.g. cache build
 cache entries) generated by trusted agents.

 ### Vetting Unsubmitted Code

 There's no getting around that presubmit testing is remote code execution (even
 if there's no confidential information accessible by those remote executors),
 and we don't want to give malicious or nuisance actors free compute. At the same
 time, we want to have minimal friction for new contributors, and especially for
 routine contributors, *indepenedent of what company they work for*. As a proxy
 for "real person who actually wants to contribute to the project" we use having
 signed the
 [Google Contributor License Agreement (CLA)](https://cla.developers.google.com),
 which is already required to contribute to the project. This is checked on
 trusted runners using configs and scripts that have already been submitted to
 the repository before running anything using the code from a pull request.
 Because CLA signing can sometimes create a roadblock (especially in the case of
 corporate CLAs, which require getting lawyers involved), if the CLA check fails,
 a [block step](https://buildkite.com/docs/pipelines/block-step) is inserted in
 the pipeline, allowing members of the IREE Buildkite org to unblock the runs
 manually in the meantime. Additionally we block any bad actors that crop up
 using Buildkite
 [conditional filtering](https://buildkite.com/docs/pipelines/conditionals#conditionals-in-pipelines)
 to stop builds from triggering on their PRs at all. IREE Buildkite organization
 admins can
 [update those options here](https://buildkite.com/iree/presubmit/settings/repository#:~:text=Filter%20builds%20using%20a%20conditional).

 ### Bootstrapping

 We want to enable testing changes to the pipeline configurations themselves on a
 PR before it is submitted. To enable this, we don't register the pipeline
 configurations themselves with Buildkite. Instead, we register one of
 [bootstrap-trusted.yml](pipelines/fragment/bootstrap-trusted.yml) or
 [bootstrap-untrusted.yml](pipelines/fragment/bootstrap-untrusted.yml) (depending
 on whether they are in the `trusted/` or `untrusted/` directory) for each of our
 pipelines. This registration happens as one of the jobs on the postsubmit
 pipeline and is performed for each pipeline configuration file in the pipeline
 directories. These bootstrap configurations just upload the relevant pipeline
 configuration file. But then we have no way to test changes to the bootstrap
 pipeline configurations themselves on presubmit. Although we do not expect them
 to change frequently, this is still undesirable. To get around this infinite
 regress, the bootstrap pipeline configurations first upload the newest version
 of themselves from the target commit, setting an environment variable to prevent
 further self-upload. Only then do they upload the target pipeline.

 ## Presubmits

 The [Presubmit pipeline](https://buildkite.com/iree/presubmit) runs on all PRs
 sent to the main repository.

 If the PR is from a fork, bootstrapping checks out the `main` branch and uploads
 the [presubmit.yml](pipelines/trusted/presubmit.yml) pipeline configuration from
 there. This pipeline checks everything out from the `main` branch. It runs
 [check_cla.py](scripts/check_cla.py) to determine if the CLA check has passed on
 the target commit. If it hasn't, it inserts a block step that prevents further
 execution until an authorized person in the IREE Buildkite organization
 unblocks. Subsequent steps use
 [wait_for_pipeline_success.py](scripts/wait_for_pipeline_success.py) to trigger
 and wait for other pipelines to execute on the PR commit.

 It is important that the CLA check is performed on trusted runners and using
 code from the main repository. This makes it difficult to test changes to the
 presubmit pipeline configuration, however. To enable this, if the PR is coming
 from another branch on the main repository, bootstrapping instead checks out the
 commit on which it was triggered and instructs the presubmit pipeline to do the
 same.

 ## Pipeline Files

 The `pipelines/` directory contains Buildkite
 [pipeline configuration yaml files](https://buildkite.com/docs/pipelines). They
 are picked up automatically by Buildkite using automation in the postsubmit
 pipeline. They are organized into three categories.

 ### Trusted

 Pipelines that are allowed to run steps on trusted agents that have elevated
 privileges. This is enforced by a pre-bootstrap hook deployed to these agents.
 These agents should only run using configurations coming from trusted sources,
 which right now means only commits that have been checked into the main IREE
 repository.

 ### Untrusted

 Pipelines that are only allowed to run steps on untrusted agents. The only risk
 posed by access to these agents must be an ability to use their compute and
 that of other untrusted agents and write to resources (including caches) used
 only by other untrusted agents. They should not have access to any sensitive
 data or the ability to delete or corrupt unrecoverable artifacts. We should be
 able to turn down all untrusted agents with the only consequence being a
 temporary delay in presubmit builds.

 ### Fragment

 Buildkite pipelines configurations that are intended to be used dynamically or
 inserted into other pipelines with the `buildkite-agent pipeline upload`
 command. These are not registered with Buildkite.

 ## Creating a Pipeline

 Creating a new pipeline is [almost](#making-pipeline-public) as simple as
 sending a PR containing a new pipeline yaml configuration. Mostly pipelines
 should be created as "untrusted" and invoked with `wait_for_pipeline_success.py`
 from the presubmit and/or postsubmit pipelines.

 ### Unregistered Pipelines

 When first introducing a new Buildkite pipeline, it will not be registered with
 Buildkite yet. The presubmit pipeline therefore cannot trigger a run of this
 pipeline. To enable testing of new pipelines, we have a special
 [unregistered pipeline](https://buildkite.com/iree/unregistered). This pipeline
 just uploads another pipeline file based on an environment variable. When a
 pipeline with the given name doesn't exist,
 [wait_for_pipeline_success.py](scripts/wait_for_pipeline_success.py) instead
 invokes the "unregistered" pipeline, which runs the given pipeline
 configuration. This misses features like grouping of pipeline runs and avoiding
 unnecessary reruns of the same pipeline, but is useful for testing.

 ### Making Pipeline Public

 Due to limitations in the Buildkite REST API, pipelines created via the API
 can't be made publically visible or accessible to be run by our presubmit bots.
 The pipeline will still be created, but it will require manual intervention to
 set all the necessary permissions. It will run normally on postsubmit, but will
 not be visible to the public. It will run on presubmit using the
 ["unregistered" pipeline](#unregistered-pipelines). To make it fully accessible,
 you (or a member of the IREE team in the IREE Buildkite organization) must go to
 the pipeline settings page and click "Make Pipeline Public" and then go to the
 Teams subsection of pipeline settings and *only if this is a pipeline under
 untrusted/* give "Build & Read" access to the "Presubmit" team and the
 "Everyone" team.

 ## Postsubmits

 Since it doesn't have to deal with potentially untrusted code, the
 [Postsubmit pipeline](https://buildkite.com/iree/postsubmit) is much simpler. It
 triggers on each commit to the `main` branch. Bootstrapping uploads
 [postsubmit.yml](pipelines/trusted/postsubmit.yml), which triggers and waits for
 all the specified pipelines to complete on the target commit. In addition, the
 postsubmit build registers pipelines with Buildkite based on the pipeline files
 checked in.

 ## Idempotency

 Both the presubmit and postsubmit pipelines are designed to be idempotent.
 Triggering them again on the same commit will not trigger any new builds, only
 orchestration pipelines. This is one of the reasons we use our own script rather
 than a Buildkite trigger step.

 ## Agent Configuration

 Files under the `agent/` directory are those that are deployed to the Buildkite
 agents to control their behavior. This is currently done manually, but they are
 checked in here for versioning and review.
	# IREE Buildkite Automation

	## Status

	We are in the process of migrating our automation infrastructure to
	[Buildkite](https://buildkite.com). `samples.yml` and everything under the
	`cmake/` directory are legacy pipelines from our previous ad hoc usage. This
	document describes everything else, which is still in development and can
	generally be ignored for now.

	## General Setup

	All IREE automation pipelines are moving to Buildkite. The jobs in these
	pipelines should avoid duplicating work and instead pass artifacts between
	machines. The workhorse of this automation is a collection of x86-64 linux build
	machines (initially GCE VMs but probably moving to Kubernetes containers) which
	perform all direct and cross compilation that is feasible. In cases where we
	want to ensure compilation on a platform, not just for it (and/or
	cross-compiling for it on Linux is difficult), we will also run build machines
	of that platform (e.g. Windows).

	Built artifacts will be farmed out to testing machines (some machines may play
	double-duty as build and test), using docker images and emulators where useful.
	Bazel builds will use
	[Bazel remote caching](https://bazel.build/docs/remote-caching), but not remote
	execution which we have found to be prohibitevly expensive to configure and
	maintain. CMake builds will initially be uncached, but move to first local
	caches (ccache) and then remote caching and/or distributed builds (more research
	needed, but e.g. sccache or distcc).

	Orchestration agents will take care of light tasks like uploading Buildkite
	pipelines and polling for bigger jobs to be finished. These will be run on
	minimally sized machines (initially e2-micro GCE VMs, but likely moving to
	Kubernetes containers).

	## Security

	There is a strict separation between machines and caches that are used for
	releases and automation, which are limited to running code that has been
	submitted to the IREE repository, and those that are used to run code coming
	from third party forks. Buildkite agents are tagged with `security: "trusted"`
	or `security: "untrusted"` to indicate the class of code that they run. Agents
	running unsubmitted code may have read access to artifacts (e.g. cache build
	cache entries) generated by trusted agents.

	### Vetting Unsubmitted Code

	There's no getting around that presubmit testing is remote code execution (even
	if there's no confidential information accessible by those remote executors),
	and we don't want to give malicious or nuisance actors free compute. At the same
	time, we want to have minimal friction for new contributors, and especially for
	routine contributors, indepenedent of what company they work for. As a proxy
	for "real person who actually wants to contribute to the project" we use having
	signed the
	[Google Contributor License Agreement (CLA)](https://cla.developers.google.com),
	which is already required to contribute to the project. This is checked on
	trusted runners using configs and scripts that have already been submitted to
	the repository before running anything using the code from a pull request.
	Because CLA signing can sometimes create a roadblock (especially in the case of
	corporate CLAs, which require getting lawyers involved), if the CLA check fails,
	a [block step](https://buildkite.com/docs/pipelines/block-step) is inserted in
	the pipeline, allowing members of the IREE Buildkite org to unblock the runs
	manually in the meantime. Additionally we block any bad actors that crop up
	using Buildkite
	[conditional filtering](https://buildkite.com/docs/pipelines/conditionals#conditionals-in-pipelines)
	to stop builds from triggering on their PRs at all. IREE Buildkite organization
	admins can
	[update those options here](https://buildkite.com/iree/presubmit/settings/repository#:~:text=Filter%20builds%20using%20a%20conditional).

	### Bootstrapping

	We want to enable testing changes to the pipeline configurations themselves on a
	PR before it is submitted. To enable this, we don't register the pipeline
	configurations themselves with Buildkite. Instead, we register one of
	[bootstrap-trusted.yml](pipelines/fragment/bootstrap-trusted.yml) or
	[bootstrap-untrusted.yml](pipelines/fragment/bootstrap-untrusted.yml) (depending
	on whether they are in the `trusted/` or `untrusted/` directory) for each of our
	pipelines. This registration happens as one of the jobs on the postsubmit
	pipeline and is performed for each pipeline configuration file in the pipeline
	directories. These bootstrap configurations just upload the relevant pipeline
	configuration file. But then we have no way to test changes to the bootstrap
	pipeline configurations themselves on presubmit. Although we do not expect them
	to change frequently, this is still undesirable. To get around this infinite
	regress, the bootstrap pipeline configurations first upload the newest version
	of themselves from the target commit, setting an environment variable to prevent
	further self-upload. Only then do they upload the target pipeline.

	## Presubmits

	The [Presubmit pipeline](https://buildkite.com/iree/presubmit) runs on all PRs
	sent to the main repository.

	If the PR is from a fork, bootstrapping checks out the `main` branch and uploads
	the [presubmit.yml](pipelines/trusted/presubmit.yml) pipeline configuration from
	there. This pipeline checks everything out from the `main` branch. It runs
	[check_cla.py](scripts/check_cla.py) to determine if the CLA check has passed on
	the target commit. If it hasn't, it inserts a block step that prevents further
	execution until an authorized person in the IREE Buildkite organization
	unblocks. Subsequent steps use
	[wait_for_pipeline_success.py](scripts/wait_for_pipeline_success.py) to trigger
	and wait for other pipelines to execute on the PR commit.

	It is important that the CLA check is performed on trusted runners and using
	code from the main repository. This makes it difficult to test changes to the
	presubmit pipeline configuration, however. To enable this, if the PR is coming
	from another branch on the main repository, bootstrapping instead checks out the
	commit on which it was triggered and instructs the presubmit pipeline to do the
	same.

	## Pipeline Files

	The `pipelines/` directory contains Buildkite
	[pipeline configuration yaml files](https://buildkite.com/docs/pipelines). They
	are picked up automatically by Buildkite using automation in the postsubmit
	pipeline. They are organized into three categories.

	### Trusted

	Pipelines that are allowed to run steps on trusted agents that have elevated
	privileges. This is enforced by a pre-bootstrap hook deployed to these agents.
	These agents should only run using configurations coming from trusted sources,
	which right now means only commits that have been checked into the main IREE
	repository.

	### Untrusted

	Pipelines that are only allowed to run steps on untrusted agents. The only risk
	posed by access to these agents must be an ability to use their compute and
	that of other untrusted agents and write to resources (including caches) used
	only by other untrusted agents. They should not have access to any sensitive
	data or the ability to delete or corrupt unrecoverable artifacts. We should be
	able to turn down all untrusted agents with the only consequence being a
	temporary delay in presubmit builds.

	### Fragment

	Buildkite pipelines configurations that are intended to be used dynamically or
	inserted into other pipelines with the `buildkite-agent pipeline upload`
	command. These are not registered with Buildkite.

	## Creating a Pipeline

	Creating a new pipeline is [almost](#making-pipeline-public) as simple as
	sending a PR containing a new pipeline yaml configuration. Mostly pipelines
	should be created as "untrusted" and invoked with `wait_for_pipeline_success.py`
	from the presubmit and/or postsubmit pipelines.

	### Unregistered Pipelines

	When first introducing a new Buildkite pipeline, it will not be registered with
	Buildkite yet. The presubmit pipeline therefore cannot trigger a run of this
	pipeline. To enable testing of new pipelines, we have a special
	[unregistered pipeline](https://buildkite.com/iree/unregistered). This pipeline
	just uploads another pipeline file based on an environment variable. When a
	pipeline with the given name doesn't exist,
	[wait_for_pipeline_success.py](scripts/wait_for_pipeline_success.py) instead
	invokes the "unregistered" pipeline, which runs the given pipeline
	configuration. This misses features like grouping of pipeline runs and avoiding
	unnecessary reruns of the same pipeline, but is useful for testing.

	### Making Pipeline Public

	Due to limitations in the Buildkite REST API, pipelines created via the API
	can't be made publically visible or accessible to be run by our presubmit bots.
	The pipeline will still be created, but it will require manual intervention to
	set all the necessary permissions. It will run normally on postsubmit, but will
	not be visible to the public. It will run on presubmit using the
	["unregistered" pipeline](#unregistered-pipelines). To make it fully accessible,
	you (or a member of the IREE team in the IREE Buildkite organization) must go to
	the pipeline settings page and click "Make Pipeline Public" and then go to the
	Teams subsection of pipeline settings and *only if this is a pipeline under
	untrusted/* give "Build & Read" access to the "Presubmit" team and the
	"Everyone" team.

	## Postsubmits

	Since it doesn't have to deal with potentially untrusted code, the
	[Postsubmit pipeline](https://buildkite.com/iree/postsubmit) is much simpler. It
	triggers on each commit to the `main` branch. Bootstrapping uploads
	[postsubmit.yml](pipelines/trusted/postsubmit.yml), which triggers and waits for
	all the specified pipelines to complete on the target commit. In addition, the
	postsubmit build registers pipelines with Buildkite based on the pipeline files
	checked in.

	## Idempotency

	Both the presubmit and postsubmit pipelines are designed to be idempotent.
	Triggering them again on the same commit will not trigger any new builds, only
	orchestration pipelines. This is one of the reasons we use our own script rather
	than a Buildkite trigger step.

	## Agent Configuration

	Files under the `agent/` directory are those that are deployed to the Buildkite
	agents to control their behavior. This is currently done manually, but they are
	checked in here for versioning and review.