# Work with hardware code in external repositories

OpenTitan is not a closed ecosystem: we incorporate code from third parties, and we split out pieces of our code to reach a wider audience.
In both cases, we need to import and use code from external repositories in our OpenTitan code base.
Read on for step-by-step instructions for common tasks, and for background information on the topic.

## Summary

Code in subdirectories of `hw/vendor` is imported (copied in) from external repositories (which may be provided by lowRISC or other sources).
The external repository is called "upstream".
Any development on imported in `hw/vendor` code should happen upstream when possible.
Files ending with `.vendor.hjson` indicate where the upstream repository is located.

In particular, this means:

- If you find a bug in imported code or want to enhance it, report it upstream.
- Follow the rules and style guides of the upstream project.
   They might differ from our own rules.
- Use the upstream mechanisms to do code changes. In many cases, upstream uses GitHub just like we do with Pull Requests.
- Work with upstream reviewers to get your changes merged into their code base.
- Once the change is part of the upstream repository, the `vendor_hw` tool can be used to copy the upstream code back into our OpenTitan repository.

Read on for the longer version of these guidelines.

Pushing changes upstream first isn't always possible or desirable: upstream might not accept changes, or be slow to respond.
In some cases, code changes are needed which are irrelevant for upstream and need to be maintained by us.
Our vendoring infrastructure is able to handle such cases, read on for more information on how to do it.

## Background

OpenTitan is developed in a "monorepo", a single repository containing all its source code.
This approach is beneficial for many reasons, ranging from an easier workflow to better reproducibility of the results, and that's why large companies like [Google](https://ai.google/research/pubs/pub45424) and Facebook are using monorepos.
Monorepos are even more compelling for hardware development, which cannot make use of a standardized language-specific package manager like npm or pip.

At the same time, open source is all about sharing and a free flow of code between projects.
We want to take in code from others, but also to give back and grow a wider ecosystem around our output.
To be able to do that, code repositories should be sufficiently modular and self-contained.
For example, if a CPU core is buried deep in a repository containing a full SoC design, people will have a hard time using this CPU core for their designs and contributing to it.

Our approach to this challenge: develop reusable parts of our code base in an external repository, and copy the source code back into our monorepo in an automated way.
The process of copying in external code is commonly called "vendoring".

Vendoring code is a good thing.
We continue to maintain a single code base which is easy to fork, tag and generally work with, as all the normal Git tooling works.
By explicitly importing code we also ensure that no unreviewed code sneaks into our code base, and a "always buildable" configuration is maintained.

But what happens if the imported code needs to be modified?
Ideally, all code changes are submitted upstream, integrated into the upstream code base, and then re-imported into our code base.
This development methodology is called "upstream first". 
History has shown repeatedly that an upstream first policy can help significantly with the long-term maintenance of code.

However, strictly following an upstream first policy isn't great either.
Some changes might not be useful for the upstream community, others might be not acceptable upstream or only applied after a long delay.
In these situations it must be possible to modify the code downstream, i.e. in our repository, as well.
Our setup includes multiple options to achieve this goal.
In many cases, applying patches on top of the imported code is the most sustainable option.

To ease the pain points of vendoring code we have developed tooling and continue to do so.
Please open an issue ticket if you see areas where the tooling could be improved.

## Basic concepts

This section gives a quick overview how we include code from other repositories into our repository.

All imported ("vendored") hardware code is by convention put into the `hw/vendor` directory.
(We have more conventions for file and directory names which are discussed below when the import of new code is described.)
To interact with code in this directory a tool called `vendor_hw` is used, which can be found in `util/vendor_hw.py`.
A "vendor description file" controls the vendoring process and serves as input to the `vendor_hw` tool.

In the simple, yet typical, case, the vendor description file is only a couple of lines of human-readable JSON:

```command
$ cat hw/vendor/lowrisc_ibex.vendor.hjson
{
  name: "lowrisc_ibex",
  target_dir: "lowrisc_ibex",

  upstream: {
    url: "https://github.com/lowRISC/ibex.git",
    rev: "master",
  },
}
```

This description file essentially says:
We vendor a component called "lowrisc_ibex" and place the code into the "lowrisc_ibex" directory (relative to the description file).
The code comes from the master branch of the Git repository found at https://github.com/lowRISC/ibex.git.

With this description file written, the `vendor_hw` tool can do its job.

```command
$ cd $REPO_TOP
$ ./util/vendor_hw.py hw/vendor/lowrisc_ibex.vendor.hjson --verbose
INFO: Cloning upstream repository https://github.com/lowRISC/ibex.git @ master
INFO: Cloned at revision 7728b7b6f2318fb4078945570a55af31ee77537a
INFO: Copying upstream sources to /home/philipp/src/opentitan/hw/vendor/lowrisc_ibex
INFO: Changes since the last import:
* Typo fix in muldiv: Reminder->Remainder (Stefan Wallentowitz)
INFO: Wrote lock file /home/philipp/src/opentitan/hw/vendor/lowrisc_ibex.lock.hjson
INFO: Import finished
```

Looking at the output, you might wonder: how did the `vendor_hw` tool know what changed since the last import?
It knows because it records the commit hash of the last import in a file called the "lock file".
This file can be found along the `.vendor.hjson` file, it's named `.lock.hjson`.

In the example above, it looks roughly like this:

```command
$ cat hw/vendor/lowrisc_ibex.lock.hjson
{
  upstream:
  {
    url: https://github.com/lowRISC/ibex.git
    rev: 7728b7b6f2318fb4078945570a55af31ee77537a
  }
}
```

The lock file should be committed together with the code itself to make the import step reproducible at any time.

After running `vendor_hw`, the code in your local working copy is updated to the latest upstream version.
Next is testing: run simulations, syntheses, or other tests to ensure that the new code works as expected.
Once you're confident that the new code is good to be committed, do so using the normal Git commands.

```command
$ cd $REPO_TOP

$ # Stage all files in the vendored directory
$ git add -A hw/vendor/lowrisc_ibex

$ # Stage the lock file as well
$ git add hw/vendor/lowrisc_ibex.lock.hjson

$ # Now commit everything. Don't forget to write a useful commit message!
$ git commit
```

Instead of running `vendor_hw` first, and then manually creating a Git commit, you can also use the `--commit` flag.

```command
$ cd $REPO_TOP
$ ./util/vendor_hw.py hw/vendor/lowrisc_ibex.vendor.hjson --verbose --commit
```

This command updates the "lowrisc_ibex" code, and creates a Git commit from it.

Read on for a complete example how to efficiently update a vendored dependency, and how to make changes to such code.

## Update vendored code in our repository

A complete example to update a vendored dependency, commit its changes, and create a pull request from it, is given below.

```command
$ cd $REPO_TOP
$ # Ensure a clean working directory
$ git stash
$ # Create a new branch for the pull request
$ git checkout -b update-ibex-code upstream/master
$ # Update lowrisc_ibex and create a commit
$ ./util/vendor_hw.py hw/vendor/lowrisc_ibex.vendor.hjson --verbose --commit
$ # Push the new branch to your fork
$ git push origin update-ibex-code
$ # Restore changes in working directory (if anything was stashed before)
$ git stash pop
```

Now go to the GitHub web interface to open a Pull Request for the `update-ibex-code` branch.

## How to modify vendored code (fix a bug, improve it)

### Step 1: Get the vendored repository

1. Open the vendor description file (`.vendor.hjson`) of the dependency you want to update and take note of the `url` and the `branch` in the `upstream` section.

2. Clone the upstream repository and switch to the used branch:

   ```command
   $ # Go to your source directory (can be anywhere)
   $ cd ~/src
   $ # Clone the repository and switch the branch. Below is an example for ibex.
   $ git clone https://github.com/lowRISC/ibex.git
   $ cd ibex
   $ git checkout master
   ```

After this step you're ready to make your modifications.
You can do so *either* directly in the upstream repository, *or* start in the OpenTitan repository.

### Step 2a: Make modifications in the upstream repository

The easiest option is to modify the upstream repository directly as usual.

### Step 2b: Make modifications in the OpenTitan repository

Most changes to external code are motivated by our own needs.
Modifying the external code directly in the `hw/vendor` directory is therefore a sensible starting point.

1. Make your changes in the OpenTitan repository. Do not commit them.

2. Create a patch with your changes. The example below uses `lowrisc_ibex`.

   ```command
   $ cd hw/vendor/lowrisc_ibex
   $ git diff --relative . > changes.patch
   ```

3. Take note of the revision of the imported repository from the lock file.
   ```command
   $ cat hw/vendor/lowrisc_ibex.lock.hjson | grep rev
    rev: 7728b7b6f2318fb4078945570a55af31ee77537a
   ```

4. Switch to the checked out upstream repository and bring it into the same state as the imported repository.
   Again, the example below uses ibex, adjust as needed.

   ```command
   # Change to the upstream repository
   $ cd ~/src/ibex

   $ # Create a new branch for your patch
   $ # Use the revision you determined in the previous step!
   $ git checkout -b modify-ibex-somehow 7728b7b6f2318fb4078945570a55af31ee77537a
   $ git apply -p1 < $REPO_BASE/hw/vendor/lowrisc_ibex/changes.patch

   $ # Add and commit your changes as usual
   $ # You can create multiple commits with git add -p and committing
   $ # multiple times.
   $ git add -u
   $ git commit
   ```

### Step 3: Get your changes accepted upstream

You have now created a commit in the upstream repository.
Before submitting your changes upstream, rebase them on top of the upstream development branch, typically `master`, and ensure that all tests pass.
Now you need to follow the upstream guidelines on how to get the change accepted.
In many cases their workflow is similar to ours: push your changes to a repository fork on your namespace, create a pull request, work through review comments, and update it until the change is accepted and merged.

### Step 4: Update the vendored copy of the external dependency

After your change is accepted upstream, you can update our copy of the code using the `vendor_hw` tool as described before.

## How to vendor new code

Vendoring external code is done by creating a vendor description file, and then running the `vendor_hw` tool.

1. Create a vendor description file for the new dependency.
   1. Make note of the Git repository and the branch you want to vendor in.
   2. Choose a name for the external dependency.
      It is recommended to use the format `<vendor>_<name>`.
      Typically `<vendor>` is the lower-cased user or organization name on GitHub, and `<name>` is the lower-cased project name.
   3. Choose a target directory.
      It is recommended use the dependency name as directory name.
   4. Create the vendor description file in `hw/vendor/<vendor>_<name>.vendor.hjson` with the following contents (adjust as needed):

      ```
      // Copyright lowRISC contributors.
      // Licensed under the Apache License, Version 2.0, see LICENSE for details.
      // SPDX-License-Identifier: Apache-2.0
      {
        name: "lowrisc_ibex",
        target_dir: "lowrisc_ibex",

        upstream: {
          url: "https://github.com/lowRISC/ibex.git",
          rev: "master",
        },
      }
      ```

2. Create a new branch for a subsequent pull request

   ```command
   $ git checkout -b vendor-something upstream/master
   ```

3. Commit the vendor description file

   ```command
   $ git add hw/vendor/<vendor>_<name>.vendor.hjson
   $ git commit
   ```

4. Run the `vendor_hw` tool for the newly vendored code.

   ```command
   $ cd $REPO_TOP
   $ ./util/vendor_hw.py hw/vendor/lowrisc_ibex.vendor.hjson --verbose --commit
   ```

5. Push the branch to your fork for review (assuming `origin` is the remote name of your fork).

   ```command
   $ git push -u origin vendor-something
   ```

   Now go the GitHub web interface to create a Pull Request for the newly created branch.

## How to exclude some files from the upstream repository

You can exclude files from the upstream code base by listing them in the vendor description file under `exclude_from_upstream`.
Glob-style wildcards are supported (`*`, `?`, etc.), as known from shells.

Example:

```
// section of a .vendor.hjson file
exclude_from_upstream: [
  // exclude all *.h files in the src directory
  "src/*.h*",
  // exclude the src_files.yml file
  "src_files.yml",
  // exclude some_directory and all files below it
  "some_directory",
]
```

## How to add patches on top of the imported code

In some cases the upstream code must be modified before it can be used.
For this purpose, the `vendor_hw` tool can apply patches on top of imported code.
The patches are kept as separate files in our repository, making it easy to understand the differences to the upstream code, and to switch the upstream code to a newer version.

To apply patches on top of vendored code, do the following:

1. Extend the `.vendor.hjson` file of the dependency and add a `patch_dir` line pointing to a directory of patch files.
   It is recommended to place patches into the `patches/<vendor>_<name>` directory.

    ```
    patch_dir: "patches/lowrisc_ibex",
    ```

2. Place patch files with a `.patch` suffix in the `patch_dir`.

3. When running `vendor_hw`, patches are applied on top of the imported code according to the following rules.

   - Patches are applied alphabetical order according to the filename.
     Name patches like `0001-do-someting.patch` to apply them in a deterministic order.
   - Patches are applied relative to the base directory of the imported code.
   - The first directory component of the filename in a patch is stripped, i.e. they are applied with the `-p1` argument of `patch`.
   - Patches are applied with `git apply`, making all extended features of Git patches available (e.g. renames).

## How to manage patches in a Git repository

Managing patch series on top of code can be challenging.
As the underlying code changes, the patches need to be refreshed to continue to apply.
Adding new patches is a very manual process.
And so on.

Fortunately, Git can be used to simplify this task.
The idea:

- Create a forked Git repository of the upstream code
- Create a new branch in this fork.
- Commit all your changes on top of the upstream code into this branch.
- Convert all commits into patch files and store them where the `vendor_hw` tool can find and apply them.

The last step is automated by the `vendor_hw` tool through its `--refresh-patches` argument.

1. Modify the vendor description file to add a `patch_repo` section.
   - The `url` parameter specifies the URL to the fork of the upstream repository containing all modifications.
   - The `rev_base` is the base revision, typically the `master` branch.
   - The `rev_patched` is the patched revision, typically the name of the branch with your changes.

    ```
    patch_repo: {
      url: "https://github.com/lowRISC/riscv-dbg.git",
      rev_base: "master",
      rev_patched: "changes",
    },
    ```

2. Create commit and push to the forked repository.
   Make sure to push both branches to the fork: `rev_base` **and** `rev_patched`.
   In the example above, this would be (with `REMOTE_NAME_FORK` being the remote name of the fork):

   ```command
   git push REMOTE_NAME_FORK master changes
   ```

3. Run the `vendor_hw` tool with the `--refresh-patches` argument.
   It will first check out the patch repository and convert all commits which are in the `rev_patched` branch and not in the `rev_base` branch into patch files.
   These patch files are then stored in the patch directory.
   After that, the vendoring process continues as usual: all patches are applied and if instructed by the `--commit` flag, a commit is created.
   This commit now also includes the updated patch files.

To update the patches you can use all the usual Git tools in the forked repository.

- Use `git rebase` to refresh them on top of changes in the upstream repository.
- Add new patches with commits to the `rev_patched` fork.
- Remove patches or reorder them with Git interactive rebase (`git rebase -i`).

It is important to update and push *both* branches in the forked repository: the `rev_base` branch and the `rev_patched` branch.
Use `git log rev_base..rev_patched` (replace `rev_base` and `rev_patched` as needed) to show all commits which will be turned into patches.
