[util] vendor.py now re-clones by default

This commit adds the `--update` command line flag to `util/vendor.py`.

Before this change, the vendor tool would always download the revision
specificed in the `.vendor.hjson` file, and if that pointed to a branch,
the repository would be updated with all the upstream changes.

After this change, users have to explicitly pass `--update` to get this
behaviour.

Now, by default, the vendor tool will re-clone the upstream repository
as specified in the `.lock.hjson` file, and apply any new patches or
file exclusions to what should be the existing version of the
repository. This means you can easily re-vendor existing versions of
repositories without having to integrate upstream changes.

Signed-off-by: Sam Elliott <selliott@lowrisc.org>
diff --git a/doc/rm/vendor_in_tool.md b/doc/rm/vendor_in_tool.md
index d33f0fb..9eec02a 100644
--- a/doc/rm/vendor_in_tool.md
+++ b/doc/rm/vendor_in_tool.md
@@ -20,6 +20,7 @@
 
 optional arguments:
   -h, --help         show this help message and exit
+  --update, -U       Update locked version of repository with upstream changes
   --refresh-patches  Refresh the patches from the patch repository
   --commit, -c       Commit the changes
   --verbose, -v      Verbose
@@ -100,10 +101,31 @@
 }
 ```
 
+## Updating and The Vendor Lock File
+
+In order to document which version of a repositoy has been cloned and committed to the repository with the vendor tool, a vendor lock file is stored in `vendor/<vendor>_<name>.lock.hjson`.
+This contains only the upstream information, including the URL and the exact git revision that was cloned.
+
+Beyond just documentation, this enables users to re-clone the previously-cloned upstream repository -- including re-applying patches, choosing subdirectories, and excluding additional files -- without having to integrate any upstream changes.
+Indeed the default behaviour of the vendor tool is to use the upstream information from `<vendor>_<name>.lock.hjson` if this file exists.
+
+Once the lock file exists, the vendor tool will only use the upstream information in `<vendor>_<name>.vendor.json` if the `--update` command-line option is used.
+
 ## Examples
 
-### Update code and commit the new code
+### Re-clone code and apply new file exclusions or patches
+
 ```command
 $ cd $REPO_TOP
-$ ./util/vendor.py hw/vendor/google_riscv-dv.vendor.hjson -v --commit
+$ ./util/vendor.py hw/vendor/google_riscv-dv.vendor.hjson -v
+```
+
+### Update code and commit the new code
+
+This will generate a commit message based off the git shortlog between the
+previously cloned revision and the newly cloned revision of the repository.
+
+```command
+$ cd $REPO_TOP
+$ ./util/vendor.py hw/vendor/google_riscv-dv.vendor.hjson -v --update --commit
 ```
diff --git a/doc/ug/vendor_hw.md b/doc/ug/vendor_hw.md
index fe95dda..9f3c1f0 100644
--- a/doc/ug/vendor_hw.md
+++ b/doc/ug/vendor_hw.md
@@ -90,7 +90,7 @@
 
 ```command
 $ cd $REPO_TOP
-$ ./util/vendor.py hw/vendor/lowrisc_ibex.vendor.hjson --verbose
+$ ./util/vendor.py hw/vendor/lowrisc_ibex.vendor.hjson --verbose --update
 INFO: Cloning upstream repository https://github.com/lowRISC/ibex.git @ master
 INFO: Cloned at revision 7728b7b6f2318fb4078945570a55af31ee77537a
 INFO: Copying upstream sources to /home/philipp/src/opentitan/hw/vendor/lowrisc_ibex
@@ -118,6 +118,7 @@
 ```
 
 The lock file should be committed together with the code itself to make the import step reproducible at any time.
+This import step can be reproduced by running the `util/vendor` tool without the `--update` flag.
 
 After running `util/vendor`, the code in your local working copy is updated to the latest upstream version.
 Next is testing: run simulations, syntheses, or other tests to ensure that the new code works as expected.
@@ -140,7 +141,8 @@
 
 ```command
 $ cd $REPO_TOP
-$ ./util/vendor.py hw/vendor/lowrisc_ibex.vendor.hjson --verbose --commit
+$ ./util/vendor.py hw/vendor/lowrisc_ibex.vendor.hjson \
+    --verbose --update --commit
 ```
 
 This command updates the "lowrisc_ibex" code, and creates a Git commit from it.
@@ -158,7 +160,8 @@
 $ # Create a new branch for the pull request
 $ git checkout -b update-ibex-code upstream/master
 $ # Update lowrisc_ibex and create a commit
-$ ./util/vendor.py hw/vendor/lowrisc_ibex.vendor.hjson --verbose --commit
+$ ./util/vendor.py hw/vendor/lowrisc_ibex.vendor.hjson \
+    --verbose --update --commit
 $ # Push the new branch to your fork
 $ git push origin update-ibex-code
 $ # Restore changes in working directory (if anything was stashed before)
@@ -316,6 +319,9 @@
 ]
 ```
 
+If you want to add more files to `exclude_from_upstream`, just update this section of the `.vendor.hjson` file and re-run the vendor tool without `--update`.
+The repository will be re-cloned without pulling in upstream updates, and the file exclusions and patches specified in the vendor file will be applied.
+
 ## How to add patches on top of the imported code
 
 In some cases the upstream code must be modified before it can be used.
@@ -341,6 +347,8 @@
    - The first directory component of the filename in a patch is stripped, i.e. they are applied with the `-p1` argument of `patch`.
    - Patches are applied with `git apply`, making all extended features of Git patches available (e.g. renames).
 
+If you want to add more patches and re-apply them without updating the upstream repository, add them to the patches directory and re-run the vendor tool without `--update`.
+
 ## How to manage patches in a Git repository
 
 Managing patch series on top of code can be challenging.
@@ -382,7 +390,7 @@
 3. Run the `util/vendor` tool with the `--refresh-patches` argument.
    It will first check out the patch repository and convert all commits which are in the `rev_patched` branch and not in the `rev_base` branch into patch files.
    These patch files are then stored in the patch directory.
-   After that, the vendoring process continues as usual: all patches are applied and if instructed by the `--commit` flag, a commit is created.
+   After that, the vendoring process continues as usual: changes from the upstream repository are downloaded if `--update` passed, all patches are applied, and if instructed by the `--commit` flag, a commit is created.
    This commit now also includes the updated patch files.
 
 To update the patches you can use all the usual Git tools in the forked repository.
diff --git a/util/vendor.py b/util/vendor.py
index 9a1721d..2a0bcb4 100755
--- a/util/vendor.py
+++ b/util/vendor.py
@@ -37,7 +37,7 @@
     """Check if the git working directory is clean (no unstaged or staged changes)"""
     cmd = ['git', 'status', '--untracked-files=no', '--porcelain']
     modified_files = subprocess.run(cmd,
-                                    cwd=git_workdir,
+                                    cwd=str(git_workdir),
                                     check=True,
                                     stdout=subprocess.PIPE,
                                     stderr=subprocess.PIPE).stdout.strip()
@@ -153,7 +153,7 @@
     ]
     try:
         proc = subprocess.run(cmd,
-                              cwd=clone_dir,
+                              cwd=str(clone_dir),
                               check=True,
                               stdout=subprocess.PIPE,
                               stderr=subprocess.PIPE,
@@ -199,7 +199,7 @@
         cmd = ['git', 'format-patch', '-o', str(target_patch_dir), rev_range]
         if not verbose:
             cmd += ['-q']
-        subprocess.run(cmd, cwd=clone_dir, check=True)
+        subprocess.run(cmd, cwd=str(clone_dir), check=True)
 
     finally:
         shutil.rmtree(str(clone_dir), ignore_errors=True)
@@ -215,18 +215,24 @@
 
 
 def apply_patch(basedir, patchfile, strip_level=1):
-    cmd = ['git', 'apply', '-p' + str(strip_level), patchfile]
+    cmd = ['git', 'apply', '-p' + str(strip_level), str(patchfile)]
     if verbose:
         cmd += ['--verbose']
-    subprocess.run(cmd, cwd=basedir, check=True)
+    subprocess.run(cmd, cwd=str(basedir), check=True)
 
 
 def clone_git_repo(repo_url, clone_dir, rev='master'):
     log.info('Cloning upstream repository %s @ %s', repo_url, rev)
 
-    cmd = [
-        'git', 'clone', '--no-single-branch', '-b', rev, repo_url, clone_dir
-    ]
+    # Clone the whole repository
+    cmd = ['git', 'clone', '--no-single-branch']
+    if not verbose:
+        cmd += ['-q']
+    cmd += [repo_url, str(clone_dir)]
+    subprocess.run(cmd, check=True)
+
+    # Check out exactly the revision requested
+    cmd = ['git', '-C', str(clone_dir), 'reset', '--hard', rev]
     if not verbose:
         cmd += ['-q']
     subprocess.run(cmd, check=True)
@@ -297,6 +303,12 @@
 
 def main(argv):
     parser = argparse.ArgumentParser(prog="vendor", description=DESC)
+    parser.add_argument(
+        '--update',
+        '-U',
+        dest='update',
+        action='store_true',
+        help='Update locked version of repository with upstream changes')
     parser.add_argument('--refresh-patches',
                         action='store_true',
                         help='Refresh the patches from the patch repository')
@@ -340,71 +352,86 @@
         raise SystemExit(sys.exc_info()[1])
     desc['_base_dir'] = vendor_file_base_dir
 
+
+    desc_file_stem = desc_file_path.name.rsplit('.', 2)[0]
+    lock_file_path = desc_file_path.with_name(desc_file_stem + '.lock.hjson')
+
+    # Importing may use lock file upstream, information, so make a copy now
+    # which we can overwrite with the upstream information from the lock file.
+    import_desc = desc.copy()
+
     # Load lock file contents (if possible)
-    desc_file_path_str = str(desc_file_path)
-    lock_file_path = Path(
-        desc_file_path_str[:desc_file_path_str.find('.vendor.hjson')] +
-        '.lock.hjson')
     try:
-        with open(lock_file_path, 'r') as f:
+        with open(str(lock_file_path), 'r') as f:
             lock = hjson.loads(f.read(), use_decimal=True)
+
+        # Use lock file information for import
+        if not args.update:
+            import_desc['upstream'] = lock['upstream'].copy()
     except FileNotFoundError:
-        log.warning(
-            "Unable to read lock file %s. Assuming this is the first import.",
-            lock_file_path)
         lock = None
+        if not args.update:
+            log.warning("Updating upstream repo as lock file %s not found.",
+                        str(lock_file_path))
+            args.update = True
 
     if args.refresh_patches:
-        refresh_patches(desc)
+        refresh_patches(import_desc)
 
     clone_dir = Path(tempfile.mkdtemp())
     try:
         # clone upstream repository
-        upstream_new_rev = clone_git_repo(desc['upstream']['url'], clone_dir,
-                                          desc['upstream']['rev'])
+        upstream_new_rev = clone_git_repo(import_desc['upstream']['url'],
+                                          clone_dir,
+                                          rev=import_desc['upstream']['rev'])
+
+        if not args.update:
+            if upstream_new_rev != lock['upstream']['rev']:
+                log.fatal(
+                    "Revision mismatch. Unable to re-clone locked version of repository."
+                )
+                log.fatal("Attempted revision: %s", import_desc['upstream']['rev'])
+                log.fatal("Re-cloned revision: %s", upstream_new_rev)
+                raise SystemExit(1)
 
         upstream_only_subdir = ''
         clone_subdir = clone_dir
-        if 'only_subdir' in desc['upstream']:
-            upstream_only_subdir = desc['upstream']['only_subdir']
+        if 'only_subdir' in import_desc['upstream']:
+            upstream_only_subdir = import_desc['upstream']['only_subdir']
             clone_subdir = clone_dir / upstream_only_subdir
             if not clone_subdir.is_dir():
-                log.fatal("subdir '%s' does not exist in repo", upstream_only_subdir)
+                log.fatal("subdir '%s' does not exist in repo",
+                          upstream_only_subdir)
                 raise SystemExit(1)
 
-
         # apply patches to upstream sources
-        if 'patch_dir' in desc:
-            patches = path_resolve(desc['patch_dir'],
+        if 'patch_dir' in import_desc:
+            patches = path_resolve(import_desc['patch_dir'],
                                    vendor_file_base_dir).glob('*.patch')
             for patch in sorted(patches):
                 log.info("Applying patch %s" % str(patch))
-                apply_patch(clone_subdir, str(patch))
+                apply_patch(clone_subdir, patch)
 
         # import selected (patched) files from upstream repo
         exclude_files = []
-        if 'exclude_from_upstream' in desc:
-            exclude_files += desc['exclude_from_upstream']
+        if 'exclude_from_upstream' in import_desc:
+            exclude_files += import_desc['exclude_from_upstream']
         exclude_files += EXCLUDE_ALWAYS
 
         import_from_upstream(
-            clone_subdir, path_resolve(desc['target_dir'], vendor_file_base_dir),
-            exclude_files)
+            clone_subdir, path_resolve(import_desc['target_dir'],
+                                       vendor_file_base_dir), exclude_files)
 
         # get shortlog
-        get_shortlog = True
-        if not lock:
+        get_shortlog = bool(args.update)
+        if lock is None:
             get_shortlog = False
-            log.warning(
-                "No lock file exists. Unable to get the log of changes.")
-        elif lock['upstream']['url'] != desc['upstream']['url']:
+            log.warning("No lock file %s: unable to summarize changes.", str(lock_file_path))
+        elif lock['upstream']['url'] != import_desc['upstream']['url']:
             get_shortlog = False
             log.warning(
                 "The repository URL changed since the last run. Unable to get log of changes."
             )
-        elif upstream_new_rev == lock['upstream']['rev']:
-            get_shortlog = False
-            log.warning("Re-importing upstream revision %s", upstream_new_rev)
 
         shortlog = None
         if get_shortlog:
@@ -412,7 +439,7 @@
                                         upstream_new_rev)
 
             # Ensure fully-qualified issue/PR references for GitHub repos
-            gh_repo_info = github_parse_url(desc['upstream']['url'])
+            gh_repo_info = github_parse_url(import_desc['upstream']['url'])
             if gh_repo_info:
                 shortlog = github_qualify_references(shortlog, gh_repo_info[0],
                                                      gh_repo_info[1])
@@ -421,30 +448,31 @@
                      format_list_to_str(shortlog))
 
         # write lock file
-        lock = {}
-        lock['upstream'] = desc['upstream']
-        lock['upstream']['rev'] = upstream_new_rev
-        with open(lock_file_path, 'w', encoding='UTF-8') as f:
-            f.write(LOCK_FILE_HEADER)
-            hjson.dump(lock, f)
-            f.write("\n")
-            log.info("Wrote lock file %s", lock_file_path)
+        if args.update:
+            lock = {}
+            lock['upstream'] = import_desc['upstream'].copy()
+            lock['upstream']['rev'] = upstream_new_rev
+            with open(str(lock_file_path), 'w', encoding='UTF-8') as f:
+                f.write(LOCK_FILE_HEADER)
+                hjson.dump(lock, f)
+                f.write("\n")
+                log.info("Wrote lock file %s", str(lock_file_path))
 
         # Commit changes
         if args.commit:
             sha_short = git_get_short_rev(clone_subdir, upstream_new_rev)
 
-            repo_info = github_parse_url(desc['upstream']['url'])
+            repo_info = github_parse_url(import_desc['upstream']['url'])
             if repo_info is not None:
                 sha_short = "%s/%s@%s" % (repo_info[0], repo_info[1],
                                           sha_short)
 
-            commit_msg_subject = 'Update %s to %s' % (desc['name'], sha_short)
+            commit_msg_subject = 'Update %s to %s' % (import_desc['name'], sha_short)
             subdir_msg = ' '
             if upstream_only_subdir:
                 subdir_msg = ' subdir %s in ' % upstream_only_subdir
             intro = 'Update code from%supstream repository %s to revision %s' % (
-                subdir_msg, desc['upstream']['url'], upstream_new_rev)
+                subdir_msg, import_desc['upstream']['url'], upstream_new_rev)
             commit_msg_body = textwrap.fill(intro, width=70)
 
             if shortlog:
@@ -455,10 +483,10 @@
 
             commit_paths = []
             commit_paths.append(
-                path_resolve(desc['target_dir'], vendor_file_base_dir))
+                path_resolve(import_desc['target_dir'], vendor_file_base_dir))
             if args.refresh_patches:
                 commit_paths.append(
-                    path_resolve(desc['patch_dir'], vendor_file_base_dir))
+                    path_resolve(import_desc['patch_dir'], vendor_file_base_dir))
             commit_paths.append(lock_file_path)
 
             git_add_commit(vendor_file_base_dir, commit_paths, commit_msg)