Fix low-frequency typos in runtime, docs, and build tools. NFC. (5/6) (#23605) Preparation for adding a typos pre-commit spell checker (6/6). Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

diff --git a/build_tools/bazel/iree_check_test.bzl b/build_tools/bazel/iree_check_test.bzl
index 57bebe0..5cb428e 100644
--- a/build_tools/bazel/iree_check_test.bzl
+++ b/build_tools/bazel/iree_check_test.bzl

@@ -34,7 +34,7 @@
       src: source mlir file containing the module.
       target_backend: target backend to compile for.
       driver: driver to run the module with. This can be omitted to test only
-          compilation, but consider omiting the driver as a hacky abuse of the
+          compilation, but consider omitting the driver as a hacky abuse of the
           rule since compilation on its own not use iree-check-module.
       compiler_flags: additional flags to pass to the compiler. Bytecode output
           format and backend flags are passed automatically.
@@ -101,7 +101,7 @@
       srcs: source mlir files containing the module.
       target_backend: target backend to compile for.
       driver: driver to run the module with. This can be omitted to test only
-          compilation, but consider omiting the driver as a hacky abuse of the
+          compilation, but consider omitting the driver as a hacky abuse of the
           rule since compilation on its own not use iree-check-module.
       compiler_flags: additional flags to pass to the compiler. Bytecode output
           format and backend flags are passed automatically.

diff --git a/build_tools/cmake/build_and_test_tsan.sh b/build_tools/cmake/build_and_test_tsan.sh
index 7fed4ff..8211a8f 100755
--- a/build_tools/cmake/build_and_test_tsan.sh
+++ b/build_tools/cmake/build_and_test_tsan.sh

@@ -8,7 +8,7 @@
 # Build and test, using CMake/CTest, with ThreadSanitizer instrumentation.
 #
 # See https://clang.llvm.org/docs/ThreadSanitizer.html. Some tests are run many
-# times to flush out non-determinstic failures.
+# times to flush out non-deterministic failures.
 #
 # The desired build directory can be passed as the first argument. Otherwise, it
 # uses the environment variable IREE_TSAN_BUILD_DIR, defaulting to "build-tsan".

diff --git a/docs/website/docs/community/blog/posts/cuda-backend.md b/docs/website/docs/community/blog/posts/cuda-backend.md
index 4966bd7..7d8eaad 100644
--- a/docs/website/docs/community/blog/posts/cuda-backend.md
+++ b/docs/website/docs/community/blog/posts/cuda-backend.md

@@ -137,7 +137,7 @@
 ## Performance
 
 Now that we have enabled functionality we need to look at the performance. Once
-again we can leverage existing MLIR transformations to speed up the developement
+again we can leverage existing MLIR transformations to speed up the development
 work.
 
 ### Tiling and distribution

diff --git a/docs/website/docs/community/blog/posts/mmt4d.md b/docs/website/docs/community/blog/posts/mmt4d.md
index 5ecda0a..3aa0ee2 100644
--- a/docs/website/docs/community/blog/posts/mmt4d.md
+++ b/docs/website/docs/community/blog/posts/mmt4d.md

@@ -116,7 +116,7 @@
     sufficed, as this could have been done as a pre-processing step on
     O(N<sup>2</sup>) data.
 
-- **Inefficent memory traversal:** For efficiency reasons, we always need
+- **Inefficient memory traversal:** For efficiency reasons, we always need
     `tile_m_v>1` and `tile_n_v>1`. That is because the higher these values, the
     fewer memory-load instructions are needed overall; and this is also dictated
     by the SIMD instructions that we want to use. But that means that the kernel
@@ -275,7 +275,7 @@
 ## Conclusion
 
 We introduced a 4d tiled representation for 2d matrix-matrix multiplication with
-a decomposable algebric transformations that requires only reshape and transpose
+a decomposable algebraic transformations that requires only reshape and transpose
 of input operands, we discussed and empirically showed how that solves major
 drawbacks in row-major linear matmul by providing a flexible way to match
 different ISA layout along with better cache locality achieving near peak

diff --git a/docs/website/docs/developers/building/cmake-options.md b/docs/website/docs/developers/building/cmake-options.md
index 56d90f5..309e66d 100644
--- a/docs/website/docs/developers/building/cmake-options.md
+++ b/docs/website/docs/developers/building/cmake-options.md

@@ -176,7 +176,7 @@
 
 * type: BOOL
 
-Enable [undefiend behavior sanitizer](https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html)
+Enable [undefined behavior sanitizer](https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html)
 if the current build type is Debug and the compiler supports it.
 
 ### `IREE_ENABLE_RUNTIME_COVERAGE`

diff --git a/docs/website/docs/developers/debugging/gpu.md b/docs/website/docs/developers/debugging/gpu.md
index bff9e61..0509c0c 100644
--- a/docs/website/docs/developers/debugging/gpu.md
+++ b/docs/website/docs/developers/debugging/gpu.md

@@ -65,7 +65,7 @@
 when bringing up a new model end-to-end via IREE.
 
 Though most of the time, we can leverage existing facilities to avoid going down
-the full top-down hiearchical debugging procedure.
+the full top-down hierarchical debugging procedure.
 For example, for regression happening on an existing model, CI or `git bitsect`
 might tell us directly the culprit commit.
 
@@ -193,7 +193,7 @@
 ## Pinpointing runtime issues
 
 On the other side, if we suspect that it's a runtime issue, here are some
-useful approachs and tips:
+useful approaches and tips:
 
 !!! tip "[correctness/performance]"
 
@@ -247,12 +247,12 @@
       `--metal_command_buffer_retain_resources=true`, or
       `--metal_resource_hazard_tracking=true` to `iree-run-module` to see
       if any of the above "fixes" the issue.
-      It can help to isolate the pontential problem.
+      It can help to isolate the potential problem.
     * [:simple-vulkan:] Use `--vulkan_robust_buffer_access=true` to `iree-run-module`
       especially when seeing undeterministic/corrupted contents in buffers and
       suspecting there are buffer allocation/indexing issues.
 
-## Binary substiution for ROCm
+## Binary substitution for ROCm
 
 [:simple-amd:] The AMD ROCm target supports binary substitution on HSA code objects
 (`.hsaco` files).
@@ -300,7 +300,7 @@
   --iree-hal-substitute-executable-object=[dispatch_name]=altered.hsaco
 ```
 
-If successful, `iree-complie` will print a message stating
+If successful, `iree-compile` will print a message stating
 
 ``` shell
 NOTE: hal.executable `[executable name]` substituted with object file at`altered.hsaco`

diff --git a/docs/website/docs/developers/debugging/model-development.md b/docs/website/docs/developers/debugging/model-development.md
index b6c0c97..fa2aa9e 100644
--- a/docs/website/docs/developers/debugging/model-development.md
+++ b/docs/website/docs/developers/debugging/model-development.md

@@ -57,7 +57,7 @@
 
 Executable sources can be dumped, edited, and then loaded back into a program
 using `--iree-hal-dump-executable-sources-to` and
-`--iree-hal-substitute-executable-source`. This can be used for performace
+`--iree-hal-substitute-executable-source`. This can be used for performance
 tuning or for debugging (e.g. by replacing a complicated dispatch with a
 simpler one).
 

diff --git a/docs/website/docs/developers/debugging/releases.md b/docs/website/docs/developers/debugging/releases.md
index a851462..37d2e70 100644
--- a/docs/website/docs/developers/debugging/releases.md
+++ b/docs/website/docs/developers/debugging/releases.md

@@ -17,7 +17,7 @@
 
 ## Mapping releases back to git commits
 
-The source IREE commit SHA is embeded into pip releases in a few places.
+The source IREE commit SHA is embedded into pip releases in a few places.
 Starting in a python venv, you can find the IREE commit from both the shell:
 
 ```shell

diff --git a/docs/website/docs/developers/design-docs/cuda-hal-driver.md b/docs/website/docs/developers/design-docs/cuda-hal-driver.md
index ec17254..23f13b0 100644
--- a/docs/website/docs/developers/design-docs/cuda-hal-driver.md
+++ b/docs/website/docs/developers/design-docs/cuda-hal-driver.md

@@ -8,7 +8,7 @@
 
 # CUDA HAL driver
 
-This document lists technical details regarding the CUDA implemenation of
+This document lists technical details regarding the CUDA implementation of
 IREE's Hardware Abstraction Layer, called a CUDA HAL driver.
 
 IREE provides a [Hardware Abstraction Layer (HAL)][iree-hal] as a common
@@ -46,7 +46,7 @@
 `iree_hal_cuda_device_t` implements [`iree_hal_device_t`][hal-device] to provide
 the interface to CUDA GPU device by wrapping a [`CUdevice`][cu-device].
 For each device, right now we create two `CUstream`s--one for issuing commands
-for memory allocation and kernel lauches as instructed by the program; the other
+for memory allocation and kernel launches as instructed by the program; the other
 for issue host callback functions after dispatched command buffers completes.
 See [synchronization](#synchronization) section regarding the details.
 
@@ -137,7 +137,7 @@
 needed by the HAL semaphore abstraction:
 
 * [Stream memory operations][cu-mem-ops] provides `cuStreamWriteValue64()` and
-  `cuStreamWaitValue64()`, which can implment HAL semaphore 64-bit integer value
+  `cuStreamWaitValue64()`, which can implement HAL semaphore 64-bit integer value
   signal and wait. Though these operations require device pointers and cannot
   accepts pointers to managed memory buffers, meaning no support for the host.
   Additionally, per the spec, "synchronization ordering established through
@@ -148,7 +148,7 @@
 * For [external resource interoperability][cu-external-resource], we have APIs
   like `cuSignalExternalSemaphoresAsync()` and `cuWaitExternalSemaphoresAsync()`,
   which can directly map to Vulkan timeline semaphores. Though these APIs are
-  meant to handle exernal resources--there is no way to create
+  meant to handle external resources--there is no way to create
   `CUexternalSemaphore` objects directly other than `cuImportExternalSemaphore()`.
 
 Therefore, to implement the support, we need to leverage multiple native CPU or
@@ -215,7 +215,7 @@
 "the host function must not make any CUDA API calls." So we cannot do that
 directly inside `cuLaunchHostFunc()`; we need to notify another separate
 thread to call CUDA APIs to push more work to the GPU. So the deferred/pending
-action queue should have an associcated thread.
+action queue should have an associated thread.
 
 For GPU waits, we can also leverage the same logic--using CPU signaling to
 unblock deferred GPU queue actions. Though this is performant, given that

diff --git a/docs/website/docs/developers/general/contributing.md b/docs/website/docs/developers/general/contributing.md
index f28a860..1a8f69f 100644
--- a/docs/website/docs/developers/general/contributing.md
+++ b/docs/website/docs/developers/general/contributing.md

@@ -415,7 +415,7 @@
 ??? info - "Using `skip-ci`"
 
     `skip-ci` skips all jobs. It is mutually exclusive with the other `ci-*`
-    options and is synonomous with `ci-skip: all`.
+    options and is synonymous with `ci-skip: all`.
 
     ``` text
     skip-ci: free form reason

diff --git a/docs/website/docs/developers/general/developer-tips.md b/docs/website/docs/developers/general/developer-tips.md
index 205cc7a..e8990b9 100644
--- a/docs/website/docs/developers/general/developer-tips.md
+++ b/docs/website/docs/developers/general/developer-tips.md

@@ -516,7 +516,7 @@
 pipeline phases.
 
 Compilation can be continued from any intermediate phase. This allows for
-interative workflows - compile to a phase, make edits to the `.mlir` file,
+interactive workflows - compile to a phase, make edits to the `.mlir` file,
 then resume compilation and continue through the pipeline:
 
 ```console

diff --git a/docs/website/docs/developers/general/github-actions.md b/docs/website/docs/developers/general/github-actions.md
index 3631b77..d43a69e 100644
--- a/docs/website/docs/developers/general/github-actions.md
+++ b/docs/website/docs/developers/general/github-actions.md

@@ -9,7 +9,7 @@
 We use [GitHub Actions](https://docs.github.com/en/actions) for continuous
 automation (CI) and continuous delivery (CD) workflows:
 
-* Code formating and linting.
+* Code formatting and linting.
 * Building from source and running tests.
 * Building packages for testing and releases.
 * Testing packages across a variety of platforms.

diff --git a/docs/website/docs/developers/general/testing-guide.md b/docs/website/docs/developers/general/testing-guide.md
index 0e43f5d..d9d9562 100644
--- a/docs/website/docs/developers/general/testing-guide.md
+++ b/docs/website/docs/developers/general/testing-guide.md

@@ -399,7 +399,7 @@
 our `CMakeLists.txt` file by
 [bazel_to_cmake](https://github.com/iree-org/iree/tree/main/build_tools/bazel_to_cmake/bazel_to_cmake.py).
 
-There are other test targets that generate tests based on template configuraton
+There are other test targets that generate tests based on template configuration
 and platform detection, such as `iree_static_linker_test`. Those targets are
 not supported by Bazel rules at this point.
 

diff --git a/docs/website/docs/developers/performance/profiling-with-tracy.md b/docs/website/docs/developers/performance/profiling-with-tracy.md
index ff35282..c6e2eee 100644
--- a/docs/website/docs/developers/performance/profiling-with-tracy.md
+++ b/docs/website/docs/developers/performance/profiling-with-tracy.md

@@ -42,7 +42,7 @@
     needs:
 
     * Debug information from `-DCMAKE_BUILD_TYPE=RelWithDebInfo` or `Debug`
-    * Privilege elevation from `sudo` on Unix or adminstrator on Windows
+    * Privilege elevation from `sudo` on Unix or administrator on Windows
 
 ### :material-connection: Remote or embedded telemetry
 
@@ -397,7 +397,7 @@
 
     Tracy keeps a number of file descriptors open that, depending on the
     machine and its settings, may exceed the limit allowed by the system
-    resulting in IREE failing to open more files. In particular, it is commom
+    resulting in IREE failing to open more files. In particular, it is common
     to have a relatively low limit when running with `sudo`.
 
 ---

diff --git a/docs/website/docs/developers/vulkan-environment-setup.md b/docs/website/docs/developers/vulkan-environment-setup.md
index 1fa757a..4082fa1 100644
--- a/docs/website/docs/developers/vulkan-environment-setup.md
+++ b/docs/website/docs/developers/vulkan-environment-setup.md

@@ -30,13 +30,13 @@
 ![High Level View of Loader][VulkanArchPicture]
 
 The Vulkan loader sits between the Vulkan application, which calls Vulkan APIs,
-and the ICDs, which implements these Vulkan APIs. Vulkan layers agument the
+and the ICDs, which implements these Vulkan APIs. Vulkan layers argument the
 Vulkan system to provide optional features like validation and debugging. The
 Vulkan loader composes a chain of requested layers, which processes the Vulkan
 application's API calls one by one, and finally redirects the API calls made by
 the Vulkan application to one or more ICDs.
 
-It's highly recommned to read the
+It's highly recommend to read the
 [Architecture of the Vulkan Loader Interfaces Overview][VulkanArchOverview] to
 get a general understanding of what these components are and how they interact
 with one another.

diff --git a/docs/website/docs/guides/deployment-configurations/gpu-cuda.md b/docs/website/docs/guides/deployment-configurations/gpu-cuda.md
index 4f73dbd..cc23b88 100644
--- a/docs/website/docs/guides/deployment-configurations/gpu-cuda.md
+++ b/docs/website/docs/guides/deployment-configurations/gpu-cuda.md

@@ -117,7 +117,7 @@
 | NVIDIA RTX40 series | `sm_89`             | `ada`
 
 In addition to the canonical `sm_<arch_number>` scheme, `iree-cuda-target`
-also supports two additonal schemes to make a better developer experience:
+also supports two additional schemes to make a better developer experience:
 
 * Architecture code names like `volta` or `ampere`
 * GPU product names like `a100` or `rtx3090`

diff --git a/docs/website/docs/guides/deployment-configurations/gpu-vulkan.md b/docs/website/docs/guides/deployment-configurations/gpu-vulkan.md
index 5b530c0..061e3ef 100644
--- a/docs/website/docs/guides/deployment-configurations/gpu-vulkan.md
+++ b/docs/website/docs/guides/deployment-configurations/gpu-vulkan.md

@@ -208,7 +208,7 @@
     the allowed variances on extensions, properties, limits, etc. So the target
     triple is just an approximation for usage. This is more of a mechanism to
     help us develop IREE itself. In the long term we want to perform
-    multi-targetting to generate code for multiple architectures if no explicit
+    multi-targeting to generate code for multiple architectures if no explicit
     target is given.
 
 ### :octicons-terminal-16: Run a compiled program

diff --git a/docs/website/docs/guides/index.md b/docs/website/docs/guides/index.md
index 00e6c5d..fa59c75 100644
--- a/docs/website/docs/guides/index.md
+++ b/docs/website/docs/guides/index.md

@@ -18,7 +18,7 @@
 
 !!! info ""
 
-    Start here: [Deplyment configurations overview](./deployment-configurations/index.md)
+    Start here: [Deployment configurations overview](./deployment-configurations/index.md)
 
 Guides for specific configurations:
 

diff --git a/docs/website/docs/guides/ml-frameworks/index.md b/docs/website/docs/guides/ml-frameworks/index.md
index 9177f5e..8235643 100644
--- a/docs/website/docs/guides/ml-frameworks/index.md
+++ b/docs/website/docs/guides/ml-frameworks/index.md

@@ -57,7 +57,7 @@
 4. Legalize the graph's operations so only IREE-compatible operations remain
 5. Write the imported MLIR to a file
 
-This fully imported form can then be compiled indepedently of the source
+This fully imported form can then be compiled independently of the source
 language and framework.
 
 ## :octicons-gear-16: Compilation

diff --git a/docs/website/docs/index.md b/docs/website/docs/index.md
index 337aa6f..b3cb1a3 100644
--- a/docs/website/docs/index.md
+++ b/docs/website/docs/index.md

@@ -5,7 +5,7 @@
 
 # IREE
 
-IREE (**I**ntermediate **R**epresentation **E**xecution **E**nvironment[^1]) is
+IREE (**I**ntermediate **R**epresentation **E**xecution **E**environment[^1]) is
 an [MLIR](https://mlir.llvm.org/)-based end-to-end compiler and runtime that
 lowers Machine Learning (ML) models to a unified IR that scales up to meet the
 needs of the datacenter and down to satisfy the constraints and special
@@ -147,7 +147,7 @@
 
 IREE provides a flexible set of tools for various
 [deployment scenarios](./guides/deployment-configurations/index.md). Fully
-featured environments can use IREE for dynamic model deployments taking
+featured eenvironments can use IREE for dynamic model deployments taking
 advantage of multi-threaded hardware, while embedded systems can bypass IREE's
 runtime entirely or interface with custom accelerators.
 

diff --git a/docs/website/docs/reference/optimization-options.md b/docs/website/docs/reference/optimization-options.md
index 527da28..97068eb 100644
--- a/docs/website/docs/reference/optimization-options.md
+++ b/docs/website/docs/reference/optimization-options.md

@@ -81,7 +81,7 @@
 
 - `iree-opt-outer-dim-concat` (enabled at `O1`)
 
-    Transpose concat operations to ocurr along the outermost dimension. The
+    Transpose concat operations to occur along the outermost dimension. The
     resulting concat will now be contiguous and the inserted transposes can
     possibly be fused with surrounding ops.
 

diff --git a/docs/website/docs/reference/tuning.md b/docs/website/docs/reference/tuning.md
index 921ba69..6294cf0 100644
--- a/docs/website/docs/reference/tuning.md
+++ b/docs/website/docs/reference/tuning.md

@@ -30,7 +30,7 @@
     accDescr {
      A generic tuning workflow consists of compiling a model, benchmarking the
      performance with current choice of parameters, than changing the
-     parameters before begining the next iteration of this loop.
+     parameters before beginning the next iteration of this loop.
     }
     A[Compile]-->B;
     B[Benchmark]-->C;

diff --git a/experimental/README.md b/experimental/README.md
index e5623c7..1c38338 100644
--- a/experimental/README.md
+++ b/experimental/README.md

@@ -1,7 +1,7 @@
 This folder contains experimental subprojects related to IREE and MLIR. These
 are not yet stable and supported and may not always be working. We may keep the
 build bots green for certain configurations but would prefer not to take on too
-much maintence overhead for things unless they are on a path to leaving
+much maintenance overhead for things unless they are on a path to leaving
 experimental. Please use forks of the repository for purely
 experimental/personal work.
 

diff --git a/experimental/hal_executable_library_call_hooks/perf_event_linux.cc b/experimental/hal_executable_library_call_hooks/perf_event_linux.cc
index 8f3210a..b6d1ab3 100644
--- a/experimental/hal_executable_library_call_hooks/perf_event_linux.cc
+++ b/experimental/hal_executable_library_call_hooks/perf_event_linux.cc

@@ -432,7 +432,7 @@
       {"ls_inef_sw_pref.data_pipe_sw_pf_dc_hit", PERF_TYPE_RAW, 0x152, "AMD",
        "Software prefetches that did not fetch data outside of the processor "
        "core as the PREFETCH instruction saw a data cache hit."},
-      {"ls_inef_sw_pref.mab_mch_cnt", PERF_TYPE_RAW, 0x252, "AMD",
+      {"ls_inef_sw_pref.mab_much_cnt", PERF_TYPE_RAW, 0x252, "AMD",
        "Software prefetches that did not fetch data outside of the processor "
        "core as the PREFETCH instruction saw a match on an already allocated "
        "Miss Address Buffer (MAB)."},

diff --git a/integrations/pjrt/src/iree_pjrt/common/api_impl.cc b/integrations/pjrt/src/iree_pjrt/common/api_impl.cc
index afc02b8..ed76c8a 100644
--- a/integrations/pjrt/src/iree_pjrt/common/api_impl.cc
+++ b/integrations/pjrt/src/iree_pjrt/common/api_impl.cc

@@ -554,7 +554,7 @@
     auto* copy_data = static_cast<CopyToHostData*>(user_data);
 
     if (!error) {
-      // If there is an allocated buffer we need to copy to the destinaton.
+      // If there is an allocated buffer we need to copy to the destination.
       if (copy_data->alloc) {
         std::memcpy(copy_data->dst, copy_data->aligned, copy_data->size);
       }

diff --git a/integrations/pjrt/src/iree_pjrt/common/api_impl.h b/integrations/pjrt/src/iree_pjrt/common/api_impl.h
index bf60d17..95d8f4d 100644
--- a/integrations/pjrt/src/iree_pjrt/common/api_impl.h
+++ b/integrations/pjrt/src/iree_pjrt/common/api_impl.h

@@ -346,7 +346,7 @@
 // data until loaded onto a device context. We call this a ResidentExecutable
 // to avoid name collisions.
 //
-// Correspondance:
+// Correspondence:
 //   PJRT_Executable -> ExecutableImage
 //   PJRT_LoadedExecutable -> LoadedExecutableInstance
 //   <None> -> ResidentExecutable

diff --git a/integrations/pjrt/src/iree_pjrt/common/command_line_utils.cc b/integrations/pjrt/src/iree_pjrt/common/command_line_utils.cc
index 31f6af0..a18414b 100644
--- a/integrations/pjrt/src/iree_pjrt/common/command_line_utils.cc
+++ b/integrations/pjrt/src/iree_pjrt/common/command_line_utils.cc

@@ -10,7 +10,7 @@
 namespace pjrt {
 
 // TODO: currently this function doesn't handle escape sequences,
-// it just ensure that single/double quotes are interpreted corrently.
+// it just ensure that single/double quotes are interpreted correctly.
 std::optional<std::vector<std::string>> ParseOptionsFromCommandLine(
     std::string_view options_str) {
   std::vector<std::string> options;

diff --git a/llvm-external-projects/iree-dialects/include/iree-dialects/Transforms/TransformMatchers.h b/llvm-external-projects/iree-dialects/include/iree-dialects/Transforms/TransformMatchers.h
index 35ec6d3..8f0f202 100644
--- a/llvm-external-projects/iree-dialects/include/iree-dialects/Transforms/TransformMatchers.h
+++ b/llvm-external-projects/iree-dialects/include/iree-dialects/Transforms/TransformMatchers.h

@@ -980,7 +980,7 @@
   }
 
 private:
-  /// The flat list of all payload opreations. `payloadGroupLengths` can be used
+  /// The flat list of all payload operations. `payloadGroupLengths` can be used
   /// to compute the sublist that corresponds to one nested list.
   // TODO: if somebody implements such a flattened vector generically, use it.
   SmallVector<Operation *> payloadOperations;

diff --git a/runtime/bindings/python/hal.cc b/runtime/bindings/python/hal.cc
index 17e4492..5582d06 100644
--- a/runtime/bindings/python/hal.cc
+++ b/runtime/bindings/python/hal.cc

@@ -1999,7 +1999,8 @@
               if (resolved_length !=
                   iree_hal_buffer_byte_length(target_buffer.raw_ptr())) {
                 throw std::invalid_argument(
-                    "If length is not provided, source and target bufer length "
+                    "If length is not provided, source and target buffer "
+                    "length "
                     "must match and it does not. Provide explicit length=");
               }
             }

diff --git a/runtime/bindings/python/iree/runtime/array_interop.py b/runtime/bindings/python/iree/runtime/array_interop.py
index 85a8753..c160acf 100644
--- a/runtime/bindings/python/iree/runtime/array_interop.py
+++ b/runtime/bindings/python/iree/runtime/array_interop.py

@@ -108,7 +108,7 @@
 
     def to_host(self) -> np.ndarray:
         """Return the array as host accessible NumPy ndarray.
-        This may map the memory or create a copy depending on wether the array is
+        This may map the memory or create a copy depending on whether the array is
         mappable to the host."""
         return self._transfer_to_host(False)
 

diff --git a/runtime/bindings/python/iree/runtime/benchmark.py b/runtime/bindings/python/iree/runtime/benchmark.py
index 35d5f27..8b6ef26 100644
--- a/runtime/bindings/python/iree/runtime/benchmark.py
+++ b/runtime/bindings/python/iree/runtime/benchmark.py

@@ -135,7 +135,7 @@
     if "INVALID_ARGUMENT;" in err:
         raise ValueError("Invalid inputs specified for benchmarking")
 
-    # In the event benchmarking runs but encounteres an internal error,
+    # In the event benchmarking runs but encounters an internal error,
     # return the internal error instead of benchmark results.
     if "INTERNAL; CUDA driver error" in out:
         raise BenchmarkToolError(out)

diff --git a/runtime/bindings/python/iree/runtime/system_setup.py b/runtime/bindings/python/iree/runtime/system_setup.py
index 8cd117d..c592e46 100644
--- a/runtime/bindings/python/iree/runtime/system_setup.py
+++ b/runtime/bindings/python/iree/runtime/system_setup.py

@@ -65,7 +65,7 @@
     """Gets the first valid (cached) device for a prioritized list of names.
 
     If no driver_names are given, and an environment variable of
-    IREE_DEFAULT_DEVICE is available, then it is treated as a comma delimitted
+    IREE_DEFAULT_DEVICE is available, then it is treated as a comma delimited
     list of driver names to try.
 
     This is meant to be used for default/automagic startup and is not suitable

diff --git a/runtime/bindings/python/local_dlpack.h b/runtime/bindings/python/local_dlpack.h
index c3ff747..9b92e7a 100644
--- a/runtime/bindings/python/local_dlpack.h
+++ b/runtime/bindings/python/local_dlpack.h

@@ -110,7 +110,7 @@
    */
   kDLCUDAManaged = 13,
   /*!
-   * \brief Unified shared memory allocated on a oneAPI non-partititioned
+   * \brief Unified shared memory allocated on a oneAPI non-partitioned
    * device. Call to oneAPI runtime is required to determine the device
    * type, the USM allocation type and the sycl context it is bound to.
    *
@@ -203,7 +203,7 @@
    * `byte_offset` field should be used to point to the beginning of the data.
    *
    * Note that as of Nov 2021, multiply libraries (CuPy, PyTorch, TensorFlow,
-   * TVM, perhaps others) do not adhere to this 256 byte aligment requirement
+   * TVM, perhaps others) do not adhere to this 256 byte alignment requirement
    * on CPU/CUDA/ROCm, and always use `byte_offset=0`.  This must be fixed
    * (after which this note will be updated); at the moment it is recommended
    * to not rely on the data pointer being correctly aligned.

diff --git a/runtime/bindings/tflite/include/tensorflow/lite/c/common.h b/runtime/bindings/tflite/include/tensorflow/lite/c/common.h
index d739f47..ae425d8 100644
--- a/runtime/bindings/tflite/include/tensorflow/lite/c/common.h
+++ b/runtime/bindings/tflite/include/tensorflow/lite/c/common.h

@@ -827,7 +827,7 @@
   // }
   //
   // NOTE: The context owns the memory referenced by partition_params_array. It
-  // will be cleared with another call to PreviewDelegateParitioning, or after
+  // will be cleared with another call to PreviewDelegatePartitioning, or after
   // TfLiteDelegateParams::Prepare returns.
   //
   // WARNING: This is an experimental interface that is subject to change.

diff --git a/runtime/bindings/tflite/java/README.md b/runtime/bindings/tflite/java/README.md
index 26cfdfa..7df4c06 100644
--- a/runtime/bindings/tflite/java/README.md
+++ b/runtime/bindings/tflite/java/README.md

@@ -5,7 +5,7 @@
 Process for building the AAR library:
 
 1. Start AndroidStudio. Select _Open File or Project_ then choose `runtime/bindings/tflite/java/gragle.build`
-2. AndroidStudio should sync the project and setup gradlew uner `runtime/bindings/tflite/java`
+2. AndroidStudio should sync the project and setup gradlew under `runtime/bindings/tflite/java`
 3. Make the project using AndroidStudio or run the build directly in terminal:
 ```shell
 ./gradlew build

diff --git a/runtime/bindings/tflite/java/org/tensorflow/lite/Interpreter.java b/runtime/bindings/tflite/java/org/tensorflow/lite/Interpreter.java
index 80c0b05..69d09b0 100644
--- a/runtime/bindings/tflite/java/org/tensorflow/lite/Interpreter.java
+++ b/runtime/bindings/tflite/java/org/tensorflow/lite/Interpreter.java

@@ -19,7 +19,7 @@
  * creation and inference for IREE compatible TFLite models.
  *
  * <p>This shim aims to mimic the functionality of Tensorflow Lite's
- * Interpeter.java class, however, there are a few notable features IREE doesn't
+ * Interpreter.java class, however, there are a few notable features IREE doesn't
  * support:
  *
  * <ul>
@@ -325,9 +325,9 @@
   }
 
   /**
-   * Gets the Tensor associated with the provdied input index.
+   * Gets the Tensor associated with the provided input index.
    *
-   * @throws IllegalArgumentException if {@code inputIndex} is negtive or is not smaller than the
+   * @throws IllegalArgumentException if {@code inputIndex} is negative or is not smaller than the
    *     number of model inputs.
    */
   public Tensor getInputTensor(int index) {
@@ -356,7 +356,7 @@
   }
 
   /**
-   * Gets the Tensor associated with the provdied output index.
+   * Gets the Tensor associated with the provided output index.
    *
    * <p>Note: Output tensor details (e.g., shape) may not be fully populated until after inference
    * is executed. If you need updated details *before* running inference (e.g., after resizing an

diff --git a/runtime/bindings/tflite/java/org/tensorflow/lite/Tensor.java b/runtime/bindings/tflite/java/org/tensorflow/lite/Tensor.java
index 9a3b50e..d0f03ae 100644
--- a/runtime/bindings/tflite/java/org/tensorflow/lite/Tensor.java
+++ b/runtime/bindings/tflite/java/org/tensorflow/lite/Tensor.java

@@ -16,7 +16,7 @@
 import java.nio.LongBuffer;
 
 /**
- * A typed multi-dimensional array used in the IREE Java comptability shim.
+ * A typed multi-dimensional array used in the IREE Java compatibility shim.
  *
  * <p>The native handle of a tensor is managed by {@link Interpreter}, and does not needed to be
  * closed by the client. However, once the {@link Interpreter} has been closed, the tensor will be

diff --git a/runtime/src/iree/async/cts/socket/send_flags_test.cc b/runtime/src/iree/async/cts/socket/send_flags_test.cc
index 5df396b..0fad667 100644
--- a/runtime/src/iree/async/cts/socket/send_flags_test.cc
+++ b/runtime/src/iree/async/cts/socket/send_flags_test.cc

@@ -150,7 +150,7 @@
 //===----------------------------------------------------------------------===//
 
 // Send two messages with MORE flag on first, verify both arrive.
-TEST_P(SendFlagsTest, MorFlagCoalesces) {
+TEST_P(SendFlagsTest, MoreFlagCoalesces) {
   // Create connected socket pair.
   iree_async_socket_t* client = nullptr;
   iree_async_socket_t* server = nullptr;
@@ -215,7 +215,7 @@
 // Zero-copy send combined with MORE flag.
 // Tests that socket option ZC works with per-send MORE flag and data arrives
 // correctly.
-TEST_P(SendFlagsTest, ZeroCopyWithMoreFlag) {
+TEST_P(SendFlagsTest, ZeroCopyWithMoreeFlag) {
   // Create connected socket pair with ZERO_COPY enabled on client.
   iree_async_socket_t* client = nullptr;
   iree_async_socket_t* server = nullptr;
@@ -239,7 +239,7 @@
   IREE_ASSERT_OK(iree_async_proactor_submit_one(proactor_, &send_op1.base));
 
   // Send second message without MORE (uncorks). ZC is from socket option.
-  const char* send_data2 = "AndMore";
+  const char* send_data2 = "AndMoree";
   iree_async_span_t send_span2 =
       iree_async_span_from_ptr((void*)send_data2, strlen(send_data2));
 
@@ -270,7 +270,7 @@
       RecvAll(server, reinterpret_cast<uint8_t*>(recv_buffer), total_expected);
 
   EXPECT_EQ(total_received, total_expected);
-  EXPECT_EQ(memcmp(recv_buffer, "ZeroCopyAndMore", total_expected), 0);
+  EXPECT_EQ(memcmp(recv_buffer, "ZeroCopyAndMoree", total_expected), 0);
 
   iree_async_socket_release(server);
   iree_async_socket_release(client);

diff --git a/runtime/src/iree/base/allocator.h b/runtime/src/iree/base/allocator.h
index fb03d0e..5e5901c 100644
--- a/runtime/src/iree/base/allocator.h
+++ b/runtime/src/iree/base/allocator.h

@@ -26,7 +26,7 @@
 //===----------------------------------------------------------------------===//
 
 #if IREE_STATISTICS_ENABLE
-// Evalutes the expression code only if statistics are enabled.
+// Evaluates the expression code only if statistics are enabled.
 //
 // Example:
 //  struct {

diff --git a/runtime/src/iree/base/internal/math.h b/runtime/src/iree/base/internal/math.h
index fc8ed06..e9394f6 100644
--- a/runtime/src/iree/base/internal/math.h
+++ b/runtime/src/iree/base/internal/math.h

@@ -100,14 +100,14 @@
 static inline int iree_math_count_leading_zeros_u64(uint64_t n) {
 #if defined(IREE_COMPILER_MSVC_COMPAT) && \
     (defined(IREE_ARCH_ARM_64) || defined(IREE_ARCH_X86_64))
-  // MSVC does not have __buitin_clzll. Use _BitScanReverse64.
+  // MSVC does not have __builtin_clzll. Use _BitScanReverse64.
   unsigned long result = 0;  // NOLINT(runtime/int)
   if (_BitScanReverse64(&result, n)) {
     return (int)(63 - result);
   }
   return 64;
 #elif defined(IREE_COMPILER_MSVC_COMPAT)
-  // MSVC does not have __buitin_clzll. Compose two calls to _BitScanReverse
+  // MSVC does not have __builtin_clzll. Compose two calls to _BitScanReverse
   unsigned long result = 0;  // NOLINT(runtime/int)
   if ((n >> 32) && _BitScanReverse(&result, n >> 32)) {
     return (int)(31 - result);
@@ -435,12 +435,12 @@
       if (biased_f32_mantissa > f32_mantissa_mask) {
         // Note: software implementations that try to be fast tend to get this
         // conditional increment of exp and zeroing of mantissa for free by
-        // simplying incrementing the whole uint32 encoding of the float value,
-        // so that the mantissa overflows into the exponent bits.
-        // This results in magical-looking code like in the following links.
-        // We'd rather not care too much about performance of this function;
-        // we should only care about fp16 performance on fp16 hardware, and
-        // then, we should use hardware instructions.
+        // simplifying incrementing the whole uint32 encoding of the float
+        // value, so that the mantissa overflows into the exponent bits. This
+        // results in magical-looking code like in the following links. We'd
+        // rather not care too much about performance of this function; we
+        // should only care about fp16 performance on fp16 hardware, and then,
+        // we should use hardware instructions.
         // https://github.com/pytorch/pytorch/blob/e1502c0cdbfd17548c612f25d5a65b1e4b86224d/c10/util/BFloat16.h#L76
         // https://gitlab.com/libeigen/eigen/-/blob/21cd3fe20990a5ac1d683806f605110962aac3f1/Eigen/src/Core/arch/Default/BFloat16.h#L565
         biased_f32_mantissa = 0;

diff --git a/runtime/src/iree/base/internal/wait_handle_win32.c b/runtime/src/iree/base/internal/wait_handle_win32.c
index 1729099..17f99a8 100644
--- a/runtime/src/iree/base/internal/wait_handle_win32.c
+++ b/runtime/src/iree/base/internal/wait_handle_win32.c

@@ -322,7 +322,7 @@
     return iree_status_from_code(IREE_STATUS_DEADLINE_EXCEEDED);
   } else if (result >= WAIT_OBJECT_0 &&
              result < WAIT_OBJECT_0 + set->handle_count) {
-    // One (or more) handles were signaled sucessfully.
+    // One (or more) handles were signaled successfully.
     if (out_wake_handle) {
       DWORD wake_index = result - WAIT_OBJECT_0;
       iree_wait_primitive_value_t wake_value;
@@ -338,7 +338,7 @@
     return iree_ok_status();
   } else if (result >= WAIT_ABANDONED_0 &&
              result < WAIT_ABANDONED_0 + set->handle_count) {
-    // One (or more) mutex handles were abandonded during the wait.
+    // One (or more) mutex handles were abandoned during the wait.
     // This happens when a thread holding the mutex dies without releasing it.
     // This is less common in-process and more for the cross-process situations
     // where we have duped/opened a remote handle and the remote process dies.
@@ -352,7 +352,7 @@
     DWORD wake_index = result - WAIT_ABANDONED_0;
     return iree_make_status(
         IREE_STATUS_DATA_LOSS,
-        "mutex native handle %lu abanonded; shared state is "
+        "mutex native handle %lu abandoned; shared state is "
         "(likely) inconsistent",
         wake_index);
   } else if (result == WAIT_FAILED) {
@@ -408,7 +408,7 @@
     // Handle was signaled successfully.
     status = iree_ok_status();
   } else if (result == WAIT_ABANDONED_0) {
-    // The mutex handle was abandonded during the wait.
+    // The mutex handle was abandoned during the wait.
     // This happens when a thread holding the mutex dies without releasing it.
     // This is less common in-process and more for the cross-process situations
     // where we have duped/opened a remote handle and the remote process dies.
@@ -420,7 +420,7 @@
     // that mutex abandonment is exceptional. If you see this you are probably
     // going to want to look for thread exit messages or zombie processes.
     status = iree_make_status(IREE_STATUS_DATA_LOSS,
-                              "mutex native handle abanonded; shared state is "
+                              "mutex native handle abandoned; shared state is "
                               "(likely) inconsistent");
   } else if (result == WAIT_FAILED) {
     status = iree_make_status(iree_status_code_from_win32_error(GetLastError()),

diff --git a/runtime/src/iree/base/string_view_test.cc b/runtime/src/iree/base/string_view_test.cc
index 890a901..cb1eb2e 100644
--- a/runtime/src/iree/base/string_view_test.cc
+++ b/runtime/src/iree/base/string_view_test.cc

@@ -683,7 +683,7 @@
 
   // Exact matches.
   EXPECT_TRUE(match("abc", "abc"));
-  EXPECT_FALSE(match("abc", "abd"));
+  EXPECT_FALSE(match("abc", "and"));
   EXPECT_FALSE(match("abc", "ab"));
   EXPECT_FALSE(match("ab", "abc"));
 

diff --git a/runtime/src/iree/builtins/ukernel/arch/riscv_64/CMakeLists.txt b/runtime/src/iree/builtins/ukernel/arch/riscv_64/CMakeLists.txt
index 9469b86..146ec63 100644
--- a/runtime/src/iree/builtins/ukernel/arch/riscv_64/CMakeLists.txt
+++ b/runtime/src/iree/builtins/ukernel/arch/riscv_64/CMakeLists.txt

@@ -135,7 +135,7 @@
 
 # Compiler emits '__extendhfsf2' call for the
 # C code below. So, the following test checks whether
-# compiler can link '__extendhfsf2' succesfully.
+# compiler can link '__extendhfsf2' successfully.
 # This is needed because riscv toolchain used by
 # 'Test RISC-V 64 CI job' fails to link '__extendhfsf2'
 # builtin. See: https://github.com/iree-org/iree/issues/22303.

diff --git a/runtime/src/iree/builtins/ukernel/arch/x86_64/mmt4d_x86_64_avx512_vnni.c b/runtime/src/iree/builtins/ukernel/arch/x86_64/mmt4d_x86_64_avx512_vnni.c
index f8042f4..67a35c8 100644
--- a/runtime/src/iree/builtins/ukernel/arch/x86_64/mmt4d_x86_64_avx512_vnni.c
+++ b/runtime/src/iree/builtins/ukernel/arch/x86_64/mmt4d_x86_64_avx512_vnni.c

@@ -205,7 +205,7 @@
 // Meanwhile, when we split the LHS s16 values into high and low 8bit components
 // the high 8bits are signed s8 and the low 8bit are unsigned u8. So, for each
 // of the combinations of operands that we have to feed _mm512_dpbusd_epi32, we
-// manage to find an operand order that accomodates the instruction's
+// manage to find an operand order that accommodates the instruction's
 // requirements on signednesses.
 void iree_uk_mmt4d_tile_s16u4s32_1x32x8_x86_64_avx512_vnni(
     void* IREE_UK_RESTRICT out_tile, const void* IREE_UK_RESTRICT lhs_panel,

diff --git a/runtime/src/iree/builtins/ukernel/common.h b/runtime/src/iree/builtins/ukernel/common.h
index 312ab6a..ee7f3cf 100644
--- a/runtime/src/iree/builtins/ukernel/common.h
+++ b/runtime/src/iree/builtins/ukernel/common.h

@@ -206,7 +206,7 @@
 }
 
 //===----------------------------------------------------------------------===//
-// Architecture detection (copied from target_platorm.h)
+// Architecture detection (copied from target_platform.h)
 //===----------------------------------------------------------------------===//
 
 #if defined(__arm64) || defined(__aarch64__) || defined(_M_ARM64) || \
@@ -744,12 +744,12 @@
       if (biased_f32_mantissa > f32_mantissa_mask) {
         // Note: software implementations that try to be fast tend to get this
         // conditional increment of exp and zeroing of mantissa for free by
-        // simplying incrementing the whole uint32 encoding of the float value,
-        // so that the mantissa overflows into the exponent bits.
-        // This results in magical-looking code like in the following links.
-        // We'd rather not care too much about performance of this function;
-        // we should only care about fp16 performance on fp16 hardware, and
-        // then, we should use hardware instructions.
+        // simplifying incrementing the whole uint32 encoding of the float
+        // value, so that the mantissa overflows into the exponent bits. This
+        // results in magical-looking code like in the following links. We'd
+        // rather not care too much about performance of this function; we
+        // should only care about fp16 performance on fp16 hardware, and then,
+        // we should use hardware instructions.
         // https://github.com/pytorch/pytorch/blob/e1502c0cdbfd17548c612f25d5a65b1e4b86224d/c10/util/BFloat16.h#L76
         // https://gitlab.com/libeigen/eigen/-/blob/21cd3fe20990a5ac1d683806f605110962aac3f1/Eigen/src/Core/arch/Default/BFloat16.h#L565
         biased_f32_mantissa = 0;

diff --git a/runtime/src/iree/hal/buffer_transfer.c b/runtime/src/iree/hal/buffer_transfer.c
index 8d95ff0..ccbe1e2 100644
--- a/runtime/src/iree/hal/buffer_transfer.c
+++ b/runtime/src/iree/hal/buffer_transfer.c

@@ -267,7 +267,7 @@
   iree_hal_buffer_t* target_buffer = target.device_buffer;
   if (iree_status_is_ok(status) && !target_buffer) {
     // Allocate uninitialized staging memory for the transfer target.
-    // We only allocate enough for the portion we are transfering.
+    // We only allocate enough for the portion we are transferring.
     // TODO(benvanik): use import if supported to avoid the allocation/copy.
     const iree_hal_buffer_params_t target_params = {
         .type = IREE_HAL_MEMORY_TYPE_HOST_LOCAL |

diff --git a/runtime/src/iree/hal/cts/allocator_test.h b/runtime/src/iree/hal/cts/allocator_test.h
index b6a1e50..679b11d 100644
--- a/runtime/src/iree/hal/cts/allocator_test.h
+++ b/runtime/src/iree/hal/cts/allocator_test.h

@@ -84,7 +84,7 @@
   IREE_ASSERT_OK(iree_hal_allocator_allocate_buffer(device_allocator_, params,
                                                     kAllocationSize, &buffer));
 
-  // At a mimimum, the requested memory type should be respected.
+  // At a minimum, the requested memory type should be respected.
   // Additional bits may be optionally set depending on the allocator.
   EXPECT_TRUE(
       iree_all_bits_set(iree_hal_buffer_memory_type(buffer), params.type));

diff --git a/runtime/src/iree/hal/drivers/amdgpu/command_buffer.c b/runtime/src/iree/hal/drivers/amdgpu/command_buffer.c
index bc42852..320af6d 100644
--- a/runtime/src/iree/hal/drivers/amdgpu/command_buffer.c
+++ b/runtime/src/iree/hal/drivers/amdgpu/command_buffer.c

@@ -575,7 +575,7 @@
 
   // --- WARNING ------------------------------------------------------------ //
   // The split above may have reset encoder state and any state loaded from it
-  // must be requeried for subsequent usage.
+  // must be required for subsequent usage.
   // --- WARNING ------------------------------------------------------------ //
 
   // We update a scratch header and copy it to each device at the end.

diff --git a/runtime/src/iree/hal/drivers/amdgpu/device/command_buffer.h b/runtime/src/iree/hal/drivers/amdgpu/device/command_buffer.h
index 92cf827..4d3c212 100644
--- a/runtime/src/iree/hal/drivers/amdgpu/device/command_buffer.h
+++ b/runtime/src/iree/hal/drivers/amdgpu/device/command_buffer.h

@@ -302,7 +302,7 @@
   // This flushes the I/K/L1 caches.
   IREE_HAL_AMDGPU_DEVICE_CMD_FLAG_FENCE_RELEASE_AGENT =
       IREE_HSA_FENCE_SCOPE_AGENT << 3,
-  // Sets HSA_FENCE_SCOPE_SYTEM on the AQL packet release scope.
+  // Sets HSA_FENCE_SCOPE_SYSTEM on the AQL packet release scope.
   // This flushes the L1/L2 caches.
   IREE_HAL_AMDGPU_DEVICE_CMD_FLAG_FENCE_RELEASE_SYSTEM =
       IREE_HSA_FENCE_SCOPE_SYSTEM << 3,

diff --git a/runtime/src/iree/hal/drivers/cuda/cuda_device.c b/runtime/src/iree/hal/drivers/cuda/cuda_device.c
index c99f30c..2223de1 100644
--- a/runtime/src/iree/hal/drivers/cuda/cuda_device.c
+++ b/runtime/src/iree/hal/drivers/cuda/cuda_device.c

@@ -73,7 +73,7 @@
   // Timepoint pools, shared by various semaphores.
   iree_hal_cuda_timepoint_pool_t* timepoint_pool;
 
-  // A queue to order device workloads and relase to the GPU when constraints
+  // A queue to order device workloads and release to the GPU when constraints
   // are met. It buffers submissions and allocations internally before they
   // are ready. This queue couples with HAL semaphores backed by iree_event_t
   // and CUevent objects.
@@ -554,7 +554,7 @@
         driver, identifier, params, device, dispatch_stream, context,
         cuda_symbols, nccl_symbols, host_allocator, out_device);
   } else {
-    // Release resources we have accquired thus far.
+    // Release resources we have acquired thus far.
     if (dispatch_stream) cuda_symbols->cuStreamDestroy(dispatch_stream);
     if (context) cuda_symbols->cuDevicePrimaryCtxRelease(device);
   }
@@ -586,7 +586,7 @@
     cuda_device->device_event_pool = device_event_pool;
     cuda_device->timepoint_pool = timepoint_pool;
   } else {
-    // Release resources we have accquired after HAL device creation.
+    // Release resources we have acquired after HAL device creation.
     if (timepoint_pool) iree_hal_cuda_timepoint_pool_free(timepoint_pool);
     if (device_event_pool) iree_hal_cuda_event_pool_release(device_event_pool);
     if (host_event_pool) iree_event_pool_free(host_event_pool);
@@ -911,7 +911,7 @@
     iree_hal_device_t* base_device, iree_hal_queue_affinity_t queue_affinity,
     iree_hal_event_flags_t flags, iree_hal_event_t** out_event) {
   return iree_make_status(IREE_STATUS_UNIMPLEMENTED,
-                          "event not yet implmeneted");
+                          "event not yet implemented");
 }
 
 static iree_status_t iree_hal_cuda_device_create_executable_cache(

diff --git a/runtime/src/iree/hal/drivers/cuda/stream_command_buffer.c b/runtime/src/iree/hal/drivers/cuda/stream_command_buffer.c
index d57da7e..54ea602 100644
--- a/runtime/src/iree/hal/drivers/cuda/stream_command_buffer.c
+++ b/runtime/src/iree/hal/drivers/cuda/stream_command_buffer.c

@@ -202,7 +202,7 @@
   //       operations. In a real command buffer we would be this stream command
   //       buffer is strictly used to perform inline execution/replay of
   //       deferred command buffers that are retaining the resources already.
-  // NOTE: reseting the arena invalidates the collective batch.
+  // NOTE: resetting the arena invalidates the collective batch.
   iree_arena_reset(&command_buffer->arena);
   iree_hal_collective_batch_deinitialize(&command_buffer->collective_batch);
   iree_hal_resource_set_free(command_buffer->resource_set);

diff --git a/runtime/src/iree/hal/drivers/hip/event_semaphore.c b/runtime/src/iree/hal/drivers/hip/event_semaphore.c
index 9cb5e6b..8aed3b7 100644
--- a/runtime/src/iree/hal/drivers/hip/event_semaphore.c
+++ b/runtime/src/iree/hal/drivers/hip/event_semaphore.c

@@ -60,7 +60,7 @@
 // will also be cleaned up at this time. If the semaphore is failed,
 // the callbacks will be called with the status code of the failure.
 // If the semaphore is destroyed while callbacks are active,
-// they will be called with the CANCELLED erorr.
+// they will be called with the CANCELLED error.
 // The |cpu_event| is a value for the CPU to wait on when
 // we may not have to wait infinitely. For example with a multi
 // wait or a non-infinite timeout.

diff --git a/runtime/src/iree/hal/drivers/hip/hip_device.c b/runtime/src/iree/hal/drivers/hip/hip_device.c
index e117dca..adc0b87 100644
--- a/runtime/src/iree/hal/drivers/hip/hip_device.c
+++ b/runtime/src/iree/hal/drivers/hip/hip_device.c

@@ -990,7 +990,7 @@
     iree_hal_device_t* base_device, iree_hal_queue_affinity_t queue_affinity,
     iree_hal_event_flags_t flags, iree_hal_event_t** out_event) {
   return iree_make_status(IREE_STATUS_UNIMPLEMENTED,
-                          "event not yet implmeneted");
+                          "event not yet implemented");
 }
 
 static iree_status_t iree_hal_hip_device_import_file(
@@ -1154,7 +1154,7 @@
   iree_hal_semaphore_list_t signal_semaphore_list;
   iree_slim_mutex_t status_mutex;
   iree_status_t status;
-  // This is null unless we are running with an extenal stream,
+  // This is null unless we are running with an external stream,
   // at which point this is valid.
   iree_hal_hip_dispatch_completed_data_t* external_stream_dispatch_data;
 } iree_hal_hip_semaphore_callback_data_t;

diff --git a/runtime/src/iree/hal/drivers/metal/builtin/fill_buffer_generic.metal b/runtime/src/iree/hal/drivers/metal/builtin/fill_buffer_generic.metal
index e31b782..281bd05 100644
--- a/runtime/src/iree/hal/drivers/metal/builtin/fill_buffer_generic.metal
+++ b/runtime/src/iree/hal/drivers/metal/builtin/fill_buffer_generic.metal

@@ -16,7 +16,7 @@
 // Fills target |buffer| with the given |spec|ification.
 //
 // The target |buffer| is assumed to have 16-byte aligned offset/length.
-// Each thread fills one 4-compoment 32-bit element vector.
+// Each thread fills one 4-component 32-bit element vector.
 kernel void fill_buffer_16byte(device uint4* buffer [[buffer(0)]],
                                constant FillSpec& spec [[buffer(1)]],
                                uint id [[thread_position_in_grid]]) {
@@ -54,7 +54,7 @@
   // 3. Right bytes: containing (0 to 3) bytes since the last 4-byte aligned
   // address
   //
-  // Threads are distributed from the perspecitve of handling middle 32-bit
+  // Threads are distributed from the perspective of handling middle 32-bit
   // scalars. We use the first thread to *additionally* handle left and right
   // bytes.
   uint8_t left_byte_count = spec.buffer_offset % 4;
@@ -69,7 +69,7 @@
   uint32_t left_mask = ~((uint64_t(1) << (8 * (4 - left_byte_count))) - 1);
   uint32_t right_mask = (uint64_t(1) << (8 * right_byte_count)) - 1;
 
-  // Indexing start points in |buffer| for the threee parts.
+  // Indexing start points in |buffer| for the three parts.
   uint64_t left_start = spec.buffer_offset / 4;
   uint64_t middle_start = (spec.buffer_offset + 3) / 4;
   uint64_t right_start = (spec.buffer_offset + spec.buffer_length) / 4;

diff --git a/runtime/src/iree/hal/drivers/metal/direct_command_buffer.m b/runtime/src/iree/hal/drivers/metal/direct_command_buffer.m
index b353362..fe417be 100644
--- a/runtime/src/iree/hal/drivers/metal/direct_command_buffer.m
+++ b/runtime/src/iree/hal/drivers/metal/direct_command_buffer.m

@@ -646,7 +646,7 @@
       z0, iree_arena_allocate(&command_buffer->arena, sizeof(*segment) + pattern_length,
                               (void**)&storage_base));
 
-  // Copy the patttern to the end of the segment for later access.
+  // Copy the pattern to the end of the segment for later access.
   uint8_t* pattern_ptr = storage_base + sizeof(*segment);
   memcpy(pattern_ptr, (const uint8_t*)pattern, pattern_length);
 

diff --git a/runtime/src/iree/hal/drivers/vulkan/debug_reporter.cc b/runtime/src/iree/hal/drivers/vulkan/debug_reporter.cc
index aee9455..0147dba 100644
--- a/runtime/src/iree/hal/drivers/vulkan/debug_reporter.cc
+++ b/runtime/src/iree/hal/drivers/vulkan/debug_reporter.cc

@@ -102,7 +102,7 @@
   IREE_TRACE_ZONE_BEGIN(z0);
 
   // Allocate our struct first as we need to pass the pointer to the userdata
-  // of the messager instance when we create it.
+  // of the messenger instance when we create it.
   iree_hal_vulkan_debug_reporter_t* reporter = NULL;
   IREE_RETURN_AND_END_ZONE_IF_ERROR(
       z0, iree_allocator_malloc(host_allocator, sizeof(*reporter),

diff --git a/runtime/src/iree/hal/drivers/vulkan/descriptor_set_arena.cc b/runtime/src/iree/hal/drivers/vulkan/descriptor_set_arena.cc
index 5ca0f62..07bb002 100644
--- a/runtime/src/iree/hal/drivers/vulkan/descriptor_set_arena.cc
+++ b/runtime/src/iree/hal/drivers/vulkan/descriptor_set_arena.cc

@@ -49,7 +49,7 @@
       // allocations; here we just need to provide the proper "view" to Vulkan
       // drivers over the allocated memory.
       //
-      // Note this is needed because we can see unusal buffers like
+      // Note this is needed because we can see unusual buffers like
       // tensor<3xi8>. Depending on GPU capabilities, this might not always be
       // directly supported by the hardware. Under such circumstances, we need
       // to emulate i8 support with i32. Shader CodeGen takes care of that: the

diff --git a/runtime/src/iree/hal/drivers/vulkan/extensibility_util.h b/runtime/src/iree/hal/drivers/vulkan/extensibility_util.h
index 611d90d..a4f4ed1 100644
--- a/runtime/src/iree/hal/drivers/vulkan/extensibility_util.h
+++ b/runtime/src/iree/hal/drivers/vulkan/extensibility_util.h

@@ -146,7 +146,7 @@
   // Note that i32 or i1 is assumed to always exist and does not appear in
   // this bitfield.
   uint32_t compute_int : 8;
-  // Storage bitwidth requirement bitfiled:
+  // Storage bitwidth requirement bitfield:
   // * 0b01: 8-bit
   // * 0b10: 16-bit
   uint32_t storage : 8;

diff --git a/runtime/src/iree/hal/drivers/vulkan/status_util.c b/runtime/src/iree/hal/drivers/vulkan/status_util.c
index b7536fd..7f585dc 100644
--- a/runtime/src/iree/hal/drivers/vulkan/status_util.c
+++ b/runtime/src/iree/hal/drivers/vulkan/status_util.c

@@ -248,7 +248,7 @@
     case VK_ERROR_FULL_SCREEN_EXCLUSIVE_MODE_LOST_EXT:
       // An operation on a swapchain created with
       // VK_FULL_SCREEN_EXCLUSIVE_APPLICATION_CONTROLLED_EXT failed as it did
-      // not have exlusive full-screen access. This may occur due to
+      // not have exclusive full-screen access. This may occur due to
       // implementation-dependent reasons, outside of the application’s control.
       return iree_make_status_with_location(
           file, line, IREE_STATUS_UNAVAILABLE,

diff --git a/runtime/src/iree/hal/drivers/vulkan/tracing.cc b/runtime/src/iree/hal/drivers/vulkan/tracing.cc
index 0337e27..8eab5bb 100644
--- a/runtime/src/iree/hal/drivers/vulkan/tracing.cc
+++ b/runtime/src/iree/hal/drivers/vulkan/tracing.cc

@@ -123,7 +123,7 @@
   context->maintenance_command_pool->Free(command_buffer);
 }
 
-// Synchronously resets a range of querys in a query pool.
+// Synchronously resets a range of queries in a query pool.
 // This may submit commands to the queue.
 static void iree_hal_vulkan_tracing_reset_query_pool(
     iree_hal_vulkan_tracing_context_t* context, uint32_t query_index,
@@ -249,7 +249,7 @@
   // implementation and platform specific reasons. It is the application’s
   // responsibility to assess whether the returned maximum deviation makes the
   // timestamp values suitable for any particular purpose and can choose to
-  // re-issue the timestamp calibration call pursuing a lower devation value.
+  // re-issue the timestamp calibration call pursuing a lower deviation value.
   // https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/vkGetCalibratedTimestampsEXT.html
   //
   // We perform a small number of queries here and find the minimum deviation

diff --git a/runtime/src/iree/hal/utils/deferred_work_queue.h b/runtime/src/iree/hal/utils/deferred_work_queue.h
index c076a8d..e035927 100644
--- a/runtime/src/iree/hal/utils/deferred_work_queue.h
+++ b/runtime/src/iree/hal/utils/deferred_work_queue.h

@@ -24,7 +24,7 @@
 // This interface is used to allow the deferred work queue to interact with
 // a specific driver.
 // Calls to this vtable may be made from the deferred work queue on
-// multile threads simultaneously and so these functions must be thread
+// multiple threads simultaneously and so these functions must be thread
 // safe.
 // Calls to this interface will either come from a thread that has had
 // bind_to_thread called on it or as a side-effect from one of the public
@@ -39,7 +39,7 @@
 typedef struct iree_hal_deferred_work_queue_device_interface_vtable_t {
   void(IREE_API_PTR* destroy)(
       iree_hal_deferred_work_queue_device_interface_t* device_interface);
-  // Binds the device work queue to a thread. May be simulatneously
+  // Binds the device work queue to a thread. May be simultaneously
   // bound to multiple threads.
   iree_status_t(IREE_API_PTR* bind_to_thread)(
       iree_hal_deferred_work_queue_device_interface_t* device_interface);
@@ -76,7 +76,7 @@
       struct iree_hal_semaphore_t*, uint64_t,
       iree_hal_deferred_work_queue_native_event_t* out_event);
 
-  // Get the device to wait on the event associated wit hthe host event.
+  // Get the device to wait on the event associated wit the host event.
   iree_status_t(IREE_API_PTR* device_wait_on_host_event)(
       iree_hal_deferred_work_queue_device_interface_t* device_interface,
       iree_hal_deferred_work_queue_host_device_event_t event);

diff --git a/runtime/src/iree/hal/utils/stream_tracing.c b/runtime/src/iree/hal/utils/stream_tracing.c
index a60bdbb..a2b317e 100644
--- a/runtime/src/iree/hal/utils/stream_tracing.c
+++ b/runtime/src/iree/hal/utils/stream_tracing.c

@@ -24,9 +24,9 @@
 // command_buffer        command_buffer          command_buffer
 //
 // The submission list is owned by the tracing context and elements are
-// inserted and removed as commmand_buffers are submitted and when they
+// inserted and removed as command_buffers are submitted and when they
 // complete. This is a list of the head elements for each command buffer.
-// The commnad buffer list is owned by the command buffer. It is the list of
+// The command buffer list is owned by the command buffer. It is the list of
 // events used to trace command buffer dispatches.
 //
 // When the event is in the freelist, next_submission should be null, and

diff --git a/runtime/src/iree/hal/utils/stream_tracing.h b/runtime/src/iree/hal/utils/stream_tracing.h
index c80cabc..de7df55 100644
--- a/runtime/src/iree/hal/utils/stream_tracing.h
+++ b/runtime/src/iree/hal/utils/stream_tracing.h

@@ -136,7 +136,7 @@
 iree_status_t iree_hal_stream_tracing_context_collect(
     iree_hal_stream_tracing_context_t* context);
 
-// Notifies that the given list of events has been dispached on to the gpu.
+// Notifies that the given list of events has been dispatched on to the gpu.
 void iree_hal_stream_tracing_notify_submitted(
     iree_hal_stream_tracing_context_t* context,
     iree_hal_stream_tracing_context_event_list_t* event_list);
@@ -247,7 +247,7 @@
     context, event_list, out_node, graph, verbosity, dependency_nodes,        \
     dependency_nodes_count, file_name, file_name_length, line, function_name, \
     function_name_length, name, name_length)
-#define IREE_HAL_STREAM_TRACE_ZONE_END(context, evnet_list, verbosity)
+#define IREE_HAL_STREAM_TRACE_ZONE_END(context, event_list, verbosity)
 #define IREE_HAL_GRAPH_TRACE_ZONE_END(context, event_list, out_node, graph, \
                                       verbosity, dependency_nodes,          \
                                       dependency_nodes_count)

diff --git a/runtime/src/iree/task/executor.c b/runtime/src/iree/task/executor.c
index 6fc98e2..ca719d1 100644
--- a/runtime/src/iree/task/executor.c
+++ b/runtime/src/iree/task/executor.c

@@ -121,7 +121,7 @@
   // enough to ensure each worker gets a sufficiently random seed for itself to
   // then generate entropy with. As a hack we use out_executor's address, as
   // that should live on the caller stack and with ASLR that's likely pretty
-  // random itself. I'm sure somewhere a mathemetician just cringed :)
+  // random itself. I'm sure somewhere a mathematician just cringed :)
   iree_prng_splitmix64_state_t seed_prng;
   iree_prng_splitmix64_initialize(/*seed=*/(uint64_t)(out_executor),
                                   &seed_prng);

diff --git a/runtime/src/iree/task/executor.h b/runtime/src/iree/task/executor.h
index a97283f..3dc5c09 100644
--- a/runtime/src/iree/task/executor.h
+++ b/runtime/src/iree/task/executor.h

@@ -233,7 +233,7 @@
 //
 // If more than 64 unique L1/L2 caches (or realistically more than probably ~32)
 // are available *and* all of them are attached to the same memory controllers
-// (no NUMA involved) then the solution is straightfoward: use multiple IREE
+// (no NUMA involved) then the solution is straightforward: use multiple IREE
 // task executors. Either within a process or in separate processes the
 // granularity is coarse enough to not be a burden and changes the problem from
 // needing 100% perfect work scaling of a single task to needing a naive

diff --git a/runtime/src/iree/task/pool.h b/runtime/src/iree/task/pool.h
index de9d5e9..c740f42 100644
--- a/runtime/src/iree/task/pool.h
+++ b/runtime/src/iree/task/pool.h

@@ -94,7 +94,7 @@
 
 // Acquires a set of tasks from the task pool. The returned tasks will have
 // undefined contents besides their intrusive next pointers and must be
-// intialized by the caller.
+// initialized by the caller.
 //
 // WARNING: this may cause growth during races if multiple threads are trying to
 // acquire at the same time. Our usage patterns here are such that this is never

diff --git a/runtime/src/iree/task/queue.h b/runtime/src/iree/task/queue.h
index 161845d..9c8c15a 100644
--- a/runtime/src/iree/task/queue.h
+++ b/runtime/src/iree/task/queue.h

@@ -61,7 +61,7 @@
 // of tasks in any given flush is low(ish) and by walking in reverse order to
 // then process forward the cache should be hot as the worker starts making its
 // way back through the tasks. As we walk forward we'll be using the task fields
-// for execution and retiring of tasks (notifing dependencies/etc) and the
+// for execution and retiring of tasks (notifying dependencies/etc) and the
 // intrusive next pointer sitting next to those should be in-cache when we need
 // to access it. This, combined with slab allocation of tasks in command buffers
 // to begin with gives us the (probabilistically) same characteristics of a flat
@@ -94,7 +94,7 @@
 // Unlike that implementation, though, our task list is unbounded because we use
 // a linked list. To keep our options open, though, I've left the API of this
 // implementation compatible with classic atomic work-stealing queues. I'm
-// hopeful this will not need to be revisted for awhile, though!
+// hopeful this will not need to be revisited for awhile, though!
 //
 // Future improvement idea: have the owner of the queue maintain a theft point
 // skip list that makes it possible for thieves to quickly come in and slice

diff --git a/runtime/src/iree/task/scope.h b/runtime/src/iree/task/scope.h
index acab8a2..361ddbe 100644
--- a/runtime/src/iree/task/scope.h
+++ b/runtime/src/iree/task/scope.h

@@ -112,7 +112,7 @@
 // describing the failure and subsequent calls will return the status code.
 bool iree_task_scope_has_failed(iree_task_scope_t* scope);
 
-// Returns the permanent scope failure status to the caller (transfering
+// Returns the permanent scope failure status to the caller (transferring
 // ownership). The scope will remain in a failed state with the status code.
 iree_status_t iree_task_scope_consume_status(iree_task_scope_t* scope);
 

diff --git a/runtime/src/iree/task/topology.h b/runtime/src/iree/task/topology.h
index 502411d..239a521 100644
--- a/runtime/src/iree/task/topology.h
+++ b/runtime/src/iree/task/topology.h

@@ -206,7 +206,7 @@
   IREE_TASK_TOPOLOGY_PERFORMANCE_LEVEL_ANY = 0,
   // Selects "E(fficiency)" cores that favor lower power/thermal load.
   IREE_TASK_TOPOLOGY_PERFORMANCE_LEVEL_LOW,
-  // Selects "P(erformance)" cores that favor higher power/thermal load.
+  // Selects "P(performance)" cores that favor higher power/thermal load.
   IREE_TASK_TOPOLOGY_PERFORMANCE_LEVEL_HIGH,
 } iree_task_topology_performance_level_t;
 

diff --git a/runtime/src/iree/task/topology_cpuinfo.c b/runtime/src/iree/task/topology_cpuinfo.c
index 15c83df..1bbac46 100644
--- a/runtime/src/iree/task/topology_cpuinfo.c
+++ b/runtime/src/iree/task/topology_cpuinfo.c

@@ -91,7 +91,7 @@
   out_affinity->id_assigned = 1;
   out_affinity->id = processor->linux_id;
 #else
-  // WASM? Unusued today.
+  // WASM? Unused today.
   out_affinity->id_assigned = 0;
 #endif  // cpuinfo-like platform field
 

diff --git a/runtime/src/iree/task/worker.c b/runtime/src/iree/task/worker.c
index 8c2b4d3..e829d23 100644
--- a/runtime/src/iree/task/worker.c
+++ b/runtime/src/iree/task/worker.c

@@ -301,7 +301,7 @@
   while (true) {
     // If we fail to find any work to do we'll wait at the end of this loop.
     // In order not to not miss any work that is enqueued after we've already
-    // checked a particular source we use an interruptable wait token that
+    // checked a particular source we use an interruptible wait token that
     // will prevent the wait from happening if anyone touches the data
     // structures we use.
     iree_wait_token_t wait_token =

diff --git a/runtime/src/iree/testing/benchmark.h b/runtime/src/iree/testing/benchmark.h
index 4b1b339..d565e7b 100644
--- a/runtime/src/iree/testing/benchmark.h
+++ b/runtime/src/iree/testing/benchmark.h

@@ -152,7 +152,7 @@
 void iree_benchmark_skip(iree_benchmark_state_t* state, const char* message);
 
 // Suspends the benchmark timer until iree_benchmark_resume_timing is called.
-// This can be used to guard per-step code that is required to initialze the
+// This can be used to guard per-step code that is required to initialize the
 // work but not something that needs to be accounted for in the benchmark
 // timing. Introduces non-trivial overhead: only use this ~once per step when
 // then going on to perform large amounts of batch work in the step.

diff --git a/runtime/src/iree/vm/bytecode/disassembler.c b/runtime/src/iree/vm/bytecode/disassembler.c
index b997eba..fb63f82 100644
--- a/runtime/src/iree/vm/bytecode/disassembler.c
+++ b/runtime/src/iree/vm/bytecode/disassembler.c

@@ -1659,7 +1659,7 @@
         IREE_RETURN_IF_ERROR(iree_string_builder_append_cstring(b, " = "));
       }
       IREE_RETURN_IF_ERROR(
-          iree_string_builder_append_cstring(b, "vm.call.varadic @"));
+          iree_string_builder_append_cstring(b, "vm.call.variadic @"));
       IREE_RETURN_IF_ERROR(iree_vm_bytecode_disassembler_print_function_name(
           module, module_state, function_ordinal, b));
       IREE_RETURN_IF_ERROR(iree_string_builder_append_cstring(b, "("));

diff --git a/runtime/src/iree/vm/bytecode/disassembler.h b/runtime/src/iree/vm/bytecode/disassembler.h
index 8adae99..b71aa3d 100644
--- a/runtime/src/iree/vm/bytecode/disassembler.h
+++ b/runtime/src/iree/vm/bytecode/disassembler.h

@@ -23,8 +23,9 @@
 } iree_vm_bytecode_disassembly_format_t;
 
 // Disassembles the bytecode operation at |pc| using the provided module state.
-// Appends the disasembled op to |string_builder| in a format based on |format|.
-// If |regs| are available then values can be added using the format mode.
+// Appends the disassembled op to |string_builder| in a format based on
+// |format|. If |regs| are available then values can be added using the format
+// mode.
 //
 // Example: `%i0 <= ShrI32U %i2, %i3`
 //

diff --git a/runtime/src/iree/vm/bytecode/dispatch.c b/runtime/src/iree/vm/bytecode/dispatch.c
index 7dcd5b7..a6a3c04 100644
--- a/runtime/src/iree/vm/bytecode/dispatch.c
+++ b/runtime/src/iree/vm/bytecode/dispatch.c

@@ -357,7 +357,7 @@
   caller_storage->return_registers = dst_reg_list;
 
   // NOTE: after this call the caller registers may be invalid and need to be
-  // requeried.
+  // required.
   iree_vm_function_t function;
   function.module = module;
   function.linkage = IREE_VM_FUNCTION_LINKAGE_INTERNAL;

diff --git a/runtime/src/iree/vm/ref_cc.h b/runtime/src/iree/vm/ref_cc.h
index d919a6d..03db634 100644
--- a/runtime/src/iree/vm/ref_cc.h
+++ b/runtime/src/iree/vm/ref_cc.h

@@ -191,7 +191,7 @@
 // The ref wrapper calls the iree_vm_ref_* functions and uses the
 // iree_vm_ref_type_descriptor_t registered for the type T to manipulate the
 // reference counter and, when needed, destroy the object using
-// iree_vm_ref_destroy_t. Any iree_vm_ref_t can be used interchangably with
+// iree_vm_ref_destroy_t. Any iree_vm_ref_t can be used interchangeably with
 // ref<T> when RAII is needed.
 //
 // Example:

diff --git a/samples/custom_dispatch/cpu/mlp_plugin/CMakeLists.txt b/samples/custom_dispatch/cpu/mlp_plugin/CMakeLists.txt
index a1ca996..28c78ec 100644
--- a/samples/custom_dispatch/cpu/mlp_plugin/CMakeLists.txt
+++ b/samples/custom_dispatch/cpu/mlp_plugin/CMakeLists.txt

@@ -12,7 +12,7 @@
 
 ## Current support is only for x86.
 if(NOT IREE_ARCH STREQUAL "x86_64")
-  message(STATUS "IREE mlp_pluging sample ignored -- only builds for x86_64 (today)")
+  message(STATUS "IREE mlp_plugin sample ignored -- only builds for x86_64 (today)")
   return()
 endif()
 

diff --git a/tests/e2e/collectives/run_rank.py b/tests/e2e/collectives/run_rank.py
index 7fdda25..cd604f7 100644
--- a/tests/e2e/collectives/run_rank.py
+++ b/tests/e2e/collectives/run_rank.py

@@ -12,7 +12,7 @@
 import test_utils
 
 """
-Run 1 rank in a destributed context.
+Run 1 rank in a distributed context.
 To start 4 ranks you would use
 ```
 mpirun -n 4 python run_rank.py ...
@@ -21,7 +21,7 @@
 
 
 def parse_args():
-    parser = argparse.ArgumentParser(description="Run 1 rank in a destributed context.")
+    parser = argparse.ArgumentParser(description="Run 1 rank in a distributed context.")
     parser.add_argument("--driver", type=str, default="local-task", help="Device URI.")
     parser.add_argument(
         "--module_filepath", type=str, required=True, help="Path to IREE module."

diff --git a/tests/e2e/convolution/generate_e2e_conv2d_tests.py b/tests/e2e/convolution/generate_e2e_conv2d_tests.py
index f611f0e..98f0d6e 100644
--- a/tests/e2e/convolution/generate_e2e_conv2d_tests.py
+++ b/tests/e2e/convolution/generate_e2e_conv2d_tests.py

@@ -524,7 +524,7 @@
             )
             # Different testcases may differ only by runtime parameters but
             # share the same code. For example, dynamic-shapes testcases
-            # share the same code involing tensor<?x?xf32> even though the runtime
+            # share the same code involving tensor<?x?xf32> even though the runtime
             # value in the trace are different. That's why we append conditionally
             # to calls, but unconditionally to function_definitions.
             if function.name not in functions:

diff --git a/tests/e2e/linalg/fp_to_subbyte.mlir b/tests/e2e/linalg/fp_to_subbyte.mlir
index ed9fb7d..4a3ceb3 100644
--- a/tests/e2e/linalg/fp_to_subbyte.mlir
+++ b/tests/e2e/linalg/fp_to_subbyte.mlir

@@ -9,7 +9,7 @@
     linalg.yield %3 : i4
   } -> tensor<8xi4>
 
-  // TODO(#14996): Remove the signed extention and directly check with i4 types.
+  // TODO(#14996): Remove the signed extension and directly check with i4 types.
   %blocker = util.optimization_barrier %res : tensor<8xi4>
   %init1 = tensor.empty() : tensor<8xi8>
   %exti8 = linalg.generic {indexing_maps = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> (d0)>], iterator_types = ["parallel"]}
@@ -34,7 +34,7 @@
     linalg.yield %3 : i2
   } -> tensor<8xi2>
 
-  // TODO(#14996): Remove the signed extention and directly check with i2 types.
+  // TODO(#14996): Remove the signed extension and directly check with i2 types.
   %blocker = util.optimization_barrier %res : tensor<8xi2>
   %init1 = tensor.empty() : tensor<8xi8>
   %exti8 = linalg.generic {indexing_maps = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> (d0)>], iterator_types = ["parallel"]}

diff --git a/tests/e2e/linalg/narrow_n_matmuls.mlir b/tests/e2e/linalg/narrow_n_matmuls.mlir
index 578d7f7..86a01ad 100644
--- a/tests/e2e/linalg/narrow_n_matmuls.mlir
+++ b/tests/e2e/linalg/narrow_n_matmuls.mlir

@@ -1,4 +1,4 @@
-// Test various forms of matmuls with narrow N, in particual matvec/batch_matvec
+// Test various forms of matmuls with narrow N, in particular matvec/batch_matvec
 // (implicitly N=1) and matmuls with N=1 and N=2.
 //
 // The reason why this needs extensive e2e testing is the transposition of

diff --git a/tests/external/iree-test-suites/test_suite_files/attention_and_matmul_spec_punet_mi300.mlir b/tests/external/iree-test-suites/test_suite_files/attention_and_matmul_spec_punet_mi300.mlir
index 2ca2cf8..5f1a1e6 100644
--- a/tests/external/iree-test-suites/test_suite_files/attention_and_matmul_spec_punet_mi300.mlir
+++ b/tests/external/iree-test-suites/test_suite_files/attention_and_matmul_spec_punet_mi300.mlir

@@ -572,7 +572,7 @@
         //, @match_broadcast_rhs_mmt_Bx64x640x2480 -> @apply_op_config
 
 
-        // Contration.
+        // Contraction.
         , @match_matmul_like_Bx20x1024x64x1280_i8xi8xi32 -> @apply_op_config
         , @match_matmul_like_Bx10x4096x64x640_i8xi8xi32 -> @apply_op_config
         , @match_matmul_like_Bx20x64x64x2048_i8xi8xi32 -> @apply_op_config