Update documentation for local-task and local-sync drivers. (#9399)

Part of https://github.com/google/iree/issues/4298, follow-up to https://github.com/google/iree/pull/9365#discussion_r891700391

The main change here is renaming the `cpu-dylib` page to just `cpu` and adding some explanations for `local-sync` vs `local-task` and the executable format options.

There's still a bit to clean up to do around the `dylib-llvm-aot` compiler target name and a few lingering places in code that make assumptions about dependencies on threading.
diff --git a/benchmarks/dashboard.md b/benchmarks/dashboard.md
index ff4bb14..8689eee 100644
--- a/benchmarks/dashboard.md
+++ b/benchmarks/dashboard.md
@@ -81,19 +81,14 @@
 
 This field specifies the IREE HAL driver:
 
-* [`Dylib`](https://google.github.io/iree/deployment-configurations/cpu-dylib/):
-  For CPU via dynamic library. Kernels contain CPU native instructions AOT
-  compiled using LLVM. This driver issues workload to the CPU in async
-  manner and supports multithreading.
-* [`Dylib-Sync`](https://google.github.io/iree/deployment-configurations/cpu-dylib/):
-  For CPU via dynamic library. Kernels contain contain CPU native instructions
-  AOT compiled using LLVM. This driver issues workload to the CPU in sync
-  manner.
-* [`VMVX`](https://github.com/google/iree/issues/5123):
-  For CPU via dynamic library. Kernels contain vector-level intrinsics that
-  are backed by fast implementations ([WIP](https://github.com/google/iree/issues/5819)).
-  This driver issues workload to the CPU in async manner and supports
-  multithreading.
+* [`local-task`](https://google.github.io/iree/deployment-configurations/cpu/):
+  For CPU via the local task system. Kernels contain CPU native instructions AOT
+  compiled using LLVM. This driver issues workloads to the CPU asynchronously
+  and supports multithreading.
+* [`local-sync`](https://google.github.io/iree/deployment-configurations/cpu/):
+  For CPU via the local 'sync' device. Kernels contain contain CPU native
+  instructions AOT compiled using LLVM. This driver issues workloads to the CPU
+  synchronously.
 * [`Vulkan`](https://google.github.io/iree/deployment-configurations/gpu-vulkan/):
   For GPU via Vulkan. Kernels contain SPIR-V. This driver issues workload to
   the GPU via the Vulkan API.
diff --git a/docs/website/docs/building-from-source/riscv.md b/docs/website/docs/building-from-source/riscv.md
index 54a38ac..db4b282 100644
--- a/docs/website/docs/building-from-source/riscv.md
+++ b/docs/website/docs/building-from-source/riscv.md
@@ -141,7 +141,7 @@
 [https://github.com/sifive/qemu/tree/v5.2.0-rvv-rvb-zfh](https://github.com/sifive/qemu/tree/v5.2.0-rvv-rvb-zfh).
 
 The SIMD code can be generated following the
-[IREE dynamic library CPU HAL driver flow](../deployment-configurations/cpu-dylib.md)
+[IREE CPU flow](../deployment-configurations/cpu.md)
 with the additional command-line flags
 
 ```shell hl_lines="3 4 5 6 7 8"
@@ -153,7 +153,7 @@
   --iree-llvm-target-abi=lp64d \
   --iree-llvm-target-cpu-features="+m,+a,+f,+d,+v" \
   --riscv-v-vector-bits-min=256 --riscv-v-fixed-length-vector-lmul-max=8 \
-  iree_input.mlir -o mobilenet-dylib.vmfb
+  iree_input.mlir -o mobilenet_cpu.vmfb
 ```
 
 Then run on the RISC-V QEMU:
@@ -164,7 +164,7 @@
   -L ${RISCV_TOOLCHAIN_ROOT}/sysroot/ \
   ../iree-build-riscv/tools/iree-run-module \
   --driver=local-task \
-  --module_file=mobilenet-dylib.vmfb \
+  --module_file=mobilenet_cpu.vmfb \
   --entry_function=predict \
   --function_input="1x224x224x3xf32=0"
 ```
diff --git a/docs/website/docs/deployment-configurations/bare-metal.md b/docs/website/docs/deployment-configurations/bare-metal.md
index d1a3ecf..5415944 100644
--- a/docs/website/docs/deployment-configurations/bare-metal.md
+++ b/docs/website/docs/deployment-configurations/bare-metal.md
@@ -1,13 +1,12 @@
 # Run on a Bare-Metal Platform
 
-IREE supports CPU model execution on a bare-metal platform. That is, a platform
-without operating system support, and the executable is built with the
-machine-specific linker script and/or the board support package (BSP).
+IREE supports CPU model execution on bare-metal platforms. That is, platforms
+without operating system support, for which executables are built using
+machine-specific linker scripts and/or board support packages (BSPs).
 
 Bare-metal deployment typically uses IREE's LLVM compiler target much like the
-[CPU - Dylib](./cpu-dylib.md)
-configuration, but using a limited subset of IREE's CPU HAL driver at runtime to
-load and execute compiled programs.
+[CPU configuration](./cpu.md), but using a limited subset of IREE's CPU HAL
+driver code at runtime to load and execute compiled programs.
 
 ## Prerequisites
 
@@ -19,7 +18,7 @@
 * Firmware libraries
 
 Please follow the
-[instructions](./cpu-dylib.md#get-compiler-for-cpu-native-instructions)
+[instructions](./cpu.md#get-compiler-for-cpu-native-instructions)
 to retrieve the IREE compiler.
 
 ## Compile the model for bare-metal
@@ -35,7 +34,7 @@
     --iree-llvm-target-triple=x86_64-pc-linux-elf \
     --iree-llvm-debug-symbols=false \
     samples/models/simple_abs.mlir \
-    -o /tmp/simple_abs_dylib.vmfb
+    -o /tmp/simple_abs_cpu.vmfb
 
 ```
 
@@ -55,7 +54,7 @@
 for example command-line instructions of some common architectures
 
 You can replace the MLIR file with the other MLIR model files, following the
-[instructions](./cpu-dylib.md#compile-the-model)
+[instructions](./cpu.md#compile-the-model)
 
 ### Compiling the bare-metal model for static-library support
 
diff --git a/docs/website/docs/deployment-configurations/cpu-dylib.md b/docs/website/docs/deployment-configurations/cpu.md
similarity index 62%
rename from docs/website/docs/deployment-configurations/cpu-dylib.md
rename to docs/website/docs/deployment-configurations/cpu.md
index 46f71c7..533c558 100644
--- a/docs/website/docs/deployment-configurations/cpu-dylib.md
+++ b/docs/website/docs/deployment-configurations/cpu.md
@@ -1,11 +1,24 @@
-# Dynamic Library CPU HAL Driver
+# CPU Deployment
 
-IREE supports efficient model execution on CPU. IREE uses [LLVM][llvm] to
-compile dense computation in the model into highly optimized CPU native
-instruction streams, which are embedded in IREE's deployable format as dynamic
-libraries (dylibs). IREE uses its own low-overhead minimal dynamic library
-loader to load them and then schedule them with concrete workloads onto various
-CPU cores.
+IREE supports efficient program execution on CPU devices by using [LLVM][llvm]
+to compile all dense computations in each program into highly optimized CPU
+native instruction streams, which are embedded in one of IREE's deployable
+formats.
+
+To compile a program for CPU execution, pick one of IREE's supported executable
+formats:
+
+| Executable Format | Description                                           |
+| ----------------- | ----------------------------------------------------- |
+| embedded ELF      | portable, high performance dynamic library            |
+| system library    | platform-specific dynamic library (.so, .dll, etc.)   |
+| VMVX              | reference target                                      |
+
+At runtime, CPU executables can be loaded using one of IREE's CPU HAL drivers:
+
+* `local-task`: asynchronous, multithreaded driver built on IREE's "task"
+   system
+* `local-sync`: synchronous, single-threaded driver that executes work inline
 
 !!! todo
 
@@ -14,35 +27,10 @@
 
 <!-- TODO(??): when to use CPU vs GPU vs other backends -->
 
-## Get runtime and compiler
-
-### Get IREE runtime with dylib HAL driver
-
-You will need to get an IREE runtime that supports the dylib HAL driver
-so it can execute the model on CPU via dynamic libraries containing native
-CPU instructions.
-
-<!-- TODO(??): vcpkg -->
-
-
-#### Build runtime from source
-
-Please make sure you have followed the [Getting started][get-started] page
-to build IREE for your host platform and the
-[Android cross-compilation][android-cc]
-page if you are cross compiling for Android. The dylib HAL driver is compiled
-in by default on all platforms.
-
-<!-- TODO(??): a way to verify dylib is compiled in and supported -->
-
-Ensure that the `IREE_HAL_DRIVER_LOCAL_TASK` and
-`IREE_HAL_EXECUTABLE_LOADER_EMBEDDED_ELF` CMake options are `ON` when
-configuring for the target.
+## Get compiler and runtime
 
 ### Get compiler for CPU native instructions
 
-<!-- TODO(??): vcpkg -->
-
 #### Download as Python package
 
 Python packages for various IREE functionalities are regularly published
@@ -66,9 +54,9 @@
 
 Please make sure you have followed the [Getting started][get-started] page
 to build IREE for your host platform and the
-[Android cross-compilation][android-cc]
-page if you are cross compiling for Android. The dylib compiler backend is
-compiled in by default on all platforms.
+[Android cross-compilation][android-cc] page if you are cross compiling for
+Android. The LLVM (CPU) compiler backend is compiled in by default on all
+platforms.
 
 Ensure that the `IREE_TARGET_BACKEND_DYLIB_LLVM_AOT` CMake option is `ON` when
 configuring for the host.
@@ -79,15 +67,15 @@
 
 ## Compile and run the model
 
-With the compiler and runtime for dynamic libraries, we can now compile a model
-and run it on the CPU.
+With the compiler and runtime for local CPU execution, we can now compile a
+model and run it.
 
 ### Compile the model
 
-IREE compilers transform a model into its final deployable format in many
+The IREE compiler transforms a model into its final deployable format in many
 sequential steps. A model authored with Python in an ML framework should use the
 corresponding framework's import tool to convert into a format (i.e.,
-[MLIR][mlir]) expected by main IREE compilers first.
+[MLIR][mlir]) expected by the IREE compiler first.
 
 Using MobileNet v2 as an example, you can download the SavedModel with trained
 weights from [TensorFlow Hub][tf-hub-mobilenetv2] and convert it using IREE's
@@ -102,7 +90,7 @@
 iree-compile \
     --iree-mlir-to-vm-bytecode-module \
     --iree-hal-target-backends=dylib-llvm-aot \
-    iree_input.mlir -o mobilenet-dylib.vmfb
+    iree_input.mlir -o mobilenet_cpu.vmfb
 ```
 
 !!! todo
@@ -112,6 +100,24 @@
 where `iree_input.mlir` is the model's initial MLIR representation generated by
 IREE's TensorFlow importer.
 
+### Get IREE runtime with local CPU HAL driver
+
+You will need to get an IREE runtime that supports the local CPU HAL driver,
+along with the appropriate executable loaders for your application.
+
+#### Build runtime from source
+
+Please make sure you have followed the [Getting started][get-started] page
+to build IREE for your host platform and the
+[Android cross-compilation][android-cc] page if you are cross compiling for
+Android. The local CPU HAL drivers are compiled in by default on all platforms.
+
+<!-- TODO(??): a way to verify the driver is compiled in and supported -->
+
+Ensure that the `IREE_HAL_DRIVER_LOCAL_TASK` and
+`IREE_HAL_EXECUTABLE_LOADER_EMBEDDED_ELF` (or other executable loader) CMake
+options are `ON` when configuring for the target.
+
 ### Run the model
 
 #### Run using the command-line
@@ -121,7 +127,7 @@
 ``` shell hl_lines="2"
 tools/iree-run-module \
     --driver=local-task \
-    --module_file=mobilenet-dylib.vmfb \
+    --module_file=mobilenet_cpu.vmfb \
     --entry_function=predict \
     --function_input="1x224x224x3xf32=0"
 ```
diff --git a/docs/website/docs/deployment-configurations/index.md b/docs/website/docs/deployment-configurations/index.md
index 1b203a2..c9ecb45 100644
--- a/docs/website/docs/deployment-configurations/index.md
+++ b/docs/website/docs/deployment-configurations/index.md
@@ -1,13 +1,13 @@
 # Deployment configurations
 
 IREE provides a flexible set of tools for various deployment scenarios.
-Fully featured environments can use IREE for dynamic model deployments taking
+Fully featured environments can use IREE to load programs on demand and to take
 advantage of multi-threaded hardware, while embedded systems can bypass IREE's
 runtime entirely or interface with custom accelerators.
 
 ## Stable configurations
 
-* [CPU - Dylib](./cpu-dylib.md)
+* [CPU](./cpu.md) for general purpose CPU deployment
 * [CPU - Bare-Metal](./bare-metal.md) with minimal platform dependencies
 * [GPU - Vulkan](./gpu-vulkan.md)
 * [GPU - CUDA/ROCm](./gpu-cuda-rocm.md)
diff --git a/docs/website/docs/getting-started/tensorflow.md b/docs/website/docs/getting-started/tensorflow.md
index c4ec7a5..3a0e118 100644
--- a/docs/website/docs/getting-started/tensorflow.md
+++ b/docs/website/docs/getting-started/tensorflow.md
@@ -83,7 +83,7 @@
     and v2 if you see one of them gives an empty dump.
 
 Afterwards you can further compile the model in `iree_input.mlir` for
-[CPU](../deployment-configurations/cpu-dylib.md) or
+[CPU](../deployment-configurations/cpu.md) or
 [GPU](../deployment-configurations/gpu-vulkan.md).
 
 <!-- TODO(??): overview of APIs available, code snippets (lift from Colab?) -->
diff --git a/docs/website/docs/index.md b/docs/website/docs/index.md
index a938b45..b530da0 100644
--- a/docs/website/docs/index.md
+++ b/docs/website/docs/index.md
@@ -74,7 +74,7 @@
     Develop your program using one of the [supported frameworks](./getting-started/#supported-frameworks), then run your model
     using one of IREE's import tools.
 
-2. **Select your [deployment configuration](./deployment-configurations)**
+2. **Select your [deployment configuration](./deployment-configurations/)**
 
     Identify your target platform, accelerator(s), and other constraints.
 
@@ -121,7 +121,7 @@
 For example, compiling for
 [GPU execution](deployment-configurations/gpu-vulkan.md) using Vulkan generates
 SPIR-V kernels and Vulkan API calls. For
-[CPU execution](deployment-configurations/cpu-dylib.md), native code with
+[CPU execution](deployment-configurations/cpu.md), native code with
 static or dynamic linkage and the associated function calls are generated.
 
 ### Running models
diff --git a/docs/website/mkdocs.yml b/docs/website/mkdocs.yml
index 48c94ff..a8fa9bd 100644
--- a/docs/website/mkdocs.yml
+++ b/docs/website/mkdocs.yml
@@ -101,7 +101,7 @@
       - JAX: 'getting-started/jax.md'
   - 'Deployment configurations':
       - 'deployment-configurations/index.md'
-      - CPU - Dylib: 'deployment-configurations/cpu-dylib.md'
+      - CPU: 'deployment-configurations/cpu.md'
       - CPU - Bare-Metal: 'deployment-configurations/bare-metal.md'
       - GPU - Vulkan: 'deployment-configurations/gpu-vulkan.md'
       - GPU - CUDA/ROCm: 'deployment-configurations/gpu-cuda-rocm.md'
diff --git a/samples/dynamic_shapes/README.md b/samples/dynamic_shapes/README.md
index 55c805f..962fa8d 100644
--- a/samples/dynamic_shapes/README.md
+++ b/samples/dynamic_shapes/README.md
@@ -77,8 +77,8 @@
     ```
 
 3. Compile the `dynamic_shapes.mlir` file using `iree-compile`. The
-    [dylib-llvm-aot](https://google.github.io/iree/deployment-configurations/cpu-dylib/)
-    configuration has the best support for dynamic shapes:
+    [CPU configuration](https://google.github.io/iree/deployment-configurations/cpu/)
+    has the best support for dynamic shapes:
 
     ```
     ../iree-build/tools/iree-compile \
diff --git a/samples/variables_and_state/README.md b/samples/variables_and_state/README.md
index a2d1178..04b0d41 100644
--- a/samples/variables_and_state/README.md
+++ b/samples/variables_and_state/README.md
@@ -87,7 +87,7 @@
 
 For example, to use IREE's `dylib-llvm-aot` target, which is optimized for CPU
 execution using LLVM, refer to the
-[documentation](https://google.github.io/iree/deployment-configurations/cpu-dylib/)
+[documentation](https://google.github.io/iree/deployment-configurations/cpu/)
 and compile the imported `counter.mlir` file using `iree-compile`:
 
 ```