Merging simplified HAL bindings branch. (#18366)

This removes the concept of descriptor sets and pipeline layouts from
the HAL and switches programs to using a flat list of bindings per
dispatch as real programs rarely benefited and will do so less as
command buffer reuse is enabled. Command buffer recording is now
stateless as the information in push constants and push descriptor set
commands are carried per dispatch and there's no need to track pipeline
layouts. Pipeline layouts are still present in a reduced form in the
compiler IR in order to handle dispatch ABI in a normalized way but it's
up to the TargetBackends to encode them. Encoded metadata for pipeline
layouts is now embedded in the target-specific executables for targets
that require them (Metal/Vulkan/WebGPU/D3D12) and a reduced set of
information is embedded for others. This simplifies the HAL API quite a
bit, makes implementing the HAL easier as targets have more freedom in
how constants and bindings are mapped to lower-level implementations,
and in practice improves command buffer recording latency as there are
fewer VM calls per dispatch on average.

Since the flatbuffers needed to change to include the new metadata that
previously was handled via the HAL APIs this branch also modernizes and
normalizes the flatbuffers across targets to both better match the
implementation and support features like multiple shader modules/kernel
libraries/etc per HAL executable (even if the compiler isn't linking
them yet). This reorganization is required to effectively manage cached
resources - before the compiler would deduplicate pipeline layouts
across all executables but now that each executable is responsible for
that having 1000 executables means that there will be 1000 pipeline
layouts even if most are the same. Perhaps this will serve as good
motivation to finally finish linking in all backends :) Debug info is
also consistently added for all targets and processed for tracing and
factored such that new debug info can be added per-exported function
without needing to change per-target code.

There are many IR changes here and many test updates: most of the tests
that were updated are in codegen and should not be using HAL ops at all.
As codegen test cleanup continues to switch from HAL ops to basic
functions future changes to the HAL IR will be easier. Notable changes
include:
* Renamed `push_constants` to `constants` (as there is no longer a
`push_constants` API)
* Dropped `#hal.descriptor_set.layout`
* Removed ordinal from `#hal.descriptor_set.binding` (as ordinals are
now implicit)
* Renamed `#hal.descriptor_set.binding` to `#hal.pipeline.binding`
* Removed `set` from `hal.interface.binding.subspan`
* Removed `#hal.interface.binding` and the spooky action at a distance
`hal.interface.binding` attr now that ordinals are implicit

Metal/CUDA/Vulkan/HIP/CPU have all been updated to the new binding
model. WebGPU has had some changes applied but needs some significant
specialized work due to its existing push constant emulation requiring
compiler-side descriptor sets. That's left for future work when that
experimental backend is revived.

This bumps the HAL version to 5 (types removed and methods changed) and
the CPU executable library version to 5 (added reserved per-export
fields for future use).

Fixes #18154.
diff --git a/build_tools/bazel/iree_flatcc.bzl b/build_tools/bazel/iree_flatcc.bzl
index 80e66d6..de5ddb7 100644
--- a/build_tools/bazel/iree_flatcc.bzl
+++ b/build_tools/bazel/iree_flatcc.bzl
@@ -10,12 +10,14 @@
         name,
         srcs,
         flatcc_args = ["--common", "--reader"],
+        includes = [],
         testonly = False,
         **kwargs):
     flatcc = "@com_github_dvidelabs_flatcc//:flatcc"
 
     flags = [
         "-o$(RULEDIR)",
+        "-I runtime/src",
     ] + flatcc_args
 
     out_stem = "%s" % (srcs[0].replace(".fbs", ""))
@@ -34,10 +36,10 @@
 
     native.genrule(
         name = name + "_gen",
-        srcs = srcs,
+        srcs = srcs + includes,
         outs = outs,
         tools = [flatcc],
-        cmd = "$(location %s) %s $(SRCS)" % (flatcc, " ".join(flags)),
+        cmd = "$(location %s) %s %s" % (flatcc, " ".join(flags), " ".join(["$(location {})".format(src) for src in srcs])),
         testonly = testonly,
     )
     native.cc_library(
diff --git a/build_tools/bazel_to_cmake/bazel_to_cmake_converter.py b/build_tools/bazel_to_cmake/bazel_to_cmake_converter.py
index 9a1796b..eb5d2b1 100644
--- a/build_tools/bazel_to_cmake/bazel_to_cmake_converter.py
+++ b/build_tools/bazel_to_cmake/bazel_to_cmake_converter.py
@@ -662,16 +662,18 @@
             f"  PUBLIC\n)\n\n"
         )
 
-    def iree_flatbuffer_c_library(self, name, srcs, flatcc_args=None):
+    def iree_flatbuffer_c_library(self, name, srcs, flatcc_args=None, includes=None):
         name_block = self._convert_string_arg_block("NAME", name, quote=False)
         srcs_block = self._convert_srcs_block(srcs)
         flatcc_args_block = self._convert_string_list_block("FLATCC_ARGS", flatcc_args)
+        includes_block = self._convert_srcs_block(includes, block_name="INCLUDES")
 
         self._converter.body += (
             f"flatbuffer_c_library(\n"
             f"{name_block}"
             f"{srcs_block}"
             f"{flatcc_args_block}"
+            f"{includes_block}"
             f"  PUBLIC\n)\n\n"
         )
 
diff --git a/build_tools/cmake/flatbuffer_c_library.cmake b/build_tools/cmake/flatbuffer_c_library.cmake
index fe0913c..2016cdf 100644
--- a/build_tools/cmake/flatbuffer_c_library.cmake
+++ b/build_tools/cmake/flatbuffer_c_library.cmake
@@ -48,7 +48,7 @@
   cmake_parse_arguments(_RULE
     "PUBLIC;TESTONLY"
     "NAME"
-    "SRCS;FLATCC_ARGS"
+    "SRCS;FLATCC_ARGS;INCLUDES"
     ${ARGN}
   )
 
@@ -94,6 +94,7 @@
       iree-flatcc-cli
           -o "${CMAKE_CURRENT_BINARY_DIR}"
           -I "${IREE_ROOT_DIR}"
+          -I "${IREE_ROOT_DIR}/runtime/src"
           ${_RULE_FLATCC_ARGS}
           "${_RULE_SRCS}"
     WORKING_DIRECTORY
diff --git a/compiler/plugins/target/CUDA/BUILD.bazel b/compiler/plugins/target/CUDA/BUILD.bazel
index 2d475bf..b694187 100644
--- a/compiler/plugins/target/CUDA/BUILD.bazel
+++ b/compiler/plugins/target/CUDA/BUILD.bazel
@@ -33,11 +33,13 @@
         "//compiler/src/iree/compiler/Codegen/LLVMGPU",
         "//compiler/src/iree/compiler/Codegen/Utils",
         "//compiler/src/iree/compiler/Dialect/HAL/Target",
+        "//compiler/src/iree/compiler/Dialect/HAL/Utils:ExecutableDebugInfoUtils",
         "//compiler/src/iree/compiler/Dialect/HAL/Utils:LLVMLinkerUtils",
         "//compiler/src/iree/compiler/PluginAPI",
         "//compiler/src/iree/compiler/Utils",
         "//runtime/src/iree/base/internal/flatcc:building",
         "//runtime/src/iree/schemas:cuda_executable_def_c_fbs",
+        "//runtime/src/iree/schemas:executable_debug_info_c_fbs",
         "@iree_cuda//:libdevice_embedded",
         "@llvm-project//llvm:Analysis",
         "@llvm-project//llvm:BitReader",
diff --git a/compiler/plugins/target/CUDA/CMakeLists.txt b/compiler/plugins/target/CUDA/CMakeLists.txt
index 214f78b..70c6dc6 100644
--- a/compiler/plugins/target/CUDA/CMakeLists.txt
+++ b/compiler/plugins/target/CUDA/CMakeLists.txt
@@ -57,10 +57,12 @@
     iree::compiler::Codegen::LLVMGPU
     iree::compiler::Codegen::Utils
     iree::compiler::Dialect::HAL::Target
+    iree::compiler::Dialect::HAL::Utils::ExecutableDebugInfoUtils
     iree::compiler::Dialect::HAL::Utils::LLVMLinkerUtils
     iree::compiler::PluginAPI
     iree::compiler::Utils
     iree::schemas::cuda_executable_def_c_fbs
+    iree::schemas::executable_debug_info_c_fbs
     iree_cuda::libdevice_embedded
   PUBLIC
 )
diff --git a/compiler/plugins/target/CUDA/CUDATarget.cpp b/compiler/plugins/target/CUDA/CUDATarget.cpp
index 331b5a1..18896f2 100644
--- a/compiler/plugins/target/CUDA/CUDATarget.cpp
+++ b/compiler/plugins/target/CUDA/CUDATarget.cpp
@@ -10,6 +10,7 @@
 #include "iree/compiler/Codegen/LLVMGPU/Passes.h"
 #include "iree/compiler/Codegen/Utils/GPUUtils.h"
 #include "iree/compiler/Dialect/HAL/Target/TargetRegistry.h"
+#include "iree/compiler/Dialect/HAL/Utils/ExecutableDebugInfoUtils.h"
 #include "iree/compiler/Dialect/HAL/Utils/LLVMLinkerUtils.h"
 #include "iree/compiler/PluginAPI/Client.h"
 #include "iree/compiler/Utils/FlatbufferUtils.h"
@@ -54,7 +55,6 @@
 
 namespace {
 struct CUDAOptions {
-  bool dumpPtx = false;
   std::string clTarget = "sm_60";
   std::string clTargetFeatures = "+ptx76";
   bool clUsePtxas = false;
@@ -63,8 +63,6 @@
 
   void bindOptions(OptionsBinder &binder) {
     static llvm::cl::OptionCategory category("CUDA HAL Target");
-    binder.opt<bool>("iree-cuda-dump-ptx", dumpPtx, llvm::cl::cat(category),
-                     llvm::cl::desc("Dump ptx to the debug stream."));
 
     binder.opt<std::string>(
         "iree-cuda-target", clTarget, llvm::cl::cat(category),
@@ -258,26 +256,14 @@
   return ptxImage;
 }
 
-static void dumpLLVMModuleToPath(StringRef path, StringRef baseName,
-                                 StringRef suffix, StringRef extPrefix,
-                                 llvm::Module &module) {
-  // Dump disassembly to path.
-  llvm::SmallVector<char> textData;
-  llvm::raw_svector_ostream textOstream(textData);
-
-  module.print(textOstream, nullptr);
-  std::string textExtension = extPrefix.str() + ".ll";
-  dumpDataToPath(path, baseName, suffix, textExtension,
-                 StringRef(textData.data(), textData.size()));
-
-  // Dump bitcode to path.
-  llvm::SmallVector<char> binaryData;
-  llvm::raw_svector_ostream binaryOstream(binaryData);
-  // Write the specified module to the specified output stream.
-  llvm::WriteBitcodeToFile(module, binaryOstream);
-  std::string binaryExtension = extPrefix.str() + ".bc";
-  dumpDataToPath(path, baseName, suffix, binaryExtension,
-                 StringRef(binaryData.data(), binaryData.size()));
+static void dumpModuleToPath(StringRef path, StringRef baseName,
+                             StringRef suffix, StringRef extension,
+                             llvm::Module &module) {
+  llvm::SmallVector<char, 0> data;
+  llvm::raw_svector_ostream ostream(data);
+  module.print(ostream, nullptr);
+  dumpDataToPath(path, baseName, suffix, extension,
+                 StringRef(data.data(), data.size()));
 }
 
 static std::string translateModuleToISA(llvm::Module &module,
@@ -398,19 +384,22 @@
   getDefaultDeviceTarget(MLIRContext *context,
                          const TargetRegistry &targetRegistry) const override {
     Builder b(context);
-    SmallVector<NamedAttribute> configItems;
 
-    // TODO: device configuration attrs.
-    auto configAttr = b.getDictionaryAttr(configItems);
+    SmallVector<NamedAttribute> deviceConfigAttrs;
+    auto deviceConfigAttr = b.getDictionaryAttr(deviceConfigAttrs);
+
+    SmallVector<NamedAttribute> executableConfigAttrs;
+    auto executableConfigAttr = b.getDictionaryAttr(executableConfigAttrs);
 
     // If we had multiple target environments we would generate one target attr
     // per environment, with each setting its own environment attribute.
     SmallVector<IREE::HAL::ExecutableTargetAttr> executableTargetAttrs;
     targetRegistry.getTargetBackend("cuda")->getDefaultExecutableTargets(
-        context, "cuda", configAttr, executableTargetAttrs);
+        context, "cuda", executableConfigAttr, executableTargetAttrs);
 
     return IREE::HAL::DeviceTargetAttr::get(context, b.getStringAttr("cuda"),
-                                            configAttr, executableTargetAttrs);
+                                            deviceConfigAttr,
+                                            executableTargetAttrs);
   }
 
 private:
@@ -475,6 +464,7 @@
   LogicalResult serializeExecutable(const SerializationOptions &serOptions,
                                     IREE::HAL::ExecutableVariantOp variantOp,
                                     OpBuilder &executableBuilder) override {
+    ModuleOp innerModuleOp = variantOp.getInnerModule();
     auto targetAttr = variantOp.getTargetAttr();
     StringRef targetArch = options.clTarget;
     StringRef targetFeatures = options.clTargetFeatures;
@@ -483,10 +473,6 @@
       targetFeatures = attr.getFeatures();
     }
 
-    // Perform the translation in a separate context to avoid any
-    // multi-threading issues.
-    llvm::LLVMContext context;
-
     // We name our files after the executable name so that they are easy to
     // track both during compilation (logs/artifacts/etc), as outputs (final
     // intermediate code/binary files), and at runtime (loaded
@@ -494,37 +480,16 @@
     auto libraryName =
         variantOp->getParentOfType<IREE::HAL::ExecutableOp>().getName().str();
 
-    // TODO(thomasraoux): property handle export ordinals; this code is assuming
-    // that ordinals are dense starting at 0 but that is not required.
-
-    // Collect all the entry point parameters.
-    SmallVector<std::array<int32_t, 3>> workgroupSizes;
-    SmallVector<uint32_t> workgroupLocalMemories;
-    for (auto exportOp : variantOp.getExportOps()) {
-      std::array<int32_t, 3> workgroupSize;
-      if (std::optional<ArrayAttr> workgroupSizeAttr =
-              exportOp.getWorkgroupSize()) {
-        for (auto it : llvm::enumerate(workgroupSizeAttr.value())) {
-          workgroupSize[it.index()] =
-              llvm::cast<IntegerAttr>(it.value()).getInt();
-        }
-      } else {
-        workgroupSize = {1, 1, 1};
-      }
-      workgroupSizes.push_back(workgroupSize);
-      uint32_t workgroupLocalMemory = 0;
-      if (auto workgroupLocalMemoryAttr = exportOp.getWorkgroupLocalMemory()) {
-        workgroupLocalMemory = workgroupLocalMemoryAttr->getSExtValue();
-      }
-      workgroupLocalMemories.push_back(workgroupLocalMemory);
+    // Collect all the entry point names.
+    auto exportOps = llvm::to_vector_of<IREE::HAL::ExecutableExportOp>(
+        variantOp.getExportOps());
+    llvm::StringMap<IREE::HAL::ExecutableExportOp> exportOpMap;
+    for (IREE::HAL::ExecutableExportOp exportOp : exportOps) {
+      exportOpMap[exportOp.getSymName()] = exportOp;
     }
 
-    FlatbufferBuilder builder;
-    iree_hal_cuda_ExecutableDef_start_as_root(builder);
-
-    SmallVector<std::string> entryPointNames;
-    std::string ptxImage;
-    SmallVector<iree_hal_cuda_FileLineLocDef_ref_t> sourceLocationRefs;
+    std::array<int32_t, 3> maxWorkgroupSize = {1, 1, 1};
+    std::string targetPTX;
     if (variantOp.isExternal()) {
       if (!variantOp.getObjects().has_value()) {
         return variantOp.emitOpError()
@@ -537,41 +502,35 @@
                                           "supported for external variants";
       }
 
-      // Take exported names verbatim. The user must have already sanitized
-      // these to match the names in their kernels. We don't support any kind of
-      // mangling and if the user was silly enough to rely on nvcc C++ mangling
-      // they'll have to figure that out.
-      for (auto exportOp : variantOp.getExportOps()) {
-        entryPointNames.emplace_back(exportOp.getSymName());
-      }
-
+      // Read the PTX from the object file.
       auto objectAttr = llvm::cast<IREE::HAL::ExecutableObjectAttr>(
           variantOp.getObjects()->getValue().front());
       if (auto data = objectAttr.loadData()) {
-        ptxImage = data.value();
+        targetPTX = data.value();
       } else {
         return variantOp.emitOpError()
                << "object file could not be loaded: " << objectAttr;
       }
     } else {
-      ModuleOp innerModuleOp = variantOp.getInnerModule();
+      // Perform the translation in a separate context to avoid any
+      // multi-threading issues.
+      llvm::LLVMContext context;
 
-      auto llvmModule =
+      std::unique_ptr<llvm::Module> llvmModule =
           mlir::translateModuleToLLVMIR(innerModuleOp, context, libraryName);
       if (!llvmModule) {
         return variantOp.emitError() << "failed to translate the MLIR LLVM "
                                         "dialect to the native llvm::Module";
       }
 
-      for (auto [exportOp, workgroupSize] :
-           llvm::zip_equal(variantOp.getExportOps(), workgroupSizes)) {
-        auto *llvmFunc = llvmModule->getFunction(exportOp.getName());
-        if (llvmFunc->isDeclaration())
+      for (auto funcOp : innerModuleOp.getOps<LLVM::LLVMFuncOp>()) {
+        llvm::Function *llvmFunc = llvmModule->getFunction(funcOp.getName());
+        if (llvmFunc->isDeclaration()) {
           continue;
+        }
 
-        // setName will make sure the function name is unique.
-        llvmFunc->setName(sanitizeSymbolName(exportOp.getName()));
-        entryPointNames.emplace_back(llvmFunc->getName());
+        // Sanitize the function name as PTX has strict requirements.
+        llvmFunc->setName(sanitizeSymbolName(funcOp.getName()));
 
         auto *annotations =
             llvmModule->getOrInsertNamedMetadata("nvvm.annotations");
@@ -584,20 +543,28 @@
           annotations->addOperand(
               llvm::MDNode::get(llvmModule->getContext(), llvmMetadata));
         };
+
         // Mark the entry point as a kernel.
         setMetadataValueI32("kernel", 1);
-        // Set the maximum number of threads in the thread block (CTA).
-        setMetadataValueI32("maxntidx", workgroupSize[0]);
-        setMetadataValueI32("maxntidy", workgroupSize[1]);
-        setMetadataValueI32("maxntidz", workgroupSize[2]);
 
-        // Optional source location information for debugging/profiling.
-        if (serOptions.debugLevel >= 1) {
-          if (auto loc = findFirstFileLoc(exportOp.getLoc())) {
-            auto filenameRef = builder.createString(loc->getFilename());
-            sourceLocationRefs.push_back(iree_hal_cuda_FileLineLocDef_create(
-                builder, filenameRef, loc->getLine()));
-          }
+        // Set the maximum number of threads in the thread block (CTA).
+        auto exportOp = exportOpMap[funcOp.getName()];
+        if (auto workgroupSizeAttr = exportOp.getWorkgroupSize()) {
+          auto workgroupSizeValues = workgroupSizeAttr->getValue();
+          std::array<int32_t, 3> workgroupSize = {
+              static_cast<int32_t>(
+                  cast<IntegerAttr>(workgroupSizeValues[0]).getInt()),
+              static_cast<int32_t>(
+                  cast<IntegerAttr>(workgroupSizeValues[1]).getInt()),
+              static_cast<int32_t>(
+                  cast<IntegerAttr>(workgroupSizeValues[2]).getInt()),
+          };
+          maxWorkgroupSize[0] = std::max(maxWorkgroupSize[0], workgroupSize[0]);
+          maxWorkgroupSize[1] = std::max(maxWorkgroupSize[1], workgroupSize[1]);
+          maxWorkgroupSize[2] = std::max(maxWorkgroupSize[2], workgroupSize[2]);
+          setMetadataValueI32("maxntidx", workgroupSize[0]);
+          setMetadataValueI32("maxntidy", workgroupSize[1]);
+          setMetadataValueI32("maxntidz", workgroupSize[2]);
         }
       }
 
@@ -617,11 +584,10 @@
         }
       }
 
-      // Dump just the codegen bitcode before linking and optimization.
-      if (!serOptions.dumpIntermediatesPath.empty()) {
-        dumpLLVMModuleToPath(serOptions.dumpIntermediatesPath,
-                             serOptions.dumpBaseName, variantOp.getName(),
-                             ".codegen", *llvmModule);
+      llvmModule->setDataLayout(targetMachine->createDataLayout());
+
+      for (llvm::Function &llvmFunc : llvmModule->functions()) {
+        llvmFunc.addFnAttr(llvm::Attribute::AlwaysInline);
       }
 
       // Link user and device bitcode alongside the generated module.
@@ -630,76 +596,123 @@
         return failure();
       }
 
-      // Dump all linked bitcode prior to optimization.
       if (!serOptions.dumpIntermediatesPath.empty()) {
-        dumpLLVMModuleToPath(serOptions.dumpIntermediatesPath,
-                             serOptions.dumpBaseName, variantOp.getName(),
-                             ".linked", *llvmModule);
+        dumpModuleToPath(serOptions.dumpIntermediatesPath,
+                         serOptions.dumpBaseName, variantOp.getName(),
+                         ".linked.ll", *llvmModule);
       }
 
-      std::array<int32_t, 3> maxWorkgroupSize = {1, 1, 1};
-      for (int64_t i = 0, e = workgroupSizes.size(); i < e; i++) {
-        for (int64_t j = 0; j < maxWorkgroupSize.size(); j++) {
-          maxWorkgroupSize[j] =
-              std::max(maxWorkgroupSize[j], workgroupSizes[i][j]);
-        }
-      }
-      // Run LTO-style full optimization on the linked modules.
+      // Run LLVM optimization passes.
       optimizeModule(*llvmModule, *targetMachine, maxWorkgroupSize);
-
-      // Dump bitcode post-linking and optimization.
       if (!serOptions.dumpIntermediatesPath.empty()) {
-        dumpLLVMModuleToPath(serOptions.dumpIntermediatesPath,
-                             serOptions.dumpBaseName, variantOp.getName(),
-                             ".optimized", *llvmModule);
+        dumpModuleToPath(serOptions.dumpIntermediatesPath,
+                         serOptions.dumpBaseName, variantOp.getName(),
+                         ".optimized.ll", *llvmModule);
       }
 
-      // Serialize CUDA kernel into the binary that we will embed in the
+      // Serialize ptx kernel into the binary that we will embed in the
       // final FlatBuffer.
-      ptxImage = translateModuleToISA(*llvmModule, *targetMachine);
+      targetPTX = translateModuleToISA(*llvmModule, *targetMachine);
+      if (targetPTX.empty()) {
+        return failure();
+      }
     }
 
-    if (options.dumpPtx) {
-      llvm::dbgs() << ptxImage;
-    }
     if (!serOptions.dumpBinariesPath.empty()) {
       dumpDataToPath(serOptions.dumpBinariesPath, serOptions.dumpBaseName,
-                     variantOp.getName(), ".ptx", ptxImage);
+                     variantOp.getName(), ".ptx", targetPTX);
     }
 
-    std::string gpuImage = produceGpuImage(options, targetArch, ptxImage);
-    auto gpuImageRef =
-        flatbuffers_string_create(builder, gpuImage.c_str(), gpuImage.size());
-    iree_hal_cuda_BlockSizeDef_vec_start(builder);
-    for (const auto &workgroupSize : workgroupSizes) {
-      iree_hal_cuda_BlockSizeDef_vec_push_create(
-          builder, workgroupSize[0], workgroupSize[1], workgroupSize[2]);
-    }
-    auto blockSizesRef = iree_hal_cuda_BlockSizeDef_vec_end(builder);
-    auto workgroupLocalMemoriesRef =
-        builder.createInt32Vec(workgroupLocalMemories);
-    auto entryPointsRef = builder.createStringVec(entryPointNames);
+    FlatbufferBuilder builder;
+    iree_hal_cuda_ExecutableDef_start_as_root(builder);
 
-    iree_hal_cuda_ExecutableDef_entry_points_add(builder, entryPointsRef);
-    iree_hal_cuda_ExecutableDef_block_sizes_add(builder, blockSizesRef);
-    iree_hal_cuda_ExecutableDef_shared_memory_size_add(
-        builder, workgroupLocalMemoriesRef);
-    iree_hal_cuda_ExecutableDef_ptx_image_add(builder, gpuImageRef);
-    if (!sourceLocationRefs.empty()) {
-      auto sourceLocationsRef =
-          builder.createOffsetVecDestructive(sourceLocationRefs);
-      iree_hal_cuda_ExecutableDef_source_locations_add(builder,
-                                                       sourceLocationsRef);
+    auto sourceFilesRef = createSourceFilesVec(
+        serOptions.debugLevel, variantOp.getSourcesAttr(), builder);
+
+    // Only a single module today.
+    SmallVector<iree_hal_cuda_ModuleDef_ref_t> moduleRefs;
+    {
+      auto ptxImageRef = flatbuffers_string_create(builder, targetPTX.c_str(),
+                                                   targetPTX.size());
+      moduleRefs.push_back(
+          iree_hal_cuda_ModuleDef_create(builder, ptxImageRef));
     }
+    auto modulesRef = builder.createOffsetVecDestructive(moduleRefs);
+
+    // Generate optional per-export debug information.
+    // May be empty if no debug information was requested.
+    auto exportDebugInfos =
+        createExportDefs(serOptions.debugLevel, exportOps, builder);
+
+    SmallVector<iree_hal_cuda_ExportDef_ref_t> exportRefs;
+    exportRefs.resize(exportOps.size(), 0);
+    for (auto exportOp : exportOps) {
+      auto ordinalAttr = exportOp.getOrdinalAttr();
+      if (!ordinalAttr) {
+        return mlir::emitError(exportOp.getLoc())
+               << "could not compile rocm binary: export op is missing ordinal";
+      }
+      int64_t ordinal = ordinalAttr.getInt();
+
+      auto kernelNameRef =
+          builder.createString(sanitizeSymbolName(exportOp.getName()));
+
+      iree_hal_cuda_BlockDims_t blockDims = {0};
+      if (auto workgroupSizeAttr = exportOp.getWorkgroupSize()) {
+        auto workgroupSize = workgroupSizeAttr->getValue();
+        blockDims.x = cast<IntegerAttr>(workgroupSize[0]).getInt();
+        blockDims.y = cast<IntegerAttr>(workgroupSize[1]).getInt();
+        blockDims.z = cast<IntegerAttr>(workgroupSize[2]).getInt();
+      }
+
+      uint32_t blockSharedMemorySize = 0;
+      if (std::optional<APInt> workgroupLocalMemoryAttr =
+              exportOp.getWorkgroupLocalMemory()) {
+        blockSharedMemorySize = workgroupLocalMemoryAttr->getSExtValue();
+      }
+
+      auto layoutAttr = exportOp.getLayoutAttr();
+      uint32_t constantCount = static_cast<uint32_t>(layoutAttr.getConstants());
+      SmallVector<iree_hal_cuda_BindingBits_enum_t> bindingFlags;
+      for (auto bindingAttr : layoutAttr.getBindings()) {
+        iree_hal_cuda_BindingBits_enum_t flags = 0;
+        if (allEnumBitsSet(bindingAttr.getFlags(),
+                           IREE::HAL::DescriptorFlags::ReadOnly)) {
+          flags |= iree_hal_cuda_BindingBits_READ_ONLY;
+        }
+        if (allEnumBitsSet(bindingAttr.getFlags(),
+                           IREE::HAL::DescriptorFlags::Indirect)) {
+          flags |= iree_hal_cuda_BindingBits_INDIRECT;
+        }
+        bindingFlags.push_back(flags);
+      }
+      auto bindingFlagsRef = iree_hal_cuda_BindingBits_vec_create(
+          builder, bindingFlags.data(), bindingFlags.size());
+
+      iree_hal_cuda_ExportDef_start(builder);
+      iree_hal_cuda_ExportDef_module_ordinal_add(builder, 0); // always 0 today
+      iree_hal_cuda_ExportDef_kernel_name_add(builder, kernelNameRef);
+      iree_hal_cuda_ExportDef_block_dims_add(builder, &blockDims);
+      iree_hal_cuda_ExportDef_block_shared_memory_size_add(
+          builder, blockSharedMemorySize);
+      iree_hal_cuda_ExportDef_constant_count_add(builder, constantCount);
+      iree_hal_cuda_ExportDef_binding_flags_add(builder, bindingFlagsRef);
+      iree_hal_cuda_ExportDef_debug_info_add(builder,
+                                             exportDebugInfos[ordinal]);
+      exportRefs[ordinal] = iree_hal_cuda_ExportDef_end(builder);
+    }
+    auto exportsRef = builder.createOffsetVecDestructive(exportRefs);
+
+    iree_hal_cuda_ExecutableDef_exports_add(builder, exportsRef);
+    iree_hal_cuda_ExecutableDef_modules_add(builder, modulesRef);
+    iree_hal_cuda_ExecutableDef_source_files_add(builder, sourceFilesRef);
     iree_hal_cuda_ExecutableDef_end_as_root(builder);
 
     // Add the binary data to the target executable.
-    auto binaryOp = executableBuilder.create<IREE::HAL::ExecutableBinaryOp>(
+    executableBuilder.create<IREE::HAL::ExecutableBinaryOp>(
         variantOp.getLoc(), variantOp.getSymName(),
         variantOp.getTarget().getFormat(),
         builder.getBufferAttr(executableBuilder.getContext()));
-    binaryOp.setMimeTypeAttr(
-        executableBuilder.getStringAttr("application/x-flatbuffers"));
 
     return success();
   }
diff --git a/compiler/plugins/target/CUDA/test/smoketest.mlir b/compiler/plugins/target/CUDA/test/smoketest.mlir
index 7843f1f..6e6fa94 100644
--- a/compiler/plugins/target/CUDA/test/smoketest.mlir
+++ b/compiler/plugins/target/CUDA/test/smoketest.mlir
@@ -1,5 +1,5 @@
 // RUN: iree-opt --split-input-file --iree-hal-transformation-pipeline --iree-gpu-test-target=sm_60 %s | FileCheck %s
-// RUN: iree-opt --split-input-file --iree-hal-transformation-pipeline --iree-gpu-test-target=sm_60 --iree-cuda-dump-ptx %s 2>&1 | FileCheck %s --check-prefix=PTX
+// RUN: iree-opt --split-input-file --iree-hal-transformation-pipeline --iree-gpu-test-target=sm_60 --iree-hal-dump-executable-binaries-to=- %s 2>&1 | FileCheck %s --check-prefix=PTX
 
 #map = affine_map<(d0) -> (d0)>
 
diff --git a/compiler/plugins/target/LLVMCPU/LLVMCPUTarget.cpp b/compiler/plugins/target/LLVMCPU/LLVMCPUTarget.cpp
index 3a6337e..7db50ac 100644
--- a/compiler/plugins/target/LLVMCPU/LLVMCPUTarget.cpp
+++ b/compiler/plugins/target/LLVMCPU/LLVMCPUTarget.cpp
@@ -398,10 +398,10 @@
 
       // Specify the constant and binding information used to validate
       // dispatches.
-      // TODO(#18189): pack per-binding information bitfields.
-      dispatchAttrs.constantCount = exportOp.getLayout().getPushConstants();
-      dispatchAttrs.bindingCount =
-          exportOp.getLayout().getSetLayout(0).getBindings().size();
+      if (auto layoutAttr = exportOp.getLayout()) {
+        dispatchAttrs.constantCount = layoutAttr.getConstants();
+        dispatchAttrs.bindingCount = layoutAttr.getBindings().size();
+      }
 
       LibraryBuilder::SourceLocation sourceLocation;
       if (options.debugLevel >= 1) {
diff --git a/compiler/plugins/target/LLVMCPU/LibraryBuilder.cpp b/compiler/plugins/target/LLVMCPU/LibraryBuilder.cpp
index 21621b9..a4402ed 100644
--- a/compiler/plugins/target/LLVMCPU/LibraryBuilder.cpp
+++ b/compiler/plugins/target/LLVMCPU/LibraryBuilder.cpp
@@ -112,7 +112,9 @@
 // %struct.iree_hal_executable_dispatch_attrs_v0_t = type {
 //   i16,
 //   i8,
-//   i8
+//   i8,
+//   i32,
+//   i64[8]
 // }
 static llvm::StructType *makeDispatchAttrsType(llvm::LLVMContext &context) {
   if (auto *existingType = llvm::StructType::getTypeByName(
@@ -121,12 +123,20 @@
   }
   auto *i8Type = llvm::IntegerType::getInt8Ty(context);
   auto *i16Type = llvm::IntegerType::getInt16Ty(context);
+  auto *i32Type = llvm::IntegerType::getInt32Ty(context);
+  auto *i64Type = llvm::IntegerType::getInt64Ty(context);
   auto *type =
       llvm::StructType::create(context,
                                {
-                                   i16Type,
-                                   i8Type,
-                                   i8Type,
+                                   i16Type, i8Type, i8Type, i32Type,
+                                   i64Type, // [0]
+                                   i64Type, // [1]
+                                   i64Type, // [2]
+                                   i64Type, // [3]
+                                   i64Type, // [4]
+                                   i64Type, // [5]
+                                   i64Type, // [6]
+                                   i64Type, // [7]
                                },
                                "iree_hal_executable_dispatch_attrs_v0_t",
                                /*isPacked=*/false);
@@ -490,6 +500,7 @@
   auto *i8Type = llvm::IntegerType::getInt8Ty(context);
   auto *i16Type = llvm::IntegerType::getInt16Ty(context);
   auto *i32Type = llvm::IntegerType::getInt32Ty(context);
+  auto *i64Type = llvm::IntegerType::getInt64Ty(context);
 
   // iree_hal_executable_export_table_v0_t::ptrs
   SmallVector<llvm::Constant *> exportPtrValues;
@@ -520,6 +531,24 @@
               llvm::ConstantInt::get(i8Type, dispatch.attrs.constantCount),
               // binding_count=
               llvm::ConstantInt::get(i8Type, dispatch.attrs.bindingCount),
+              // reserved_0=
+              llvm::ConstantInt::get(i32Type, 0),
+              // reserved_1[0]=
+              llvm::ConstantInt::get(i64Type, 0),
+              // reserved_1[1]=
+              llvm::ConstantInt::get(i64Type, 0),
+              // reserved_1[2]=
+              llvm::ConstantInt::get(i64Type, 0),
+              // reserved_1[3]=
+              llvm::ConstantInt::get(i64Type, 0),
+              // reserved_1[4]=
+              llvm::ConstantInt::get(i64Type, 0),
+              // reserved_1[5]=
+              llvm::ConstantInt::get(i64Type, 0),
+              // reserved_1[6]=
+              llvm::ConstantInt::get(i64Type, 0),
+              // reserved_1[7]=
+              llvm::ConstantInt::get(i64Type, 0),
           }));
     }
     exportAttrs = createArrayConstant(libraryName + "_attrs", dispatchAttrsType,
diff --git a/compiler/plugins/target/LLVMCPU/LibraryBuilder.h b/compiler/plugins/target/LLVMCPU/LibraryBuilder.h
index 6b1ee87..7998099 100644
--- a/compiler/plugins/target/LLVMCPU/LibraryBuilder.h
+++ b/compiler/plugins/target/LLVMCPU/LibraryBuilder.h
@@ -48,10 +48,11 @@
     // or some semantic versioning we track in whatever spec we end up having.
     V_0_3 = 0x0000'0003u, // v0.3 - ~2022-08-08
     V_0_4 = 0x0000'0004u, // v0.4 - ~2024-03-12
+    V_0_5 = 0x0000'0005u, // v0.5 - ~2024-08-25
 
     // Pinned to the latest version.
     // Requires that the runtime be compiled with the same version.
-    LATEST = V_0_4,
+    LATEST = V_0_5,
   };
 
   // iree_hal_executable_library_features_t
diff --git a/compiler/plugins/target/LLVMCPU/test/materialize_homogeneous_encodings.mlir b/compiler/plugins/target/LLVMCPU/test/materialize_homogeneous_encodings.mlir
index 0d16e3b..1819b0d 100644
--- a/compiler/plugins/target/LLVMCPU/test/materialize_homogeneous_encodings.mlir
+++ b/compiler/plugins/target/LLVMCPU/test/materialize_homogeneous_encodings.mlir
@@ -6,7 +6,7 @@
 #map2 = affine_map<(d0, d1, d2) -> (d2, d1)>
 #map3 = affine_map<(d0, d1, d2) -> (d0, d1)>
 #encoding = #iree_encoding.encoding<operand_index = 0, op_type = matmul, element_types = [f32, f32, f32], user_indexing_maps = [#map1, #map2, #map3], round_dims_to = array<i64: 16, 16, 16>>
-#device_target_llvm_cpu = #hal.device.target<"llvm-cpu", [#executable_target_embedded_elf_x86_64_]> : !hal.device
+#device_target_llvm_cpu = #hal.device.target<"local", [#executable_target_embedded_elf_x86_64_]> : !hal.device
 module attributes {hal.device.targets = [#device_target_llvm_cpu]} {
   util.func public @lhs_encoding(%arg0: tensor<?x?xf32>) -> tensor<?x?xf32> {
     %3 = iree_encoding.set_encoding %arg0 : tensor<?x?xf32> -> tensor<?x?xf32, #encoding>
diff --git a/compiler/plugins/target/LLVMCPU/test/smoketest_embedded.mlir b/compiler/plugins/target/LLVMCPU/test/smoketest_embedded.mlir
index f9e0a4b..abab6cd 100644
--- a/compiler/plugins/target/LLVMCPU/test/smoketest_embedded.mlir
+++ b/compiler/plugins/target/LLVMCPU/test/smoketest_embedded.mlir
@@ -3,7 +3,7 @@
 
 module attributes {
   hal.device.targets = [
-    #hal.device.target<"llvm-cpu", [
+    #hal.device.target<"local", [
       #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {
         native_vector_size = 16 : index
       }>
diff --git a/compiler/plugins/target/LLVMCPU/test/smoketest_system.mlir b/compiler/plugins/target/LLVMCPU/test/smoketest_system.mlir
index d6c6658..2bdc7aa 100644
--- a/compiler/plugins/target/LLVMCPU/test/smoketest_system.mlir
+++ b/compiler/plugins/target/LLVMCPU/test/smoketest_system.mlir
@@ -5,7 +5,7 @@
 
 module attributes {
   hal.device.targets = [
-    #hal.device.target<"llvm-cpu", [
+    #hal.device.target<"local", [
       #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {
         native_vector_size = 16 : index
       }>
diff --git a/compiler/plugins/target/MetalSPIRV/BUILD.bazel b/compiler/plugins/target/MetalSPIRV/BUILD.bazel
index ede5566..ae750cb 100644
--- a/compiler/plugins/target/MetalSPIRV/BUILD.bazel
+++ b/compiler/plugins/target/MetalSPIRV/BUILD.bazel
@@ -31,8 +31,10 @@
         "//compiler/src/iree/compiler/Codegen/Utils",
         "//compiler/src/iree/compiler/Dialect/Flow/IR",
         "//compiler/src/iree/compiler/Dialect/HAL/Target",
+        "//compiler/src/iree/compiler/Dialect/HAL/Utils:ExecutableDebugInfoUtils",
         "//compiler/src/iree/compiler/PluginAPI",
         "//compiler/src/iree/compiler/Utils",
+        "//runtime/src/iree/schemas:executable_debug_info_c_fbs",
         "//runtime/src/iree/schemas:metal_executable_def_c_fbs",
         "@llvm-project//llvm:Support",
         "@llvm-project//llvm:TargetParser",
diff --git a/compiler/plugins/target/MetalSPIRV/CMakeLists.txt b/compiler/plugins/target/MetalSPIRV/CMakeLists.txt
index 678a37a..3a7b0e6 100644
--- a/compiler/plugins/target/MetalSPIRV/CMakeLists.txt
+++ b/compiler/plugins/target/MetalSPIRV/CMakeLists.txt
@@ -41,8 +41,10 @@
     iree::compiler::Codegen::Utils
     iree::compiler::Dialect::Flow::IR
     iree::compiler::Dialect::HAL::Target
+    iree::compiler::Dialect::HAL::Utils::ExecutableDebugInfoUtils
     iree::compiler::PluginAPI
     iree::compiler::Utils
+    iree::schemas::executable_debug_info_c_fbs
     iree::schemas::metal_executable_def_c_fbs
   PUBLIC
 )
diff --git a/compiler/plugins/target/MetalSPIRV/MetalSPIRVTarget.cpp b/compiler/plugins/target/MetalSPIRV/MetalSPIRVTarget.cpp
index 4fa2b03..eb8918d 100644
--- a/compiler/plugins/target/MetalSPIRV/MetalSPIRVTarget.cpp
+++ b/compiler/plugins/target/MetalSPIRV/MetalSPIRVTarget.cpp
@@ -12,6 +12,7 @@
 #include "iree/compiler/Codegen/SPIRV/Passes.h"
 #include "iree/compiler/Dialect/Flow/IR/FlowDialect.h"
 #include "iree/compiler/Dialect/HAL/Target/TargetRegistry.h"
+#include "iree/compiler/Dialect/HAL/Utils/ExecutableDebugInfoUtils.h"
 #include "iree/compiler/PluginAPI/Client.h"
 #include "iree/compiler/Utils/FlatbufferUtils.h"
 #include "iree/schemas/metal_executable_def_builder.h"
@@ -131,6 +132,18 @@
                                     IREE::HAL::ExecutableVariantOp variantOp,
                                     OpBuilder &executableBuilder) override {
     ModuleOp innerModuleOp = variantOp.getInnerModule();
+
+    // TODO: rework this to compile all modules into the same metallib and
+    // source the entry points from them. Or use a linking tool (metal-ar) to
+    // link the compiled metallibs together. If we were not using spirv-cross
+    // we'd never do it like this with one module per function.
+    //
+    // Currently this is _really_ bad because it doesn't support linking like
+    // the Vulkan SPIR-V target: that allows multiple spirv::ModuleOps so we
+    // at least only have a single HAL executable; this should all be reworked
+    // to have multiple SPIR-V modules in a single executable and then even if
+    // passing through spirv-cross independently should link the resulting
+    // metallibs together.
     auto spvModuleOp = *innerModuleOp.getOps<spirv::ModuleOp>().begin();
     if (!serOptions.dumpIntermediatesPath.empty()) {
       std::string assembly;
@@ -140,14 +153,6 @@
                      variantOp.getName(), ".mlir", assembly);
     }
 
-    // The runtime use ordinals instead of names but Metal requires function
-    // names for constructing pipeline states. Get an ordered list of the entry
-    // point names.
-    SmallVector<StringRef, 8> spirvEntryPointNames;
-    spvModuleOp.walk([&](spirv::EntryPointOp exportOp) {
-      spirvEntryPointNames.push_back(exportOp.getFn());
-    });
-
     // 1. Serialize the spirv::ModuleOp into binary format.
     SmallVector<uint32_t, 0> spvBinary;
     if (failed(spirv::serialize(spvModuleOp, spvBinary))) {
@@ -159,6 +164,14 @@
                                ".spv", spvBinary);
     }
 
+    // The runtime use ordinals instead of names but Metal requires function
+    // names for constructing pipeline states. Get an ordered list of the entry
+    // point names.
+    SmallVector<StringRef, 8> spirvEntryPointNames;
+    spvModuleOp.walk([&](spirv::EntryPointOp exportOp) {
+      spirvEntryPointNames.push_back(exportOp.getFn());
+    });
+
     // 2. Cross compile SPIR-V to MSL source code.
     SmallVector<MetalShader, 2> mslShaders;
     SmallVector<std::string, 2> mslEntryPointNames;
@@ -187,15 +200,17 @@
     }
 
     // 3. Compile MSL to MTLLibrary.
-    SmallVector<std::unique_ptr<llvm::MemoryBuffer>> metalLibs;
+    SmallVector<std::unique_ptr<llvm::MemoryBuffer>> metallibs;
+    metallibs.resize(mslShaders.size());
     if (options.compileToMetalLib) {
       // We need to use offline Metal shader compilers.
       // TODO(#14048): The toolchain can also exist on other platforms. Probe
       // the PATH instead.
       auto hostTriple = llvm::Triple(llvm::sys::getProcessTriple());
       if (hostTriple.isMacOSX()) {
-        for (auto [shader, entryPoint] :
-             llvm::zip(mslShaders, mslEntryPointNames)) {
+        for (auto [i, shader, entryPoint] :
+             llvm::zip_equal(llvm::seq(mslShaders.size()), mslShaders,
+                             mslEntryPointNames)) {
           std::unique_ptr<llvm::MemoryBuffer> lib = compileMSLToMetalLib(
               options.targetPlatform, shader.source, entryPoint);
           if (!lib) {
@@ -203,7 +218,7 @@
                    << "failed to compile to MTLLibrary from MSL:\n\n"
                    << shader.source << "\n\n";
           }
-          metalLibs.push_back(std::move(lib));
+          metallibs[i] = std::move(lib);
         }
       }
     }
@@ -212,36 +227,88 @@
     FlatbufferBuilder builder;
     iree_hal_metal_ExecutableDef_start_as_root(builder);
 
-    auto entryPointNamesRef = builder.createStringVec(mslEntryPointNames);
-    iree_hal_metal_ExecutableDef_entry_points_add(builder, entryPointNamesRef);
+    // Attach embedded source file contents.
+    auto sourceFilesRef = createSourceFilesVec(
+        serOptions.debugLevel, variantOp.getSourcesAttr(), builder);
 
-    iree_hal_metal_ThreadgroupSize_vec_start(builder);
-    for (auto &shader : mslShaders) {
-      iree_hal_metal_ThreadgroupSize_vec_push_create(
-          builder, shader.threadgroupSize.x, shader.threadgroupSize.y,
-          shader.threadgroupSize.z);
+    // Each library may provide multiple functions so we encode them
+    // independently.
+    SmallVector<iree_hal_metal_LibraryDef_ref_t> libraryRefs;
+    for (auto [shader, metallib] : llvm::zip_equal(mslShaders, metallibs)) {
+      const bool embedSource = !metallib || serOptions.debugLevel > 1;
+      iree_hal_metal_MSLSourceDef_ref_t sourceRef = 0;
+      if (embedSource) {
+        // TODO: pull this from an attribute?
+        // https://developer.apple.com/documentation/metal/mtllanguageversion
+        unsigned version = 196608; // MTLLanguageVersion3_0
+        auto sourceStrRef = builder.createString(shader.source);
+        sourceRef =
+            iree_hal_metal_MSLSourceDef_create(builder, version, sourceStrRef);
+      }
+      flatbuffers_string_ref_t metallibRef = 0;
+      if (metallib) {
+        metallibRef = flatbuffers_string_create(
+            builder, metallib->getBufferStart(), metallib->getBufferSize());
+      }
+      iree_hal_metal_LibraryDef_start(builder);
+      iree_hal_metal_LibraryDef_source_add(builder, sourceRef);
+      iree_hal_metal_LibraryDef_metallib_add(builder, metallibRef);
+      libraryRefs.push_back(iree_hal_metal_LibraryDef_end(builder));
     }
-    auto threadgroupSizesRef = iree_hal_metal_ThreadgroupSize_vec_end(builder);
-    iree_hal_metal_ExecutableDef_threadgroup_sizes_add(builder,
-                                                       threadgroupSizesRef);
+    auto librariesRef = builder.createOffsetVecDestructive(libraryRefs);
 
-    if (metalLibs.empty()) {
-      auto shaderSourcesRef = builder.createStringVec(
-          llvm::map_range(mslShaders, [&](const MetalShader &shader) {
-            return shader.source;
-          }));
-      iree_hal_metal_ExecutableDef_shader_sources_add(builder,
-                                                      shaderSourcesRef);
-    } else {
-      auto refs = llvm::to_vector<8>(llvm::map_range(
-          metalLibs, [&](const std::unique_ptr<llvm::MemoryBuffer> &buffer) {
-            return flatbuffers_string_create(builder, buffer->getBufferStart(),
-                                             buffer->getBufferSize());
-          }));
-      auto libsRef =
-          flatbuffers_string_vec_create(builder, refs.data(), refs.size());
-      iree_hal_metal_ExecutableDef_shader_libraries_add(builder, libsRef);
+    // Generate optional per-export debug information.
+    // May be empty if no debug information was requested.
+    auto exportOps = llvm::to_vector_of<IREE::HAL::ExecutableExportOp>(
+        variantOp.getExportOps());
+    auto exportDebugInfos =
+        createExportDefs(serOptions.debugLevel, exportOps, builder);
+
+    SmallVector<iree_hal_metal_PipelineDef_ref_t> pipelineRefs;
+    for (auto [i, shader, entryPoint, exportOp] :
+         llvm::zip_equal(llvm::seq(mslShaders.size()), mslShaders,
+                         mslEntryPointNames, exportOps)) {
+      auto entryPointRef = builder.createString(entryPoint);
+
+      iree_hal_metal_ThreadgroupSize_t threadgroupSize = {
+          shader.threadgroupSize.x,
+          shader.threadgroupSize.y,
+          shader.threadgroupSize.z,
+      };
+
+      auto layoutAttr = exportOp.getLayoutAttr();
+      uint32_t constantCount = static_cast<uint32_t>(layoutAttr.getConstants());
+      SmallVector<iree_hal_metal_BindingBits_enum_t> bindingFlags;
+      for (auto bindingAttr : layoutAttr.getBindings()) {
+        iree_hal_metal_BindingBits_enum_t flags = 0;
+        if (allEnumBitsSet(bindingAttr.getFlags(),
+                           IREE::HAL::DescriptorFlags::ReadOnly)) {
+          flags |= iree_hal_metal_BindingBits_IMMUTABLE;
+        }
+        bindingFlags.push_back(flags);
+      }
+      auto bindingFlagsRef = iree_hal_metal_BindingBits_vec_create(
+          builder, bindingFlags.data(), bindingFlags.size());
+
+      iree_hal_metal_PipelineDef_start(builder);
+      iree_hal_metal_PipelineDef_library_ordinal_add(builder, i);
+      iree_hal_metal_PipelineDef_entry_point_add(builder, entryPointRef);
+      iree_hal_metal_PipelineDef_threadgroup_size_add(builder,
+                                                      &threadgroupSize);
+      // TODO: embed additional metadata on threadgroup info if available.
+      // iree_hal_metal_PipelineDef_max_threads_per_threadgroup_add(builder, 0);
+      // iree_hal_metal_PipelineDef_threadgroup_size_aligned_add(builder,
+      // false);
+      iree_hal_metal_PipelineDef_constant_count_add(builder, constantCount);
+      iree_hal_metal_PipelineDef_binding_flags_add(builder, bindingFlagsRef);
+      iree_hal_metal_PipelineDef_debug_info_add(builder, exportDebugInfos[i]);
+      pipelineRefs.push_back(iree_hal_metal_PipelineDef_end(builder));
     }
+    auto pipelinesRef = builder.createOffsetVecDestructive(pipelineRefs);
+
+    iree_hal_metal_ExecutableDef_pipelines_add(builder, pipelinesRef);
+    iree_hal_metal_ExecutableDef_libraries_add(builder, librariesRef);
+    iree_hal_metal_ExecutableDef_source_files_add(builder, sourceFilesRef);
 
     iree_hal_metal_ExecutableDef_end_as_root(builder);
 
diff --git a/compiler/plugins/target/ROCM/BUILD.bazel b/compiler/plugins/target/ROCM/BUILD.bazel
index 75296b8..7962cf8 100644
--- a/compiler/plugins/target/ROCM/BUILD.bazel
+++ b/compiler/plugins/target/ROCM/BUILD.bazel
@@ -35,10 +35,12 @@
         "//compiler/src/iree/compiler/Codegen/Utils",
         "//compiler/src/iree/compiler/Dialect/HAL/IR",
         "//compiler/src/iree/compiler/Dialect/HAL/Target",
+        "//compiler/src/iree/compiler/Dialect/HAL/Utils:ExecutableDebugInfoUtils",
         "//compiler/src/iree/compiler/Dialect/HAL/Utils:LLVMLinkerUtils",
         "//compiler/src/iree/compiler/PluginAPI",
         "//compiler/src/iree/compiler/Utils",
-        "//runtime/src/iree/schemas:rocm_executable_def_c_fbs",
+        "//runtime/src/iree/schemas:executable_debug_info_c_fbs",
+        "//runtime/src/iree/schemas:hip_executable_def_c_fbs",
         "@llvm-project//llvm:AMDGPUCodeGen",
         "@llvm-project//llvm:Analysis",
         "@llvm-project//llvm:BitWriter",
diff --git a/compiler/plugins/target/ROCM/CMakeLists.txt b/compiler/plugins/target/ROCM/CMakeLists.txt
index b3e8fd5..9430dca 100644
--- a/compiler/plugins/target/ROCM/CMakeLists.txt
+++ b/compiler/plugins/target/ROCM/CMakeLists.txt
@@ -60,10 +60,12 @@
     iree::compiler::Codegen::Utils
     iree::compiler::Dialect::HAL::IR
     iree::compiler::Dialect::HAL::Target
+    iree::compiler::Dialect::HAL::Utils::ExecutableDebugInfoUtils
     iree::compiler::Dialect::HAL::Utils::LLVMLinkerUtils
     iree::compiler::PluginAPI
     iree::compiler::Utils
-    iree::schemas::rocm_executable_def_c_fbs
+    iree::schemas::executable_debug_info_c_fbs
+    iree::schemas::hip_executable_def_c_fbs
   PUBLIC
 )
 
diff --git a/compiler/plugins/target/ROCM/ROCMTarget.cpp b/compiler/plugins/target/ROCM/ROCMTarget.cpp
index de7d3d7..04e87bb 100644
--- a/compiler/plugins/target/ROCM/ROCMTarget.cpp
+++ b/compiler/plugins/target/ROCM/ROCMTarget.cpp
@@ -18,12 +18,13 @@
 #include "iree/compiler/Codegen/Utils/Utils.h"
 #include "iree/compiler/Dialect/HAL/IR/HALOps.h"
 #include "iree/compiler/Dialect/HAL/Target/TargetRegistry.h"
+#include "iree/compiler/Dialect/HAL/Utils/ExecutableDebugInfoUtils.h"
 #include "iree/compiler/Dialect/HAL/Utils/LLVMLinkerUtils.h"
 #include "iree/compiler/PluginAPI/Client.h"
 #include "iree/compiler/Utils/FlatbufferUtils.h"
 #include "iree/compiler/Utils/ModuleUtils.h"
 #include "iree/compiler/Utils/ToolUtils.h"
-#include "iree/schemas/rocm_executable_def_builder.h"
+#include "iree/schemas/hip_executable_def_builder.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Analysis/TargetTransformInfo.h"
@@ -243,23 +244,28 @@
   getDefaultDeviceTarget(MLIRContext *context,
                          const TargetRegistry &targetRegistry) const override {
     Builder b(context);
-    SmallVector<NamedAttribute> configAttrItems;
+
+    SmallVector<NamedAttribute> deviceConfigAttrs;
     if (options.legacySync) {
       // Indicates that the runtime HAL driver operates only in the legacy
       // synchronous mode.
-      configAttrItems.emplace_back(b.getStringAttr("legacy_sync"),
-                                   b.getUnitAttr());
+      deviceConfigAttrs.emplace_back(b.getStringAttr("legacy_sync"),
+                                     b.getUnitAttr());
     }
-    DictionaryAttr configAttr = b.getDictionaryAttr(configAttrItems);
+    auto deviceConfigAttr = b.getDictionaryAttr(deviceConfigAttrs);
+
+    SmallVector<NamedAttribute> executableConfigAttrs;
+    auto executableConfigAttr = b.getDictionaryAttr(executableConfigAttrs);
 
     // If we had multiple target environments we would generate one target attr
     // per environment, with each setting its own environment attribute.
     SmallVector<IREE::HAL::ExecutableTargetAttr> executableTargetAttrs;
     targetRegistry.getTargetBackend("rocm")->getDefaultExecutableTargets(
-        context, "rocm", configAttr, executableTargetAttrs);
+        context, "rocm", executableConfigAttr, executableTargetAttrs);
 
     return IREE::HAL::DeviceTargetAttr::get(context, b.getStringAttr("hip"),
-                                            configAttr, executableTargetAttrs);
+                                            deviceConfigAttr,
+                                            executableTargetAttrs);
   }
 
 private:
@@ -386,35 +392,27 @@
     auto exportOps = llvm::to_vector_of<IREE::HAL::ExecutableExportOp>(
         variantOp.getExportOps());
     llvm::StringMap<IREE::HAL::ExecutableExportOp> exportOpMap;
-    std::vector<std::array<int32_t, 3>> workgroupSizes;
-    SmallVector<uint32_t> workgroupLocalMemories;
-    uint32_t subgroupSize = 64;
+    std::optional<uint32_t> subgroupSize;
     for (IREE::HAL::ExecutableExportOp exportOp : exportOps) {
       exportOpMap[exportOp.getSymName()] = exportOp;
 
-      std::array<int32_t, 3> workgroupSize = {1, 1, 1};
-      if (std::optional<ArrayAttr> workgroupSizeAttr =
-              exportOp.getWorkgroupSize()) {
-        for (auto [value, sizeAttr] :
-             llvm::zip_equal(workgroupSize, *workgroupSizeAttr))
-          value = cast<IntegerAttr>(sizeAttr).getInt();
-      }
-      workgroupSizes.push_back(workgroupSize);
-
+      // TODO: put this either on the variant or propagate as a function
+      // attribute instead - today this *must* be consistent across all exports
+      // and it shouldn't need to be.
       if (auto setSubgroupSize = exportOp.getSubgroupSizeAsUInt()) {
         if (setSubgroupSize.value() != 32 && setSubgroupSize.value() != 64) {
           return variantOp.emitError()
                  << "invalid subgroup size " << setSubgroupSize.value();
         }
+        if (subgroupSize.has_value() &&
+            setSubgroupSize.value() != subgroupSize.value()) {
+          return variantOp.emitError()
+                 << "multiple exports with different subgroup sizes; this is a "
+                    "limitation of the IREE compilation process and should be "
+                    "fixed";
+        }
         subgroupSize = setSubgroupSize.value();
       }
-
-      uint32_t workgroupLocalMemory = 0;
-      if (std::optional<APInt> workgroupLocalMemoryAttr =
-              exportOp.getWorkgroupLocalMemory()) {
-        workgroupLocalMemory = workgroupLocalMemoryAttr->getSExtValue();
-      }
-      workgroupLocalMemories.push_back(workgroupLocalMemory);
     }
 
     std::string targetHSACO;
@@ -499,10 +497,15 @@
         std::string features;
         if (targetArch.starts_with("gfx10") ||
             targetArch.starts_with("gfx11")) {
-          if (subgroupSize == 32)
+          switch (subgroupSize.value_or(64)) {
+          case 32:
             features = "+wavefrontsize32";
-          if (subgroupSize == 64)
+            break;
+          default:
+          case 64:
             features = "+wavefrontsize64";
+            break;
+          }
         }
         if (!targetFeatures.empty()) {
           features += (features.empty() ? "" : ",") + targetFeatures.str();
@@ -604,30 +607,29 @@
     }
 
     iree_compiler::FlatbufferBuilder builder;
-    iree_hal_rocm_ExecutableDef_start_as_root(builder);
+    iree_hal_hip_ExecutableDef_start_as_root(builder);
 
     // Attach embedded source file contents.
-    SmallVector<iree_hal_rocm_SourceFileDef_ref_t> sourceFileRefs;
-    if (auto sourcesAttr = variantOp.getSourcesAttr()) {
-      for (auto sourceAttr : llvm::reverse(sourcesAttr.getValue())) {
-        if (auto resourceAttr = dyn_cast_if_present<DenseResourceElementsAttr>(
-                sourceAttr.getValue())) {
-          auto filenameRef = builder.createString(sourceAttr.getName());
-          auto contentRef = builder.streamUint8Vec([&](llvm::raw_ostream &os) {
-            auto blobData = resourceAttr.getRawHandle().getBlob()->getData();
-            os.write(blobData.data(), blobData.size());
-            return true;
-          });
-          sourceFileRefs.push_back(iree_hal_rocm_SourceFileDef_create(
-              builder, filenameRef, contentRef));
-        }
-      }
-      std::reverse(sourceFileRefs.begin(), sourceFileRefs.end());
-    }
+    auto sourceFilesRef = createSourceFilesVec(
+        serOptions.debugLevel, variantOp.getSourcesAttr(), builder);
 
-    SmallVector<StringRef> entryPointNames;
-    SmallVector<iree_hal_rocm_FileLineLocDef_ref_t> sourceLocationRefs;
-    entryPointNames.resize(exportOps.size());
+    // Only a single module today.
+    SmallVector<iree_hal_hip_ModuleDef_ref_t> moduleRefs;
+    {
+      auto hsacoImageRef = flatbuffers_string_create(
+          builder, targetHSACO.c_str(), targetHSACO.size());
+      moduleRefs.push_back(
+          iree_hal_hip_ModuleDef_create(builder, hsacoImageRef));
+    }
+    auto modulesRef = builder.createOffsetVecDestructive(moduleRefs);
+
+    // Generate optional per-export debug information.
+    // May be empty if no debug information was requested.
+    auto exportDebugInfos =
+        createExportDefs(serOptions.debugLevel, exportOps, builder);
+
+    SmallVector<iree_hal_hip_ExportDef_ref_t> exportRefs;
+    exportRefs.resize(exportOps.size(), 0);
     for (auto exportOp : exportOps) {
       auto ordinalAttr = exportOp.getOrdinalAttr();
       if (!ordinalAttr) {
@@ -635,86 +637,58 @@
                << "could not compile rocm binary: export op is missing ordinal";
       }
       int64_t ordinal = ordinalAttr.getInt();
-      entryPointNames[ordinal] = exportOp.getName();
 
-      // Optional source location information for debugging/profiling.
-      if (serOptions.debugLevel >= 1) {
-        if (auto loc = findFirstFileLoc(exportOp.getLoc())) {
-          // We only ever resize to the maximum -- so all previous data will
-          // be kept as-is.
-          sourceLocationRefs.resize(exportOps.size());
-          auto filenameRef = builder.createString(loc->getFilename());
-          sourceLocationRefs[ordinal] = iree_hal_rocm_FileLineLocDef_create(
-              builder, filenameRef, loc->getLine());
-        }
+      auto kernelNameRef = builder.createString(exportOp.getName());
+
+      iree_hal_hip_BlockDims_t blockDims = {0};
+      if (auto workgroupSizeAttr = exportOp.getWorkgroupSize()) {
+        auto workgroupSize = workgroupSizeAttr->getValue();
+        blockDims.x = cast<IntegerAttr>(workgroupSize[0]).getInt();
+        blockDims.y = cast<IntegerAttr>(workgroupSize[1]).getInt();
+        blockDims.z = cast<IntegerAttr>(workgroupSize[2]).getInt();
       }
-    }
 
-    // Optional compilation stage source files.
-    SmallVector<iree_hal_rocm_StageLocationsDef_ref_t> stageLocationsRefs;
-    if (serOptions.debugLevel >= 3) {
-      for (auto exportOp : exportOps) {
-        SmallVector<iree_hal_rocm_StageLocationDef_ref_t> stageLocationRefs;
-        if (auto locsAttr = exportOp.getSourceLocsAttr()) {
-          for (auto locAttr : locsAttr.getValue()) {
-            if (auto loc =
-                    findFirstFileLoc(cast<LocationAttr>(locAttr.getValue()))) {
-              auto stageNameRef = builder.createString(locAttr.getName());
-              auto filenameRef = builder.createString(loc->getFilename());
-              stageLocationRefs.push_back(iree_hal_rocm_StageLocationDef_create(
-                  builder, stageNameRef,
-                  iree_hal_rocm_FileLineLocDef_create(builder, filenameRef,
-                                                      loc->getLine())));
-            }
-          }
-        }
-        if (!stageLocationRefs.empty()) {
-          // We only ever resize to the maximum -- so all previous data will
-          // be kept as-is.
-          stageLocationsRefs.resize(exportOps.size());
-          int64_t ordinal = exportOp.getOrdinalAttr().getInt();
-          stageLocationsRefs[ordinal] = iree_hal_rocm_StageLocationsDef_create(
-              builder, builder.createOffsetVecDestructive(stageLocationRefs));
-        }
+      uint32_t blockSharedMemorySize = 0;
+      if (std::optional<APInt> workgroupLocalMemoryAttr =
+              exportOp.getWorkgroupLocalMemory()) {
+        blockSharedMemorySize = workgroupLocalMemoryAttr->getSExtValue();
       }
-    }
 
-    auto hsacoRef = flatbuffers_string_create(builder, targetHSACO.c_str(),
-                                              targetHSACO.size());
+      auto layoutAttr = exportOp.getLayoutAttr();
+      uint32_t constantCount = static_cast<uint32_t>(layoutAttr.getConstants());
+      SmallVector<iree_hal_hip_BindingBits_enum_t> bindingFlags;
+      for (auto bindingAttr : layoutAttr.getBindings()) {
+        iree_hal_hip_BindingBits_enum_t flags = 0;
+        if (allEnumBitsSet(bindingAttr.getFlags(),
+                           IREE::HAL::DescriptorFlags::ReadOnly)) {
+          flags |= iree_hal_hip_BindingBits_READ_ONLY;
+        }
+        if (allEnumBitsSet(bindingAttr.getFlags(),
+                           IREE::HAL::DescriptorFlags::Indirect)) {
+          flags |= iree_hal_hip_BindingBits_INDIRECT;
+        }
+        bindingFlags.push_back(flags);
+      }
+      auto bindingFlagsRef = iree_hal_hip_BindingBits_vec_create(
+          builder, bindingFlags.data(), bindingFlags.size());
 
-    auto entryPointsRef = builder.createStringVec(entryPointNames);
-    iree_hal_rocm_BlockSizeDef_vec_start(builder);
-    auto blockSizes = workgroupSizes.begin();
-    for (int i = 0, e = entryPointNames.size(); i < e; ++i) {
-      iree_hal_rocm_BlockSizeDef_vec_push_create(
-          builder, (*blockSizes)[0], (*blockSizes)[1], (*blockSizes)[2]);
-      ++blockSizes;
+      iree_hal_hip_ExportDef_start(builder);
+      iree_hal_hip_ExportDef_module_ordinal_add(builder, 0); // always 0 today
+      iree_hal_hip_ExportDef_kernel_name_add(builder, kernelNameRef);
+      iree_hal_hip_ExportDef_block_dims_add(builder, &blockDims);
+      iree_hal_hip_ExportDef_block_shared_memory_size_add(
+          builder, blockSharedMemorySize);
+      iree_hal_hip_ExportDef_constant_count_add(builder, constantCount);
+      iree_hal_hip_ExportDef_binding_flags_add(builder, bindingFlagsRef);
+      iree_hal_hip_ExportDef_debug_info_add(builder, exportDebugInfos[ordinal]);
+      exportRefs[ordinal] = iree_hal_hip_ExportDef_end(builder);
     }
-    auto workgroupLocalMemoriesRef =
-        builder.createInt32Vec(workgroupLocalMemories);
-    auto blockSizesRef = iree_hal_rocm_BlockSizeDef_vec_end(builder);
-    iree_hal_rocm_ExecutableDef_entry_points_add(builder, entryPointsRef);
-    iree_hal_rocm_ExecutableDef_block_sizes_add(builder, blockSizesRef);
-    iree_hal_rocm_ExecutableDef_shared_memory_sizes_add(
-        builder, workgroupLocalMemoriesRef);
-    iree_hal_rocm_ExecutableDef_hsaco_image_add(builder, hsacoRef);
-    if (!sourceLocationRefs.empty()) {
-      auto sourceLocationsRef =
-          builder.createOffsetVecDestructive(sourceLocationRefs);
-      iree_hal_rocm_ExecutableDef_source_locations_add(builder,
-                                                       sourceLocationsRef);
-    }
-    if (!stageLocationsRefs.empty()) {
-      auto stageLocationsRef =
-          builder.createOffsetVecDestructive(stageLocationsRefs);
-      iree_hal_rocm_ExecutableDef_stage_locations_add(builder,
-                                                      stageLocationsRef);
-    }
-    if (!sourceFileRefs.empty()) {
-      auto sourceFilesRef = builder.createOffsetVecDestructive(sourceFileRefs);
-      iree_hal_rocm_ExecutableDef_source_files_add(builder, sourceFilesRef);
-    }
-    iree_hal_rocm_ExecutableDef_end_as_root(builder);
+    auto exportsRef = builder.createOffsetVecDestructive(exportRefs);
+
+    iree_hal_hip_ExecutableDef_exports_add(builder, exportsRef);
+    iree_hal_hip_ExecutableDef_modules_add(builder, modulesRef);
+    iree_hal_hip_ExecutableDef_source_files_add(builder, sourceFilesRef);
+    iree_hal_hip_ExecutableDef_end_as_root(builder);
 
     // Add the binary data to the target executable.
     executableBuilder.create<iree_compiler::IREE::HAL::ExecutableBinaryOp>(
diff --git a/compiler/plugins/target/VMVX/VMVXTarget.cpp b/compiler/plugins/target/VMVX/VMVXTarget.cpp
index 831eb8c..daba862 100644
--- a/compiler/plugins/target/VMVX/VMVXTarget.cpp
+++ b/compiler/plugins/target/VMVX/VMVXTarget.cpp
@@ -132,15 +132,14 @@
 
       // Specify the constant and binding information used to validate
       // dispatches.
-      // TODO(#18189): pack per-binding information bitfields.
       if (auto layoutAttr = exportOp.getLayout()) {
-        int64_t constantCount = layoutAttr.getPushConstants();
+        int64_t constantCount = layoutAttr.getConstants();
         if (constantCount > 0) {
           funcOp.setReflectionAttr("constant_count",
                                    executableBuilder.getI8IntegerAttr(
                                        static_cast<uint8_t>(constantCount)));
         }
-        size_t bindingCount = layoutAttr.getSetLayout(0).getBindings().size();
+        size_t bindingCount = layoutAttr.getBindings().size();
         if (bindingCount > 0) {
           funcOp.setReflectionAttr("binding_count",
                                    executableBuilder.getI8IntegerAttr(
diff --git a/compiler/plugins/target/VMVX/test/smoketest.mlir b/compiler/plugins/target/VMVX/test/smoketest.mlir
index 44b3208..6bc3062 100644
--- a/compiler/plugins/target/VMVX/test/smoketest.mlir
+++ b/compiler/plugins/target/VMVX/test/smoketest.mlir
@@ -38,11 +38,10 @@
 // CHECK-LABEL: hal.executable public @add_dispatch_0
 //  CHECK-NEXT:   hal.executable.variant public @vmvx_bytecode_fb target(<"vmvx", "vmvx-bytecode-fb">) {
 //  CHECK-NEXT:     hal.executable.export public @add_dispatch_0 ordinal(0)
-//  CHECK-SAME:       layout(#hal.pipeline.layout<push_constants = 0, sets = [
-//  CHECK-SAME:         <0, bindings = [
-//  CHECK-SAME:           <0, storage_buffer>,
-//  CHECK-SAME:           <1, storage_buffer>,
-//  CHECK-SAME:           <2, storage_buffer>
+//  CHECK-SAME:       layout(#hal.pipeline.layout<bindings = [
+//  CHECK-SAME:           #hal.pipeline.binding<storage_buffer>,
+//  CHECK-SAME:           #hal.pipeline.binding<storage_buffer>,
+//  CHECK-SAME:           #hal.pipeline.binding<storage_buffer>
 //       CHECK:     module attributes {vm.toplevel} {
 //  CHECK-NEXT:       vm.module public @module {
 //  CHECK-NEXT:         vm.func private @add_dispatch_0(
diff --git a/compiler/plugins/target/VulkanSPIRV/BUILD.bazel b/compiler/plugins/target/VulkanSPIRV/BUILD.bazel
index 984bef9..fb53170 100644
--- a/compiler/plugins/target/VulkanSPIRV/BUILD.bazel
+++ b/compiler/plugins/target/VulkanSPIRV/BUILD.bazel
@@ -29,9 +29,11 @@
         "//compiler/src/iree/compiler/Codegen/SPIRV",
         "//compiler/src/iree/compiler/Codegen/Utils",
         "//compiler/src/iree/compiler/Dialect/HAL/Target",
+        "//compiler/src/iree/compiler/Dialect/HAL/Utils:ExecutableDebugInfoUtils",
         "//compiler/src/iree/compiler/PluginAPI",
         "//compiler/src/iree/compiler/Utils",
-        "//runtime/src/iree/schemas:spirv_executable_def_c_fbs",
+        "//runtime/src/iree/schemas:executable_debug_info_c_fbs",
+        "//runtime/src/iree/schemas:vulkan_executable_def_c_fbs",
         "@llvm-project//llvm:Support",
         "@llvm-project//mlir:AsmParser",
         "@llvm-project//mlir:GPUDialect",
diff --git a/compiler/plugins/target/VulkanSPIRV/CMakeLists.txt b/compiler/plugins/target/VulkanSPIRV/CMakeLists.txt
index 958e277..3ef8e75 100644
--- a/compiler/plugins/target/VulkanSPIRV/CMakeLists.txt
+++ b/compiler/plugins/target/VulkanSPIRV/CMakeLists.txt
@@ -37,9 +37,11 @@
     iree::compiler::Codegen::SPIRV
     iree::compiler::Codegen::Utils
     iree::compiler::Dialect::HAL::Target
+    iree::compiler::Dialect::HAL::Utils::ExecutableDebugInfoUtils
     iree::compiler::PluginAPI
     iree::compiler::Utils
-    iree::schemas::spirv_executable_def_c_fbs
+    iree::schemas::executable_debug_info_c_fbs
+    iree::schemas::vulkan_executable_def_c_fbs
   PUBLIC
 )
 
diff --git a/compiler/plugins/target/VulkanSPIRV/VulkanSPIRVTarget.cpp b/compiler/plugins/target/VulkanSPIRV/VulkanSPIRVTarget.cpp
index 45bbdf3..58137fd 100644
--- a/compiler/plugins/target/VulkanSPIRV/VulkanSPIRVTarget.cpp
+++ b/compiler/plugins/target/VulkanSPIRV/VulkanSPIRVTarget.cpp
@@ -8,10 +8,11 @@
 #include "iree/compiler/Codegen/Dialect/GPU/TargetUtils/KnownTargets.h"
 #include "iree/compiler/Codegen/SPIRV/Passes.h"
 #include "iree/compiler/Dialect/HAL/Target/TargetRegistry.h"
+#include "iree/compiler/Dialect/HAL/Utils/ExecutableDebugInfoUtils.h"
 #include "iree/compiler/PluginAPI/Client.h"
 #include "iree/compiler/Utils/FlatbufferUtils.h"
 #include "iree/compiler/Utils/ModuleUtils.h"
-#include "iree/schemas/spirv_executable_def_builder.h"
+#include "iree/schemas/vulkan_executable_def_builder.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/raw_ostream.h"
@@ -60,6 +61,95 @@
 };
 } // namespace
 
+using DescriptorSetLayout = std::pair<unsigned, ArrayRef<PipelineBindingAttr>>;
+
+static std::tuple<iree_hal_vulkan_DescriptorSetLayoutDef_vec_ref_t,
+                  iree_hal_vulkan_PipelineLayoutDef_vec_ref_t,
+                  DenseMap<IREE::HAL::PipelineLayoutAttr, uint32_t>>
+createPipelineLayoutDefs(ArrayRef<IREE::HAL::ExecutableExportOp> exportOps,
+                         FlatbufferBuilder &fbb) {
+  DenseMap<DescriptorSetLayout, size_t> descriptorSetLayoutMap;
+  DenseMap<IREE::HAL::PipelineLayoutAttr, uint32_t> pipelineLayoutMap;
+  SmallVector<iree_hal_vulkan_DescriptorSetLayoutDef_ref_t>
+      descriptorSetLayoutRefs;
+  SmallVector<iree_hal_vulkan_PipelineLayoutDef_ref_t> pipelineLayoutRefs;
+  for (auto exportOp : exportOps) {
+    auto pipelineLayoutAttr = exportOp.getLayout();
+    if (pipelineLayoutMap.contains(pipelineLayoutAttr)) {
+      continue; // already present
+    }
+
+    // Currently only one descriptor set on the compiler side. We could
+    // partition it by binding type (direct vs indirect, etc).
+    SmallVector<uint32_t> descriptorSetLayoutOrdinals;
+    auto descriptorSetLayout =
+        DescriptorSetLayout(0, pipelineLayoutAttr.getBindings());
+    auto it = descriptorSetLayoutMap.find(descriptorSetLayout);
+    if (it != descriptorSetLayoutMap.end()) {
+      descriptorSetLayoutOrdinals.push_back(it->second);
+    } else {
+      SmallVector<iree_hal_vulkan_DescriptorSetLayoutBindingDef_ref_t>
+          bindingRefs;
+      for (auto [i, bindingAttr] :
+           llvm::enumerate(pipelineLayoutAttr.getBindings())) {
+        uint32_t ordinal = static_cast<uint32_t>(i);
+        iree_hal_vulkan_VkDescriptorType_enum_t descriptorType = 0;
+        switch (bindingAttr.getType()) {
+        case IREE::HAL::DescriptorType::UniformBuffer:
+          descriptorType = iree_hal_vulkan_VkDescriptorType_UNIFORM_BUFFER;
+          break;
+        case IREE::HAL::DescriptorType::StorageBuffer:
+          descriptorType = iree_hal_vulkan_VkDescriptorType_STORAGE_BUFFER;
+          break;
+        }
+        uint32_t descriptorCount = 1;
+        uint32_t stageFlags = 0x00000020u; // VK_SHADER_STAGE_COMPUTE_BIT
+        bindingRefs.push_back(
+            iree_hal_vulkan_DescriptorSetLayoutBindingDef_create(
+                fbb, ordinal, descriptorType, descriptorCount, stageFlags));
+      }
+      auto bindingsRef = fbb.createOffsetVecDestructive(bindingRefs);
+
+      descriptorSetLayoutOrdinals.push_back(descriptorSetLayoutRefs.size());
+      descriptorSetLayoutMap[descriptorSetLayout] =
+          descriptorSetLayoutRefs.size();
+      descriptorSetLayoutRefs.push_back(
+          iree_hal_vulkan_DescriptorSetLayoutDef_create(fbb, bindingsRef));
+    }
+    auto descriptorSetLayoutOrdinalsRef =
+        fbb.createInt32Vec(descriptorSetLayoutOrdinals);
+
+    iree_hal_vulkan_PushConstantRange_vec_ref_t pushConstantRangesRef = 0;
+    if (int64_t pushConstantCount = pipelineLayoutAttr.getConstants()) {
+      SmallVector<iree_hal_vulkan_PushConstantRange> pushConstantRanges;
+      iree_hal_vulkan_PushConstantRange range0;
+      range0.stage_flags = 0x00000020u; // VK_SHADER_STAGE_COMPUTE_BIT
+      range0.offset = 0;
+      range0.size = pushConstantCount * sizeof(uint32_t);
+      pushConstantRanges.push_back(range0);
+      pushConstantRangesRef = iree_hal_vulkan_PushConstantRange_vec_create(
+          fbb, pushConstantRanges.data(), pushConstantRanges.size());
+    }
+
+    pipelineLayoutMap[pipelineLayoutAttr] =
+        static_cast<uint32_t>(pipelineLayoutRefs.size());
+    iree_hal_vulkan_PipelineLayoutDef_start(fbb);
+    iree_hal_vulkan_PipelineLayoutDef_descriptor_set_layout_ordinals_add(
+        fbb, descriptorSetLayoutOrdinalsRef);
+    if (pushConstantRangesRef) {
+      iree_hal_vulkan_PipelineLayoutDef_push_constant_ranges_add(
+          fbb, pushConstantRangesRef);
+    }
+    pipelineLayoutRefs.push_back(iree_hal_vulkan_PipelineLayoutDef_end(fbb));
+  }
+
+  auto descriptorSetLayoutsRef =
+      fbb.createOffsetVecDestructive(descriptorSetLayoutRefs);
+  auto pipelineLayoutsRef = fbb.createOffsetVecDestructive(pipelineLayoutRefs);
+  return std::make_tuple(descriptorSetLayoutsRef, pipelineLayoutsRef,
+                         pipelineLayoutMap);
+}
+
 // TODO: VulkanOptions for choosing the Vulkan version and extensions/features.
 class VulkanTargetDevice : public TargetDevice {
 public:
@@ -70,17 +160,21 @@
   getDefaultDeviceTarget(MLIRContext *context,
                          const TargetRegistry &targetRegistry) const override {
     Builder b(context);
-    SmallVector<NamedAttribute> configItems;
 
-    auto configAttr = b.getDictionaryAttr(configItems);
+    SmallVector<NamedAttribute> deviceConfigAttrs;
+    auto deviceConfigAttr = b.getDictionaryAttr(deviceConfigAttrs);
+
+    SmallVector<NamedAttribute> executableConfigAttrs;
+    auto executableConfigAttr = b.getDictionaryAttr(executableConfigAttrs);
 
     SmallVector<IREE::HAL::ExecutableTargetAttr> executableTargetAttrs;
     targetRegistry.getTargetBackend("vulkan-spirv")
-        ->getDefaultExecutableTargets(context, "vulkan", configAttr,
+        ->getDefaultExecutableTargets(context, "vulkan", executableConfigAttr,
                                       executableTargetAttrs);
 
     return IREE::HAL::DeviceTargetAttr::get(context, b.getStringAttr("vulkan"),
-                                            configAttr, executableTargetAttrs);
+                                            deviceConfigAttr,
+                                            executableTargetAttrs);
   }
 
 private:
@@ -161,190 +255,141 @@
       return variantOp.emitError() << "should contain some spirv.module ops";
     }
 
-    DenseMap<StringRef, uint64_t> entryPointOrdinals;
-
-    SmallVector<IREE::HAL::ExecutableExportOp> exportOps =
+    // Create a list of executable exports (by ordinal) to the SPIR-V module and
+    // entry point defining them.
+    auto unsortedExportOps =
         llvm::to_vector(variantOp.getOps<IREE::HAL::ExecutableExportOp>());
-    for (auto exportOp : exportOps) {
+    DenseMap<StringRef, std::tuple<IREE::HAL::ExecutableExportOp, uint64_t>>
+        exportOrdinalMap;
+    for (auto exportOp : variantOp.getOps<IREE::HAL::ExecutableExportOp>()) {
       uint64_t ordinal = 0;
       if (std::optional<APInt> optionalOrdinal = exportOp.getOrdinal()) {
         ordinal = optionalOrdinal->getZExtValue();
       } else {
-        // For executables with only one entry point, linking doesn't kick in at
+        // For executables with only one entry point linking doesn't kick in at
         // all. So the ordinal can be missing for this case.
-        if (!llvm::hasSingleElement(exportOps)) {
+        if (!llvm::hasSingleElement(unsortedExportOps)) {
           return exportOp.emitError() << "should have ordinal attribute";
         }
       }
-      entryPointOrdinals[exportOp.getSymName()] = ordinal;
+      exportOrdinalMap[exportOp.getSymName()] =
+          std::make_tuple(exportOp, ordinal);
     }
-    uint64_t ordinalCount = entryPointOrdinals.size();
-
-    FlatbufferBuilder builder;
-    iree_hal_spirv_ExecutableDef_start_as_root(builder);
-
-    // Attach embedded source file contents.
-    SmallVector<iree_hal_spirv_SourceFileDef_ref_t> sourceFileRefs;
-    if (auto sourcesAttr = variantOp.getSourcesAttr()) {
-      for (auto sourceAttr : llvm::reverse(sourcesAttr.getValue())) {
-        if (auto resourceAttr = dyn_cast_if_present<DenseResourceElementsAttr>(
-                sourceAttr.getValue())) {
-          auto filenameRef = builder.createString(sourceAttr.getName());
-          auto contentRef = builder.streamUint8Vec([&](llvm::raw_ostream &os) {
-            auto blobData = resourceAttr.getRawHandle().getBlob()->getData();
-            os.write(blobData.data(), blobData.size());
-            return true;
-          });
-          sourceFileRefs.push_back(iree_hal_spirv_SourceFileDef_create(
-              builder, filenameRef, contentRef));
-        }
-      }
-      std::reverse(sourceFileRefs.begin(), sourceFileRefs.end());
-    }
-
-    // The list of shader modules.
-    SmallVector<iree_hal_spirv_ShaderModuleDef_ref_t> shaderModuleRefs;
-
-    // Per entry-point data.
-    // Note that the following vectors should all be of the same size and
-    // element at index #i is for entry point with ordinal #i!
-    SmallVector<StringRef> entryPointNames;
-    SmallVector<uint32_t> subgroupSizes;
-    SmallVector<uint32_t> shaderModuleIndices;
-    SmallVector<iree_hal_spirv_FileLineLocDef_ref_t> sourceLocationRefs;
-    entryPointNames.resize(ordinalCount);
-    subgroupSizes.resize(ordinalCount);
-    shaderModuleIndices.resize(ordinalCount);
-
-    // Iterate over all spirv.module ops and encode them into the FlatBuffer
-    // data structure.
-    bool hasAnySubgroupSizes = false;
-    for (spirv::ModuleOp spvModuleOp : spirvModuleOps) {
-      // Currently the spirv.module op should only have one entry point. Get it.
-      auto spirvEntryPoints = spvModuleOp.getOps<spirv::EntryPointOp>();
+    SmallVector<IREE::HAL::ExecutableExportOp> sortedExportOps;
+    sortedExportOps.resize(unsortedExportOps.size());
+    SmallVector<std::tuple<IREE::HAL::ExecutableExportOp, spirv::ModuleOp,
+                           spirv::EntryPointOp>>
+        exportOps;
+    exportOps.resize(unsortedExportOps.size());
+    for (spirv::ModuleOp spirvModuleOp : spirvModuleOps) {
+      // Currently the spirv.module op should only have one entry point.
+      auto spirvEntryPoints = spirvModuleOp.getOps<spirv::EntryPointOp>();
       if (!llvm::hasSingleElement(spirvEntryPoints)) {
-        return spvModuleOp.emitError()
+        // TODO: support multiple entry points. We only need them here to get
+        // the module name for dumping files.
+        return spirvModuleOp.emitError()
                << "expected to contain exactly one entry point";
       }
-      spirv::EntryPointOp spvEntryPoint = *spirvEntryPoints.begin();
-      uint64_t ordinal = entryPointOrdinals.at(spvEntryPoint.getFn());
+      spirv::EntryPointOp spirvEntryPointOp = *spirvEntryPoints.begin();
+      auto [exportOp, ordinal] = exportOrdinalMap.at(spirvEntryPointOp.getFn());
+      sortedExportOps[ordinal] = exportOp;
+      exportOps[ordinal] =
+          std::make_tuple(exportOp, spirvModuleOp, spirvEntryPointOp);
+    }
 
+    FlatbufferBuilder builder;
+    iree_hal_vulkan_ExecutableDef_start_as_root(builder);
+
+    // Attach embedded source file contents.
+    auto sourceFilesRef = createSourceFilesVec(
+        options.debugLevel, variantOp.getSourcesAttr(), builder);
+
+    // Generate optional per-export debug information.
+    // May be empty if no debug information was requested.
+    auto exportDebugInfos =
+        createExportDefs(options.debugLevel, sortedExportOps, builder);
+
+    // Create a list of all serialized SPIR-V modules.
+    // TODO: unique the modules when each contains multiple entry points.
+    DenseMap<spirv::EntryPointOp, uint32_t> entryPointToModuleMap;
+    SmallVector<iree_hal_vulkan_ShaderModuleDef_ref_t> shaderModuleRefs;
+    for (auto [exportOp, spirvModuleOp, spirvEntryPointOp] : exportOps) {
       if (!options.dumpIntermediatesPath.empty()) {
         std::string assembly;
         llvm::raw_string_ostream os(assembly);
-        spvModuleOp.print(os, OpPrintingFlags().useLocalScope());
+        spirvModuleOp.print(os, OpPrintingFlags().useLocalScope());
         dumpDataToPath(options.dumpIntermediatesPath, options.dumpBaseName,
-                       spvEntryPoint.getFn(), ".spirv.mlir", assembly);
+                       spirvEntryPointOp.getFn(), ".spirv.mlir", assembly);
       }
 
       // Serialize the spirv::ModuleOp into the binary blob.
-      SmallVector<uint32_t, 0> spvBinary;
-      if (failed(spirv::serialize(spvModuleOp, spvBinary)) ||
-          spvBinary.empty()) {
-        return spvModuleOp.emitError() << "failed to serialize";
+      SmallVector<uint32_t, 0> spirvBinary;
+      if (failed(spirv::serialize(spirvModuleOp, spirvBinary)) ||
+          spirvBinary.empty()) {
+        return spirvModuleOp.emitError() << "failed to serialize";
       }
       if (!options.dumpBinariesPath.empty()) {
         dumpDataToPath<uint32_t>(options.dumpBinariesPath, options.dumpBaseName,
-                                 spvEntryPoint.getFn(), ".spv", spvBinary);
+                                 spirvEntryPointOp.getFn(), ".spv",
+                                 spirvBinary);
       }
-      auto spvCodeRef = flatbuffers_uint32_vec_create(builder, spvBinary.data(),
-                                                      spvBinary.size());
-      shaderModuleIndices[ordinal] = shaderModuleRefs.size();
+      auto spirvCodeRef = flatbuffers_uint32_vec_create(
+          builder, spirvBinary.data(), spirvBinary.size());
+      entryPointToModuleMap[spirvEntryPointOp] =
+          static_cast<uint32_t>(shaderModuleRefs.size());
       shaderModuleRefs.push_back(
-          iree_hal_spirv_ShaderModuleDef_create(builder, spvCodeRef));
-
-      // The IREE runtime uses ordinals instead of names. We need to attach the
-      // entry point name for VkShaderModuleCreateInfo.
-      entryPointNames[ordinal] = spvEntryPoint.getFn();
-
-      // If there are subgroup size requests, we need to pick up too.
-      auto fn = spvModuleOp.lookupSymbol<spirv::FuncOp>(spvEntryPoint.getFn());
-      auto abi = fn->getAttrOfType<spirv::EntryPointABIAttr>(
-          spirv::getEntryPointABIAttrName());
-      if (abi && abi.getSubgroupSize()) {
-        subgroupSizes[ordinal] = *abi.getSubgroupSize();
-        hasAnySubgroupSizes = true;
-      } else {
-        subgroupSizes[ordinal] = 0;
-      }
-
-      // Optional source location information for debugging/profiling.
-      if (options.debugLevel >= 1) {
-        if (auto loc = findFirstFileLoc(spvEntryPoint.getLoc())) {
-          // We only ever resize to the maximum -- so all previous data will be
-          // kept as-is.
-          sourceLocationRefs.resize(ordinalCount);
-          auto filenameRef = builder.createString(loc->getFilename());
-          sourceLocationRefs[ordinal] = iree_hal_spirv_FileLineLocDef_create(
-              builder, filenameRef, loc->getLine());
-        }
-      }
+          iree_hal_vulkan_ShaderModuleDef_create(builder, spirvCodeRef));
     }
-
-    // Optional compilation stage source files.
-    SmallVector<iree_hal_spirv_StageLocationsDef_ref_t> stageLocationsRefs;
-    if (options.debugLevel >= 3) {
-      for (auto exportOp : exportOps) {
-        SmallVector<iree_hal_spirv_StageLocationDef_ref_t> stageLocationRefs;
-        if (auto locsAttr = exportOp.getSourceLocsAttr()) {
-          for (auto locAttr : locsAttr.getValue()) {
-            if (auto loc =
-                    findFirstFileLoc(cast<LocationAttr>(locAttr.getValue()))) {
-              auto stageNameRef = builder.createString(locAttr.getName());
-              auto filenameRef = builder.createString(loc->getFilename());
-              stageLocationRefs.push_back(
-                  iree_hal_spirv_StageLocationDef_create(
-                      builder, stageNameRef,
-                      iree_hal_spirv_FileLineLocDef_create(builder, filenameRef,
-                                                           loc->getLine())));
-            }
-          }
-        }
-        if (!stageLocationRefs.empty()) {
-          // We only ever resize to the maximum -- so all previous data will
-          // be kept as-is.
-          stageLocationsRefs.resize(ordinalCount);
-          int64_t ordinal = exportOp.getOrdinalAttr().getInt();
-          stageLocationsRefs[ordinal] = iree_hal_spirv_StageLocationsDef_create(
-              builder, builder.createOffsetVecDestructive(stageLocationRefs));
-        }
-      }
-    }
-
-    // Add top-level executable fields following their order of definition.
-    auto entryPointsRef = builder.createStringVec(entryPointNames);
-    flatbuffers_int32_vec_ref_t subgroupSizesRef =
-        hasAnySubgroupSizes ? builder.createInt32Vec(subgroupSizes) : 0;
-    flatbuffers_int32_vec_ref_t shaderModuleIndicesRef =
-        builder.createInt32Vec(shaderModuleIndices);
-    iree_hal_spirv_ExecutableDef_entry_points_add(builder, entryPointsRef);
-    if (subgroupSizesRef) {
-      iree_hal_spirv_ExecutableDef_subgroup_sizes_add(builder,
-                                                      subgroupSizesRef);
-    }
-    iree_hal_spirv_ExecutableDef_shader_module_indices_add(
-        builder, shaderModuleIndicesRef);
     auto shaderModulesRef =
         builder.createOffsetVecDestructive(shaderModuleRefs);
-    iree_hal_spirv_ExecutableDef_shader_modules_add(builder, shaderModulesRef);
-    if (!sourceLocationRefs.empty()) {
-      auto sourceLocationsRef =
-          builder.createOffsetVecDestructive(sourceLocationRefs);
-      iree_hal_spirv_ExecutableDef_source_locations_add(builder,
-                                                        sourceLocationsRef);
-    }
-    if (!stageLocationsRefs.empty()) {
-      auto stageLocationsRef =
-          builder.createOffsetVecDestructive(stageLocationsRefs);
-      iree_hal_spirv_ExecutableDef_stage_locations_add(builder,
-                                                       stageLocationsRef);
-    }
-    if (!sourceFileRefs.empty()) {
-      auto sourceFilesRef = builder.createOffsetVecDestructive(sourceFileRefs);
-      iree_hal_spirv_ExecutableDef_source_files_add(builder, sourceFilesRef);
-    }
 
-    iree_hal_spirv_ExecutableDef_end_as_root(builder);
+    // Create unique descriptor and pipeline layouts for each entry point.
+    auto [descriptorSetLayoutsRef, pipelineLayoutsRef, pipelineLayoutMap] =
+        createPipelineLayoutDefs(sortedExportOps, builder);
+
+    // Create pipelines representing entry points.
+    // Note that the element at index #i is for entry point with ordinal #i.
+    SmallVector<iree_hal_vulkan_PipelineDef_ref_t> pipelineRefs;
+    for (auto [exportOp, spirvModuleOp, spirvEntryPointOp] : exportOps) {
+      int64_t ordinal = exportOp.getOrdinalAttr().getInt();
+
+      uint32_t shaderModuleOrdinal =
+          entryPointToModuleMap.at(spirvEntryPointOp);
+      uint32_t pipelineLayoutOrdinal =
+          pipelineLayoutMap.at(exportOp.getLayout());
+
+      // Subgroup size requests are optional.
+      auto spirvFuncOp =
+          spirvModuleOp.lookupSymbol<spirv::FuncOp>(spirvEntryPointOp.getFn());
+      auto abiAttr = spirvFuncOp->getAttrOfType<spirv::EntryPointABIAttr>(
+          spirv::getEntryPointABIAttrName());
+      uint32_t subgroupSize =
+          abiAttr ? abiAttr.getSubgroupSize().value_or(0) : 0;
+
+      auto entryPointRef = builder.createString(spirvEntryPointOp.getFn());
+      iree_hal_vulkan_PipelineDef_start(builder);
+      iree_hal_vulkan_PipelineDef_shader_module_ordinal_add(
+          builder, shaderModuleOrdinal);
+      iree_hal_vulkan_PipelineDef_entry_point_add(builder, entryPointRef);
+      iree_hal_vulkan_PipelineDef_pipeline_layout_ordinal_add(
+          builder, pipelineLayoutOrdinal);
+      iree_hal_vulkan_PipelineDef_subgroup_size_add(builder, subgroupSize);
+      iree_hal_vulkan_PipelineDef_debug_info_add(builder,
+                                                 exportDebugInfos[ordinal]);
+      pipelineRefs.push_back(iree_hal_vulkan_PipelineDef_end(builder));
+    }
+    auto pipelinesRef = builder.createOffsetVecDestructive(pipelineRefs);
+
+    // Add top-level executable fields following their order of definition.
+    iree_hal_vulkan_ExecutableDef_pipelines_add(builder, pipelinesRef);
+    iree_hal_vulkan_ExecutableDef_descriptor_set_layouts_add(
+        builder, descriptorSetLayoutsRef);
+    iree_hal_vulkan_ExecutableDef_pipeline_layouts_add(builder,
+                                                       pipelineLayoutsRef);
+    iree_hal_vulkan_ExecutableDef_shader_modules_add(builder, shaderModulesRef);
+    iree_hal_vulkan_ExecutableDef_source_files_add(builder, sourceFilesRef);
+
+    iree_hal_vulkan_ExecutableDef_end_as_root(builder);
 
     // Add the binary data to the target executable.
     auto binaryOp = executableBuilder.create<IREE::HAL::ExecutableBinaryOp>(
@@ -372,51 +417,71 @@
                                         "supported for external variants";
     }
 
-    // Take exported names verbatim for passing into VkShaderModuleCreateInfo.
-    SmallVector<StringRef, 8> entryPointNames;
-    for (auto exportOp : variantOp.getExportOps()) {
-      entryPointNames.emplace_back(exportOp.getSymName());
-    }
-    // We only have one object file for now. So all entry points have shader
-    // module index 0.
-    SmallVector<uint32_t, 8> shaderModuleIndices(entryPointNames.size(), 0);
-
     // Load .spv object file.
     auto objectAttr = llvm::cast<IREE::HAL::ExecutableObjectAttr>(
         variantOp.getObjects()->getValue().front());
-    std::string spvBinary;
+    std::string spirvBinary;
     if (auto data = objectAttr.loadData()) {
-      spvBinary = data.value();
+      spirvBinary = data.value();
     } else {
       return variantOp.emitOpError()
              << "object file could not be loaded: " << objectAttr;
     }
-    if (spvBinary.size() % 4 != 0) {
+    if (spirvBinary.size() % 4 != 0) {
       return variantOp.emitOpError()
              << "object file is not 4-byte aligned as expected for SPIR-V";
     }
 
     FlatbufferBuilder builder;
-    iree_hal_spirv_ExecutableDef_start_as_root(builder);
+    iree_hal_vulkan_ExecutableDef_start_as_root(builder);
 
-    auto spvCodeRef = flatbuffers_uint32_vec_create(
-        builder, reinterpret_cast<const uint32_t *>(spvBinary.data()),
-        spvBinary.size() / sizeof(uint32_t));
-    SmallVector<iree_hal_spirv_ShaderModuleDef_ref_t> shaderModuleRefs;
+    // Wrap and embed shader module binary.
+    auto spirvCodeRef = flatbuffers_uint32_vec_create(
+        builder, reinterpret_cast<const uint32_t *>(spirvBinary.data()),
+        spirvBinary.size() / sizeof(uint32_t));
+    SmallVector<iree_hal_vulkan_ShaderModuleDef_ref_t> shaderModuleRefs;
     shaderModuleRefs.push_back(
-        iree_hal_spirv_ShaderModuleDef_create(builder, spvCodeRef));
-
-    // Add top-level executable fields following their order of definition.
-    auto entryPointsRef = builder.createStringVec(entryPointNames);
-    auto shaderModuleIndicesRef = builder.createInt32Vec(shaderModuleIndices);
-    iree_hal_spirv_ExecutableDef_entry_points_add(builder, entryPointsRef);
-    iree_hal_spirv_ExecutableDef_shader_module_indices_add(
-        builder, shaderModuleIndicesRef);
+        iree_hal_vulkan_ShaderModuleDef_create(builder, spirvCodeRef));
     auto shaderModulesRef =
         builder.createOffsetVecDestructive(shaderModuleRefs);
-    iree_hal_spirv_ExecutableDef_shader_modules_add(builder, shaderModulesRef);
 
-    iree_hal_spirv_ExecutableDef_end_as_root(builder);
+    // Generate descriptor set and pipeline layouts from export ops.
+    auto exportOps = llvm::to_vector(variantOp.getExportOps());
+    auto [descriptorSetLayoutsRef, pipelineLayoutsRef, pipelineLayoutMap] =
+        createPipelineLayoutDefs(exportOps, builder);
+
+    // Create a pipeline for each export.
+    SmallVector<iree_hal_vulkan_PipelineDef_ref_t> pipelineRefs;
+    for (auto exportOp : exportOps) {
+      uint32_t shaderModuleOrdinal = 0; // only one today
+      uint32_t pipelineLayoutOrdinal =
+          pipelineLayoutMap.at(exportOp.getLayout());
+
+      // Subgroup size requests are optional.
+      // TODO: support annotation on an attribute to allow users to specify.
+      uint32_t subgroupSize = 0;
+
+      auto entryPointRef = builder.createString(exportOp.getName());
+      iree_hal_vulkan_PipelineDef_start(builder);
+      iree_hal_vulkan_PipelineDef_shader_module_ordinal_add(
+          builder, shaderModuleOrdinal);
+      iree_hal_vulkan_PipelineDef_entry_point_add(builder, entryPointRef);
+      iree_hal_vulkan_PipelineDef_pipeline_layout_ordinal_add(
+          builder, pipelineLayoutOrdinal);
+      iree_hal_vulkan_PipelineDef_subgroup_size_add(builder, subgroupSize);
+      pipelineRefs.push_back(iree_hal_vulkan_PipelineDef_end(builder));
+    }
+    auto pipelinesRef = builder.createOffsetVecDestructive(pipelineRefs);
+
+    // Add top-level executable fields following their order of definition.
+    iree_hal_vulkan_ExecutableDef_pipelines_add(builder, pipelinesRef);
+    iree_hal_vulkan_ExecutableDef_descriptor_set_layouts_add(
+        builder, descriptorSetLayoutsRef);
+    iree_hal_vulkan_ExecutableDef_pipeline_layouts_add(builder,
+                                                       pipelineLayoutsRef);
+    iree_hal_vulkan_ExecutableDef_shader_modules_add(builder, shaderModulesRef);
+
+    iree_hal_vulkan_ExecutableDef_end_as_root(builder);
 
     // Add the binary data to the target executable.
     auto binaryOp = executableBuilder.create<IREE::HAL::ExecutableBinaryOp>(
diff --git a/compiler/plugins/target/VulkanSPIRV/test/materialize_homogeneous_encodings.mlir b/compiler/plugins/target/VulkanSPIRV/test/materialize_homogeneous_encodings.mlir
index 3033264..992d9c7 100644
--- a/compiler/plugins/target/VulkanSPIRV/test/materialize_homogeneous_encodings.mlir
+++ b/compiler/plugins/target/VulkanSPIRV/test/materialize_homogeneous_encodings.mlir
@@ -26,7 +26,7 @@
 #map3 = affine_map<(d0, d1, d2) -> (d0, d1)>
 #encoding = #iree_encoding.encoding<operand_index = 0, op_type = matmul, element_types = [f32, f32, f32], user_indexing_maps = [#map1, #map2, #map3], round_dims_to = array<i64: 16, 16, 16>>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {target_triple = "x86_64-none-elf", cpu_features = "+avx512f"}>
-#device_target_llvm_cpu = #hal.device.target<"llvm-cpu", [#executable_target_embedded_elf_x86_64_]> : !hal.device
+#device_target_llvm_cpu = #hal.device.target<"local", [#executable_target_embedded_elf_x86_64_]> : !hal.device
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb">
 #device_target_vulkan = #hal.device.target<"vulkan", [#executable_target_vulkan_spirv_fb]> : !hal.device
 module attributes {hal.device.targets = [#hal.device.select<[#device_target_vulkan, #device_target_llvm_cpu]> : !hal.device]} {
diff --git a/compiler/plugins/target/WebGPUSPIRV/CMakeLists.txt b/compiler/plugins/target/WebGPUSPIRV/CMakeLists.txt
index caf4460..64b6b4c 100644
--- a/compiler/plugins/target/WebGPUSPIRV/CMakeLists.txt
+++ b/compiler/plugins/target/WebGPUSPIRV/CMakeLists.txt
@@ -52,9 +52,11 @@
     iree::compiler::Codegen::SPIRV
     iree::compiler::Dialect::Flow::IR
     iree::compiler::Dialect::HAL::Target
+    iree::compiler::Dialect::HAL::Utils::ExecutableDebugInfoUtils
     iree::compiler::PluginAPI
     iree::compiler::Utils
-    iree::schemas::wgsl_executable_def_c_fbs
+    iree::schemas::executable_debug_info_c_fbs
+    iree::schemas::webgpu_executable_def_c_fbs
     libtint
   PUBLIC
 )
diff --git a/compiler/plugins/target/WebGPUSPIRV/WebGPUSPIRVTarget.cpp b/compiler/plugins/target/WebGPUSPIRV/WebGPUSPIRVTarget.cpp
index 8fd7c53..9d3057c 100644
--- a/compiler/plugins/target/WebGPUSPIRV/WebGPUSPIRVTarget.cpp
+++ b/compiler/plugins/target/WebGPUSPIRV/WebGPUSPIRVTarget.cpp
@@ -11,9 +11,10 @@
 #include "iree/compiler/Codegen/WGSL/Passes.h"
 #include "iree/compiler/Dialect/Flow/IR/FlowDialect.h"
 #include "iree/compiler/Dialect/HAL/Target/TargetRegistry.h"
+#include "iree/compiler/Dialect/HAL/Utils/ExecutableDebugInfoUtils.h"
 #include "iree/compiler/PluginAPI/Client.h"
 #include "iree/compiler/Utils/FlatbufferUtils.h"
-#include "iree/schemas/wgsl_executable_def_builder.h"
+#include "iree/schemas/webgpu_executable_def_builder.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/FormatVariadic.h"
 #include "mlir/Dialect/Func/IR/FuncOps.h"
@@ -118,13 +119,6 @@
 
   void buildTranslationPassPipeline(IREE::HAL::ExecutableTargetAttr targetAttr,
                                     OpPassManager &passManager) override {
-    // WebGPU does not support push constants (yet?), so replace loads from
-    // push constants with loads from uniform buffers.
-    // The corresponding runtime code must perform similar emulation, based
-    // on the push constant count listed in the executable layout.
-    passManager.nest<ModuleOp>().nest<func::FuncOp>().addPass(
-        createWGSLReplacePushConstantsPass());
-
     buildSPIRVCodegenPassPipeline(passManager);
 
     // Prepare SPIR-V for WebGPU by expanding or removing unsupported ops.
@@ -236,23 +230,28 @@
 
     // Pack the WGSL and metadata into a FlatBuffer.
     FlatbufferBuilder builder;
-    iree_hal_wgsl_ExecutableDef_start_as_root(builder);
+    iree_hal_webgpu_ExecutableDef_start_as_root(builder);
 
-    iree_hal_wgsl_ShaderModuleDef_start(builder);
+    // Attach embedded source file contents.
+    auto sourceFilesRef = createSourceFilesVec(
+        serOptions.debugLevel, variantOp.getSourcesAttr(), builder);
+
+    iree_hal_webgpu_ShaderModuleDef_start(builder);
     auto wgslRef = builder.createString(wgsl.value());
-    iree_hal_wgsl_ShaderModuleDef_code_add(builder, wgslRef);
+    iree_hal_webgpu_ShaderModuleDef_wgsl_source_add(builder, wgslRef);
     // TODO(scotttodd): populate source map
-    auto shaderModuleRef = iree_hal_wgsl_ShaderModuleDef_end(builder);
+    auto shaderModuleRef = iree_hal_webgpu_ShaderModuleDef_end(builder);
 
-    auto shaderModulesVec = iree_hal_wgsl_ShaderModuleDef_vec_create(
+    auto shaderModulesVec = iree_hal_webgpu_ShaderModuleDef_vec_create(
         builder, &shaderModuleRef, /*len=*/1);
-    iree_hal_wgsl_ExecutableDef_shader_modules_add(builder, shaderModulesVec);
+    iree_hal_webgpu_ExecutableDef_shader_modules_add(builder, shaderModulesVec);
 
     auto entryPointsRef = flatbuffers_uint32_vec_create(
         builder, entryPointOrdinals.data(), entryPointOrdinals.size());
-    iree_hal_wgsl_ExecutableDef_entry_points_add(builder, entryPointsRef);
+    iree_hal_webgpu_ExecutableDef_entry_points_add(builder, entryPointsRef);
+    iree_hal_webgpu_ExecutableDef_source_files_add(builder, sourceFilesRef);
 
-    iree_hal_wgsl_ExecutableDef_end_as_root(builder);
+    iree_hal_webgpu_ExecutableDef_end_as_root(builder);
 
     // Add the binary data to the target executable.
     auto binaryOp = executableBuilder.create<IREE::HAL::ExecutableBinaryOp>(
diff --git a/compiler/src/iree/compiler/Codegen/Common/BufferizationAnalysis.cpp b/compiler/src/iree/compiler/Codegen/Common/BufferizationAnalysis.cpp
index 54094f8..089f6a2 100644
--- a/compiler/src/iree/compiler/Codegen/Common/BufferizationAnalysis.cpp
+++ b/compiler/src/iree/compiler/Codegen/Common/BufferizationAnalysis.cpp
@@ -137,8 +137,7 @@
   if (!v1InterfaceBinding || !v2InterfaceBinding) {
     return true;
   }
-  if (v1InterfaceBinding.getSet() != v2InterfaceBinding.getSet() ||
-      v1InterfaceBinding.getBinding() != v2InterfaceBinding.getBinding() ||
+  if (v1InterfaceBinding.getBinding() != v2InterfaceBinding.getBinding() ||
       v1InterfaceBinding.getByteOffset() !=
           v2InterfaceBinding.getByteOffset()) {
     // If the set, binding or offsets are different, map these to different
diff --git a/compiler/src/iree/compiler/Codegen/Common/CPU/test/llvmcpu_materialize_encoding.mlir b/compiler/src/iree/compiler/Codegen/Common/CPU/test/llvmcpu_materialize_encoding.mlir
index 6dd1983..4491ab1 100644
--- a/compiler/src/iree/compiler/Codegen/Common/CPU/test/llvmcpu_materialize_encoding.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/CPU/test/llvmcpu_materialize_encoding.mlir
@@ -1,18 +1,16 @@
 // RUN: iree-opt --pass-pipeline="builtin.module(func.func(iree-codegen-cpu-materialize-device-encoding),canonicalize,cse)" --split-input-file %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #encoding = #iree_encoding.encoding<operand_index = 0, op_type = matmul, element_types = [bf16, bf16, bf16], original_type = tensor<1x1000xbf16>, matmul_narrow_M = 1 : index, user_indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>], round_dims_to = array<i64: 16, 16, 16>>
 func.func @set_encoding_with_padding_semantics_bf16_x86_64_avx512f() attributes {
   hal.executable.target = #hal.executable.target<"xyz", "xyz", {target_triple="x86_64-xyz-xyz", cpu_features="+avx512f"}>
 }{
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x1000xbf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x1000xbf16, #encoding>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x1000xbf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x1000xbf16, #encoding>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1, 1000], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1x1000xbf16>> -> tensor<1x1000xbf16>
   %3 = iree_encoding.set_encoding %2 : tensor<1x1000xbf16> -> tensor<1x1000xbf16, #encoding>
   flow.dispatch.tensor.store %3, %1, offsets = [0, 0], sizes = [1, 1000], strides = [1, 1] : tensor<1x1000xbf16, #encoding> -> !flow.dispatch.tensor<writeonly:tensor<1x1000xbf16,  #encoding>>
@@ -37,11 +35,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -51,8 +47,8 @@
    hal.executable.target = #hal.executable.target<"xyz", "xyz", {target_triple="x86_64-xyz-xyz", cpu_features="+avx,+avx2,+fma"}>
 } {
   %c0 = arith.constant 0 : index
-  %8 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<7x7xf32>>
-  %11 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<7x7xf32, #encoding>>
+  %8 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<7x7xf32>>
+  %11 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<7x7xf32, #encoding>>
   %14 = flow.dispatch.tensor.load %8, offsets = [0, 0], sizes = [7, 7], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<7x7xf32>> -> tensor<7x7xf32>
   %17 = iree_encoding.set_encoding %14 : tensor<7x7xf32> -> tensor<7x7xf32, #encoding>
   flow.dispatch.tensor.store %17, %11, offsets = [0, 0], sizes = [7, 7], strides = [1, 1] : tensor<7x7xf32, #encoding> -> !flow.dispatch.tensor<writeonly:tensor<7x7xf32, #encoding>>
@@ -69,11 +65,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>
 #map1 = affine_map<(d0, d1, d2, d3) -> (d0, d3, d2)>
@@ -83,8 +77,8 @@
    hal.executable.target = #hal.executable.target<"xyz", "xyz", {target_triple="x86_64-xyz-xyz", cpu_features="+avx,+avx2,+fma"}>
 } {
   %c0 = arith.constant 0 : index
-  %8 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128x80x32xf32>>
-  %11 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x80x32xf32, #encoding>>
+  %8 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128x80x32xf32>>
+  %11 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x80x32xf32, #encoding>>
   %14 = flow.dispatch.tensor.load %8, offsets = [0, 0, 0], sizes = [128, 80, 32], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<128x80x32xf32>> -> tensor<128x80x32xf32>
   %17 = iree_encoding.set_encoding %14 : tensor<128x80x32xf32> -> tensor<128x80x32xf32, #encoding>
   flow.dispatch.tensor.store %17, %11, offsets = [0, 0, 0], sizes = [128, 80, 32], strides = [1, 1, 1]
@@ -93,8 +87,8 @@
   return
 }
 // CHECK-LABEL:    func @set_encoding_128x80x32_batch_matmul_LHS(
-//       CHECK:      %[[INPUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) {{.*}} !flow.dispatch.tensor<readonly:tensor<128x80x32xf32>>
-//       CHECK:      %[[OUTPUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) {{.*}} !flow.dispatch.tensor<writeonly:tensor<128x10x32x8x1xf32>>
+//       CHECK:      %[[INPUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) {{.*}} !flow.dispatch.tensor<readonly:tensor<128x80x32xf32>>
+//       CHECK:      %[[OUTPUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) {{.*}} !flow.dispatch.tensor<writeonly:tensor<128x10x32x8x1xf32>>
 //       CHECK:      %[[INPUT:.+]] = flow.dispatch.tensor.load %[[INPUT_BINDING]], offsets = [0, 0, 0], sizes = [128, 80, 32], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<128x80x32xf32>> -> tensor<128x80x32xf32>
 //       CHECK:      %[[EMPTY:.+]] = tensor.empty() : tensor<128x10x32x8x1xf32>
 //       CHECK:      %[[PACK:.+]] = tensor.pack %[[INPUT]] outer_dims_perm = [0, 1, 2] inner_dims_pos = [1, 2] inner_tiles = [8, 1] into %[[EMPTY]] : tensor<128x80x32xf32> -> tensor<128x10x32x8x1xf32>
@@ -102,11 +96,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>
 #map1 = affine_map<(d0, d1, d2, d3) -> (d0, d3, d2)>
@@ -118,8 +110,8 @@
   %c0 = arith.constant 0 : index
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : i32
   %5 = arith.index_castui %0 {stream.alignment = 64 : index} : i32 to index
-  %10 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128x32x320xf32>>
-  %13 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%5) : !flow.dispatch.tensor<writeonly:tensor<128x32x320xf32, #encoding>>
+  %10 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128x32x320xf32>>
+  %13 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%5) : !flow.dispatch.tensor<writeonly:tensor<128x32x320xf32, #encoding>>
   %16 = flow.dispatch.tensor.load %10, offsets = [0, 0, 0], sizes = [128, 32, 320], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<128x32x320xf32>> -> tensor<128x32x320xf32>
   %19 = iree_encoding.set_encoding %16 : tensor<128x32x320xf32> -> tensor<128x32x320xf32, #encoding>
   flow.dispatch.tensor.store %19, %13, offsets = [0, 0, 0], sizes = [128, 32, 320], strides = [1, 1, 1]
@@ -128,8 +120,8 @@
   return
 }
 // CHECK-LABEL:    func @set_encoding_128x32x320_batch_matmul_RHS(
-//       CHECK:      %[[INPUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) {{.*}} !flow.dispatch.tensor<readonly:tensor<128x32x320xf32>>
-//       CHECK:      %[[OUTPUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) {{.*}} !flow.dispatch.tensor<writeonly:tensor<128x40x32x8x1xf32>>
+//       CHECK:      %[[INPUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) {{.*}} !flow.dispatch.tensor<readonly:tensor<128x32x320xf32>>
+//       CHECK:      %[[OUTPUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) {{.*}} !flow.dispatch.tensor<writeonly:tensor<128x40x32x8x1xf32>>
 //       CHECK:      %[[INPUT:.+]] = flow.dispatch.tensor.load %[[INPUT_BINDING]], offsets = [0, 0, 0], sizes = [128, 32, 320], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<128x32x320xf32>> -> tensor<128x32x320xf32>
 //       CHECK:      %[[EMPTY:.+]] = tensor.empty() : tensor<128x40x32x8x1xf32>
 //       CHECK:      %[[PACK:.+]] = tensor.pack %[[INPUT]] outer_dims_perm = [0, 2, 1] inner_dims_pos = [2, 1] inner_tiles = [8, 1] into %[[EMPTY]] : tensor<128x32x320xf32> -> tensor<128x40x32x8x1xf32>
@@ -137,11 +129,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>
 #map1 = affine_map<(d0, d1, d2, d3) -> (d0, d3, d2)>
@@ -153,8 +143,8 @@
   %c0 = arith.constant 0 : index
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : i32
   %3 = arith.index_castui %0 : i32 to index
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x80x320xf32>>
-  %9 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%3) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128x80x320xf32, #encoding>>
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x80x320xf32>>
+  %9 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%3) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128x80x320xf32, #encoding>>
   %10 = flow.dispatch.tensor.load %9, offsets = [0, 0, 0], sizes = [128, 80, 320], strides = [1, 1, 1]
       : !flow.dispatch.tensor<readonly:tensor<128x80x320xf32, #encoding>>
       -> tensor<128x80x320xf32, #encoding>
@@ -166,9 +156,9 @@
 //   CHECK-DAG:   %[[C0:.+]] = arith.constant 0 : index
 //   CHECK-DAG:   %[[D0:.+]] = hal.interface.constant.load layout(#pipeline_layout) ordinal(0)
 //       CHECK:   %[[CAST:.+]] = arith.index_castui %[[D0]] : i32 to index
-//       CHECK:   %[[OUTPUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) alignment(64) offset(%[[C0]])
+//       CHECK:   %[[OUTPUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) alignment(64) offset(%[[C0]])
 //  CHECK-SAME:       : !flow.dispatch.tensor<writeonly:tensor<128x80x320xf32>>
-//       CHECK:   %[[INPUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64) offset(%[[CAST]])
+//       CHECK:   %[[INPUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64) offset(%[[CAST]])
 //  CHECK-SAME:       : !flow.dispatch.tensor<readonly:tensor<128x10x40x8x8xf32>>
 //       CHECK:   %[[INPUT:.+]] = flow.dispatch.tensor.load %[[INPUT_BINDING]]
 //  CHECK-SAME:       offsets = [0, 0, 0, 0, 0], sizes = [128, 10, 40, 8, 8], strides = [1, 1, 1, 1, 1]
@@ -255,12 +245,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -275,11 +263,11 @@
   %M = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %N = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %K = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xf32, #encoding_lhs>>{%M, %K}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xf32, #encoding_rhs>>{%K, %N}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readwrite:tensor<?x?xf32, #encoding_result>>{%M, %N}
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [%M, %K], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xf32, #encoding_lhs>>{%M, %K}
@@ -307,12 +295,12 @@
 //   CHECK-DAG:   %[[N:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(1)
 //   CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2)
 //   CHECK-DAG:   %[[TILED_M:.+]] = affine.apply #[[$MAP0]]()[%[[M]]]
-//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x8x1xf32>>{%[[TILED_M]], %[[K]]}
 //       CHECK:   %[[TILED_N:.+]] = affine.apply #[[$MAP0]]()[%[[N]]]
-//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x8x1xf32>>{%[[TILED_N]], %[[K]]}
-//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //  CHECK-SAME:       !flow.dispatch.tensor<readwrite:tensor<?x?x8x8xf32>>{%[[TILED_M]], %[[TILED_N]]}
 //       CHECK:   %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]]
 //  CHECK-SAME:       offsets = [0, 0, 0, 0], sizes = [%[[TILED_M]], %[[K]], 8, 1], strides = [1, 1, 1, 1]
@@ -352,12 +340,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -369,11 +355,11 @@
   hal.executable.target = #hal.executable.target<"xyz", "xyz", {target_triple="aarch64-xyz-xyz"}>
 } {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<16x16xf32, #encoding_lhs>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<16x1xf32, #encoding_rhs>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readwrite:tensor<16x1xf32, #encoding_result>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [16, 16], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<16x16xf32, #encoding_lhs>>
@@ -396,11 +382,11 @@
 }
 // CHECK-LABEL: func @matvec_lowering_f32f32f32_aarch64()
 //   CHECK-DAG:   %[[C0:.+]] = arith.constant 0 : index
-//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<2x16x8x1xf32>>
-//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<1x16x1x1xf32>>
-//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //  CHECK-SAME:       !flow.dispatch.tensor<readwrite:tensor<1x2x1x8xf32>>
 //       CHECK:   %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]]
 //  CHECK-SAME:       offsets = [0, 0, 0, 0], sizes = [2, 16, 8, 1], strides = [1, 1, 1, 1]
@@ -416,12 +402,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -436,11 +420,11 @@
   %M = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %N = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %K = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xf16, #encoding_lhs>>{%M, %K}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xf16, #encoding_rhs>>{%K, %N}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readwrite:tensor<?x?xf16, #encoding_result>>{%M, %N}
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [%M, %K], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xf16, #encoding_lhs>>{%M, %K}
@@ -468,12 +452,12 @@
 //   CHECK-DAG:   %[[N:.+]] = hal.interface.constant.load layout(#pipeline_layout) ordinal(1)
 //   CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout(#pipeline_layout) ordinal(2)
 //   CHECK-DAG:   %[[TILED_M:.+]] = affine.apply #[[$MAP0]]()[%[[M]]]
-//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x8x1xf16>>{%[[TILED_M]], %[[K]]}
 //       CHECK:   %[[TILED_N:.+]] = affine.apply #[[$MAP0]]()[%[[N]]]
-//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x8x1xf16>>{%[[TILED_N]], %[[K]]}
-//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //  CHECK-SAME:       !flow.dispatch.tensor<readwrite:tensor<?x?x8x8xf16>>{%[[TILED_M]], %[[TILED_N]]}
 //       CHECK:   %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]]
 //  CHECK-SAME:       offsets = [0, 0, 0, 0], sizes = [%[[TILED_M]], %[[K]], 8, 1], strides = [1, 1, 1, 1]
@@ -489,12 +473,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -509,11 +491,11 @@
   %M = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %N = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %K = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xf32, #encoding_lhs>>{%M, %K}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xf32, #encoding_rhs>>{%K, %N}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readwrite:tensor<?x?xf32, #encoding_result>>{%M, %N}
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [%M, %K], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xf32, #encoding_lhs>>{%M, %K}
@@ -542,12 +524,12 @@
 //   CHECK-DAG:   %[[N:.+]] = hal.interface.constant.load layout(#pipeline_layout) ordinal(1)
 //   CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout(#pipeline_layout) ordinal(2)
 //   CHECK-DAG:   %[[TILED_M:.+]] = affine.apply #[[$MAP0]]()[%[[M]]]
-//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x8x1xf32>>{%[[TILED_M]], %[[K]]}
 //       CHECK:   %[[TILED_N:.+]] = affine.apply #[[$MAP1]]()[%[[N]]]
-//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x4x1xf32>>{%[[TILED_N]], %[[K]]}
-//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //  CHECK-SAME:       !flow.dispatch.tensor<readwrite:tensor<?x?x8x4xf32>>{%[[TILED_M]], %[[TILED_N]]}
 //       CHECK:   %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]]
 //  CHECK-SAME:       offsets = [0, 0, 0, 0], sizes = [%[[TILED_M]], %[[K]], 8, 1], strides = [1, 1, 1, 1]
@@ -563,12 +545,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -583,11 +563,11 @@
   %M = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %N = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %K = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xf32, #encoding_lhs>>{%M, %K}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xf32, #encoding_rhs>>{%K, %N}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readwrite:tensor<?x?xf32, #encoding_result>>{%M, %N}
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [%M, %K], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xf32, #encoding_lhs>>{%M, %K}
@@ -615,12 +595,12 @@
 //   CHECK-DAG:   %[[N:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(1)
 //   CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2)
 //   CHECK-DAG:   %[[TILED_M:.+]] = affine.apply #[[$MAP0]]()[%[[M]]]
-//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x8x1xf32>>{%[[TILED_M]], %[[K]]}
 //       CHECK:   %[[TILED_N:.+]] = affine.apply #[[$MAP0]]()[%[[N]]]
-//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x8x1xf32>>{%[[TILED_N]], %[[K]]}
-//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //  CHECK-SAME:       !flow.dispatch.tensor<readwrite:tensor<?x?x8x8xf32>>{%[[TILED_M]], %[[TILED_N]]}
 //       CHECK:   %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]]
 //  CHECK-SAME:       offsets = [0, 0, 0, 0], sizes = [%[[TILED_M]], %[[K]], 8, 1], strides = [1, 1, 1, 1]
@@ -636,12 +616,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -656,11 +634,11 @@
   %M = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %N = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %K = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xf32, #encoding_lhs>>{%M, %K}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xf32, #encoding_rhs>>{%K, %N}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readwrite:tensor<?x?xf32, #encoding_result>>{%M, %N}
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [%M, %K], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xf32, #encoding_lhs>>{%M, %K}
@@ -688,12 +666,12 @@
 //   CHECK-DAG:   %[[N:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(1)
 //   CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2)
 //   CHECK-DAG:   %[[TILED_M:.+]] = affine.apply #[[$MAP0]]()[%[[M]]]
-//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x16x1xf32>>{%[[TILED_M]], %[[K]]}
 //       CHECK:   %[[TILED_N:.+]] = affine.apply #[[$MAP0]]()[%[[N]]]
-//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x16x1xf32>>{%[[TILED_N]], %[[K]]}
-//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //  CHECK-SAME:       !flow.dispatch.tensor<readwrite:tensor<?x?x16x16xf32>>{%[[TILED_M]], %[[TILED_N]]}
 //       CHECK:   %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]]
 //  CHECK-SAME:       offsets = [0, 0, 0, 0], sizes = [%[[TILED_M]], %[[K]], 16, 1], strides = [1, 1, 1, 1]
@@ -709,12 +687,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -729,11 +705,11 @@
   %M = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %N = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %K = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xf16, #encoding_lhs>>{%M, %K}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xf16, #encoding_rhs>>{%K, %N}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readwrite:tensor<?x?xf32, #encoding_result>>{%M, %N}
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [%M, %K], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xf16, #encoding_lhs>>{%M, %K}
@@ -761,12 +737,12 @@
 //   CHECK-DAG:   %[[N:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(1)
 //   CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2)
 //   CHECK-DAG:   %[[TILED_M:.+]] = affine.apply #[[$MAP0]]()[%[[M]]]
-//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x16x1xf16>>{%[[TILED_M]], %[[K]]}
 //       CHECK:   %[[TILED_N:.+]] = affine.apply #[[$MAP0]]()[%[[N]]]
-//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x16x1xf16>>{%[[TILED_N]], %[[K]]}
-//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //  CHECK-SAME:       !flow.dispatch.tensor<readwrite:tensor<?x?x16x16xf32>>{%[[TILED_M]], %[[TILED_N]]}
 //       CHECK:   %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]]
 //  CHECK-SAME:       offsets = [0, 0, 0, 0], sizes = [%[[TILED_M]], %[[K]], 16, 1], strides = [1, 1, 1, 1]
@@ -782,12 +758,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -802,11 +776,11 @@
   %M = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %N = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %K = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xf16, #encoding_lhs>>{%M, %K}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xf16, #encoding_rhs>>{%K, %N}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readwrite:tensor<?x?xf16, #encoding_result>>{%M, %N}
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [%M, %K], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xf16, #encoding_lhs>>{%M, %K}
@@ -834,12 +808,12 @@
 //   CHECK-DAG:   %[[N:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(1)
 //   CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2)
 //   CHECK-DAG:   %[[TILED_M:.+]] = affine.apply #[[$MAP0]]()[%[[M]]]
-//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x16x1xf16>>{%[[TILED_M]], %[[K]]}
 //       CHECK:   %[[TILED_N:.+]] = affine.apply #[[$MAP0]]()[%[[N]]]
-//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x16x1xf16>>{%[[TILED_N]], %[[K]]}
-//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //  CHECK-SAME:       !flow.dispatch.tensor<readwrite:tensor<?x?x16x16xf16>>{%[[TILED_M]], %[[TILED_N]]}
 //       CHECK:   %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]]
 //  CHECK-SAME:       offsets = [0, 0, 0, 0], sizes = [%[[TILED_M]], %[[K]], 16, 1], strides = [1, 1, 1, 1]
@@ -855,12 +829,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -875,11 +847,11 @@
   %M = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %N = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %K = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xbf16, #encoding_lhs>>{%M, %K}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xbf16, #encoding_rhs>>{%K, %N}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readwrite:tensor<?x?xf32, #encoding_result>>{%M, %N}
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [%M, %K], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xbf16, #encoding_lhs>>{%M, %K}
@@ -907,12 +879,12 @@
 //   CHECK-DAG:   %[[N:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(1)
 //   CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2)
 //   CHECK-DAG:   %[[TILED_M:.+]] = affine.apply #[[$MAP0]]()[%[[M]]]
-//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x16x1xbf16>>{%[[TILED_M]], %[[K]]}
 //       CHECK:   %[[TILED_N:.+]] = affine.apply #[[$MAP0]]()[%[[N]]]
-//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x16x1xbf16>>{%[[TILED_N]], %[[K]]}
-//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //  CHECK-SAME:       !flow.dispatch.tensor<readwrite:tensor<?x?x16x16xf32>>{%[[TILED_M]], %[[TILED_N]]}
 //       CHECK:   %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]]
 //  CHECK-SAME:       offsets = [0, 0, 0, 0], sizes = [%[[TILED_M]], %[[K]], 16, 1], strides = [1, 1, 1, 1]
@@ -928,12 +900,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -948,11 +918,11 @@
   %M = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %N = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %K = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xbf16, #encoding_lhs>>{%M, %K}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xbf16, #encoding_rhs>>{%K, %N}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readwrite:tensor<?x?xbf16, #encoding_result>>{%M, %N}
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [%M, %K], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xbf16, #encoding_lhs>>{%M, %K}
@@ -980,12 +950,12 @@
 //   CHECK-DAG:   %[[N:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(1)
 //   CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2)
 //   CHECK-DAG:   %[[TILED_M:.+]] = affine.apply #[[$MAP0]]()[%[[M]]]
-//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x16x1xbf16>>{%[[TILED_M]], %[[K]]}
 //       CHECK:   %[[TILED_N:.+]] = affine.apply #[[$MAP0]]()[%[[N]]]
-//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x16x1xbf16>>{%[[TILED_N]], %[[K]]}
-//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //  CHECK-SAME:       !flow.dispatch.tensor<readwrite:tensor<?x?x16x16xbf16>>{%[[TILED_M]], %[[TILED_N]]}
 //       CHECK:   %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]]
 //  CHECK-SAME:       offsets = [0, 0, 0, 0], sizes = [%[[TILED_M]], %[[K]], 16, 1], strides = [1, 1, 1, 1]
@@ -1001,12 +971,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -1021,11 +989,11 @@
   %M = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %N = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %K = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xbf16, #encoding_lhs>>{%M, %K}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xbf16, #encoding_rhs>>{%K, %N}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readwrite:tensor<?x?xf32, #encoding_result>>{%M, %N}
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [%M, %K], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xbf16, #encoding_lhs>>{%M, %K}
@@ -1055,12 +1023,12 @@
 //   CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2)
 //   CHECK-DAG:   %[[TILED_M:.+]] = affine.apply #[[$MAP0]]()[%[[M]]]
 //   CHECK-DAG:   %[[TILED_K:.+]] = affine.apply #[[$MAP1]]()[%[[K]]]
-//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x16x2xbf16>>{%[[TILED_M]], %[[TILED_K]]}
 //       CHECK:   %[[TILED_N:.+]] = affine.apply #[[$MAP0]]()[%[[N]]]
-//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x16x2xbf16>>{%[[TILED_N]], %[[TILED_K]]}
-//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //  CHECK-SAME:       !flow.dispatch.tensor<readwrite:tensor<?x?x16x16xf32>>{%[[TILED_M]], %[[TILED_N]]}
 //       CHECK:   %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]]
 //  CHECK-SAME:       offsets = [0, 0, 0, 0], sizes = [%[[TILED_M]], %[[TILED_K]], 16, 2], strides = [1, 1, 1, 1]
@@ -1076,12 +1044,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -1096,11 +1062,11 @@
   %M = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %N = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %K = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xbf16, #encoding_lhs>>{%M, %K}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xbf16, #encoding_rhs>>{%K, %N}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readwrite:tensor<?x?xbf16, #encoding_result>>{%M, %N}
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [%M, %K], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xbf16, #encoding_lhs>>{%M, %K}
@@ -1130,12 +1096,12 @@
 //   CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2)
 //   CHECK-DAG:   %[[TILED_M:.+]] = affine.apply #[[$MAP0]]()[%[[M]]]
 //   CHECK-DAG:   %[[TILED_K:.+]] = affine.apply #[[$MAP1]]()[%[[K]]]
-//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x16x2xbf16>>{%[[TILED_M]], %[[TILED_K]]}
 //       CHECK:   %[[TILED_N:.+]] = affine.apply #[[$MAP0]]()[%[[N]]]
-//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x16x2xbf16>>{%[[TILED_N]], %[[TILED_K]]}
-//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //  CHECK-SAME:       !flow.dispatch.tensor<readwrite:tensor<?x?x16x16xbf16>>{%[[TILED_M]], %[[TILED_N]]}
 //       CHECK:   %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]]
 //  CHECK-SAME:       offsets = [0, 0, 0, 0], sizes = [%[[TILED_M]], %[[TILED_K]], 16, 2], strides = [1, 1, 1, 1]
@@ -1151,12 +1117,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -1171,11 +1135,11 @@
   %M = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %N = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %K = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xf32, #encoding_lhs>>{%M, %K}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xf16, #encoding_rhs>>{%K, %N}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readwrite:tensor<?x?xf16, #encoding_result>>{%M, %N}
   %lhs_f32 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [%M, %K], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xf32, #encoding_lhs>>{%M, %K}
@@ -1213,9 +1177,9 @@
 //   CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2) : index
 //   CHECK-DAG:   %[[M_CEILDIV_8:.+]] = affine.apply #[[$MAP_CEILDIV_8]]()[%[[M]]]
 //   CHECK-DAG:   %[[N_CEILDIV_8:.+]] = affine.apply #[[$MAP_CEILDIV_8]]()[%[[N]]]
-//   CHECK-DAG:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) {{.*}} : !flow.dispatch.tensor<readonly:tensor<?x?x8x1xf32>>{%[[M_CEILDIV_8]], %[[K]]}
-//   CHECK-DAG:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) {{.*}} : !flow.dispatch.tensor<readonly:tensor<?x?x8x1xf16>>{%[[N_CEILDIV_8]], %[[K]]}
-//   CHECK-DAG:   %[[OUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) {{.*}} : !flow.dispatch.tensor<readwrite:tensor<?x?x8x8xf16>>{%[[M_CEILDIV_8]], %[[N_CEILDIV_8]]}
+//   CHECK-DAG:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) {{.*}} : !flow.dispatch.tensor<readonly:tensor<?x?x8x1xf32>>{%[[M_CEILDIV_8]], %[[K]]}
+//   CHECK-DAG:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) {{.*}} : !flow.dispatch.tensor<readonly:tensor<?x?x8x1xf16>>{%[[N_CEILDIV_8]], %[[K]]}
+//   CHECK-DAG:   %[[OUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2) {{.*}} : !flow.dispatch.tensor<readwrite:tensor<?x?x8x8xf16>>{%[[M_CEILDIV_8]], %[[N_CEILDIV_8]]}
 //       CHECK:   %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]], offsets = [0, 0, 0, 0], sizes = [%[[M_CEILDIV_8]], %[[K]], 8, 1], {{.*}} -> tensor<?x?x8x1xf32>
 //       CHECK:   %[[RHS:.+]] = flow.dispatch.tensor.load %[[RHS_BINDING]], offsets = [0, 0, 0, 0], sizes = [%[[N_CEILDIV_8]], %[[K]], 8, 1], {{.*}} -> tensor<?x?x8x1xf16>
 //       CHECK:   %[[OUT:.+]] = flow.dispatch.tensor.load %[[OUT_BINDING]], offsets = [0, 0, 0, 0], sizes = [%[[M_CEILDIV_8]], %[[N_CEILDIV_8]], 8, 8], {{.*}} -> tensor<?x?x8x8xf16>
@@ -1226,12 +1190,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -1246,11 +1208,11 @@
   %M = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %N = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %K = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xf32, #encoding_lhs>>{%M, %K}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xf16, #encoding_rhs>>{%K, %N}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readwrite:tensor<?x?xf16, #encoding_result>>{%M, %N}
   %lhs_f32 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [%M, %K], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xf32, #encoding_lhs>>{%M, %K}
@@ -1289,9 +1251,9 @@
 //   CHECK-DAG: %[[K:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2) : index
 //   CHECK-DAG: %[[M_CEILDIV_16:.+]] = affine.apply #[[$MAP_CEILDIV_16]]()[%[[M]]]
 //   CHECK-DAG: %[[N_CEILDIV_16:.+]] = affine.apply #[[$MAP_CEILDIV_16]]()[%[[N]]]
-//   CHECK-DAG: %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) {{.*}} : !flow.dispatch.tensor<readonly:tensor<?x?x16x1xf32>>{%[[M_CEILDIV_16]], %[[K]]}
-//   CHECK-DAG: %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) {{.*}} : !flow.dispatch.tensor<readonly:tensor<?x?x16x1xf16>>{%[[N_CEILDIV_16]], %[[K]]}
-//   CHECK-DAG: %[[OUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) {{.*}} : !flow.dispatch.tensor<readwrite:tensor<?x?x16x16xf16>>{%[[M_CEILDIV_16]], %[[N_CEILDIV_16]]}
+//   CHECK-DAG: %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) {{.*}} : !flow.dispatch.tensor<readonly:tensor<?x?x16x1xf32>>{%[[M_CEILDIV_16]], %[[K]]}
+//   CHECK-DAG: %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) {{.*}} : !flow.dispatch.tensor<readonly:tensor<?x?x16x1xf16>>{%[[N_CEILDIV_16]], %[[K]]}
+//   CHECK-DAG: %[[OUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2) {{.*}} : !flow.dispatch.tensor<readwrite:tensor<?x?x16x16xf16>>{%[[M_CEILDIV_16]], %[[N_CEILDIV_16]]}
 //       CHECK: %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]], offsets = [0, 0, 0, 0], sizes = [%[[M_CEILDIV_16]], %[[K]], 16, 1], {{.*}} -> tensor<?x?x16x1xf32>
 //       CHECK: %[[RHS:.+]] = flow.dispatch.tensor.load %[[RHS_BINDING]], offsets = [0, 0, 0, 0], sizes = [%[[N_CEILDIV_16]], %[[K]], 16, 1], {{.*}} -> tensor<?x?x16x1xf16>
 //       CHECK: %[[OUT:.+]] = flow.dispatch.tensor.load %[[OUT_BINDING]], offsets = [0, 0, 0, 0], sizes = [%[[M_CEILDIV_16]], %[[N_CEILDIV_16]], 16, 16], {{.*}} -> tensor<?x?x16x16xf16>
@@ -1302,12 +1264,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -1322,11 +1282,11 @@
   %M = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %N = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %K = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_lhs>>{%M, %K}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_rhs>>{%K, %N}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readwrite:tensor<?x?xi32, #encoding_result>>{%M, %N}
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [%M, %K], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_lhs>>{%M, %K}
@@ -1352,11 +1312,11 @@
 //   CHECK-DAG:   %[[M:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(0)
 //   CHECK-DAG:   %[[N:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(1)
 //   CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2)
-//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?xi8>>{%[[M]], %[[K]]}
-//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?xi8>>{%[[K]], %[[N]]}
-//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //  CHECK-SAME:       !flow.dispatch.tensor<readwrite:tensor<?x?xi32>>{%[[M]], %[[N]]}
 //       CHECK:   %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]]
 //  CHECK-SAME:       offsets = [0, 0], sizes = [%[[M]], %[[K]]], strides = [1, 1]
@@ -1372,12 +1332,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -1392,11 +1350,11 @@
   %M = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %N = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %K = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_lhs>>{%M, %K}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_rhs>>{%K, %N}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readwrite:tensor<?x?xi32, #encoding_result>>{%M, %N}
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [%M, %K], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_lhs>>{%M, %K}
@@ -1426,12 +1384,12 @@
 //   CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2)
 //   CHECK-DAG:   %[[TILED_M:.+]] = affine.apply #[[$MAP0]]()[%[[M]]]
 //   CHECK-DAG:   %[[TILED_K:.+]] = affine.apply #[[$MAP1]]()[%[[K]]]
-//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x8x4xi8>>{%[[TILED_M]], %[[TILED_K]]}
 //       CHECK:   %[[TILED_N:.+]] = affine.apply #[[$MAP0]]()[%[[N]]]
-//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x8x4xi8>>{%[[TILED_N]], %[[TILED_K]]}
-//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //  CHECK-SAME:       !flow.dispatch.tensor<readwrite:tensor<?x?x8x8xi32>>{%[[TILED_M]], %[[TILED_N]]}
 //       CHECK:   %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]]
 //  CHECK-SAME:       offsets = [0, 0, 0, 0], sizes = [%[[TILED_M]], %[[TILED_K]], 8, 4], strides = [1, 1, 1, 1]
@@ -1447,12 +1405,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -1467,11 +1423,11 @@
   %M = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %N = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %K = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_lhs>>{%M, %K}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_rhs>>{%K, %N}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readwrite:tensor<?x?xi32, #encoding_result>>{%M, %N}
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [%M, %K], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_lhs>>{%M, %K}
@@ -1500,12 +1456,12 @@
 //   CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2)
 //   CHECK-DAG:   %[[TILED_M:.+]] = affine.apply #[[$MAP0]]()[%[[M]]]
 //   CHECK-DAG:   %[[TILED_K:.+]] = affine.apply #[[$MAP0]]()[%[[K]]]
-//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x8x8xi8>>{%[[TILED_M]], %[[TILED_K]]}
 //       CHECK:   %[[TILED_N:.+]] = affine.apply #[[$MAP0]]()[%[[N]]]
-//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x8x8xi8>>{%[[TILED_N]], %[[TILED_K]]}
-//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //  CHECK-SAME:       !flow.dispatch.tensor<readwrite:tensor<?x?x8x8xi32>>{%[[TILED_M]], %[[TILED_N]]}
 //       CHECK:   %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]]
 //  CHECK-SAME:       offsets = [0, 0, 0, 0], sizes = [%[[TILED_M]], %[[TILED_K]], 8, 8], strides = [1, 1, 1, 1]
@@ -1521,12 +1477,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -1541,11 +1495,11 @@
   %M = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %N = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %K = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_lhs>>{%M, %K}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xi4, #encoding_rhs>>{%K, %N}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readwrite:tensor<?x?xi32, #encoding_result>>{%M, %N}
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [%M, %K], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_lhs>>{%M, %K}
@@ -1576,12 +1530,12 @@
 //   CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2)
 //   CHECK-DAG:   %[[TILED_M:.+]] = affine.apply #[[$MAP0]]()[%[[M]]]
 //       CHECK:   %[[TILED_K:.+]] = affine.apply #[[$MAP1]]()[%[[K]]]
-//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x4x2xi8>>{%[[TILED_M]], %[[TILED_K]]}
 //       CHECK:   %[[TILED_N:.+]] = affine.apply #[[$MAP2]]()[%[[N]]]
-//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x16x2xi4>>{%[[TILED_N]], %[[TILED_K]]}
-//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //  CHECK-SAME:       !flow.dispatch.tensor<readwrite:tensor<?x?x4x16xi32>>{%[[TILED_M]], %[[TILED_N]]}
 //       CHECK:   %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]]
 //  CHECK-SAME:       offsets = [0, 0, 0, 0], sizes = [%[[TILED_M]], %[[TILED_K]], 4, 2], strides = [1, 1, 1, 1]
@@ -1597,12 +1551,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -1617,11 +1569,11 @@
   %M = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %N = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %K = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_lhs>>{%M, %K}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xi4, #encoding_rhs>>{%K, %N}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readwrite:tensor<?x?xi32, #encoding_result>>{%M, %N}
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [%M, %K], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_lhs>>{%M, %K}
@@ -1650,12 +1602,12 @@
 //   CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2)
 //   CHECK-DAG:   %[[TILED_M:.+]] = affine.apply #[[$MAP0]]()[%[[M]]]
 //   CHECK-DAG:   %[[TILED_K:.+]] = affine.apply #[[$MAP0]]()[%[[K]]]
-//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x8x8xi8>>{%[[TILED_M]], %[[TILED_K]]}
 //       CHECK:   %[[TILED_N:.+]] = affine.apply #[[$MAP0]]()[%[[N]]]
-//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x8x8xi4>>{%[[TILED_N]], %[[TILED_K]]}
-//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //  CHECK-SAME:       !flow.dispatch.tensor<readwrite:tensor<?x?x8x8xi32>>{%[[TILED_M]], %[[TILED_N]]}
 //       CHECK:   %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]]
 //  CHECK-SAME:       offsets = [0, 0, 0, 0], sizes = [%[[TILED_M]], %[[TILED_K]], 8, 8], strides = [1, 1, 1, 1]
@@ -1671,12 +1623,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -1691,11 +1641,11 @@
   %M = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %N = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %K = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_lhs>>{%M, %K}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xi4, #encoding_rhs>>{%K, %N}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readwrite:tensor<?x?xi32, #encoding_result>>{%M, %N}
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [%M, %K], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_lhs>>{%M, %K}
@@ -1726,12 +1676,12 @@
 //   CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2)
 //   CHECK-DAG:   %[[TILED_M:.+]] = affine.apply #[[$MAP0]]()[%[[M]]]
 //   CHECK-DAG:   %[[TILED_K:.+]] = affine.apply #[[$MAP1]]()[%[[K]]]
-//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x4x16xi8>>{%[[TILED_M]], %[[TILED_K]]}
 //       CHECK:   %[[TILED_N:.+]] = affine.apply #[[$MAP2]]()[%[[N]]]
-//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x8x16xi4>>{%[[TILED_N]], %[[TILED_K]]}
-//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //  CHECK-SAME:       !flow.dispatch.tensor<readwrite:tensor<?x?x4x8xi32>>{%[[TILED_M]], %[[TILED_N]]}
 //       CHECK:   %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]]
 //  CHECK-SAME:       offsets = [0, 0, 0, 0], sizes = [%[[TILED_M]], %[[TILED_K]], 4, 16], strides = [1, 1, 1, 1]
@@ -1802,12 +1752,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -1822,11 +1770,11 @@
   %M = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %N = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %K = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_lhs>>{%M, %K}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_rhs>>{%K, %N}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readwrite:tensor<?x?xi32, #encoding_result>>{%M, %N}
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [%M, %K], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_lhs>>{%M, %K}
@@ -1856,12 +1804,12 @@
 //   CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2)
 //   CHECK-DAG:   %[[TILED_M:.+]] = affine.apply #[[$MAP0]]()[%[[M]]]
 //   CHECK-DAG:   %[[TILED_K:.+]] = affine.apply #[[$MAP1]]()[%[[K]]]
-//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x8x4xi8>>{%[[TILED_M]], %[[TILED_K]]}
 //       CHECK:   %[[TILED_N:.+]] = affine.apply #[[$MAP0]]()[%[[N]]]
-//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x8x4xi8>>{%[[TILED_N]], %[[TILED_K]]}
-//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //  CHECK-SAME:       !flow.dispatch.tensor<readwrite:tensor<?x?x8x8xi32>>{%[[TILED_M]], %[[TILED_N]]}
 //       CHECK:   %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]]
 //  CHECK-SAME:       offsets = [0, 0, 0, 0], sizes = [%[[TILED_M]], %[[TILED_K]], 8, 4], strides = [1, 1, 1, 1]
@@ -1877,12 +1825,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -1897,11 +1843,11 @@
   %M = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %N = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %K = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_lhs>>{%M, %K}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_rhs>>{%K, %N}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readwrite:tensor<?x?xi32, #encoding_result>>{%M, %N}
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [%M, %K], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_lhs>>{%M, %K}
@@ -1931,12 +1877,12 @@
 //   CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2)
 //   CHECK-DAG:   %[[TILED_M:.+]] = affine.apply #[[$MAP0]]()[%[[M]]]
 //   CHECK-DAG:   %[[TILED_K:.+]] = affine.apply #[[$MAP1]]()[%[[K]]]
-//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x8x2xi8>>{%[[TILED_M]], %[[TILED_K]]}
 //       CHECK:   %[[TILED_N:.+]] = affine.apply #[[$MAP0]]()[%[[N]]]
-//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x8x2xi8>>{%[[TILED_N]], %[[TILED_K]]}
-//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //  CHECK-SAME:       !flow.dispatch.tensor<readwrite:tensor<?x?x8x8xi32>>{%[[TILED_M]], %[[TILED_N]]}
 //       CHECK:   %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]]
 //  CHECK-SAME:       offsets = [0, 0, 0, 0], sizes = [%[[TILED_M]], %[[TILED_K]], 8, 2], strides = [1, 1, 1, 1]
@@ -1952,12 +1898,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -1972,11 +1916,11 @@
   %M = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %N = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %K = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_lhs>>{%M, %K}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_rhs>>{%K, %N}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readwrite:tensor<?x?xi32, #encoding_result>>{%M, %N}
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [%M, %K], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_lhs>>{%M, %K}
@@ -2006,12 +1950,12 @@
 //   CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2)
 //   CHECK-DAG:   %[[TILED_M:.+]] = affine.apply #[[$MAP0]]()[%[[M]]]
 //   CHECK-DAG:   %[[TILED_K:.+]] = affine.apply #[[$MAP1]]()[%[[K]]]
-//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x16x2xi8>>{%[[TILED_M]], %[[TILED_K]]}
 //       CHECK:   %[[TILED_N:.+]] = affine.apply #[[$MAP0]]()[%[[N]]]
-//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x16x2xi8>>{%[[TILED_N]], %[[TILED_K]]}
-//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //  CHECK-SAME:       !flow.dispatch.tensor<readwrite:tensor<?x?x16x16xi32>>{%[[TILED_M]], %[[TILED_N]]}
 //       CHECK:   %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]]
 //  CHECK-SAME:       offsets = [0, 0, 0, 0], sizes = [%[[TILED_M]], %[[TILED_K]], 16, 2], strides = [1, 1, 1, 1]
@@ -2027,12 +1971,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -2047,11 +1989,11 @@
   %M = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %N = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %K = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_lhs>>{%M, %K}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_rhs>>{%K, %N}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readwrite:tensor<?x?xi32, #encoding_result>>{%M, %N}
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [%M, %K], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_lhs>>{%M, %K}
@@ -2081,12 +2023,12 @@
 //   CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2)
 //   CHECK-DAG:   %[[TILED_M:.+]] = affine.apply #[[$MAP0]]()[%[[M]]]
 //   CHECK-DAG:   %[[TILED_K:.+]] = affine.apply #[[$MAP1]]()[%[[K]]]
-//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x16x2xi8>>{%[[TILED_M]], %[[TILED_K]]}
 //       CHECK:   %[[TILED_N:.+]] = affine.apply #[[$MAP0]]()[%[[N]]]
-//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x16x2xi8>>{%[[TILED_N]], %[[TILED_K]]}
-//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //  CHECK-SAME:       !flow.dispatch.tensor<readwrite:tensor<?x?x16x16xi32>>{%[[TILED_M]], %[[TILED_N]]}
 //       CHECK:   %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]]
 //  CHECK-SAME:       offsets = [0, 0, 0, 0], sizes = [%[[TILED_M]], %[[TILED_K]], 16, 2], strides = [1, 1, 1, 1]
@@ -2160,12 +2102,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -2180,11 +2120,11 @@
   %M = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %N = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %K = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xi16, #encoding_lhs>>{%M, %K}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xi16, #encoding_rhs>>{%K, %N}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readwrite:tensor<?x?xi32, #encoding_result>>{%M, %N}
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [%M, %K], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xi16, #encoding_lhs>>{%M, %K}
@@ -2214,12 +2154,12 @@
 //   CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2)
 //   CHECK-DAG:   %[[TILED_M:.+]] = affine.apply #[[$MAP0]]()[%[[M]]]
 //   CHECK-DAG:   %[[TILED_K:.+]] = affine.apply #[[$MAP1]]()[%[[K]]]
-//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//       CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x8x2xi16>>{%[[TILED_M]], %[[TILED_K]]}
 //       CHECK:   %[[TILED_N:.+]] = affine.apply #[[$MAP0]]()[%[[N]]]
-//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//       CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x8x2xi16>>{%[[TILED_N]], %[[TILED_K]]}
-//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//       CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //  CHECK-SAME:       !flow.dispatch.tensor<readwrite:tensor<?x?x8x8xi32>>{%[[TILED_M]], %[[TILED_N]]}
 //       CHECK:   %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]]
 //  CHECK-SAME:       offsets = [0, 0, 0, 0], sizes = [%[[TILED_M]], %[[TILED_K]], 8, 2], strides = [1, 1, 1, 1]
@@ -2235,12 +2175,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -2255,11 +2193,11 @@
   %M = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %N = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %K = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %lhs_binding = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %lhs_binding = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xi16, #encoding_lhs>>{%M, %K}
-  %rhs_binding = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %rhs_binding = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xi4, #encoding_rhs>>{%K, %N}
-  %out_binding = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+  %out_binding = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readwrite:tensor<?x?xi32, #encoding_result>>{%M, %N}
   %lhs = flow.dispatch.tensor.load %lhs_binding, offsets = [0, 0], sizes = [%M, %K], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xi16, #encoding_lhs>>{%M, %K}
@@ -2297,9 +2235,9 @@
 //   CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2) : index
 //   CHECK-DAG:   %[[K_CEILDIV_8:.+]] = affine.apply #[[$MAP_CEILDIV_8]]()[%[[K]]]
 //   CHECK-DAG:   %[[N_CEILDIV_32:.+]] = affine.apply #[[$MAP_CEILDIV_32]]()[%[[N]]]
-//   CHECK-DAG:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) {{.*}} : !flow.dispatch.tensor<readonly:tensor<?x?x1x8xi16>>{%[[M]], %[[K_CEILDIV_8]]}
-//   CHECK-DAG:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) {{.*}} : !flow.dispatch.tensor<readonly:tensor<?x?x32x8xi4>>{%[[N_CEILDIV_32]], %[[K_CEILDIV_8]]}
-//   CHECK-DAG:   %[[OUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) {{.*}} : !flow.dispatch.tensor<readwrite:tensor<?x?x1x32xi32>>{%[[M]], %[[N_CEILDIV_32]]}
+//   CHECK-DAG:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) {{.*}} : !flow.dispatch.tensor<readonly:tensor<?x?x1x8xi16>>{%[[M]], %[[K_CEILDIV_8]]}
+//   CHECK-DAG:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) {{.*}} : !flow.dispatch.tensor<readonly:tensor<?x?x32x8xi4>>{%[[N_CEILDIV_32]], %[[K_CEILDIV_8]]}
+//   CHECK-DAG:   %[[OUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2) {{.*}} : !flow.dispatch.tensor<readwrite:tensor<?x?x1x32xi32>>{%[[M]], %[[N_CEILDIV_32]]}
 //   CHECK-DAG:   %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]], offsets = [0, 0, 0, 0], sizes = [%[[M]], %[[K_CEILDIV_8]], 1, 8], {{.*}} -> tensor<?x?x1x8xi16>
 //   CHECK-DAG:   %[[RHS:.+]] = flow.dispatch.tensor.load %[[RHS_BINDING]], offsets = [0, 0, 0, 0], sizes = [%[[N_CEILDIV_32]], %[[K_CEILDIV_8]], 32, 8], {{.*}} -> tensor<?x?x32x8xi4>
 //   CHECK-DAG:   %[[OUT:.+]] = flow.dispatch.tensor.load %[[OUT_BINDING]], offsets = [0, 0, 0, 0], sizes = [%[[M]], %[[N_CEILDIV_32]], 1, 32], {{.*}} -> tensor<?x?x1x32xi32>
@@ -2792,13 +2730,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #encoding = #iree_encoding.encoding<operand_index = 0 : index, op_type = matmul, element_types = [f32, f32, f32], original_type = tensor<2x128x64xf32>, user_indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d3, d2)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2)>], bcast_map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>, round_dims_to = array<i64: 16, 16, 16>>
 #encoding_bcast = #iree_encoding.encoding<operand_index = 0 : index, op_type = matmul, element_types = [f32, f32, f32], original_type = tensor<2x128x64xf32>, user_indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d3, d2)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2)>], bcast_map = affine_map<(d0, d1, d2) -> (d0, d2)>, round_dims_to = array<i64: 16, 16, 16>>
@@ -2807,10 +2743,10 @@
 } {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x128x64xi8, #encoding>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x64xf32, #encoding_bcast>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x64xf32, #encoding_bcast>>
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x128x64xf32, #encoding>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x128x64xi8, #encoding>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x64xf32, #encoding_bcast>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x64xf32, #encoding_bcast>>
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x128x64xf32, #encoding>>
   %7 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [2, 128, 64], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2x128x64xi8, #encoding>> -> tensor<2x128x64xi8, #encoding>
   %8 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [2, 64], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2x64xf32, #encoding_bcast>> -> tensor<2x64xf32, #encoding_bcast>
   %9 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [2, 64], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2x64xf32, #encoding_bcast>> -> tensor<2x64xf32, #encoding_bcast>
@@ -2851,11 +2787,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #encoding = #iree_encoding.encoding<operand_index = 0 : index, op_type = matmul, element_types = [f32, f32, f32], original_type = tensor<2x128x64xf32>, user_indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d3, d2)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2)>], bcast_map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>, round_dims_to = array<i64: 16, 16, 16>>
 #encoding_bcast = #iree_encoding.encoding<operand_index = 0 : index, op_type = matmul, element_types = [f32, f32, f32], original_type = tensor<2x128x64xf32>, user_indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d3, d2)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2)>], bcast_map = affine_map<(d0, d1, d2) -> (d1, d2)>, round_dims_to = array<i64: 16, 16, 16>>
@@ -2864,8 +2798,8 @@
 } {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128x64xf32, #encoding_bcast>>
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x128x64xf32, #encoding>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128x64xf32, #encoding_bcast>>
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x128x64xf32, #encoding>>
   %8 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [128, 64], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<128x64xf32, #encoding_bcast>> -> tensor<128x64xf32, #encoding_bcast>
   %13 = tensor.empty() : tensor<2x128x64xf32, #encoding>
   %14 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>], iterator_types = ["parallel", "parallel", "parallel"]} ins(%8 : tensor<128x64xf32, #encoding_bcast>) outs(%13 : tensor<2x128x64xf32, #encoding>) {
@@ -2892,11 +2826,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #encoding = #iree_encoding.encoding<operand_index = 0 : index, op_type = matmul, element_types = [f32, f32, f32], original_type = tensor<2x128x64xf32>, user_indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d3, d2)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2)>], bcast_map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>, round_dims_to = array<i64: 16, 16, 16>>
 #encoding_bcast = #iree_encoding.encoding<operand_index = 0 : index, op_type = matmul, element_types = [f32, f32, f32], original_type = tensor<2x128x64xf32>, user_indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d3, d2)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2)>], bcast_map = affine_map<(d0, d1, d2) -> (d0, d1)>, round_dims_to = array<i64: 16, 16, 16>>
@@ -2905,8 +2837,8 @@
 } {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x128xf32, #encoding_bcast>>
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x128x64xf32, #encoding>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x128xf32, #encoding_bcast>>
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x128x64xf32, #encoding>>
   %8 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [2, 128], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2x128xf32, #encoding_bcast>> -> tensor<2x128xf32, #encoding_bcast>
   %13 = tensor.empty() : tensor<2x128x64xf32, #encoding>
   %14 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>], iterator_types = ["parallel", "parallel", "parallel"]} ins(%8 : tensor<2x128xf32, #encoding_bcast>) outs(%13 : tensor<2x128x64xf32, #encoding>) {
@@ -2933,11 +2865,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #encoding = #iree_encoding.encoding<operand_index = 1 : index, op_type = matmul, element_types = [f32, f32, f32], original_type = tensor<2x128x64xf32>, user_indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d3, d2)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2)>], bcast_map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>, round_dims_to = array<i64: 16, 16, 16>>
 #encoding_bcast = #iree_encoding.encoding<operand_index = 1 : index, op_type = matmul, element_types = [f32, f32, f32], original_type = tensor<2x128x64xf32>, user_indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d3, d2)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2)>], bcast_map = affine_map<(d0, d1, d2) -> (d0, d2)>, round_dims_to = array<i64: 16, 16, 16>>
@@ -2946,8 +2876,8 @@
 } {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x64xf32, #encoding_bcast>>
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x128x64xf32, #encoding>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x64xf32, #encoding_bcast>>
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x128x64xf32, #encoding>>
   %8 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [2, 64], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2x64xf32, #encoding_bcast>> -> tensor<2x64xf32, #encoding_bcast>
   %13 = tensor.empty() : tensor<2x128x64xf32, #encoding>
   %14 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>], iterator_types = ["parallel", "parallel", "parallel"]} ins(%8 : tensor<2x64xf32, #encoding_bcast>) outs(%13 : tensor<2x128x64xf32, #encoding>) {
@@ -2974,11 +2904,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #encoding = #iree_encoding.encoding<operand_index = 0 : index, op_type = matmul, element_types = [f32, f32, f32], original_type = tensor<2x128x64xf32>, user_indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d3, d2)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2)>], bcast_map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>, round_dims_to = array<i64: 16, 16, 16>>
 #encoding_bcast = #iree_encoding.encoding<operand_index = 0 : index, op_type = matmul, element_types = [f32, f32, f32], original_type = tensor<2x128x64xf32>, user_indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d3, d2)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2)>], bcast_map = affine_map<(d0, d1, d2) -> (d0, d2)>, round_dims_to = array<i64: 16, 16, 16>>
@@ -2987,8 +2915,8 @@
 } {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x64xf32, #encoding_bcast>>
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x128x64xf32, #encoding>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x64xf32, #encoding_bcast>>
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x128x64xf32, #encoding>>
   %8 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [2, 64], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2x64xf32, #encoding_bcast>> -> tensor<2x64xf32, #encoding_bcast>
   %13 = tensor.empty() : tensor<2x128x64xf32, #encoding>
   %14 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>], iterator_types = ["parallel", "parallel", "parallel"]} ins(%8 : tensor<2x64xf32, #encoding_bcast>) outs(%13 : tensor<2x128x64xf32, #encoding>) {
diff --git a/compiler/src/iree/compiler/Codegen/Common/CPU/test/vmvx_materialize_encoding.mlir b/compiler/src/iree/compiler/Codegen/Common/CPU/test/vmvx_materialize_encoding.mlir
index 0464a42..10ceeaa 100644
--- a/compiler/src/iree/compiler/Codegen/Common/CPU/test/vmvx_materialize_encoding.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/CPU/test/vmvx_materialize_encoding.mlir
@@ -1,11 +1,9 @@
 // RUN: iree-opt --pass-pipeline="builtin.module(func.func(iree-codegen-cpu-materialize-device-encoding),canonicalize,cse)" --split-input-file %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -20,11 +18,11 @@
   %M = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %N = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %K = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_lhs>>{%M, %K}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_rhs>>{%K, %N}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readwrite:tensor<?x?xi32, #encoding_result>>{%M, %N}
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [%M, %K], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xi8, #encoding_lhs>>{%M, %K}
@@ -55,17 +53,17 @@
 //      CHECK:   %[[LHS_TILE_SIZES:.+]]:2 = iree_codegen.query_tile_sizes tensor<?x?xi8, #iree_encoding.encoding<operand_index = 0 : i64, op_type = matmul, element_types = [i8, i8, i32], user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>> -> index, index
 //  CHECK-DAG:   %[[LHS_OUTER_SIZE0:.+]] = affine.apply #[[MAP_CEILDIV]]()[%[[M]], %[[LHS_TILE_SIZES]]#0]
 //  CHECK-DAG:   %[[LHS_OUTER_SIZE1:.+]] = affine.apply #[[MAP_CEILDIV]]()[%[[K]], %[[LHS_TILE_SIZES]]#1]
-//      CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//      CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 // CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x?x?xi8>>{%[[LHS_OUTER_SIZE0]], %[[LHS_OUTER_SIZE1]], %[[LHS_TILE_SIZES]]#0, %[[LHS_TILE_SIZES]]#1}
 //      CHECK:   %[[RHS_TILE_SIZES:.+]]:2 = iree_codegen.query_tile_sizes tensor<?x?xi8, #iree_encoding.encoding<operand_index = 1 : i64, op_type = matmul, element_types = [i8, i8, i32], user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>> -> index, index
 //  CHECK-DAG:   %[[RHS_OUTER_SIZE0:.+]] = affine.apply #[[MAP_CEILDIV]]()[%[[N]], %[[RHS_TILE_SIZES]]#0]
 //  CHECK-DAG:   %[[RHS_OUTER_SIZE1:.+]] = affine.apply #[[MAP_CEILDIV]]()[%[[K]], %[[RHS_TILE_SIZES]]#1]
-//      CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//      CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 // CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x?x?xi8>>{%[[RHS_OUTER_SIZE0]], %[[RHS_OUTER_SIZE1]], %[[RHS_TILE_SIZES]]#0, %[[RHS_TILE_SIZES]]#1}
 //      CHECK:   %[[RESULT_TILE_SIZES:.+]]:2 = iree_codegen.query_tile_sizes tensor<?x?xi32, #iree_encoding.encoding<operand_index = 2 : i64, op_type = matmul, element_types = [i8, i8, i32], user_indexing_maps = [#map, #map1, #map2], round_dims_to = array<i64: 16, 16, 16>>> -> index, index
 //  CHECK-DAG:   %[[RESULT_OUTER_SIZE0:.+]] = affine.apply #[[MAP_CEILDIV]]()[%[[M]], %[[RESULT_TILE_SIZES]]#0]
 //  CHECK-DAG:   %[[RESULT_OUTER_SIZE1:.+]] = affine.apply #[[MAP_CEILDIV]]()[%[[N]], %[[RESULT_TILE_SIZES]]#1]
-//      CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//      CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 // CHECK-SAME:       !flow.dispatch.tensor<readwrite:tensor<?x?x?x?xi32>>{%[[RESULT_OUTER_SIZE0]], %[[RESULT_OUTER_SIZE1]], %[[RESULT_TILE_SIZES]]#0, %[[RESULT_TILE_SIZES]]#1}
 //      CHECK:   %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]]
 // CHECK-SAME:       offsets = [0, 0, 0, 0], sizes = [%[[LHS_OUTER_SIZE0]], %[[LHS_OUTER_SIZE1]], %[[LHS_TILE_SIZES]]#0, %[[LHS_TILE_SIZES]]#1], strides = [1, 1, 1, 1]
@@ -81,12 +79,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<()[s0] -> ((3 ceildiv s0) * s0)>
 #map1 = affine_map<()[s0] -> ((1 ceildiv s0) * s0)>
@@ -102,9 +98,9 @@
   %c32_i64 = arith.constant 32 : i64
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x2xf32, #encoding_lhs>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x3xf32, #encoding_rhs>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x3xf32, #encoding_result>>{%arg4, %arg5}
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x2xf32, #encoding_lhs>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x3xf32, #encoding_rhs>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x3xf32, #encoding_result>>{%arg4, %arg5}
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1, 2], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1x2xf32, #encoding_lhs>> -> tensor<1x2xf32, #encoding_lhs>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [2, 3], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2x3xf32, #encoding_rhs>> -> tensor<2x3xf32, #encoding_rhs>
   %7 = tensor.empty() : tensor<1x3xf32, #encoding_result>
@@ -115,11 +111,11 @@
 }
 //      CHECK: func.func @fill_matmul
 //  CHECK-DAG:   %[[ZERO:.+]] = arith.constant 0.000000e+00 : f32
-//      CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//      CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 // CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<1x1x8x4xf32>>
-//      CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//      CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 // CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<1x1x8x4xf32>>
-//      CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//      CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 // CHECK-SAME:       !flow.dispatch.tensor<writeonly:tensor<1x1x8x8xf32>>
 //      CHECK:   %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]]
 // CHECK-SAME:       offsets = [0, 0, 0, 0], sizes = [1, 1, 8, 4], strides = [1, 1, 1, 1]
@@ -137,11 +133,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -153,9 +147,9 @@
   %c0 = arith.constant 0 : index
   %d0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %d1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%d0, %d1}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<writeonly:tensor<?x?xf32, #encoding_lhs>>{%d0, %d1}
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [%d0, %d1], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%d0, %d1} -> tensor<?x?xf32>
@@ -172,10 +166,10 @@
 //   CHECK-DAG:   %[[CST:.+]] = arith.constant 0.0
 //   CHECK-DAG:   %[[D0:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(0)
 //   CHECK-DAG:   %[[D1:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(1)
-//       CHECK:   %[[INPUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//       CHECK:   %[[INPUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //   CHECK-DAG:   %[[TILED_D0:.+]] = affine.apply #[[MAP0]]()[%[[D0]]]
 //   CHECK-DAG:   %[[TILED_D1:.+]] = affine.apply #[[MAP1]]()[%[[D1]]]
-//   CHECK-DAG:   %[[OUTPUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//   CHECK-DAG:   %[[OUTPUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //  CHECK-SAME:       !flow.dispatch.tensor<writeonly:tensor<?x?x8x4xf32>>{%[[TILED_D0]], %[[TILED_D1]]}
 //       CHECK:   %[[INPUT:.+]] = flow.dispatch.tensor.load %[[INPUT_BINDING]]
 //       CHECK:   %[[EMPTY:.+]] = tensor.empty
@@ -187,11 +181,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -204,9 +196,9 @@
   %cst = arith.constant 0.000000e+00 : f32
   %d0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %d1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xf32, #encoding_lhs>>{%d0, %d1}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%d0, %d1}
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [%d0, %d1], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xf32, #encoding_lhs>>{%d0, %d1}
@@ -226,9 +218,9 @@
 //   CHECK-DAG:   %[[D1:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(1)
 //   CHECK-DAG:   %[[TILED_D0:.+]] = affine.apply #[[MAP0]]()[%[[D0]]]
 //   CHECK-DAG:   %[[TILED_D1:.+]] = affine.apply #[[MAP1]]()[%[[D1]]]
-//   CHECK-DAG:   %[[INPUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//   CHECK-DAG:   %[[INPUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x8x4xf32>>{%[[TILED_D0]], %[[TILED_D1]]}
-//   CHECK-DAG:   %[[OUTPUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//   CHECK-DAG:   %[[OUTPUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //       CHECK:   %[[INPUT:.+]] = flow.dispatch.tensor.load %[[INPUT_BINDING]]
 //  CHECK-SAME:       offsets = [0, 0, 0, 0], sizes = [%[[TILED_D0]], %[[TILED_D1]], 8, 4], strides = [1, 1, 1, 1]
 //       CHECK:   %[[EMPTY:.+]] = tensor.empty(%[[D0]], %[[D1]])
@@ -238,12 +230,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -258,11 +248,11 @@
   %M = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %N = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %K = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xf32, #encoding_lhs>>{%M, %K}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xf32, #encoding_rhs>>{%K, %N}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readwrite:tensor<?x?xf32, #encoding_result>>{%M, %N}
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [%M, %K], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xf32, #encoding_lhs>>{%M, %K}
@@ -292,12 +282,12 @@
 //  CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2)
 //  CHECK-DAG:   %[[TILED_M:.+]] = affine.apply #[[MAP0]]()[%[[M]]]
 //  CHECK-DAG:   %[[TILED_K:.+]] = affine.apply #[[MAP1]]()[%[[K]]]
-//      CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//      CHECK:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 // CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x8x4xf32>>{%[[TILED_M]], %[[TILED_K]]}
 //      CHECK:   %[[TILED_N:.+]] = affine.apply #[[MAP0]]()[%[[N]]]
-//      CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//      CHECK:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 // CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<?x?x8x4xf32>>{%[[TILED_N]], %[[TILED_K]]}
-//      CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//      CHECK:   %[[OUTS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 // CHECK-SAME:       !flow.dispatch.tensor<readwrite:tensor<?x?x8x8xf32>>{%[[TILED_M]], %[[TILED_N]]}
 //      CHECK:   %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]]
 // CHECK-SAME:       offsets = [0, 0, 0, 0], sizes = [%[[TILED_M]], %[[TILED_K]], 8, 4], strides = [1, 1, 1, 1]
diff --git a/compiler/src/iree/compiler/Codegen/Common/ConvertBf16ToUInt16Buffers.cpp b/compiler/src/iree/compiler/Codegen/Common/ConvertBf16ToUInt16Buffers.cpp
index 5f25384..bc9420f 100644
--- a/compiler/src/iree/compiler/Codegen/Common/ConvertBf16ToUInt16Buffers.cpp
+++ b/compiler/src/iree/compiler/Codegen/Common/ConvertBf16ToUInt16Buffers.cpp
@@ -92,10 +92,9 @@
 
     auto newOp =
         rewriter.replaceOpWithNewOp<IREE::HAL::InterfaceBindingSubspanOp>(
-            op, newResultTy, adaptor.getLayout(), adaptor.getSet(),
-            adaptor.getBinding(), adaptor.getByteOffset(),
-            adaptor.getDynamicDims(), adaptor.getAlignmentAttr(),
-            adaptor.getDescriptorFlagsAttr());
+            op, newResultTy, adaptor.getLayout(), adaptor.getBinding(),
+            adaptor.getByteOffset(), adaptor.getDynamicDims(),
+            adaptor.getAlignmentAttr(), adaptor.getDescriptorFlagsAttr());
     LLVM_DEBUG(llvm::dbgs() << "Bf16Emulation: new op: " << newOp << "\n");
     (void)newOp;
     return success();
diff --git a/compiler/src/iree/compiler/Codegen/Common/EmulateNarrowType.cpp b/compiler/src/iree/compiler/Codegen/Common/EmulateNarrowType.cpp
index e2c8805..772faf4 100644
--- a/compiler/src/iree/compiler/Codegen/Common/EmulateNarrowType.cpp
+++ b/compiler/src/iree/compiler/Codegen/Common/EmulateNarrowType.cpp
@@ -81,9 +81,9 @@
     }
 
     rewriter.replaceOpWithNewOp<IREE::HAL::InterfaceBindingSubspanOp>(
-        op, newResultType, adaptor.getLayout(), adaptor.getSet(),
-        adaptor.getBinding(), byteOffset, dynamicLinearizedSize,
-        adaptor.getAlignmentAttr(), adaptor.getDescriptorFlagsAttr());
+        op, newResultType, adaptor.getLayout(), adaptor.getBinding(),
+        byteOffset, dynamicLinearizedSize, adaptor.getAlignmentAttr(),
+        adaptor.getDescriptorFlagsAttr());
     return success();
   }
 };
diff --git a/compiler/src/iree/compiler/Codegen/Common/FlattenMemRefSubspanPass.cpp b/compiler/src/iree/compiler/Codegen/Common/FlattenMemRefSubspanPass.cpp
index 2f7a39e..aa9e124 100644
--- a/compiler/src/iree/compiler/Codegen/Common/FlattenMemRefSubspanPass.cpp
+++ b/compiler/src/iree/compiler/Codegen/Common/FlattenMemRefSubspanPass.cpp
@@ -285,7 +285,7 @@
 
     auto newOffset = rewriter.create<arith::ConstantIndexOp>(loc, 0);
     auto newOp = rewriter.create<IREE::HAL::InterfaceBindingSubspanOp>(
-        subspanOp.getLoc(), newType, subspanOp.getLayout(), subspanOp.getSet(),
+        subspanOp.getLoc(), newType, subspanOp.getLayout(),
         subspanOp.getBinding(), newOffset, dynamicShape,
         subspanOp.getAlignmentAttr(), subspanOp.getDescriptorFlagsAttr());
 
diff --git a/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_apply_tiling_level.mlir b/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_apply_tiling_level.mlir
index 7c4cd2f..5bdfb14 100644
--- a/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_apply_tiling_level.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_apply_tiling_level.mlir
@@ -268,13 +268,6 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
-]>
 #config = #iree_gpu.derived_thread_config
 module {
   func.func @inferred_im2col(%2: tensor<2x34x34x128xf16>, %3: tensor<2x128x8xf16>) -> tensor<2x128x8xf16>
diff --git a/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_create_fast_slow_path.mlir b/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_create_fast_slow_path.mlir
index 126c6f5..40ac0b7 100644
--- a/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_create_fast_slow_path.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_create_fast_slow_path.mlir
@@ -1,22 +1,20 @@
 // RUN: iree-opt --split-input-file --pass-pipeline="builtin.module(func.func(iree-codegen-gpu-create-fast-slow-path))" --mlir-print-local-scope %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @padded_conv() {
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
   %c32 = arith.constant 32 : index
   %c112 = arith.constant 112 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x224x224x3xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x3x3x32xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x112x112x32xf32>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x32xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x224x224x3xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x3x3x32xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x112x112x32xf32>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x32xf32>>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
diff --git a/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_distribute.mlir b/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_distribute.mlir
index 0831e5b..7a3ab28 100644
--- a/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_distribute.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_distribute.mlir
@@ -1,11 +1,9 @@
 // RUN: iree-opt --pass-pipeline="builtin.module(func.func(iree-codegen-gpu-distribute, cse))" %s --split-input-file | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<()[s0] -> (s0 * 256)>
 #map1 = affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>
@@ -15,11 +13,11 @@
   %cst = arith.constant 0.000000e+00 : f32
   %c64 = arith.constant 64 : index
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<233x1024xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<233x1024xf32>
   memref.assume_alignment %0, 64 : memref<233x1024xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<233x1024xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<233x1024xf32>
   memref.assume_alignment %1, 64 : memref<233x1024xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<233x1024xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<233x1024xf32>
   memref.assume_alignment %2, 64 : memref<233x1024xf32>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -51,12 +49,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<()[s0] -> (s0 * 256)>
 #map1 = affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>
@@ -66,11 +62,11 @@
   %cst = arith.constant 0.000000e+00 : f32
   %c64 = arith.constant 64 : index
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<233x1024xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<233x1024xf32>
   memref.assume_alignment %0, 64 : memref<233x1024xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<233x1024xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<233x1024xf32>
   memref.assume_alignment %1, 64 : memref<233x1024xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<233x1024xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<233x1024xf32>
   memref.assume_alignment %2, 64 : memref<233x1024xf32>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
diff --git a/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_pipeline.mlir b/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_pipeline.mlir
index 3535a86..b483dda 100644
--- a/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_pipeline.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_pipeline.mlir
@@ -2,12 +2,10 @@
 // RUN: iree-opt --pass-pipeline="builtin.module(func.func(iree-codegen-gpu-pipelining{epilogue-peeling=false}))" --split-input-file %s | FileCheck %s
 // RUN: iree-opt --pass-pipeline="builtin.module(func.func(iree-codegen-gpu-pipelining{pipeline-depth=3 schedule-index=2 epilogue-peeling=false}))" --split-input-file %s | FileCheck -check-prefix=CHECK-NV %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @_matmul_f16_f16_dispatch_0_fill_3456x1024() {
   %c2048 = arith.constant 2048 : index
@@ -21,11 +19,11 @@
   %3 = gpu.thread_id  z
   %4 = memref.alloc() : memref<4x32x40xf16, 3>
   %5 = memref.alloc() : memref<4x32x40xf16, 3>
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<3456x2048xf16>
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<3456x2048xf16>
   memref.assume_alignment %6, 64 : memref<3456x2048xf16>
-  %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<2048x1024xf16>
+  %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<2048x1024xf16>
   memref.assume_alignment %7, 64 : memref<2048x1024xf16>
-  %8 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<3456x1024xf16>
+  %8 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<3456x1024xf16>
   memref.assume_alignment %8, 64 : memref<3456x1024xf16>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -63,12 +61,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @nvidia_tenscore_schedule_f16() {
   %c3 = arith.constant 3 : index
@@ -86,11 +82,11 @@
   %alloc = memref.alloc() : memref<128x256xf16, #gpu.address_space<workgroup>>
   %alloc_1 = memref.alloc() : memref<3x128x32xf16, #gpu.address_space<workgroup>>
   %alloc_2 = memref.alloc() : memref<3x32x256xf16, #gpu.address_space<workgroup>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<512x1280xf16>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<512x1280xf16>
   memref.assume_alignment %3, 64 : memref<512x1280xf16>
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : memref<1280x1280xf16>
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : memref<1280x1280xf16>
   memref.assume_alignment %4, 64 : memref<1280x1280xf16>
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<512x1280xf16>
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<512x1280xf16>
   memref.assume_alignment %5, 64 : memref<512x1280xf16>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -517,12 +513,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @nvidia_tenscore_schedule_f32() {
   %c31 = arith.constant 31 : index
@@ -540,11 +534,11 @@
   %alloc = memref.alloc() : memref<128x128xf32, #gpu.address_space<workgroup>>
   %alloc_2 = memref.alloc() : memref<3x128x32xf32, #gpu.address_space<workgroup>>
   %alloc_3 = memref.alloc() : memref<3x32x128xf32, #gpu.address_space<workgroup>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<256x256xf32>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<256x256xf32>
   memref.assume_alignment %3, 64 : memref<256x256xf32>
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : memref<256x256xf32>
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : memref<256x256xf32>
   memref.assume_alignment %4, 64 : memref<256x256xf32>
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<256x256xf32>
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<256x256xf32>
   memref.assume_alignment %5, 64 : memref<256x256xf32>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
diff --git a/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_reorder_workgroups.mlir b/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_reorder_workgroups.mlir
index 9377136..7a0334c 100644
--- a/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_reorder_workgroups.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_reorder_workgroups.mlir
@@ -4,20 +4,18 @@
 // RUN: iree-opt --pass-pipeline="builtin.module(func.func(iree-codegen-reorder-workgroups{strategy=transpose}))" \
 // RUN:   --split-input-file %s | FileCheck --check-prefix=TRANSPOSE %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul() {
   %c0 = arith.constant 0 : index
   %c128 = arith.constant 128 : index
   %c96 = arith.constant 96 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x4096xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4096x96xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x96xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x4096xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4096x96xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x96xf32>>
   %3 = tensor.empty() : tensor<128x96xf32>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
diff --git a/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_reorder_workgroups_static.mlir b/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_reorder_workgroups_static.mlir
index 04ffb5b..a07b26f 100644
--- a/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_reorder_workgroups_static.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_reorder_workgroups_static.mlir
@@ -33,11 +33,9 @@
 // TRANSPOSE-DAG:               affine.apply #{{.+}}()[%[[REM]]]
 // TRANSPOSE:                   return
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @main_dispatch_0 {
 hal.executable.variant public @rocm_hsaco_fb target(<"rocm", "rocm-hsaco-fb">) {
@@ -54,8 +52,8 @@
       %c64 = arith.constant 64 : index
       %cst = arith.constant 0.000000e+00 : f16
       %c0 = arith.constant 0 : index
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32000x4096xf16>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32000x32000xf16>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32000x4096xf16>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32000x32000xf16>>
       %workgroup_id_x = hal.interface.workgroup.id[0] : index
       %workgroup_id_y = hal.interface.workgroup.id[1] : index
       %2 = affine.apply affine_map<()[s0] -> (s0 * 64)>()[%workgroup_id_y]
diff --git a/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_tensor_alloc.mlir b/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_tensor_alloc.mlir
index 527c083..7598316 100644
--- a/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_tensor_alloc.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_tensor_alloc.mlir
@@ -1,19 +1,17 @@
 // RUN: iree-opt %s --allow-unregistered-dialect --split-input-file --pass-pipeline="builtin.module(func.func(iree-codegen-gpu-tensor-tile-to-serial-loops,iree-codegen-gpu-tensor-alloc))" | FileCheck %s
 // RUN: iree-opt %s --allow-unregistered-dialect --split-input-file --pass-pipeline="builtin.module(func.func(iree-codegen-gpu-tensor-tile-to-serial-loops{coalesce-loops},iree-codegen-gpu-tensor-alloc))" | FileCheck %s --check-prefix=COALESCE_LOOPS
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_2048x512x1024() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2048x1024xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1024x512xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x512xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2048x1024xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1024x512xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x512xf32>>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
   %3 = affine.apply affine_map<()[s0] -> (s0 * 32)>()[%workgroup_id_y]
@@ -38,19 +36,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_1x384x384() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x384xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<384x384xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x384xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x384xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<384x384xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x384xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1, 384], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1x384xf32>> -> tensor<1x384xf32>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %4 = affine.apply affine_map<()[s0] -> (s0 * 128)>()[%workgroup_id_x]
@@ -68,19 +64,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_multi_uses() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2048x1024xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1024x512xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x512xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2048x1024xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1024x512xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x512xf32>>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
   %3 = affine.apply affine_map<()[s0] -> (s0 * 32)>()[%workgroup_id_y]
@@ -107,12 +101,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_33x33x903168_f32() {
   %c0 = arith.constant 0 : index
@@ -130,10 +122,10 @@
     %5 = arith.index_castui %2 {stream.alignment = 4096 : index, stream.values = [1240289280 : index, 1789415424 : index]} : i32 to index
     %6 = arith.index_castui %3 {stream.alignment = 8192 : index, stream.values = [633077760 : index, 752295936 : index]} : i32 to index
     %7 = arith.index_castui %4 {stream.alignment = 64 : index, stream.values = [1486349952 : index, 1486358464 : index]} : i32 to index
-    %8 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%5) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<33x903168xf32>>
-    %9 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%6) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<903168x33xf32>>
-    %10 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<33x33xf32>>
-    %11 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%7) : !flow.dispatch.tensor<writeonly:tensor<33x33xf32>>
+    %8 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%5) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<33x903168xf32>>
+    %9 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%6) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<903168x33xf32>>
+    %10 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<33x33xf32>>
+    %11 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%7) : !flow.dispatch.tensor<writeonly:tensor<33x33xf32>>
     %12 = affine.apply affine_map<()[s0] -> (s0 * 32)>()[%workgroup_id_x]
     %13 = flow.dispatch.tensor.load %11, offsets = [%12, 0], sizes = [32, 33], strides = [1, 1] : !flow.dispatch.tensor<writeonly:tensor<33x33xf32>> -> tensor<32x33xf32>
     %14 = flow.dispatch.tensor.load %9, offsets = [0, 0], sizes = [903168, 33], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<903168x33xf32>> -> tensor<903168x33xf32>
@@ -160,23 +152,21 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>,
-    #hal.descriptor_set.binding<4, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @weight_dequant_matmul() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<86x128x2048xi4>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<86x2048xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<86x2048xi4>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86x128xf32>>
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(4) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4096x2048xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<86x128x2048xi4>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<86x2048xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<86x2048xi4>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86x128xf32>>
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(4) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4096x2048xf32>>
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
   %5 = affine.apply affine_map<()[s0] -> (s0 * 32)>()[%workgroup_id_y]
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
@@ -235,19 +225,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @conv() attributes {translation_info = #iree_codegen.translation_info<LLVMGPUVectorDistribute workgroup_size = [256, 1, 1] subgroup_size = 64, {mma_schedule = #iree_gpu.mma_schedule<intrinsic = #iree_gpu.mma_layout<MFMA_F32_16x16x16_F16>, subgroup_m_count = 1, subgroup_n_count = 4>}>} {
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x34x34x1280xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<3x3x1280x1280xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x32x32x1280xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x34x34x1280xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<3x3x1280x1280xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x32x32x1280xf32>>
   %workgroup_id_z = hal.interface.workgroup.id[2] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
diff --git a/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_tensor_tile.mlir b/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_tensor_tile.mlir
index 364c25b..233c5b6 100644
--- a/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_tensor_tile.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_tensor_tile.mlir
@@ -1,11 +1,9 @@
 // RUN: iree-opt --split-input-file --pass-pipeline="builtin.module(func.func(iree-codegen-gpu-tensor-tile, cse))" %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[1, 256]]>
 #map = affine_map<()[s0] -> (s0 * 256)>
@@ -14,9 +12,9 @@
 module {
   func.func @add_tensor() attributes {translation_info = #translation} {
     %c0 = arith.constant 0 : index
-    %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<233x1024xf32>>
-    %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<233x1024xf32>>
-    %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<233x1024xf32>>
+    %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<233x1024xf32>>
+    %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<233x1024xf32>>
+    %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<233x1024xf32>>
     %workgroup_id_x = hal.interface.workgroup.id[0] : index
     %workgroup_id_y = hal.interface.workgroup.id[1] : index
     %3 = affine.apply #map()[%workgroup_id_x]
@@ -35,9 +33,9 @@
 
 //         CHECK: #[[$MAP:.*]] = affine_map<(d0) -> (d0 * 4)>
 //   CHECK-LABEL: func.func @add_tensor
-//     CHECK-DAG:   %[[A:.*]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//     CHECK-DAG:   %[[B:.*]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//     CHECK-DAG:   %[[C:.*]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//     CHECK-DAG:   %[[A:.*]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//     CHECK-DAG:   %[[B:.*]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//     CHECK-DAG:   %[[C:.*]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //     CHECK-DAG:   %[[LA:.*]] = flow.dispatch.tensor.load %[[A]]
 //     CHECK-DAG:   %[[LB:.*]] = flow.dispatch.tensor.load %[[B]]
 //     CHECK-DAG:   %[[LC:.*]] = flow.dispatch.tensor.load %[[C]]
@@ -58,11 +56,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[64, 4]]>
 #map = affine_map<()[s0] -> (s0 * 64)>
@@ -73,8 +69,8 @@
   func.func @reduction() attributes {translation_info = #translation} {
     %c0 = arith.constant 0 : index
     %cst = arith.constant 0.000000e+00 : f32
-    %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x384xf32>>
-    %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128xf32>>
+    %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x384xf32>>
+    %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128xf32>>
     %workgroup_id_x = hal.interface.workgroup.id[0] : index
     %2 = affine.apply #map()[%workgroup_id_x]
     %3 = flow.dispatch.tensor.load %1, offsets = [%2], sizes = [64], strides = [1] : !flow.dispatch.tensor<writeonly:tensor<128xf32>> -> tensor<64xf32>
@@ -116,11 +112,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[1, 64, 4, 4]]>
 #map = affine_map<()[s0] -> (s0 * 64)>
@@ -131,8 +125,8 @@
   func.func @reduction_broadcast() attributes {translation_info = #translation} {
     %c0 = arith.constant 0 : index
     %cst = arith.constant 0.000000e+00 : f32
-    %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x32x10x4096xf32>>
-    %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x32x10x4096xf32>>
+    %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x32x10x4096xf32>>
+    %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x32x10x4096xf32>>
     %workgroup_id_x = hal.interface.workgroup.id[0] : index
     %workgroup_id_y = hal.interface.workgroup.id[1] : index
     %2 = affine.apply #map()[%workgroup_id_x]
diff --git a/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_tile.mlir b/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_tile.mlir
index 9cc4a19..6b15d52 100644
--- a/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_tile.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_tile.mlir
@@ -1,10 +1,8 @@
 // RUN: iree-opt -split-input-file --pass-pipeline="builtin.module(func.func(iree-codegen-gpu-tile))" %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @innermost_reduction() {
   %c1 = arith.constant 1 : index
@@ -16,9 +14,9 @@
   %3 = arith.index_cast %0 {stream.alignment = 512 : index, stream.values = [0 : index, 394752 : index, 984064 : index]} : i32 to index
   %4 = arith.index_cast %1 {stream.alignment = 512 : index, stream.values = [0 : index, 196608 : index, 197120 : index]} : i32 to index
   %5 = arith.index_cast %2 {stream.alignment = 512 : index, stream.values = [512 : index, 197120 : index, 197632 : index]} : i32 to index
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%3) : !flow.dispatch.tensor<readonly:tensor<128x384xf32>>
-  %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%4) : !flow.dispatch.tensor<readonly:tensor<128xf32>>
-  %8 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%5) : !flow.dispatch.tensor<writeonly:tensor<128xf32>>
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%3) : !flow.dispatch.tensor<readonly:tensor<128x384xf32>>
+  %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%4) : !flow.dispatch.tensor<readonly:tensor<128xf32>>
+  %8 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%5) : !flow.dispatch.tensor<writeonly:tensor<128xf32>>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %9 = affine.apply affine_map<()[s0] -> (s0 * 128)>()[%workgroup_id_x]
@@ -61,11 +59,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @has_scf_if() {
   %c49152 = arith.constant 49152 : index
@@ -74,8 +70,8 @@
   %c1023_i32 = arith.constant 1023 : i32
   %c2_i32 = arith.constant 2 : i32
   %c0_i32 = arith.constant 0 : i32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<49152xi32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<49152xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<49152xi32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<49152xi32>>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %2 = affine.apply affine_map<()[s0] -> (s0 * 256)>()[%workgroup_id_x]
diff --git a/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_tile_reduction.mlir b/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_tile_reduction.mlir
index 2cea3e2..ee1cadd 100644
--- a/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_tile_reduction.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_tile_reduction.mlir
@@ -1,15 +1,13 @@
 // RUN: iree-opt --pass-pipeline="builtin.module(func.func(iree-codegen-gpu-tile-reduction),canonicalize,cse)" --split-input-file %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @warp_reduction_dispatch() {
   %cst = arith.constant 1.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<512x10240xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<512xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<512x10240xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<512xf32>>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %2 = flow.dispatch.tensor.load %1, offsets = [%workgroup_id_x], sizes = [1], strides = [1] : !flow.dispatch.tensor<writeonly:tensor<512xf32>> -> tensor<1xf32>
   %3 = flow.dispatch.tensor.load %0, offsets = [%workgroup_id_x, 0], sizes = [1, 10240], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<512x10240xf32>> -> tensor<1x10240xf32>
@@ -49,18 +47,16 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @warp_reduction_batch_matmul() {
   %cst = arith.constant 1.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<11x512x512xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<11x512x512xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<11x512x512xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<11x512x512xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<11x512x512xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<11x512x512xf32>>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
   %workgroup_id_z = hal.interface.workgroup.id[2] : index
@@ -95,16 +91,14 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @warp_reduction_broadcast_dispatch() {
   %cst = arith.constant 1.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<512x10240xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<512x10240xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<512x10240xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<512x10240xf32>>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %2 = flow.dispatch.tensor.load %1, offsets = [%workgroup_id_x, 0], sizes = [1, 10240], strides = [1, 1] : !flow.dispatch.tensor<writeonly:tensor<512x10240xf32>> -> tensor<1x10240xf32>
   %3 = flow.dispatch.tensor.load %0, offsets = [%workgroup_id_x, 0], sizes = [1, 10240], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<512x10240xf32>> -> tensor<1x10240xf32>
@@ -163,22 +157,20 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>,
-    #hal.descriptor_set.binding<4, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @warp_reduction_multi_reduction() {
   %cst = arith.constant 0.000000e+00 : f32
-  %10 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<4096x86x128xi4>>
-  %11 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<4096x86xf32>>
-  %12 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<4096x86xf32>>
-  %13 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<readonly:tensor<86x128xf32>>
-  %14 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(4) : !flow.dispatch.tensor<writeonly:tensor<4096xf32>>
+  %10 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<4096x86x128xi4>>
+  %11 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<4096x86xf32>>
+  %12 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<4096x86xf32>>
+  %13 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<readonly:tensor<86x128xf32>>
+  %14 = hal.interface.binding.subspan layout(#pipeline_layout) binding(4) : !flow.dispatch.tensor<writeonly:tensor<4096xf32>>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %15 = flow.dispatch.tensor.load %14, offsets = [%workgroup_id_x], sizes = [1], strides = [1] : !flow.dispatch.tensor<writeonly:tensor<4096xf32>> -> tensor<1xf32>
   %16 = flow.dispatch.tensor.load %13, offsets = [0, 0], sizes = [86, 128], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<86x128xf32>> -> tensor<86x128xf32>
diff --git a/compiler/src/iree/compiler/Codegen/Common/GPU/test/transform_gpu_workgroup_swizzle.mlir b/compiler/src/iree/compiler/Codegen/Common/GPU/test/transform_gpu_workgroup_swizzle.mlir
index bb3565f..a60528d 100644
--- a/compiler/src/iree/compiler/Codegen/Common/GPU/test/transform_gpu_workgroup_swizzle.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/GPU/test/transform_gpu_workgroup_swizzle.mlir
@@ -1,19 +1,17 @@
 // RUN: iree-opt %s --iree-transform-dialect-interpreter -transform-dialect-drop-schedule --split-input-file | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul() {
   %c0 = arith.constant 0 : index
   %c128 = arith.constant 128 : index
   %c96 = arith.constant 96 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x4096xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4096x96xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x96xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x4096xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4096x96xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x96xf32>>
   %3 = tensor.empty() : tensor<128x96xf32>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
diff --git a/compiler/src/iree/compiler/Codegen/Common/GPU/test/vector_reduction_to_gpu.mlir b/compiler/src/iree/compiler/Codegen/Common/GPU/test/vector_reduction_to_gpu.mlir
index a50b6ec..e365a2b 100644
--- a/compiler/src/iree/compiler/Codegen/Common/GPU/test/vector_reduction_to_gpu.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/GPU/test/vector_reduction_to_gpu.mlir
@@ -1,11 +1,9 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=sm_60 --pass-pipeline='builtin.module(func.func(iree-codegen-vector-reduction-to-gpu, cse))' %s | FileCheck %s
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=gfx940 --pass-pipeline='builtin.module(func.func(iree-codegen-vector-reduction-to-gpu, cse))' %s | FileCheck %s --check-prefix=CDNA3
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<()[s0, s1] -> (s1 * 2 + s0 floordiv 32)>
 #translation_info = #iree_codegen.translation_info<None workgroup_size = [32, 1, 1] subgroup_size = 32>
@@ -17,8 +15,8 @@
     %cst_1 = arith.constant dense<3.840000e+02> : vector<1xf32>
     %c32 = arith.constant 32 : index
     %c384 = arith.constant 384 : index
-    %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<128x384xf32>
-    %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<128xf32>
+    %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<128x384xf32>
+    %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<128xf32>
     %workgroup_id_x = hal.interface.workgroup.id[0] : index
     %thread_id_x = gpu.thread_id  x
     %2 = affine.apply #map()[%thread_id_x, %workgroup_id_x]
@@ -75,12 +73,10 @@
 
 // Make sure memref.load from uniform buffers are hoisted out as uniform code.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, uniform_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<uniform_buffer>
 ]>
 #translation_info = #iree_codegen.translation_info<None workgroup_size = [32, 1, 1] subgroup_size = 32>
 #map = affine_map<()[s0, s1] -> (s1 * 2 + s0 floordiv 32)>
@@ -93,14 +89,14 @@
     %cst_1 = arith.constant dense<3.840000e+02> : vector<1xf32>
     %c32 = arith.constant 32 : index
     %c384 = arith.constant 384 : index
-    %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) offset(%c0) : memref<1xvector<4xi32>, #hal.descriptor_type<uniform_buffer>>
+    %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) offset(%c0) : memref<1xvector<4xi32>, #hal.descriptor_type<uniform_buffer>>
     %1 = memref.load %0[%c0] : memref<1xvector<4xi32>, #hal.descriptor_type<uniform_buffer>>
     %2 = vector.extractelement %1[%c0 : index] : vector<4xi32>
     %3 = vector.extractelement %1[%c1 : index] : vector<4xi32>
     %4 = arith.index_castui %2 : i32 to index
     %5 = arith.index_castui %3 : i32 to index
-    %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%4) : memref<128x384xf32>
-    %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%5) : memref<128xf32>
+    %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%4) : memref<128x384xf32>
+    %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%5) : memref<128xf32>
     %workgroup_id_x = hal.interface.workgroup.id[0] : index
     %thread_id_x = gpu.thread_id  x
     %8 = affine.apply #map()[%thread_id_x, %workgroup_id_x]
@@ -119,14 +115,14 @@
 //   CHECK-LABEL: func.func @reduce_uniform_buffer_offset()
 //     CHECK-DAG:   %[[C0:.+]] = arith.constant 0 : index
 //     CHECK-DAG:   %[[C1:.+]] = arith.constant 1 : index
-//         CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//         CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //         CHECK:   %[[LOAD:.+]] = memref.load %[[SUBSPAN]][%[[C0]]]
 //         CHECK:   %[[EXT0:.+]] = vector.extractelement %[[LOAD]][%[[C0]] : index] : vector<4xi32>
 //         CHECK:   %[[EXT1:.+]] = vector.extractelement %[[LOAD]][%[[C1]] : index] : vector<4xi32>
 //         CHECK:   %[[OFFSET0:.+]] = arith.index_castui %[[EXT0]] : i32 to index
 //         CHECK:   %[[OFFSET1:.+]] = arith.index_castui %[[EXT1]] : i32 to index
-//         CHECK:   hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64) offset(%[[OFFSET0]])
-//         CHECK:   hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) alignment(64) offset(%[[OFFSET1]])
+//         CHECK:   hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64) offset(%[[OFFSET0]])
+//         CHECK:   hal.interface.binding.subspan layout({{.+}}) binding(1) alignment(64) offset(%[[OFFSET1]])
 //         CHECK:   scf.for
 // CHECK-COUNT-5:     gpu.shuffle
 //         CHECK:     arith.addf
@@ -136,12 +132,10 @@
 
 // Make sure memref.load from readonly storage buffers are hoisted out as uniform code.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<()[s0, s1] -> (s1 * 2 + s0 floordiv 32)>
 #translation_info = #iree_codegen.translation_info<None workgroup_size = [32, 1, 1] subgroup_size = 32>
@@ -154,14 +148,14 @@
     %cst_1 = arith.constant dense<3.840000e+02> : vector<1xf32>
     %c32 = arith.constant 32 : index
     %c384 = arith.constant 384 : index
-    %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : memref<1xvector<4xi32>, #hal.descriptor_type<storage_buffer>>
+    %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : memref<1xvector<4xi32>, #hal.descriptor_type<storage_buffer>>
     %1 = memref.load %0[%c0] : memref<1xvector<4xi32>, #hal.descriptor_type<storage_buffer>>
     %2 = vector.extractelement %1[%c0 : index] : vector<4xi32>
     %3 = vector.extractelement %1[%c1 : index] : vector<4xi32>
     %4 = arith.index_castui %2 : i32 to index
     %5 = arith.index_castui %3 : i32 to index
-    %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%4) : memref<128x384xf32>
-    %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%5) : memref<128xf32>
+    %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%4) : memref<128x384xf32>
+    %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%5) : memref<128xf32>
     %workgroup_id_x = hal.interface.workgroup.id[0] : index
     %thread_id_x = gpu.thread_id  x
     %8 = affine.apply #map()[%thread_id_x, %workgroup_id_x]
@@ -180,14 +174,14 @@
 //   CHECK-LABEL: func.func @reduce_storage_buffer_offset()
 //     CHECK-DAG:   %[[C0:.+]] = arith.constant 0 : index
 //     CHECK-DAG:   %[[C1:.+]] = arith.constant 1 : index
-//         CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//         CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //         CHECK:   %[[LOAD:.+]] = memref.load %[[SUBSPAN]][%[[C0]]]
 //         CHECK:   %[[EXT0:.+]] = vector.extractelement %[[LOAD]][%[[C0]] : index] : vector<4xi32>
 //         CHECK:   %[[EXT1:.+]] = vector.extractelement %[[LOAD]][%[[C1]] : index] : vector<4xi32>
 //         CHECK:   %[[OFFSET0:.+]] = arith.index_castui %[[EXT0]] : i32 to index
 //         CHECK:   %[[OFFSET1:.+]] = arith.index_castui %[[EXT1]] : i32 to index
-//         CHECK:   hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64) offset(%[[OFFSET0]])
-//         CHECK:   hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) alignment(64) offset(%[[OFFSET1]])
+//         CHECK:   hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64) offset(%[[OFFSET0]])
+//         CHECK:   hal.interface.binding.subspan layout({{.+}}) binding(1) alignment(64) offset(%[[OFFSET1]])
 //         CHECK:   scf.for
 // CHECK-COUNT-5:     gpu.shuffle
 //         CHECK:     arith.addf
@@ -195,11 +189,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #translation_info = #iree_codegen.translation_info<None workgroup_size = [32, 1, 1] subgroup_size = 32>
 module {
@@ -208,8 +200,8 @@
     %cst = arith.constant dense<0.000000e+00> : vector<1xf32>
     %cst_0 = arith.constant 0.000000e+00 : f32
     %c32 = arith.constant 32 : index
-    %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<128x32xf32>
-    %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<128x32xf32>
+    %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<128x32xf32>
+    %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<128x32xf32>
     %workgroup_id_x = hal.interface.workgroup.id[0] : index
     %alloc = memref.alloc() {alignment = 64 : i64} : memref<32xf32, #gpu.address_space<workgroup>>
     %2 = vector.transfer_read %0[%workgroup_id_x, %c0], %cst_0 {in_bounds = [true]} : memref<128x32xf32>, vector<32xf32>
@@ -234,12 +226,10 @@
 
 // Check that we multi-row matvec gets distributed across subgroup threads.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #translation_info = #iree_codegen.translation_info<None workgroup_size = [64, 1, 1] subgroup_size = 64>
 #map = affine_map<()[s0] -> (s0 * 4)>
@@ -253,11 +243,11 @@
     %c512 = arith.constant 512 : index
     %cst_1 = arith.constant 0.000000e+00 : f16
     %thread_id_x = gpu.thread_id  x
-    %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<1x4096xf16, #hal.descriptor_type<storage_buffer>>
+    %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<1x4096xf16, #hal.descriptor_type<storage_buffer>>
     memref.assume_alignment %0, 64 : memref<1x4096xf16, #hal.descriptor_type<storage_buffer>>
-    %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : memref<32000x4096xf16, #hal.descriptor_type<storage_buffer>>
+    %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : memref<32000x4096xf16, #hal.descriptor_type<storage_buffer>>
     memref.assume_alignment %1, 64 : memref<32000x4096xf16, #hal.descriptor_type<storage_buffer>>
-    %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<1x32000xf16, #hal.descriptor_type<storage_buffer>>
+    %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<1x32000xf16, #hal.descriptor_type<storage_buffer>>
     memref.assume_alignment %2, 64 : memref<1x32000xf16, #hal.descriptor_type<storage_buffer>>
     %workgroup_id_x = hal.interface.workgroup.id[0] : index
     %3 = affine.apply #map()[%workgroup_id_x]
@@ -291,19 +281,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #translation_info = #iree_codegen.translation_info<None workgroup_size = [32, 1, 1] subgroup_size = 32>
 module {
   func.func @simple_nd_write() attributes {translation_info = #translation_info} {
     %c0 = arith.constant 0 : index
     %cst = arith.constant 0.000000e+00 : f32
-    %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<4x1024xf32>
-    %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<4x1024xf32>
+    %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<4x1024xf32>
+    %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<4x1024xf32>
     %2 = vector.transfer_read %0[%c0, %c0], %cst {in_bounds = [true, true]} : memref<4x1024xf32>, vector<4x1024xf32>
     vector.transfer_write %2, %1[%c0, %c0] {in_bounds = [true, true]} : vector<4x1024xf32>, memref<4x1024xf32>
     return
diff --git a/compiler/src/iree/compiler/Codegen/Common/IREEExpandStridedMetadata.cpp b/compiler/src/iree/compiler/Codegen/Common/IREEExpandStridedMetadata.cpp
index 7ae89bd..c45c62f 100644
--- a/compiler/src/iree/compiler/Codegen/Common/IREEExpandStridedMetadata.cpp
+++ b/compiler/src/iree/compiler/Codegen/Common/IREEExpandStridedMetadata.cpp
@@ -156,7 +156,7 @@
     // `hal.interface.binding.subspan` is
     //
     // ```mlir
-    //  hal.interface.binding.subspan layout(#pipeline_layout) set(0)
+    //  hal.interface.binding.subspan layout(#pipeline_layout)
     //  binding(1) offset(%offset)
     //      : memref<?x?xf32, strided<[?, 1], offset: 64]>>{%s0, %s1}
     // ```
@@ -167,7 +167,7 @@
     //  #map = affine_map<()[s0, s1, s2] -> (s0 + s1 * s2)>
     //  %linearSize = affine.apply #map()[%offset, %s0, %s1]
     //  %c0 = arith.constant 0 : index
-    //  hal.interface.binding.subspan layout(#pipeline_layout) set(0)
+    //  hal.interface.binding.subspan layout(#pipeline_layout)
     //  binding(1) offset(%c0)
     //      : memref<?xf32>{%linearSize}
     // ```
@@ -197,7 +197,7 @@
     Value zero = rewriter.create<arith::ConstantIndexOp>(loc, 0);
     auto linearInterfaceBinding =
         rewriter.create<IREE::HAL::InterfaceBindingSubspanOp>(
-            loc, newBufferType, binding.getLayoutAttr(), binding.getSetAttr(),
+            loc, newBufferType, binding.getLayoutAttr(),
             binding.getBindingAttr(), zero, dynamicLinearShape,
             binding.getAlignmentAttr(), binding.getDescriptorFlagsAttr());
 
diff --git a/compiler/src/iree/compiler/Codegen/Common/MaterializeEncodingIntoPackUnPack.cpp b/compiler/src/iree/compiler/Codegen/Common/MaterializeEncodingIntoPackUnPack.cpp
index e966749..2de3dc3 100644
--- a/compiler/src/iree/compiler/Codegen/Common/MaterializeEncodingIntoPackUnPack.cpp
+++ b/compiler/src/iree/compiler/Codegen/Common/MaterializeEncodingIntoPackUnPack.cpp
@@ -616,9 +616,9 @@
     auto newResultType = IREE::Flow::DispatchTensorType::get(
         resultType.getAccess(), convertedBoundType);
     rewriter.replaceOpWithNewOp<IREE::HAL::InterfaceBindingSubspanOp>(
-        subspanOp, newResultType, subspanOp.getLayout(), subspanOp.getSet(),
-        subspanOp.getBinding(), subspanOp.getByteOffset(), newDynamicDims,
-        subspanOp.getAlignmentAttr(), subspanOp.getDescriptorFlagsAttr());
+        subspanOp, newResultType, subspanOp.getLayout(), subspanOp.getBinding(),
+        subspanOp.getByteOffset(), newDynamicDims, subspanOp.getAlignmentAttr(),
+        subspanOp.getDescriptorFlagsAttr());
     return success();
   }
 };
diff --git a/compiler/src/iree/compiler/Codegen/Common/test/bufferize_copy_only_dispatches.mlir b/compiler/src/iree/compiler/Codegen/Common/test/bufferize_copy_only_dispatches.mlir
index cbbcde4..ab2a5bd 100644
--- a/compiler/src/iree/compiler/Codegen/Common/test/bufferize_copy_only_dispatches.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/test/bufferize_copy_only_dispatches.mlir
@@ -1,10 +1,8 @@
 // RUN: iree-opt --pass-pipeline="builtin.module(func.func(iree-codegen-bufferize-copy-only-dispatches))" --split-input-file %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 13, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 13, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @tensor_insert_slice() {
   %slice_size = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
@@ -20,9 +18,9 @@
   %dest_binding_size_x = hal.interface.constant.load layout(#pipeline_layout) ordinal(10) : index
   %source_binding_size_y = hal.interface.constant.load layout(#pipeline_layout) ordinal(11) : index
   %source_binding_size_x = hal.interface.constant.load layout(#pipeline_layout) ordinal(12) : index
-  %source = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0)
+  %source = hal.interface.binding.subspan layout(#pipeline_layout) binding(0)
       : !flow.dispatch.tensor<readonly:tensor<?x?xi32>>{%source_binding_size_y, %source_binding_size_x}
-  %dest = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1)
+  %dest = hal.interface.binding.subspan layout(#pipeline_layout) binding(1)
       : !flow.dispatch.tensor<readwrite:tensor<?x?xi32>>{%dest_binding_size_y, %dest_binding_size_x}
   %source_load = flow.dispatch.tensor.load %source, offsets = [%source_offset_y, %source_offset_x],
       sizes = [1, %slice_size], strides = [%source_stride_y, %source_stride_x]
@@ -43,8 +41,8 @@
 //  CHECK-DAG:   %[[SOURCE_OFFSET_X:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(6)
 //  CHECK-DAG:   %[[SOURCE_STRIDE_Y:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(7)
 //  CHECK-DAG:   %[[SOURCE_STRIDE_X:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(8)
-//  CHECK-DAG:   %[[SOURCE:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//  CHECK-DAG:   %[[DEST:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//  CHECK-DAG:   %[[SOURCE:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//  CHECK-DAG:   %[[DEST:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //  CHECK-DAG:   %[[SOURCE_SUBVIEW:.+]] = memref.subview %[[SOURCE]][%[[SOURCE_OFFSET_Y]], %[[SOURCE_OFFSET_X]]] [1, %[[SLICE_SIZE]]] [%[[SOURCE_STRIDE_Y]], %[[SOURCE_STRIDE_X]]]
 //  CHECK-DAG:   %[[DEST_SUBVIEW:.+]] = memref.subview %[[DEST]][%[[DEST_OFFSET_Y]], %[[DEST_OFFSET_X]]] [%[[SLICE_SIZE]], 1] [%[[DEST_STRIDE_Y]], %[[DEST_STRIDE_X]]]
 //      CHECK:   linalg.generic
@@ -53,24 +51,22 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, uniform_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<uniform_buffer>
 ]>
 func.func @UpSampling1D() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<2x16x3xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x8x3xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<2x16x3xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x8x3xf32>>
   %2 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [2, 1, 3], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2x8x3xf32>> -> tensor<2x3xf32>
   flow.dispatch.tensor.store %2, %0, offsets = [0, 0, 0], sizes = [2, 1, 3], strides = [1, 1, 1] : tensor<2x3xf32> -> !flow.dispatch.tensor<readwrite:tensor<2x16x3xf32>>
   return
 }
 
 // CHECK-LABEL: func.func @UpSampling1D()
-//   CHECK-DAG:   %[[DEST:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[SOURCE:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//   CHECK-DAG:   %[[DEST:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[SOURCE:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //   CHECK-DAG:   %[[SOURCE_SUBVIEW:.+]] = memref.subview %[[SOURCE]][0, 0, 0] [2, 1, 3]
 //   CHECK-DAG:   %[[DEST_SUBVIEW:.+]] = memref.subview %[[DEST]][0, 0, 0] [2, 1, 3]
 //       CHECK:   linalg.generic
@@ -79,15 +75,13 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @concatenate_cst() {
   %cst = arith.constant dense<0> : tensor<2x3xi32>
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<2x5xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<2x5xi32>>
   flow.dispatch.tensor.store %cst, %0, offsets = [0, 2], sizes = [2, 3], strides = [1, 1] : tensor<2x3xi32> -> !flow.dispatch.tensor<readwrite:tensor<2x5xi32>>
   return
 }
@@ -103,15 +97,13 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @already_bufferized() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<1001xf32, #hal.descriptor_type<storage_buffer>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<1001xf32, #hal.descriptor_type<storage_buffer>>
   memref.assume_alignment %0, 64 : memref<1001xf32, #hal.descriptor_type<storage_buffer>>
   %alloc = memref.alloc() : memref<1001xf32>
   linalg.fill ins(%cst : f32) outs(%alloc : memref<1001xf32>)
diff --git a/compiler/src/iree/compiler/Codegen/Common/test/canonicalize_interface_load_store.mlir b/compiler/src/iree/compiler/Codegen/Common/test/canonicalize_interface_load_store.mlir
index 023b33a..54819ce 100644
--- a/compiler/src/iree/compiler/Codegen/Common/test/canonicalize_interface_load_store.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/test/canonicalize_interface_load_store.mlir
@@ -1,18 +1,16 @@
 // RUN: iree-opt --split-input-file --pass-pipeline="builtin.module(func.func(iree-codegen-cleanup-buffer-alloc-view))" %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 // CHECK-LABEL: func.func @fold_reshape_load()
 func.func @fold_reshape_load() {
   %c0 = arith.constant 0 : index
   %c1 = arith.constant 1 : index
   %cst = arith.constant 0.0 : f32
-  // CHECK: %[[ARG:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<3x3x96xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<3x3x1x96xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<writeonly:tensor<3x3x96xf32>>
+  // CHECK: %[[ARG:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : !flow.dispatch.tensor<readonly:tensor<3x3x96xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<3x3x1x96xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<writeonly:tensor<3x3x96xf32>>
   // CHECK: %[[LOAD:.+]] = flow.dispatch.tensor.load %[[ARG]], {{.*}} : !flow.dispatch.tensor<readonly:tensor<3x3x96xf32>> -> tensor<3x3x96xf32>
   %3 = flow.dispatch.tensor.load %1, offsets=[0, 0, 0, 0], sizes =[3, 3, 1, 96], strides=[1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x1x96xf32>> -> tensor<3x3x1x96xf32>
   %4 = tensor.collapse_shape %3 [[0, 1, 2, 3]] : tensor<3x3x1x96xf32> into tensor<864xf32>
@@ -26,19 +24,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 // CHECK-LABEL: func.func @fold_reshape_store()
 func.func @fold_reshape_store() {
   %c0 = arith.constant 0 : index
   %c1 = arith.constant 1 : index
   %cst = arith.constant 0.0 : f32
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<3x3x1x96xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<writeonly:tensor<3x3x96xf32>>
-  // CHECK: %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : !flow.dispatch.tensor<writeonly:tensor<3x3x1x96xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<3x3x1x96xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<writeonly:tensor<3x3x96xf32>>
+  // CHECK: %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : !flow.dispatch.tensor<writeonly:tensor<3x3x1x96xf32>>
   // CHECK: %[[LOAD:.+]] = flow.dispatch.tensor.load %{{.*}}, {{.*}}
   %3 = flow.dispatch.tensor.load %1, offsets=[0, 0, 0, 0], sizes =[3, 3, 1, 96], strides=[1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x1x96xf32>> -> tensor<3x3x1x96xf32>
   //  CHECK: %[[FILL:.+]] = linalg.fill ins(%{{.+}}) outs(%[[LOAD]] : tensor<3x3x1x96xf32>)
@@ -52,10 +48,8 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 // CHECK-LABEL: func.func @dont_fold_reshape_with_not_full_load()
 func.func @dont_fold_reshape_with_not_full_load() {
@@ -63,8 +57,8 @@
   %c1 = arith.constant 1 : index
   %c3 = arith.constant 3 : index
   %c96 = arith.constant 96 : index
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<6x3x1x96xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<writeonly:tensor<3x3x96xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<6x3x1x96xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<writeonly:tensor<3x3x96xf32>>
   %3 = flow.dispatch.tensor.load %1, offsets = [%c3, %c0, %c0, %c0], sizes = [%c3, %c3, %c1, %c96], strides = [%c1, %c1, %c1, %c1] : !flow.dispatch.tensor<readonly:tensor<6x3x1x96xf32>> -> tensor<3x3x1x96xf32>
   // CHECK: tensor.collapse_shape
   // CHECK: tensor.expand_shape
@@ -76,10 +70,8 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 // CHECK-LABEL: func.func @dont_fold_dynamic_reshape()
 func.func @dont_fold_dynamic_reshape() {
@@ -88,8 +80,8 @@
   %dim0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %dim1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %dim2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?x96xf32>>{%dim0, %dim1}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<writeonly:tensor<?x12x8xf32>>{%dim2}
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?x96xf32>>{%dim0, %dim1}
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<writeonly:tensor<?x12x8xf32>>{%dim2}
   %3 = flow.dispatch.tensor.load %1, offsets=[0, 0, 0], sizes =[%dim0, %dim1, 96], strides=[1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?x96xf32>>{%dim0, %dim1} -> tensor<?x?x96xf32>
   // CHECK: tensor.collapse_shape
   // CHECK: tensor.expand_shape
@@ -102,10 +94,8 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 // CHECK: #[[$MAP:.+]] = affine_map<()[s0] -> (s0 ceildiv 288)>
 // CHECK-LABEL: func.func @fold_reshape_slice_store
@@ -114,9 +104,9 @@
   %c0 = arith.constant 0 : index
   %c1 = arith.constant 1 : index
   %cst = arith.constant 0.0 : f32
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<3x3x1x96xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<writeonly:tensor<1728xf32>>
-  // CHECK: %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : !flow.dispatch.tensor<writeonly:tensor<6x3x1x96xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<3x3x1x96xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<writeonly:tensor<1728xf32>>
+  // CHECK: %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : !flow.dispatch.tensor<writeonly:tensor<6x3x1x96xf32>>
   // CHECK: %[[LOAD:.+]] = flow.dispatch.tensor.load %{{.*}}, {{.*}}
   %3 = flow.dispatch.tensor.load %1, offsets=[0, 0, 0, 0], sizes =[3, 3, 1, 96], strides=[1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x1x96xf32>> -> tensor<3x3x1x96xf32>
   //  CHECK: %[[FILL:.+]] = linalg.fill ins(%{{.+}}) outs(%[[LOAD]] : tensor<3x3x1x96xf32>)
diff --git a/compiler/src/iree/compiler/Codegen/Common/test/convert_bf16_to_uint16_buffers.mlir b/compiler/src/iree/compiler/Codegen/Common/test/convert_bf16_to_uint16_buffers.mlir
index 1f99752..7a4c471 100644
--- a/compiler/src/iree/compiler/Codegen/Common/test/convert_bf16_to_uint16_buffers.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/test/convert_bf16_to_uint16_buffers.mlir
@@ -1,27 +1,25 @@
 // RUN: iree-opt --split-input-file \
 // RUN:   --iree-codegen-convert-bf16-to-uint16-buffers %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 // CHECK-LABEL: @bf16_conversion
 func.func @bf16_conversion() {
   %c0 = arith.constant 0 : index
   %c8 = arith.constant 8 : index
 
-  // CHECK-DAG: %[[BUF0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<?xi16, #spirv.storage_class<StorageBuffer>>{%c8}
-  // CHECK-DAG: %[[BUF1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : memref<?xi16, #spirv.storage_class<StorageBuffer>>{%c8}
-  // CHECK-DAG: %[[BUF2:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) alignment(64) offset(%c0) : memref<?xi16, #spirv.storage_class<StorageBuffer>>{%c8}
+  // CHECK-DAG: %[[BUF0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<?xi16, #spirv.storage_class<StorageBuffer>>{%c8}
+  // CHECK-DAG: %[[BUF1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : memref<?xi16, #spirv.storage_class<StorageBuffer>>{%c8}
+  // CHECK-DAG: %[[BUF2:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2) alignment(64) offset(%c0) : memref<?xi16, #spirv.storage_class<StorageBuffer>>{%c8}
   // CHECK-DAG: %[[LOAD0:.+]] = memref.load %[[BUF0]][%arg0] : memref<?xi16, #spirv.storage_class<StorageBuffer>>
   // CHECK-DAG: %[[LOAD1:.+]] = memref.load %[[BUF1]][%arg0] : memref<?xi16, #spirv.storage_class<StorageBuffer>>
   // CHECK: memref.store %{{.+}}, %[[BUF2]][%arg0] : memref<?xi16, #spirv.storage_class<StorageBuffer>>
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<?xbf16, #spirv.storage_class<StorageBuffer>>{%c8}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : memref<?xbf16, #spirv.storage_class<StorageBuffer>>{%c8}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<?xbf16, #spirv.storage_class<StorageBuffer>>{%c8}
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<?xbf16, #spirv.storage_class<StorageBuffer>>{%c8}
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : memref<?xbf16, #spirv.storage_class<StorageBuffer>>{%c8}
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<?xbf16, #spirv.storage_class<StorageBuffer>>{%c8}
   %3 = gpu.thread_id  x
   %4 = gpu.block_dim  x
   scf.for %arg0 = %3 to %c8 step %4 {
@@ -48,11 +46,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK-LABEL: @iree_uk_mmt4d
@@ -77,11 +73,11 @@
   %c0 = arith.constant 0 : index
   %c64 = arith.constant 64 : index
   %c128 = arith.constant 128 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<1x3x8x1xbf16>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<1x3x8x1xbf16>
   memref.assume_alignment %0, 64 : memref<1x3x8x1xbf16>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c64) flags(ReadOnly) : memref<1x3x8x1xbf16, strided<[24, 8, 1, 1], offset: 32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c64) flags(ReadOnly) : memref<1x3x8x1xbf16, strided<[24, 8, 1, 1], offset: 32>>
   memref.assume_alignment %1, 64 : memref<1x3x8x1xbf16, strided<[24, 8, 1, 1], offset: 32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c128) : memref<1x1x8x8xf32, strided<[64, 64, 8, 1], offset: 32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c128) : memref<1x1x8x8xf32, strided<[64, 64, 8, 1], offset: 32>>
   memref.assume_alignment %2, 64 : memref<1x1x8x8xf32, strided<[64, 64, 8, 1], offset: 32>>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
@@ -131,10 +127,8 @@
 // is rewritten correctly, along with any following ops.
 // See issue https://github.com/iree-org/iree/issues/17177
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK-LABEL: module @extract_strided_metadata
@@ -144,7 +138,7 @@
   func.func @external_func_entry_point() attributes {translation_info = #iree_codegen.translation_info<CPUDefault>} {
     %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : i32
     %1 = arith.index_castui %0 : i32 to index
-    %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%1) flags(ReadOnly) : memref<1x8x768xbf16, strided<[6144, 768, 1], offset: ?>>
+    %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%1) flags(ReadOnly) : memref<1x8x768xbf16, strided<[6144, 768, 1], offset: ?>>
     // CHECK: %[[SUBSPAN:.+]] = hal.interface.binding.subspan {{.*}} : memref<1x8x768xi16,
     %base_buffer, %offset, %sizes:3, %strides:3 = iree_codegen.extract_strided_metadata %2 : memref<1x8x768xbf16, strided<[6144, 768, 1], offset: ?>> -> memref<bf16>, index, index, index, index, index, index, index
     // CHECK: {{.+}} = iree_codegen.extract_strided_metadata %[[SUBSPAN]] : memref<1x8x768xi16,
diff --git a/compiler/src/iree/compiler/Codegen/Common/test/convert_to_destination_passing_style.mlir b/compiler/src/iree/compiler/Codegen/Common/test/convert_to_destination_passing_style.mlir
index 1101469..a185567 100644
--- a/compiler/src/iree/compiler/Codegen/Common/test/convert_to_destination_passing_style.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/test/convert_to_destination_passing_style.mlir
@@ -1,21 +1,19 @@
 // RUN: iree-opt %s --pass-pipeline="builtin.module(func.func(iree-codegen-convert-to-destination-passing-style),canonicalize,cse)" --split-input-file | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul() {
   %m = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %n = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %k = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %lhs = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%m, %k}
-  %rhs = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%k, %n}
-  %init = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%m, %n}
-  %result = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%m, %n}
+  %lhs = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%m, %k}
+  %rhs = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%k, %n}
+  %init = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%m, %n}
+  %result = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%m, %n}
   %wg_id_y = hal.interface.workgroup.id[1] : index
   %wg_count_y = hal.interface.workgroup.count[1] : index
   %wg_size_y = hal.interface.workgroup.size[1] : index
@@ -40,10 +38,10 @@
   return
 }
 //      CHECK: func.func @matmul()
-//  CHECK-DAG:   %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//  CHECK-DAG:   %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//  CHECK-DAG:   %[[INIT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
-//  CHECK-DAG:   %[[RESULT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(3)
+//  CHECK-DAG:   %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//  CHECK-DAG:   %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//  CHECK-DAG:   %[[INIT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
+//  CHECK-DAG:   %[[RESULT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(3)
 //      CHECK:   scf.for %[[IV0:.+]] =
 //      CHECK:     scf.for %[[IV1:.+]] =
 //  CHECK-DAG:       %[[LHS_TILE:.+]] = flow.dispatch.tensor.load %[[LHS]]
@@ -56,12 +54,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_fill() {
   %cst = arith.constant 0.0 : f32
@@ -69,9 +65,9 @@
   %m = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %n = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %k = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %lhs = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%m, %k}
-  %rhs = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%k, %n}
-  %result = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%m, %n}
+  %lhs = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%m, %k}
+  %rhs = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%k, %n}
+  %result = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%m, %n}
   %wg_id_y = hal.interface.workgroup.id[1] : index
   %wg_count_y = hal.interface.workgroup.count[1] : index
   %wg_size_y = hal.interface.workgroup.size[1] : index
@@ -97,9 +93,9 @@
   return
 }
 //      CHECK: func.func @matmul_fill()
-//  CHECK-DAG:   %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//  CHECK-DAG:   %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//  CHECK-DAG:   %[[RESULT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//  CHECK-DAG:   %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//  CHECK-DAG:   %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//  CHECK-DAG:   %[[RESULT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //      CHECK:   scf.for %[[IV0:.+]] =
 //      CHECK:     scf.for %[[IV1:.+]] =
 //  CHECK-DAG:       %[[LHS_TILE:.+]] = flow.dispatch.tensor.load %[[LHS]]
@@ -114,21 +110,19 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_inplace() {
   %c0 = arith.constant 0 : index
   %m = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %n = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %k = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %lhs = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%m, %k}
-  %rhs = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%k, %n}
-  %result = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readwrite:tensor<?x?xf32>>{%m, %n}
+  %lhs = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%m, %k}
+  %rhs = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%k, %n}
+  %result = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readwrite:tensor<?x?xf32>>{%m, %n}
   %wg_id_y = hal.interface.workgroup.id[1] : index
   %wg_count_y = hal.interface.workgroup.count[1] : index
   %wg_size_y = hal.interface.workgroup.size[1] : index
@@ -153,9 +147,9 @@
   return
 }
 //      CHECK: func.func @matmul_inplace()
-//  CHECK-DAG:   %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//  CHECK-DAG:   %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//  CHECK-DAG:   %[[RESULT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//  CHECK-DAG:   %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//  CHECK-DAG:   %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//  CHECK-DAG:   %[[RESULT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //      CHECK:   scf.for %[[IV0:.+]] =
 //      CHECK:     scf.for %[[IV1:.+]] =
 //  CHECK-DAG:       %[[LHS_TILE:.+]] = flow.dispatch.tensor.load %[[LHS]]
@@ -168,11 +162,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @reshape_simple() {
   %c0 = arith.constant 0 : index
@@ -180,27 +172,25 @@
   %c3 = arith.constant 3 : index
   %c4 = arith.constant 4 : index
   %c12 = arith.constant 12 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<12xi32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<3x4xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<12xi32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<3x4xi32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [12], strides = [1] : !flow.dispatch.tensor<readonly:tensor<12xi32>> -> tensor<12xi32>
   %3 = tensor.expand_shape %2 [[0, 1]] output_shape [3, 4] : tensor<12xi32> into tensor<3x4xi32>
   flow.dispatch.tensor.store %3, %1, offsets = [0, 0], sizes = [3, 4], strides = [1, 1] : tensor<3x4xi32> -> !flow.dispatch.tensor<writeonly:tensor<3x4xi32>>
   return
 }
 //      CHECK: func.func @reshape_simple()
-//  CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//  CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//  CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//  CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //      CHECK:   %[[SOURCE:.+]] = flow.dispatch.tensor.load %[[ARG0]]
 //      CHECK:   %[[RESHAPE:.+]] = tensor.expand_shape %[[SOURCE]]
 //      CHECK:   flow.dispatch.tensor.store %[[RESHAPE]], %[[RET0]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @reshape_fused_source() {
   %c0 = arith.constant 0 : index
@@ -208,8 +198,8 @@
   %c3 = arith.constant 3 : index
   %c4 = arith.constant 4 : index
   %c12 = arith.constant 12 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<12xi32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<3x4xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<12xi32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<3x4xi32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [12], strides = [1] : !flow.dispatch.tensor<readonly:tensor<12xi32>> -> tensor<12xi32>
   %3 = tensor.expand_shape %2 [[0, 1]] output_shape [3, 4] : tensor<12xi32> into tensor<3x4xi32>
   %4 = tensor.empty() : tensor<3x4xi32>
@@ -225,8 +215,8 @@
   return
 }
 //      CHECK: func.func @reshape_fused_source()
-//  CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//  CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//  CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//  CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //      CHECK:   %[[TARGET:.+]] = flow.dispatch.tensor.load %[[RET0]]
 //      CHECK:   %[[SOURCE:.+]] = flow.dispatch.tensor.load %[[ARG0]]
 //      CHECK:   %[[RESHAPE:.+]] = tensor.expand_shape %[[SOURCE]]
@@ -237,12 +227,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @reshape_fused_source_and_copyout() {
   %c0 = arith.constant 0 : index
@@ -250,9 +238,9 @@
   %c3 = arith.constant 3 : index
   %c4 = arith.constant 4 : index
   %c12 = arith.constant 12 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<12xi32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<3x4xi32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<3x4xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<12xi32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<3x4xi32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<3x4xi32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [12], strides = [1] : !flow.dispatch.tensor<readonly:tensor<12xi32>> -> tensor<12xi32>
   %4 = tensor.expand_shape %3 [[0, 1]] output_shape [3, 4] : tensor<12xi32> into tensor<3x4xi32>
   %5 = tensor.empty() : tensor<3x4xi32>
@@ -269,9 +257,9 @@
   return
 }
 //      CHECK: func.func @reshape_fused_source_and_copyout()
-//  CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//  CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//  CHECK-DAG:   %[[RET1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//  CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//  CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//  CHECK-DAG:   %[[RET1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //  CHECK-DAG:   %[[TARGET:.+]] = flow.dispatch.tensor.load %[[RET0]]
 //      CHECK:   %[[SOURCE:.+]] = flow.dispatch.tensor.load %[[ARG0]]
 //      CHECK:   %[[RESHAPE:.+]] = tensor.expand_shape %[[SOURCE]]
@@ -283,11 +271,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @reshape_fused_target() {
   %c0 = arith.constant 0 : index
@@ -295,8 +281,8 @@
   %c3 = arith.constant 3 : index
   %c4 = arith.constant 4 : index
   %c12 = arith.constant 12 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<3x4xi32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<12xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<3x4xi32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<12xi32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [3, 4], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<3x4xi32>> -> tensor<3x4xi32>
   %3 = tensor.empty() : tensor<3x4xi32>
   %4 = linalg.generic {
@@ -312,8 +298,8 @@
   return
 }
 //      CHECK: func.func @reshape_fused_target()
-//  CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//  CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//  CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//  CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //  CHECK-DAG:   %[[SOURCE:.+]] = flow.dispatch.tensor.load %[[ARG0]]
 //  CHECK-DAG:   %[[TARGET:.+]] = flow.dispatch.tensor.load %[[RET0]]
 //  CHECK-DAG:   %[[RESHAPE_EXPAND:.+]] = tensor.expand_shape %[[TARGET]] {{\[}}[0, 1]{{\]}}
@@ -325,12 +311,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @cast_followed_by_store() {
   %c0 = arith.constant 0 : index
@@ -339,9 +323,9 @@
   %c64 = arith.constant 64 : index
   %c1 = arith.constant 1 : index
   %c32 = arith.constant 32 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<4x32x1024xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<4x1024x64xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<4x32x64xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<4x32x1024xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<4x1024x64xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<4x32x64xf32>>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -368,9 +352,9 @@
   return
 }
 //      CHECK: func.func @cast_followed_by_store()
-//  CHECK-DAG:   %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//  CHECK-DAG:   %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//  CHECK-DAG:   %[[RESULT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//  CHECK-DAG:   %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//  CHECK-DAG:   %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//  CHECK-DAG:   %[[RESULT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //      CHECK:   scf.for %[[IV0:.+]] =
 //      CHECK:     scf.for %[[IV1:.+]] =
 //  CHECK-DAG:       %[[LHS_TILE:.+]] = flow.dispatch.tensor.load %[[LHS]]
@@ -385,13 +369,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 12, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 12, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1) -> (d0, d1)>
 func.func @multi_result() {
@@ -408,10 +390,10 @@
   %dim5 = hal.interface.constant.load layout(#pipeline_layout) ordinal(5) : index
   %dim6 = hal.interface.constant.load layout(#pipeline_layout) ordinal(6) : index
   %dim7 = hal.interface.constant.load layout(#pipeline_layout) ordinal(7) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%dim0, %dim1}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%dim2, %dim3}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%dim4, %dim5}
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%dim6, %dim7}
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%dim0, %dim1}
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%dim2, %dim3}
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%dim4, %dim5}
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%dim6, %dim7}
   %4 = hal.interface.constant.load layout(#pipeline_layout) ordinal(8) : index
   %5 = hal.interface.constant.load layout(#pipeline_layout) ordinal(9) : index
   %6 = hal.interface.constant.load layout(#pipeline_layout) ordinal(10) : index
@@ -448,10 +430,10 @@
   return
 }
 //      CHECK: func.func @multi_result()
-//  CHECK-DAG:   %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//  CHECK-DAG:   %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//  CHECK-DAG:   %[[RESULT0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
-//  CHECK-DAG:   %[[RESULT1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(3)
+//  CHECK-DAG:   %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//  CHECK-DAG:   %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//  CHECK-DAG:   %[[RESULT0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
+//  CHECK-DAG:   %[[RESULT1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(3)
 //      CHECK:   scf.for %[[IV0:.+]] =
 //      CHECK:     scf.for %[[IV1:.+]] =
 //  CHECK-DAG:       %[[LHS_TILE:.+]] = flow.dispatch.tensor.load %[[LHS]]
@@ -466,12 +448,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 6, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 6, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @unused_ins_operand() {
   %c64 = arith.constant 64 : index
@@ -489,9 +469,9 @@
   %9 = arith.index_cast %3 : i32 to index
   %10 = arith.index_cast %4 : i32 to index
   %11 = arith.index_cast %5 : i32 to index
-  %12 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32) offset(%c32) : !flow.dispatch.tensor<readonly:tensor<?x?x?xi32>>{%6, %7, %8}
-  %13 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(32) offset(%c64) : !flow.dispatch.tensor<readonly:tensor<?x?x?xi32>>{%9, %10, %11}
-  %14 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(32) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<?x?x?xi32>>{%9, %10, %8}
+  %12 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32) offset(%c32) : !flow.dispatch.tensor<readonly:tensor<?x?x?xi32>>{%6, %7, %8}
+  %13 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(32) offset(%c64) : !flow.dispatch.tensor<readonly:tensor<?x?x?xi32>>{%9, %10, %11}
+  %14 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(32) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<?x?x?xi32>>{%9, %10, %8}
   %15 = flow.dispatch.tensor.load %13, offsets = [0, 0, 0], sizes = [%9, %10, %11], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?x?xi32>>{%9, %10, %11} -> tensor<?x?x?xi32>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
@@ -531,8 +511,8 @@
   return
 }
 // CHECK-LABEL: func.func @unused_ins_operand()
-//   CHECK-DAG:   %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//   CHECK-DAG:   %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //   CHECK-DAG:   %[[IN_VIEW:.+]] = flow.dispatch.tensor.load %[[IN]]
 //  CHECK-DAG:    %[[OUT_VIEW:.+]] = flow.dispatch.tensor.load %[[OUT]]
 //       CHECK:   linalg.generic
@@ -541,17 +521,15 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @cumsum__2x2x2x2x2x2x2() {
   %cst = arith.constant dense<0.000000e+00> : tensor<2x2x2x2x2x2x2xf32>
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x2x2x2x2x2x2xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x2x2x2x2x2x2xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x2x2x2x2x2x2xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x2x2x2x2x2x2xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0, 0, 0, 0], sizes = [3, 2, 2, 2, 2, 2, 2], strides = [1, 1, 1, 1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x2x2x2x2x2x2xf32>> -> tensor<3x2x2x2x2x2x2xf32>
   %3 = tensor.empty() : tensor<2xf32>
   %4 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2, d3, d4, d5, d6, d7) -> (d0 + d7, d1, d2, d3, d4, d5, d6)>,
@@ -577,17 +555,15 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @reduce_window_max_4x6xf32() {
   %cst = arith.constant dense<0xFF800000> : tensor<2x2xf32>
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x4x6xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x2xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x4x6xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x2xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [2, 4, 6], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2x4x6xf32>> -> tensor<2x4x6xf32>
   %3 = tensor.empty() : tensor<2x2x3xf32>
   %4 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2, d3, d4) -> (d2, d0 * 2 + d3, d1 * 3 + d4)>, affine_map<(d0, d1, d2, d3, d4) -> (d2, d3, d4)>, affine_map<(d0, d1, d2, d3, d4) -> (d0, d1)>], iterator_types = ["parallel", "parallel", "reduction", "reduction", "reduction"]} ins(%2, %3 : tensor<2x4x6xf32>, tensor<2x2x3xf32>) outs(%cst : tensor<2x2xf32>) {
@@ -607,14 +583,12 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @sort1D() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<4xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<4xi32>>
   %1 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [4], strides = [1] : !flow.dispatch.tensor<readwrite:tensor<4xi32>> -> tensor<4xi32>
   %2 = iree_linalg_ext.sort dimension(0) outs(%1 : tensor<4xi32>) {
   ^bb0(%arg0: i32, %arg1: i32):
@@ -625,7 +599,7 @@
   return
 }
 //      CHECK: func.func @sort1D()
-//  CHECK-DAG:   %[[BUF:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//  CHECK-DAG:   %[[BUF:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //  CHECK-DAG:   %[[IN:.+]] = flow.dispatch.tensor.load %[[BUF]]
 //      CHECK:   %[[SORT:.+]] = iree_linalg_ext.sort
 // CHECK-SAME:       outs(%[[IN]] : tensor<4xi32>)
@@ -633,18 +607,16 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @clone_index_computations() {
   %c0 = arith.constant 0 : index
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : i32
   %1 = arith.index_castui %0 : i32 to index
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?xf32>>{%1}
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<?xf32>>{%1}
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?xf32>>{%1}
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<?xf32>>{%1}
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %4 = affine.apply affine_map<()[s0] -> (s0 * 64)>()[%workgroup_id_x]
@@ -664,31 +636,29 @@
   return
 }
 // CHECK-LABEL: func @clone_index_computations()
-//   CHECK-DAG:   %[[INPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[OUTPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//   CHECK-DAG:   %[[INPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[OUTPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //       CHECK:   scf.for
 //       CHECK:     %[[TILESIZE:.+]] = affine.min
 //       CHECK:     %[[LOAD:.+]] = flow.dispatch.tensor.load %[[OUTPUT]], offsets = [{{.+}}], sizes = [%[[TILESIZE]]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>,
-    #hal.descriptor_set.binding<5, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @gemm_gather() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.0 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<128x256xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<256x512xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<2x512xf32>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<readonly:tensor<128xi32>>
-  %result = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(5) : !flow.dispatch.tensor<writeonly:tensor<128x512xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<128x256xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<256x512xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<2x512xf32>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<readonly:tensor<128xi32>>
+  %result = hal.interface.binding.subspan layout(#pipeline_layout) binding(4) : !flow.dispatch.tensor<writeonly:tensor<128x512xf32>>
   %4 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [128, 256], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<128x256xf32>> -> tensor<128x256xf32>
   %5 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [256, 512], strides = [1, 1]
@@ -724,19 +694,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @reduce_broadcast_generic() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.0 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<10x1024xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<10xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<10x1024xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<10x1024xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<10xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<10x1024xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets= [0, 0], sizes = [10, 1024], strides= [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<10x1024xf32>> -> tensor<10x1024xf32>
   %4 = flow.dispatch.tensor.load %1, offsets= [0], sizes = [10], strides= [1]
@@ -767,7 +735,7 @@
   return
 }
 // CHECK-LABEL: func @reduce_broadcast_generic
-//       CHECK:   %[[OUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//       CHECK:   %[[OUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //       CHECK:   %[[OUT:.+]] = flow.dispatch.tensor.load %[[OUT_BINDING]]
 //       CHECK:   %[[RESULT:.+]]:2 = linalg.generic
 //       CHECK:   linalg.generic
@@ -776,16 +744,14 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @pack() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4x4xi32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x2x2x2xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4x4xi32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x2x2x2xi32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [4, 4], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4x4xi32>> -> tensor<4x4xi32>
   %3 = tensor.empty() : tensor<2x2x2x2xi32>
   %pack = tensor.pack %2 inner_dims_pos = [0, 1] inner_tiles = [2, 2] into %3 : tensor<4x4xi32> -> tensor<2x2x2x2xi32>
@@ -793,24 +759,22 @@
   return
 }
 // CHECK-LABEL: func.func @pack
-// CHECK-DAG:     %[[IN_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-// CHECK-DAG:     %[[OUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+// CHECK-DAG:     %[[IN_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+// CHECK-DAG:     %[[OUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 // CHECK-DAG:     %[[IN:.+]] = flow.dispatch.tensor.load %[[IN_BINDING]]
 // CHECK-DAG:     %[[OUT:.+]] = flow.dispatch.tensor.load %[[OUT_BINDING]]
 // CHECK:         tensor.pack %[[IN]] inner_dims_pos = [0, 1] inner_tiles = [2, 2] into %[[OUT]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @unpack() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x2x2x2xi32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4x4xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x2x2x2xi32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4x4xi32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [2, 2, 2, 2], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2x2x2x2xi32>> -> tensor<2x2x2x2xi32>
   %3 = tensor.empty() : tensor<4x4xi32>
   %4 = tensor.unpack %2 inner_dims_pos = [0, 1] inner_tiles = [2, 2] into %3 : tensor<2x2x2x2xi32> -> tensor<4x4xi32>
@@ -818,19 +782,17 @@
   return
 }
 // CHECK-LABEL: func.func @unpack
-// CHECK-DAG:     %[[IN_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-// CHECK-DAG:     %[[OUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+// CHECK-DAG:     %[[IN_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+// CHECK-DAG:     %[[OUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 // CHECK-DAG:     %[[IN:.+]] = flow.dispatch.tensor.load %[[IN_BINDING]]
 // CHECK-DAG:     %[[OUT:.+]] = flow.dispatch.tensor.load %[[OUT_BINDING]]
 // CHECK:         tensor.unpack %[[IN]] inner_dims_pos = [0, 1] inner_tiles = [2, 2] into %[[OUT]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -843,8 +805,8 @@
   %0:2 = iree_codegen.query_tile_sizes tensor<16x16xi32, #iree_encoding.encoding<operand_index = 2, op_type = matmul, element_types = [i8, i8, i32], user_indexing_maps = [#map, #map1, #map2]>> -> index, index
   %1 = affine.apply affine_map<()[s0] -> (16 ceildiv s0)>()[%0#0]
   %2 = affine.apply affine_map<()[s0] -> (16 ceildiv s0)>()[%0#1]
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c512) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xi32>>{%1, %2, %0#0, %0#1}
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x1xi32>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c512) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xi32>>{%1, %2, %0#0, %0#1}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x1xi32>>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -877,26 +839,24 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>,
-    #hal.descriptor_set.binding<4, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @multi_result_dispatches() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<120x240xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<240x360xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<120xf32>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0)
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<writeonly:tensor<120x360xf32>>
-  %30 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(4) alignment(64) offset(%c0)
+  %30 = hal.interface.binding.subspan layout(#pipeline_layout) binding(4) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<writeonly:tensor<120x360xf32>>
   %4 = tensor.empty() : tensor<120x360xf32>
   %cst = arith.constant 0.0 : f32
@@ -925,12 +885,12 @@
   return
 }
 // CHECK-LABEL: func @multi_result_dispatches()
-//   CHECK-DAG:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//   CHECK-DAG:   %[[BIAS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
-//   CHECK-DAG:   %[[RESULT_BINDING0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(3)
+//   CHECK-DAG:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//   CHECK-DAG:   %[[BIAS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
+//   CHECK-DAG:   %[[RESULT_BINDING0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(3)
 //   CHECK-DAG:   %[[RESULT0:.+]] = flow.dispatch.tensor.load %[[RESULT_BINDING0]]
-//   CHECK-DAG:   %[[RESULT_BINDING1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(4)
+//   CHECK-DAG:   %[[RESULT_BINDING1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(4)
 //   CHECK-DAG:   %[[RESULT1:.+]] = flow.dispatch.tensor.load %[[RESULT_BINDING1]]
 //       CHECK:   %[[FILL:.+]] = linalg.fill
 //  CHECK-SAME:       outs(%[[RESULT1]] :
@@ -948,12 +908,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 5, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 5, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @if_conversion() {
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
@@ -961,11 +919,11 @@
   %size = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
   %cond = hal.interface.constant.load layout(#pipeline_layout) ordinal(3) : i1
   %result_offset = hal.interface.constant.load layout(#pipeline_layout) ordinal(4) : index
-  %then = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0)
+  %then = hal.interface.binding.subspan layout(#pipeline_layout) binding(0)
     : !flow.dispatch.tensor<readonly:tensor<?xf32>>{%0}
-  %else = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1)
+  %else = hal.interface.binding.subspan layout(#pipeline_layout) binding(1)
     : !flow.dispatch.tensor<readonly:tensor<?xf32>>{%0}
-  %result = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2)
+  %result = hal.interface.binding.subspan layout(#pipeline_layout) binding(2)
     : !flow.dispatch.tensor<writeonly:tensor<?xf32>>{%0}
   %then_value = flow.dispatch.tensor.load %then, offsets = [%offset], sizes = [%size], strides = [1]
     : !flow.dispatch.tensor<readonly:tensor<?xf32>>{%0} -> tensor<?xf32>
@@ -985,9 +943,9 @@
 //   CHECK-DAG:   %[[S1:.+]] = hal.interface.constant.load layout(#pipeline_layout) ordinal(2)
 //   CHECK-DAG:   %[[COND:.+]] = hal.interface.constant.load layout(#pipeline_layout) ordinal(3)
 //   CHECK-DAG:   %[[OFFSET:.+]] = hal.interface.constant.load layout(#pipeline_layout) ordinal(4)
-//   CHECK-DAG:   %[[THEN_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[ELSE_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//   CHECK-DAG:   %[[RESULT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//   CHECK-DAG:   %[[THEN_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[ELSE_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//   CHECK-DAG:   %[[RESULT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //   CHECK-DAG:   %[[THEN:.+]] = flow.dispatch.tensor.load %[[THEN_BINDING]]
 //   CHECK-DAG:   %[[ELSE:.+]] = flow.dispatch.tensor.load %[[ELSE_BINDING]]
 //       CHECK:   scf.if %[[COND]] {
@@ -1003,11 +961,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 5, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 5, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @if_conversion_clone_offsets() {
   %cst = arith.constant 0.000000e+00 : f32
@@ -1022,8 +978,8 @@
   %7 = arith.index_castui %2 : i32 to index
   %8 = arith.index_castui %3 : i32 to index
   %9 = arith.index_castui %4 : i32 to index
-  %10 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%6, %7}
-  %11 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%5) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%8, %9}
+  %10 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%6, %7}
+  %11 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%5) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%8, %9}
   %12 = affine.apply affine_map<()[s0, s1] -> (-s0 + s1 + (s0 ceildiv 16) * 16)>()[%6, %6]
   %13 = affine.apply affine_map<()[s0, s1] -> (-s0 + s1 + (s0 ceildiv 16) * 16)>()[%7, %7]
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
diff --git a/compiler/src/iree/compiler/Codegen/Common/test/convolution_to_igemm.mlir b/compiler/src/iree/compiler/Codegen/Common/test/convolution_to_igemm.mlir
index 46f30fe..771ef0a 100644
--- a/compiler/src/iree/compiler/Codegen/Common/test/convolution_to_igemm.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/test/convolution_to_igemm.mlir
@@ -31,21 +31,19 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_gpu.lowering_config<{thread = [2, 16], subgroup = [2, 16]}>
 #map = affine_map<(d0, d1) -> (d0, d1)>
 module {
   func.func @fold_with_interface_tensor() {
     %c0 = arith.constant 0 : index
-    %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x16x16x4xf32>>
-    %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x3x4x16xf32>>
-    %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x14x14x16xf32>>
+    %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x16x16x4xf32>>
+    %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x3x4x16xf32>>
+    %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x14x14x16xf32>>
     %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 16, 16, 4], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x16x16x4xf32>> -> tensor<1x16x16x4xf32>
     %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [3, 3, 4, 16], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x4x16xf32>> -> tensor<3x3x4x16xf32>
     %5 = flow.dispatch.tensor.load %2, offsets = [0, 0, 0, 0], sizes = [1, 14, 14, 16], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<writeonly:tensor<1x14x14x16xf32>> -> tensor<1x14x14x16xf32>
diff --git a/compiler/src/iree/compiler/Codegen/Common/test/decompose_conv2d.mlir b/compiler/src/iree/compiler/Codegen/Common/test/decompose_conv2d.mlir
index 38ee86b..d39c341 100644
--- a/compiler/src/iree/compiler/Codegen/Common/test/decompose_conv2d.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/test/decompose_conv2d.mlir
@@ -3,19 +3,17 @@
 #config = #iree_codegen.lowering_config<tile_sizes = [[0, 0, 0, 0, 0, 0], [1, 1, 1, 4, 0, 0], [0, 0, 0, 0, 1, 4], [0, 0, 0, 0, 0, 0]]>
 #executable_target_system_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "system-elf-arm_64", {data_layout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128", native_vector_size = 16 : index, target_triple = "aarch64-none-linux-android30"}>
 #translation = #iree_codegen.translation_info<CPUConvTileAndDecomposeExpert>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 module {
   func.func @restrict_num_workgroups() attributes {hal.executable.target = #executable_target_system_elf_arm_64_, translation_info = #translation} {
     %cst = arith.constant 0.000000e+00 : f32
-    %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x1x4x4xf32>>
-    %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<1x4x4xf32>>
-    %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x1x1x4xf32>>
+    %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x1x4x4xf32>>
+    %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<1x4x4xf32>>
+    %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x1x1x4xf32>>
     %input = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 1, 4, 4], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x1x4x4xf32>> -> tensor<1x1x4x4xf32>
     %filter = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [1, 4, 4], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x4x4xf32>> -> tensor<1x4x4xf32>
     %5 = tensor.empty() : tensor<1x1x1x4xf32>
diff --git a/compiler/src/iree/compiler/Codegen/Common/test/eliminate_empty_tensors.mlir b/compiler/src/iree/compiler/Codegen/Common/test/eliminate_empty_tensors.mlir
index 45301a4..f603d39 100644
--- a/compiler/src/iree/compiler/Codegen/Common/test/eliminate_empty_tensors.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/test/eliminate_empty_tensors.mlir
@@ -2,17 +2,15 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @eliminate_empty_tensors_with_store_op() {
   %c0 = arith.constant 0 : index
   %c8 = arith.constant 8 : index
   %c32 = arith.constant 32 : index
   %c128 = arith.constant 128 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x384xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x384xf32>>
   %1 = tensor.empty() : tensor<32x384xf32>
   scf.for %arg0 = %c0 to %c128 step %c32 {
     %2 = scf.for %arg1 = %c0 to %c32 step %c8 iter_args(%arg2 = %1) -> (tensor<32x384xf32>) {
diff --git a/compiler/src/iree/compiler/Codegen/Common/test/emulate_narrow_type.mlir b/compiler/src/iree/compiler/Codegen/Common/test/emulate_narrow_type.mlir
index 84f84cc..f6b2ee2 100644
--- a/compiler/src/iree/compiler/Codegen/Common/test/emulate_narrow_type.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/test/emulate_narrow_type.mlir
@@ -1,13 +1,11 @@
 // RUN: iree-opt --split-input-file --iree-codegen-emulate-narrow-type %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @memref_i4_to_i8() -> i4 {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<3x15xi4>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<3x15xi4>
   %1 = memref.load %0[%c0, %c0] : memref<3x15xi4>
   return %1 : i4
 }
@@ -16,14 +14,12 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @memref_i4_to_i8_dynamic(%arg0 : index, %arg1 : index, %arg2 : index) -> i4 {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%arg0) flags(ReadOnly) : memref<?x?xi4, strided<[?, 1], offset: ?>>{%arg1, %arg2}
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%arg0) flags(ReadOnly) : memref<?x?xi4, strided<[?, 1], offset: ?>>{%arg1, %arg2}
   %1 = memref.load %0[%c0, %c0] : memref<?x?xi4, strided<[?, 1], offset: ?>>
   return %1 : i4
 }
diff --git a/compiler/src/iree/compiler/Codegen/Common/test/flatten_memref_subspan.mlir b/compiler/src/iree/compiler/Codegen/Common/test/flatten_memref_subspan.mlir
index 48c1922..bb3e4bd 100644
--- a/compiler/src/iree/compiler/Codegen/Common/test/flatten_memref_subspan.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/test/flatten_memref_subspan.mlir
@@ -1,12 +1,10 @@
 // RUN: iree-opt --split-input-file --iree-codegen-flatten-memref-subspan --canonicalize --allow-unregistered-dialect %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @load_subspan_with_offset(%offset : index, %i0: index, %i1: index, %i2: index) -> f32 {
-  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%offset) : memref<6x7x8xf32, strided<[56, 8, 1], offset: ?>>
+  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%offset) : memref<6x7x8xf32, strided<[56, 8, 1], offset: ?>>
   %val = memref.load %subspan[%i0, %i1, %i2] : memref<6x7x8xf32, strided<[56, 8, 1], offset: ?>>
   return %val: f32
 }
@@ -17,20 +15,18 @@
 // CHECK-SAME: (%[[OFFSET:.+]]: index, %[[I0:.+]]: index, %[[I1:.+]]: index, %[[I2:.+]]: index)
 //  CHECK-DAG:   %[[ZERO:.+]] = arith.constant 0 : index
 //  CHECK-DAG:   %[[SIZE:.+]] = affine.apply #[[$MAP0]]()[%[[OFFSET]]]
-//      CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) offset(%[[ZERO]]) : memref<?xf32>{%[[SIZE]]}
+//      CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) offset(%[[ZERO]]) : memref<?xf32>{%[[SIZE]]}
 //      CHECK:   %[[INDEX:.+]] = affine.apply #[[$MAP1]]()[%[[OFFSET]], %[[I0]], %[[I1]], %[[I2]]]
 //      CHECK:   %[[LOAD:.+]] = memref.load %[[SUBSPAN]][%[[INDEX]]]
 //      CHECK:   return %[[LOAD]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @store_subspan_with_offset(%value: f32, %offset : index, %i0: index, %i1: index, %i2: index) {
-  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%offset) : memref<2x3x4xf32, strided<[12, 4, 1], offset: ?>>
+  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%offset) : memref<2x3x4xf32, strided<[12, 4, 1], offset: ?>>
   memref.store %value, %subspan[%i0, %i1, %i2] : memref<2x3x4xf32, strided<[12, 4, 1], offset: ?>>
   return
 }
@@ -41,19 +37,17 @@
 // CHECK-SAME: (%[[VALUE:.+]]: f32, %[[OFFSET:.+]]: index, %[[I0:.+]]: index, %[[I1:.+]]: index, %[[I2:.+]]: index)
 //  CHECK-DAG:   %[[ZERO:.+]] = arith.constant 0 : index
 //  CHECK-DAG:   %[[SIZE:.+]] = affine.apply #[[$MAP0]]()[%[[OFFSET]]
-//      CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) offset(%[[ZERO]]) : memref<?xf32>{%[[SIZE]]}
+//      CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) offset(%[[ZERO]]) : memref<?xf32>{%[[SIZE]]}
 //      CHECK:   %[[INDEX:.+]] = affine.apply #[[$MAP1]]()[%[[OFFSET]], %[[I0]], %[[I1]], %[[I2]]]
 //      CHECK:   memref.store %[[VALUE]], %[[SUBSPAN]][%[[INDEX]]] : memref<?xf32>
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @load_subspan_with_vector_element(%offset : index, %i0: index, %i1: index, %i2: index) -> vector<4xf32> {
-  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%offset) : memref<6x7x8xvector<4xf32>, strided<[56, 8, 1], offset:?>>
+  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%offset) : memref<6x7x8xvector<4xf32>, strided<[56, 8, 1], offset:?>>
   %val = memref.load %subspan[%i0, %i1, %i2] : memref<6x7x8xvector<4xf32>, strided<[56, 8, 1], offset:?>>
   return %val: vector<4xf32>
 }
@@ -64,13 +58,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @load_subspan_with_16bit_element(%offset : index, %i0: index, %i1: index, %i2: index) -> f16 {
-  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%offset) : memref<6x7x8xf16, strided<[56, 8, 1], offset:?>>
+  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%offset) : memref<6x7x8xf16, strided<[56, 8, 1], offset:?>>
   %val = memref.load %subspan[%i0, %i1, %i2] : memref<6x7x8xf16, strided<[56, 8, 1], offset:?>>
   return %val: f16
 }
@@ -81,15 +73,13 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 func.func @store_subspan_with_leading_dynamic_dim(%value: f32, %offset : index, %i0: index, %i1: index, %i2: index) {
   %dim = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
-  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%offset) : memref<?x3x4xf32, strided<[12, 4, 1], offset:?>>{%dim}
+  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%offset) : memref<?x3x4xf32, strided<[12, 4, 1], offset:?>>{%dim}
   memref.store %value, %subspan[%i0, %i1, %i2] : memref<?x3x4xf32, strided<[12, 4, 1], offset:?>>
   return
 }
@@ -101,23 +91,21 @@
 //      CHECK:   %[[C0:.+]] = arith.constant 0 : index
 //      CHECK:   %[[DIM:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(0) : index
 //      CHECK:   %[[SIZE:.+]] = affine.apply #[[$SIZE_MAP]]()[%[[DIM]], %[[OFFSET]]]
-//      CHECK:   %[[DST:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) offset(%[[C0]]) : memref<?xf32>{%[[SIZE]]}
+//      CHECK:   %[[DST:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) offset(%[[C0]]) : memref<?xf32>{%[[SIZE]]}
 //      CHECK:   %[[INDEX:.+]] = affine.apply #[[$OFFSET_MAP]]()[%[[OFFSET]], %[[I0]], %[[I1]], %[[I2]]]
 //      CHECK:   memref.store %[[VALUE]], %[[DST]][%[[INDEX]]] : memref<?xf32>
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 4, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 4, bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @store_subspan_with_all_dynamic_dim(%value: f32, %offset : index, %i0: index, %i1: index, %i2: index, %i3: index) {
   %dim0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %dim1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %dim2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
   %dim3 = hal.interface.constant.load layout(#pipeline_layout) ordinal(3) : index
-  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%offset) : memref<?x?x?x?xf32, strided<[?, ?, ?, 1], offset: ?>>{%dim0, %dim1, %dim2, %dim3}
+  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%offset) : memref<?x?x?x?xf32, strided<[?, ?, ?, 1], offset: ?>>{%dim0, %dim1, %dim2, %dim3}
   memref.store %value, %subspan[%i0, %i1, %i2, %i3] : memref<?x?x?x?xf32, strided<[?, ?, ?, 1], offset: ?>>
   return
 }
@@ -132,21 +120,19 @@
 //      CHECK:   %[[DIM2:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2) : index
 //      CHECK:   %[[DIM3:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(3) : index
 //      CHECK:   %[[SIZE:.+]] = affine.apply #[[$SIZE_MAP]]()[%[[DIM0]], %[[DIM1]], %[[DIM2]], %[[DIM3]], %[[OFFSET]]]
-//      CHECK:   %[[DST:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) offset(%[[C0]]) : memref<?xf32>{%[[SIZE]]}
+//      CHECK:   %[[DST:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) offset(%[[C0]]) : memref<?xf32>{%[[SIZE]]}
 //      CHECK:   %[[INDEX:.+]] = affine.apply #[[$OFFSET_MAP]]()[%[[OFFSET]], %[[DIM3]], %[[I3]], %[[DIM2]], %[[I2]], %[[I0]], %[[DIM1]], %[[I1]]]
 //      CHECK:   memref.store %[[VALUE]], %[[DST]][%[[INDEX]]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @store_subspan_with_mixed_dynamic_dim(%value: f32, %offset : index, %i0: index, %i1: index, %i2: index, %i3: index) {
   %dim0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %dim1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
-  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%offset) : memref<?x4x?x8xf32, strided<[?, ?, 8, 1], offset: ?>>{%dim0, %dim1}
+  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%offset) : memref<?x4x?x8xf32, strided<[?, ?, 8, 1], offset: ?>>{%dim0, %dim1}
   memref.store %value, %subspan[%i0, %i1, %i2, %i3] : memref<?x4x?x8xf32, strided<[?, ?, 8, 1], offset: ?>>
   return
 }
@@ -159,20 +145,18 @@
 //      CHECK:   %[[DIM0:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(0) : index
 //      CHECK:   %[[DIM2:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(1) : index
 //      CHECK:   %[[SIZE:.+]] = affine.apply #[[$SIZE_MAP]]()[%[[DIM0]], %[[DIM2]], %[[OFFSET]]]
-//      CHECK:   %[[DST:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) offset(%[[C0]]) : memref<?xf32>{%[[SIZE]]}
+//      CHECK:   %[[DST:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) offset(%[[C0]]) : memref<?xf32>{%[[SIZE]]}
 //      CHECK:   %[[INDEX:.+]] = affine.apply #[[$OFFSET_MAP]]()[%[[OFFSET]], %[[I3]], %[[DIM2]], %[[I2]], %[[I0]], %[[I1]]]
 //      CHECK:   memref.store %[[VALUE]], %[[DST]][%[[INDEX]]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @store_subspan_with_flow_control(%value: f32, %offset : index, %i0: index, %i1: index, %i2: index) {
   %dim = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
-  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%offset) : memref<?x3x4xf32, strided<[12, 4, 1], offset:?>>{%dim}
+  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%offset) : memref<?x3x4xf32, strided<[12, 4, 1], offset:?>>{%dim}
   scf.for %i = %i0 to %i1 step %i2 {
     memref.store %value, %subspan[%i0, %i1, %i2] : memref<?x3x4xf32, strided<[12, 4, 1], offset:?>>
   }
@@ -186,7 +170,7 @@
 //      CHECK:   %[[C0:.+]] = arith.constant 0 : index
 //      CHECK:   %[[DIM:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(0) : index
 //      CHECK:   %[[SIZE:.+]] = affine.apply #[[$SIZE_MAP]]()[%[[DIM]], %[[OFFSET]]]
-//      CHECK:   %[[DST:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) offset(%[[C0]]) : memref<?xf32>{%[[SIZE]]}
+//      CHECK:   %[[DST:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) offset(%[[C0]]) : memref<?xf32>{%[[SIZE]]}
 //      CHECK: scf.for
 //      CHECK:   %[[INDEX:.+]] = affine.apply #[[$OFFSET_MAP]]()[%[[OFFSET]], %[[I0]], %[[I1]], %[[I2]]]
 //      CHECK:   memref.store %[[VALUE]], %[[DST]][%[[INDEX]]] : memref<?xf32>
@@ -256,14 +240,12 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @transfer_read_subspan_with_offset(
     %arg0 : index, %arg1: index, %arg2: index, %arg3: index) -> vector<4xf32> {
-  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%arg0) : memref<6x7x8xf32, strided<[56, 8, 1], offset:?>>
+  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%arg0) : memref<6x7x8xf32, strided<[56, 8, 1], offset:?>>
   %cst = arith.constant 0.0 : f32
   %val = vector.transfer_read %subspan[%arg1, %arg2, %arg3], %cst {in_bounds = [true]} : memref<6x7x8xf32, strided<[56, 8, 1], offset:?>>, vector<4xf32>
   return %val: vector<4xf32>
@@ -278,21 +260,19 @@
 // CHECK-SAME:   %[[ARG3:[a-zA-Z0-9_]+]]: index
 //  CHECK-DAG:   %[[C0:.+]] = arith.constant 0 : index
 //  CHECK-DAG:   %[[SIZE:.+]] = affine.apply #[[$MAP0]]()[%[[ARG0]]]
-//      CHECK:   %[[MEMREF:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) offset(%[[C0]]) : memref<?xf32>{%[[SIZE]]}
+//      CHECK:   %[[MEMREF:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) offset(%[[C0]]) : memref<?xf32>{%[[SIZE]]}
 //      CHECK:   %[[INDEX:.+]] = affine.apply #[[$MAP1]]()[%[[ARG0]], %[[ARG1]], %[[ARG2]], %[[ARG3]]]
 //      CHECK:   %[[VEC:.+]] = vector.transfer_read %[[MEMREF]][%[[INDEX]]]
 //      CHECK:   return %[[VEC]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @transfer_write_subspan_with_offset(
     %arg0 : index, %arg1: index, %arg2: index, %arg3: index, %arg4 : vector<4xf32>) {
-  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%arg0) : memref<6x7x8xf32, strided<[56, 8, 1], offset:?>>
+  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%arg0) : memref<6x7x8xf32, strided<[56, 8, 1], offset:?>>
   vector.transfer_write %arg4, %subspan[%arg1, %arg2, %arg3] {in_bounds = [true]} :  vector<4xf32>, memref<6x7x8xf32, strided<[56, 8, 1], offset:?>>
   return
 }
@@ -307,21 +287,19 @@
 // CHECK-SAME:   %[[ARG4:[a-zA-Z0-9_]+]]: vector<4xf32>
 //  CHECK-DAG:   %[[C0:.+]] = arith.constant 0 : index
 //  CHECK-DAG:   %[[SIZE:.+]] = affine.apply #[[$MAP0]]()[%[[ARG0]]]
-//      CHECK:   %[[MEMREF:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) offset(%[[C0]]) : memref<?xf32>{%[[SIZE]]}
+//      CHECK:   %[[MEMREF:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) offset(%[[C0]]) : memref<?xf32>{%[[SIZE]]}
 //      CHECK:   %[[INDEX:.+]] = affine.apply #[[$MAP1]]()[%[[ARG0]], %[[ARG1]], %[[ARG2]], %[[ARG3]]]
 //      CHECK:   vector.transfer_write %[[ARG4]], %[[MEMREF]][%[[INDEX]]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @load_store_subspan_with_zero_offset(%arg0 : index, %arg1 : index, %arg2 : index, %arg3 : index) {
-  %subspan0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<?x?xf32>{%arg0, %arg1}
-  %subspan1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<?x?xf32>{%arg0, %arg1}
+  %subspan0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<?x?xf32>{%arg0, %arg1}
+  %subspan1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<?x?xf32>{%arg0, %arg1}
   %val = memref.load %subspan0[%arg2, %arg3] : memref<?x?xf32>
   memref.store %val, %subspan1[%arg2, %arg3] : memref<?x?xf32>
   return
@@ -335,9 +313,9 @@
 //  CHECK-SAME:     %[[ARG3:[a-zA-Z0-9]+]]: index
 //       CHECK:  %[[C0:.+]] = arith.constant 0 : index
 //       CHECK:  %[[D0:.+]] = affine.apply #[[$MAP0]]()[%[[ARG0]], %[[ARG1]]]
-//       CHECK:  %[[BINDING0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) offset(%[[C0]]) : memref<?xf32>{%[[D0]]}
+//       CHECK:  %[[BINDING0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) offset(%[[C0]]) : memref<?xf32>{%[[D0]]}
 //       CHECK:  %[[D1:.+]] = affine.apply #[[$MAP0]]()[%[[ARG0]], %[[ARG1]]]
-//       CHECK:  %[[BINDING1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) offset(%[[C0]]) : memref<?xf32>{%[[D1]]}
+//       CHECK:  %[[BINDING1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) offset(%[[C0]]) : memref<?xf32>{%[[D1]]}
 //       CHECK:  %[[OFFSET0:.+]] = affine.apply #[[$MAP1]]()[%[[ARG2]], %[[ARG1]], %[[ARG3]]]
 //       CHECK:  %[[VAL:.+]] = memref.load %[[BINDING0]][%[[OFFSET0]]]
 //       CHECK:  %[[OFFSET1:.+]] = affine.apply #[[$MAP1]]()[%[[ARG2]], %[[ARG1]], %[[ARG3]]]
@@ -345,16 +323,14 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @load_store_rank_zero_subspan_with_zero_offset() {
   %zero = arith.constant 0 : index
-  %subspan0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%zero) : memref<f32>
-  %subspan1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) offset(%zero) : memref<f32>
+  %subspan0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%zero) : memref<f32>
+  %subspan1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) offset(%zero) : memref<f32>
   %val = memref.load %subspan0[] : memref<f32>
   memref.store %val, %subspan1[] : memref<f32>
   return
@@ -362,20 +338,18 @@
 
 //CHECK-LABEL: func.func @load_store_rank_zero_subspan_with_zero_offset
 //  CHECK-DAG:   %[[C0:.+]] = arith.constant 0 : index
-//      CHECK:   %[[SPAN0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) offset(%[[C0]]) : memref<f32>
-//      CHECK:   %[[SPAN1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) offset(%[[C0]]) : memref<f32>
+//      CHECK:   %[[SPAN0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) offset(%[[C0]]) : memref<f32>
+//      CHECK:   %[[SPAN1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) offset(%[[C0]]) : memref<f32>
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @load_store_rank_zero_subspan_with_offset(%offset : index) {
-  %subspan0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%offset) : memref<f32, strided<[], offset:?>>
-  %subspan1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) offset(%offset) : memref<f32, strided<[], offset:?>>
+  %subspan0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%offset) : memref<f32, strided<[], offset:?>>
+  %subspan1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) offset(%offset) : memref<f32, strided<[], offset:?>>
   %val = memref.load %subspan0[] : memref<f32, strided<[], offset:?>>
   memref.store %val, %subspan1[] : memref<f32, strided<[], offset:?>>
   return
@@ -387,9 +361,9 @@
 // CHECK-SAME: (%[[OFFSET:.+]]: index)
 //  CHECK-DAG:   %[[C0:.+]] = arith.constant 0 : index
 //  CHECK-DAG:   %[[SIZE0:.+]] = affine.apply #[[$MAP0]]()[%[[OFFSET]]]
-//      CHECK:   %[[SPAN0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) offset(%[[C0]]) : memref<?xf32>{%[[SIZE0]]}
+//      CHECK:   %[[SPAN0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) offset(%[[C0]]) : memref<?xf32>{%[[SIZE0]]}
 //  CHECK-DAG:   %[[SIZE1:.+]] = affine.apply #[[$MAP0]]()[%[[OFFSET]]]
-//      CHECK:   %[[SPAN1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) offset(%[[C0]]) : memref<?xf32>{%[[SIZE1]]}
+//      CHECK:   %[[SPAN1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) offset(%[[C0]]) : memref<?xf32>{%[[SIZE1]]}
 //      CHECK:   %[[INDEX0:.+]] = affine.apply #[[$MAP1]]()[%[[OFFSET]]]
 //      CHECK:   %[[LOAD:.+]] = memref.load %[[SPAN0]][%[[INDEX0]]] : memref<?xf32>
 //      CHECK:   %[[INDEX1:.+]] = affine.apply #[[$MAP1]]()[%[[OFFSET]]]
@@ -397,13 +371,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @collapse_shape(%offset : index, %i0 : index, %i1 : index) -> f32 {
-  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%offset) : memref<4x5x6x7xf32, strided<[210, 42, 7, 1], offset:?>>
+  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%offset) : memref<4x5x6x7xf32, strided<[210, 42, 7, 1], offset:?>>
   %collapse = memref.collapse_shape %subspan[[0, 1], [2, 3]] : memref<4x5x6x7xf32, strided<[210, 42, 7, 1], offset:?>> into memref<20x42xf32, strided<[42, 1], offset:?>>
   %value = memref.load %collapse[%i0, %i1] : memref<20x42xf32, strided<[42, 1], offset:?>>
   return %value : f32
@@ -415,19 +387,17 @@
 // CHECK-SAME: (%[[OFFSET:.+]]: index, %[[I0:.+]]: index, %[[I1:.+]]: index)
 //  CHECK-DAG:   %[[C0:.+]] = arith.constant 0 : index
 //  CHECK-DAG:   %[[SIZE:.+]] = affine.apply #[[$MAP0]]()[%[[OFFSET]]]
-//      CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) offset(%[[C0]]) : memref<?xf32>{%[[SIZE]]}
+//      CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) offset(%[[C0]]) : memref<?xf32>{%[[SIZE]]}
 //      CHECK:   %[[INDEX:.+]] = affine.apply #[[$MAP1]]()[%[[OFFSET]], %[[I0]], %[[I1]]]
 //      CHECK:   memref.load %[[SUBSPAN]][%[[INDEX]]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @expand_shape(%offset : index, %i0: index, %i1: index, %i2: index, %i3: index) -> f32 {
-  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%offset) : memref<20x42xf32, strided<[42, 1], offset:?>>
+  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%offset) : memref<20x42xf32, strided<[42, 1], offset:?>>
   %expand = memref.expand_shape %subspan[[0, 1], [2, 3]] output_shape [4, 5, 6, 7] : memref<20x42xf32, strided<[42, 1], offset:?>> into memref<4x5x6x7xf32, strided<[210, 42, 7, 1], offset:?>>
   %value = memref.load %expand[%i0, %i1, %i2, %i3] : memref<4x5x6x7xf32, strided<[210, 42, 7, 1], offset:?>>
   return %value : f32
@@ -439,19 +409,17 @@
 // CHECK-SAME: (%[[OFFSET:.+]]: index, %[[I0:.+]]: index, %[[I1:.+]]: index, %[[I2:.+]]: index, %[[I3:.+]]: index)
 //  CHECK-DAG:   %[[C0:.+]] = arith.constant 0 : index
 //  CHECK-DAG:   %[[SIZE:.+]] = affine.apply #[[$MAP0]]()[%[[OFFSET]]]
-//      CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) offset(%[[C0]]) : memref<?xf32>{%[[SIZE]]}
+//      CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) offset(%[[C0]]) : memref<?xf32>{%[[SIZE]]}
 //      CHECK:   %[[INDEX:.+]] = affine.apply #[[$MAP1]]()[%[[OFFSET]], %[[I0]], %[[I1]], %[[I2]], %[[I3]]]
 //      CHECK:   memref.load %[[SUBSPAN]][%[[INDEX]]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @expand_shape2(%offset : index, %i0: index, %i1: index) -> f32 {
-  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%offset) : memref<128xf32, strided<[1], offset: ?>>
+  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%offset) : memref<128xf32, strided<[1], offset: ?>>
   %expand = memref.expand_shape %subspan [[0, 1]] output_shape [1, 128] : memref<128xf32, strided<[1], offset: ?>> into memref<1x128xf32, strided<[128, 1], offset: ?>>
   %value = memref.load %expand[%i0, %i1] : memref<1x128xf32, strided<[128, 1], offset: ?>>
   return %value : f32
@@ -463,7 +431,7 @@
 // CHECK-SAME: (%[[OFFSET:.+]]: index, %[[I0:.+]]: index, %[[I1:.+]]: index)
 //  CHECK-DAG:   %[[C0:.+]] = arith.constant 0 : index
 //  CHECK-DAG:   %[[SIZE:.+]] = affine.apply #[[$MAP0]]()[%[[OFFSET]]]
-//      CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) offset(%[[C0]]) : memref<?xf32>{%[[SIZE]]}
+//      CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) offset(%[[C0]]) : memref<?xf32>{%[[SIZE]]}
 //      CHECK:   %[[INDEX:.+]] = affine.apply #[[$MAP1]]()[%[[OFFSET]], %[[I0]], %[[I1]]]
 //      CHECK:   memref.load %[[SUBSPAN]][%[[INDEX]]]
 
@@ -473,13 +441,11 @@
 // be able to do so (a memref cast is inserted to move between unknown and
 // known dim).
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @static_collapse_shape_to_1d_static(%offset : index, %i: index) {
-  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%offset) : memref<6x7x8xf32, strided<[56, 8, 1], offset:?>>
+  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%offset) : memref<6x7x8xf32, strided<[56, 8, 1], offset:?>>
   %collapse = memref.collapse_shape %subspan [[0, 1, 2]] : memref<6x7x8xf32, strided<[56, 8, 1], offset:?>> into memref<336xf32, strided<[1], offset: ?>>
   "unregistered.opaque"(%collapse) : (memref<336xf32, strided<[1], offset: ?>>) -> ()
 }
@@ -491,20 +457,18 @@
 //   CHECK-DAG:   %[[C0:.+]] = arith.constant 0 : index
 //   CHECK-DAG:   %[[OFFSET:.+]] = affine.apply #[[$MAP0]]()[%[[ARG0]]
 //   CHECK-DAG:   %[[SIZE:.+]] = affine.apply #[[$MAP1]]()[%[[ARG0]]
-//       CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) offset(%[[C0]]) : memref<?xf32>{%[[SIZE]]}
+//       CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) offset(%[[C0]]) : memref<?xf32>{%[[SIZE]]}
 //       CHECK:   %[[SUBVIEW:.+]] = memref.subview %[[SUBSPAN]][%[[OFFSET]]] [336] [1] : memref<?xf32> to memref<336xf32, strided<[1], offset: ?>>
 //       CHECK:   "unregistered.opaque"(%[[SUBVIEW]])
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @subview(%offset : index, %i0: index, %i1: index) -> f32 {
   %c0 = arith.constant 0 : index
-  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%offset) : memref<32x128xf32, strided<[128, 1], offset: ?>>
+  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%offset) : memref<32x128xf32, strided<[128, 1], offset: ?>>
   %expand = memref.subview %subspan[%i0, %i1][16, 8][1, 1] : memref<32x128xf32, strided<[128, 1], offset: ?>> to memref<16x8xf32, strided<[128, 1], offset: ?>>
   %value = memref.load %expand[%c0, %c0] : memref<16x8xf32, strided<[128, 1], offset: ?>>
   return %value : f32
@@ -516,7 +480,7 @@
 // CHECK-SAME: (%[[OFFSET:.+]]: index, %[[I0:.+]]: index, %[[I1:.+]]: index)
 //  CHECK-DAG:   %[[C0:.+]] = arith.constant 0 : index
 //  CHECK-DAG:   %[[SIZE:.+]] = affine.apply #[[$MAP0]]()[%[[OFFSET]]]
-//      CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) offset(%[[C0]]) : memref<?xf32>{%[[SIZE]]}
+//      CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) offset(%[[C0]]) : memref<?xf32>{%[[SIZE]]}
 //      CHECK:   %[[INDEX:.+]] = affine.apply #[[$MAP1]]()[%[[OFFSET]], %[[I0]], %[[I1]]]
 //      CHECK:   memref.load %[[SUBSPAN]][%[[INDEX]]]
 
@@ -553,13 +517,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @subgroup_mma_load_with_offset(%offset : index, %i0: index, %i1: index) -> !gpu.mma_matrix<16x16xf16, "AOp"> {
-  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%offset) : memref<32x32xf16, strided<[32, 1], offset: ?>, 3>
+  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%offset) : memref<32x32xf16, strided<[32, 1], offset: ?>, 3>
   %0 = gpu.subgroup_mma_load_matrix %subspan[%i0, %i1] {leadDimension = 32 : index} : memref<32x32xf16, strided<[32, 1], offset: ?>, 3> -> !gpu.mma_matrix<16x16xf16, "AOp">
   return %0 : !gpu.mma_matrix<16x16xf16, "AOp">
 }
@@ -570,20 +532,18 @@
 //  CHECK-SAME: (%[[OFFSET:.+]]: index, %[[I0:.+]]: index, %[[I1:.+]]: index)
 //   CHECK-DAG:   %[[ZERO:.+]] = arith.constant 0 : index
 //   CHECK-DAG:   %[[SIZE:.+]] = affine.apply #[[$MAP1]]()[%[[OFFSET]]]
-//       CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) offset(%[[ZERO]]) : memref<?xf16, 3>{%[[SIZE]]}
+//       CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) offset(%[[ZERO]]) : memref<?xf16, 3>{%[[SIZE]]}
 //       CHECK:   %[[INDEX:.+]] = affine.apply #[[$MAP2]]()[%[[OFFSET]], %[[I0]], %[[I1]]]
 //       CHECK:   %[[LD:.+]] = gpu.subgroup_mma_load_matrix %[[SUBSPAN]][%[[INDEX]]] {leadDimension = 32 : index}
 //       CHECK:   return %[[LD]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @subgroup_mma_store_with_offset(%offset : index, %i0: index, %i1: index, %val: !gpu.mma_matrix<16x16xf16, "COp">) {
-  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%offset) : memref<32x32xf16, strided<[32, 1], offset: ?>, 3>
+  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%offset) : memref<32x32xf16, strided<[32, 1], offset: ?>, 3>
   gpu.subgroup_mma_store_matrix %val, %subspan[%i0, %i1] {leadDimension = 128 : index} : !gpu.mma_matrix<16x16xf16, "COp">, memref<32x32xf16, strided<[32, 1], offset: ?>, 3>
   return
 }
@@ -594,19 +554,17 @@
 //  CHECK-SAME: (%[[OFFSET:.+]]: index, %[[I0:.+]]: index, %[[I1:.+]]: index, %[[VAL:.+]]: !gpu.mma_matrix<16x16xf16, "COp">
 //   CHECK-DAG:   %[[ZERO:.+]] = arith.constant 0 : index
 //   CHECK-DAG:   %[[SIZE:.+]] = affine.apply #[[$MAP1]]()[%[[OFFSET]]]
-//       CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) offset(%[[ZERO]]) : memref<?xf16, 3>{%[[SIZE]]}
+//       CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) offset(%[[ZERO]]) : memref<?xf16, 3>{%[[SIZE]]}
 //       CHECK:   %[[INDEX:.+]] = affine.apply #[[$MAP2]]()[%[[OFFSET]], %[[I0]], %[[I1]]]
 //       CHECK:   gpu.subgroup_mma_store_matrix %[[VAL]], %[[SUBSPAN]][%[[INDEX]]] {leadDimension = 128 : index}
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, uniform_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<uniform_buffer>
 ]>
 func.func @load_uniform_buffer(%offset: index, %i0: index, %i1 : index, %i2: index) -> i32 {
-  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%offset) : memref<2x3x4xi32, strided<[12, 4, 1], offset:?>, #hal.descriptor_type<uniform_buffer>>
+  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%offset) : memref<2x3x4xi32, strided<[12, 4, 1], offset:?>, #hal.descriptor_type<uniform_buffer>>
   %val = memref.load %subspan[%i0, %i1, %i2] : memref<2x3x4xi32, strided<[12, 4, 1], offset:?>, #hal.descriptor_type<uniform_buffer>>
   return %val: i32
 }
@@ -615,7 +573,7 @@
 // CHECK-LABEL: func.func @load_uniform_buffer
 //  CHECK-SAME: (%[[OFFSET:.+]]: index, %[[I0:.+]]: index, %[[I1:.+]]: index, %[[I2:.+]]: index)
 //       CHECK:   %[[C0:.+]] = arith.constant 0 : index
-//       CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) offset(%[[C0]]) : memref<?xi32, #hal.descriptor_type<uniform_buffer>>
+//       CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) offset(%[[C0]]) : memref<?xi32, #hal.descriptor_type<uniform_buffer>>
 //       CHECK:   %[[INDEX:.+]] = affine.apply #[[$MAP1]]()[%[[OFFSET]], %[[I0]], %[[I1]], %[[I2]]]
 //       CHECK:   %[[LD:.+]] = memref.load %[[SUBSPAN]][%[[INDEX]]] : memref<?xi32, #hal.descriptor_type<uniform_buffer>>
 //       CHECK:   return %[[LD]] : i32
@@ -623,13 +581,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, uniform_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<uniform_buffer>
 ]>
 func.func @store_uniform_buffer(%value : i32, %offset: index, %i0: index, %i1 : index, %i2: index) {
-  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%offset) : memref<2x3x4xi32, strided<[12, 4, 1], offset:?>, #hal.descriptor_type<uniform_buffer>>
+  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%offset) : memref<2x3x4xi32, strided<[12, 4, 1], offset:?>, #hal.descriptor_type<uniform_buffer>>
   memref.store %value, %subspan[%i0, %i1, %i2] : memref<2x3x4xi32, strided<[12, 4, 1], offset:?>, #hal.descriptor_type<uniform_buffer>>
   return
 }
@@ -640,21 +596,19 @@
 //  CHECK-SAME: (%[[VAL:.+]]: i32, %[[OFFSET:.+]]: index, %[[I0:.+]]: index, %[[I1:.+]]: index, %[[I2:.+]]: index)
 //       CHECK:   %[[C0:.+]] = arith.constant 0 : index
 //       CHECK:   %[[SIZE:.+]] = affine.apply #[[$MAP0]]()[%[[OFFSET]]]
-//       CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) offset(%[[C0]]) : memref<?xi32, #hal.descriptor_type<uniform_buffer>>{%[[SIZE]]}
+//       CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) offset(%[[C0]]) : memref<?xi32, #hal.descriptor_type<uniform_buffer>>{%[[SIZE]]}
 //       CHECK:   %[[INDEX:.+]] = affine.apply #[[$MAP1]]()[%[[OFFSET]], %[[I0]], %[[I1]], %[[I2]]]
 //       CHECK:   memref.store %[[VAL]], %[[SUBSPAN]][%[[INDEX]]] : memref<?xi32, #hal.descriptor_type<uniform_buffer>>
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @reinterpret_cast_lowering_static_zero_offset() -> f32 {
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<?x?xf32>{%0, %1}
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<?x?xf32>{%0, %1}
   %3 = memref.reinterpret_cast %2 to offset: [0], sizes: [], strides: [] : memref<?x?xf32> to memref<f32>
   %4 = memref.load %3[] : memref<f32>
   return %4 : f32
@@ -664,16 +618,14 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @reinterpret_cast_lowering_dynamic_zero_offset() -> f32 {
   %c0 = arith.constant 0 : index
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<?x?xf32>{%0, %1}
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<?x?xf32>{%0, %1}
   %3 = memref.reinterpret_cast %2 to offset: [%c0], sizes: [], strides: [] : memref<?x?xf32> to memref<f32>
   %4 = memref.load %3[] : memref<f32>
   return %4 : f32
diff --git a/compiler/src/iree/compiler/Codegen/Common/test/fold_affine_min_of_block_id.mlir b/compiler/src/iree/compiler/Codegen/Common/test/fold_affine_min_of_block_id.mlir
index b2529d6..6139480 100644
--- a/compiler/src/iree/compiler/Codegen/Common/test/fold_affine_min_of_block_id.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/test/fold_affine_min_of_block_id.mlir
@@ -1,10 +1,8 @@
 // RUN: iree-opt --pass-pipeline='builtin.module(hal.executable(hal.executable.variant(builtin.module(func.func(iree-codegen-fold-affinemin-in-distributed-loops, canonicalize)))))' %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable public @generic_static {
   hal.executable.variant public @cuda_nvptx_fb target(<"cuda", "cuda-nvptx-fb">) {
@@ -23,8 +21,8 @@
 //       CHECK:} -> tensor<32x32xf32>
 //       CHECK: flow.dispatch.tensor.store {{.*}} sizes = [32, 32], strides = [1, 1] : tensor<32x32xf32> -> !flow.dispatch.tensor<writeonly:tensor<4096x4096xf32>>
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4096x4096xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4096x4096xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4096x4096xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4096x4096xf32>>
         %workgroup_id_x = hal.interface.workgroup.id[0] : index
         %workgroup_id_y = hal.interface.workgroup.id[1] : index
         %2 = affine.min affine_map<()[s0] -> (32, s0 * -32 + 4096)>()[%workgroup_id_y]
diff --git a/compiler/src/iree/compiler/Codegen/Common/test/hoist_unrolled_vector_extract_insert_slice.mlir b/compiler/src/iree/compiler/Codegen/Common/test/hoist_unrolled_vector_extract_insert_slice.mlir
index 2e1ab03..594b8ee 100644
--- a/compiler/src/iree/compiler/Codegen/Common/test/hoist_unrolled_vector_extract_insert_slice.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/test/hoist_unrolled_vector_extract_insert_slice.mlir
@@ -1,11 +1,9 @@
 // RUN: iree-opt --pass-pipeline="builtin.module(func.func(iree-codegen-hoist-vector-extract-insert-slice))" %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @hoist_unrolled_vector_for_mma() {
   %c0 = arith.constant 0 : index
@@ -13,11 +11,11 @@
   %cst_0 = arith.constant dense<0.000000e+00> : vector<32x32xf32>
   %c64 = arith.constant 64 : index
   %c2048 = arith.constant 2048 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<3456x2048xf16>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<3456x2048xf16>
   memref.assume_alignment %0, 64 : memref<3456x2048xf16>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : memref<2048x1024xf16>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : memref<2048x1024xf16>
   memref.assume_alignment %1, 64 : memref<2048x1024xf16>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<3456x1024xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<3456x1024xf32>
   memref.assume_alignment %2, 64 : memref<3456x1024xf32>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %3 = gpu.thread_id  x
diff --git a/compiler/src/iree/compiler/Codegen/Common/test/iree_comprehensive_bufferize.mlir b/compiler/src/iree/compiler/Codegen/Common/test/iree_comprehensive_bufferize.mlir
index c96a6db..fbe6b45 100644
--- a/compiler/src/iree/compiler/Codegen/Common/test/iree_comprehensive_bufferize.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/test/iree_comprehensive_bufferize.mlir
@@ -1,22 +1,20 @@
 // RUN: iree-opt %s --pass-pipeline="builtin.module(func.func(iree-codegen-iree-comprehensive-bufferize, canonicalize, cse, canonicalize))" --split-input-file | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul() {
   %c0 = arith.constant 0 : index
   %m = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %n = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %k = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %lhs = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%m, %k}
-  %rhs = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%k, %n}
-  %init = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%m, %n}
-  %result = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%m, %n}
+  %lhs = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%m, %k}
+  %rhs = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%k, %n}
+  %init = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%m, %n}
+  %result = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%m, %n}
   %wg_id_y = hal.interface.workgroup.id[1] : index
   %wg_count_y = hal.interface.workgroup.count[1] : index
   %wg_size_y = hal.interface.workgroup.size[1] : index
@@ -47,10 +45,10 @@
 //  CHECK-DAG:   %[[M:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(0)
 //  CHECK-DAG:   %[[N:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(1)
 //  CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2)
-//  CHECK-DAG:   %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//  CHECK-DAG:   %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//  CHECK-DAG:   %[[INIT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
-//  CHECK-DAG:   %[[RESULT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(3)
+//  CHECK-DAG:   %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//  CHECK-DAG:   %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//  CHECK-DAG:   %[[INIT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
+//  CHECK-DAG:   %[[RESULT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(3)
 //  CHECK-DAG:   %[[WG_ID_Y:.+]] = hal.interface.workgroup.id[1]
 //  CHECK-DAG:   %[[WG_COUNT_Y:.+]] = hal.interface.workgroup.count[1]
 //  CHECK-DAG:   %[[WG_SIZE_Y:.+]] = hal.interface.workgroup.size[1]
@@ -78,12 +76,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 4, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 4, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_fill() {
   %cst = arith.constant 0.0 : f32
@@ -94,9 +90,9 @@
   %k = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
   %base_offset_i32 = hal.interface.constant.load layout(#pipeline_layout) ordinal(3) alignment(8) : i32
   %base_offset = arith.index_castui %base_offset_i32 : i32 to index
-  %lhs = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%m, %k}
-  %rhs = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%base_offset) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%k, %n}
-  %result = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c1024) : !flow.dispatch.tensor<readwrite:tensor<?x?xf32>>{%m, %n}
+  %lhs = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%m, %k}
+  %rhs = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%base_offset) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%k, %n}
+  %result = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c1024) : !flow.dispatch.tensor<readwrite:tensor<?x?xf32>>{%m, %n}
   %wg_id_y = hal.interface.workgroup.id[1] : index
   %wg_count_y = hal.interface.workgroup.count[1] : index
   %wg_size_y = hal.interface.workgroup.size[1] : index
@@ -131,11 +127,11 @@
 //  CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2)
 //  CHECK-DAG:   %[[BASE_OFFSET_I32:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(3)
 //  CHECK-DAG:   %[[BASE_OFFSET:.+]] = arith.index_castui %[[BASE_OFFSET_I32]]
-//  CHECK-DAG:   %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(32)
+//  CHECK-DAG:   %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(32)
 //  CHECK-DAG:   memref.assume_alignment %[[LHS]], 32
-//  CHECK-DAG:   %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) alignment(64) offset(%[[BASE_OFFSET]])
+//  CHECK-DAG:   %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) alignment(64) offset(%[[BASE_OFFSET]])
 //  CHECK-DAG:   memref.assume_alignment %[[RHS]], 8
-//  CHECK-DAG:   %[[RESULT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) alignment(64) offset(%c1024)
+//  CHECK-DAG:   %[[RESULT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2) alignment(64) offset(%c1024)
 //  CHECK-DAG:   memref.assume_alignment %[[RESULT]], 64
 //  CHECK-DAG:   %[[WG_ID_Y:.+]] = hal.interface.workgroup.id[1]
 //  CHECK-DAG:   %[[WG_COUNT_Y:.+]] = hal.interface.workgroup.count[1]
@@ -164,11 +160,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @elementwise() {
   %c4 = arith.constant 4 : index
@@ -177,8 +171,8 @@
   %c512 = arith.constant 512 : index
   %c64 = arith.constant 64 : index
   %c10 = arith.constant 10 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c512) : !flow.dispatch.tensor<readonly:tensor<1x10xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c64) : !flow.dispatch.tensor<writeonly:tensor<1x10xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c512) : !flow.dispatch.tensor<readonly:tensor<1x10xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c64) : !flow.dispatch.tensor<writeonly:tensor<1x10xf32>>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %2 = affine.apply affine_map<()[s0] -> (s0 * 4)>()[%workgroup_id_x]
@@ -213,8 +207,8 @@
 //      CHECK: func.func @elementwise()
 //  CHECK-DAG:   %[[CST_TENSOR:.+]] = arith.constant dense_resource<__elided__> : tensor<1x10xf32>
 //  CHECK-DAG:   %[[CST_BUF:.+]] = bufferization.to_memref %[[CST_TENSOR]]
-//  CHECK-DAG:   %[[IN_BUF:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0)  binding(0) {{.+}} : memref<1x10xf32, strided<[10, 1], offset: 128>, #hal.descriptor_type<storage_buffer>>
-//  CHECK-DAG:   %[[OUT_BUF:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0)  binding(1) {{.+}} : memref<1x10xf32, strided<[10, 1], offset: 16>, #hal.descriptor_type<storage_buffer>>
+//  CHECK-DAG:   %[[IN_BUF:.+]] = hal.interface.binding.subspan layout({{.+}})  binding(0) {{.+}} : memref<1x10xf32, strided<[10, 1], offset: 128>, #hal.descriptor_type<storage_buffer>>
+//  CHECK-DAG:   %[[OUT_BUF:.+]] = hal.interface.binding.subspan layout({{.+}})  binding(1) {{.+}} : memref<1x10xf32, strided<[10, 1], offset: 16>, #hal.descriptor_type<storage_buffer>>
 //      CHECK:   scf.for
 //  CHECK-DAG:     %[[SUB_IN1:.+]] = memref.subview %[[IN_BUF]]
 //  CHECK-DAG:     %[[SUB_OUT1:.+]] = memref.subview %[[OUT_BUF]]
@@ -228,18 +222,16 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map0 = affine_map<()[s0] -> (s0 * 2)>
 #map1 = affine_map<(d0) -> (d0)>
 func.func @rank_reduced_slice() {
   %c10 = arith.constant 10 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x40xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<10xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x40xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<10xf32>>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %3 = affine.apply #map0()[%workgroup_id_x]
@@ -257,8 +249,8 @@
   return
 }
 //      CHECK: func.func @rank_reduced_slice()
-//  CHECK-DAG:   %[[SRC_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<1x40xf32, #hal.descriptor_type<storage_buffer>>
-//  CHECK-DAG:   %[[DST_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) : memref<10xf32, #hal.descriptor_type<storage_buffer>>
+//  CHECK-DAG:   %[[SRC_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<1x40xf32, #hal.descriptor_type<storage_buffer>>
+//  CHECK-DAG:   %[[DST_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) : memref<10xf32, #hal.descriptor_type<storage_buffer>>
 //      CHECK:   scf.for %[[IV0:.+]] =
 //  CHECK-DAG:     %[[SRC_SUBVIEW:.+]] = memref.subview %[[SRC_BINDING]][0, %[[IV0]]] [1, 2] [1, 1] : memref<1x40xf32{{.+}}> to memref<2xf32
 //  CHECK-DAG:     %[[DST_SUBVIEW:.+]] = memref.subview %[[DST_BINDING]][%[[IV0]]] [2] [1] : memref<10xf32{{.+}}> to memref<2xf32
@@ -271,11 +263,9 @@
 // Checks that there are no errors in early bufferized copy ops. The
 // bufferization pass should make it as it is.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @early_bufferized_copy_cst_ops() {
   %c0 = arith.constant 0 : index
@@ -283,9 +273,9 @@
   %c2 = arith.constant 2 : index
   %cst = arith.constant dense<0> : tensor<2x3xi32>
   %0 = bufferization.to_memref %cst : memref<2x3xi32, affine_map<(d0, d1)[s0, s1, s2] -> (d0 * s1 + s0 + d1 * s2)>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<2x5xi32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<2x5xi32>
   memref.assume_alignment %1, 64 : memref<2x5xi32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<2x5xi32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<2x5xi32>>
   %3 = memref.subview %1[%c0, %c2] [2, 3] [%c1, %c1] : memref<2x5xi32> to memref<2x3xi32, affine_map<(d0, d1)[s0, s1, s2] -> (d0 * s1 + s0 + d1 * s2)>>
   linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%0 : memref<2x3xi32, affine_map<(d0, d1)[s0, s1, s2] -> (d0 * s1 + s0 + d1 * s2)>>) outs(%3 : memref<2x3xi32, affine_map<(d0, d1)[s0, s1, s2] -> (d0 * s1 + s0 + d1 * s2)>>) {
   ^bb0(%arg0: i32, %arg1: i32):
@@ -299,12 +289,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @tile_from_tensor_load_inplace() {
   %c2 = arith.constant 2 : index
@@ -313,9 +301,9 @@
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readwrite:tensor<?x?xf32>>{%0, %1}
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readwrite:tensor<?x?xf32>>{%0, %1}
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
   scf.for %arg0 = %workgroup_id_y to %c2 step %c2 {
@@ -331,9 +319,9 @@
 }
 
 // CHECK-LABEL: func.func @tile_from_tensor_load_inplace()
-//   CHECK-DAG:   %[[TENSOR_LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[TENSOR_RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//   CHECK-DAG:   %[[RETURN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//   CHECK-DAG:   %[[TENSOR_LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[TENSOR_RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//   CHECK-DAG:   %[[RETURN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //       CHECK:   scf.for %[[IV0:.+]] = {{.+}} {
 //       CHECK:     scf.for %[[IV1:.+]] = {{.+}} {
 //   CHECK-DAG:       %[[LHS:.+]] = memref.subview %[[TENSOR_LHS]][%[[IV0]], 0] [1, 3] [1, 1]
@@ -345,13 +333,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @tile_from_tensor_load_inplace_and_copy() {
   %c2 = arith.constant 2 : index
@@ -360,10 +346,10 @@
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readwrite:tensor<?x?xf32>>{%0, %1}
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readwrite:tensor<?x?xf32>>{%0, %1}
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
   scf.for %arg0 = %workgroup_id_y to %c2 step %c2 {
@@ -380,10 +366,10 @@
 }
 
 // CHECK-LABEL: func.func @tile_from_tensor_load_inplace_and_copy()
-//   CHECK-DAG:   %[[TENSOR_LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[TENSOR_RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//   CHECK-DAG:   %[[RETURN1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
-//   CHECK-DAG:   %[[RETURN2:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(3)
+//   CHECK-DAG:   %[[TENSOR_LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[TENSOR_RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//   CHECK-DAG:   %[[RETURN1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
+//   CHECK-DAG:   %[[RETURN2:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(3)
 //       CHECK:   scf.for %[[IV0:.+]] = {{.+}} {
 //       CHECK:     scf.for %[[IV1:.+]] = {{.+}} {
 //   CHECK-DAG:       %[[LHS:.+]] = memref.subview %[[TENSOR_LHS]][%[[IV0]], 0] [1, 3] [1, 1]
@@ -397,12 +383,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1) -> (d0, d1)>
 func.func @tile_from_pointwise_lhs_inplace() {
@@ -412,9 +396,9 @@
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readwrite:tensor<?x?xf32>>{%0, %1}
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readwrite:tensor<?x?xf32>>{%0, %1}
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
   scf.for %arg0 = %workgroup_id_y to %c2 step %c2 {
@@ -436,9 +420,9 @@
 }
 
 // CHECK-LABEL: func.func @tile_from_pointwise_lhs_inplace()
-//   CHECK-DAG:   %[[TENSOR_LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[TENSOR_RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//   CHECK-DAG:   %[[RETURN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//   CHECK-DAG:   %[[TENSOR_LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[TENSOR_RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//   CHECK-DAG:   %[[RETURN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //       CHECK:   scf.for %[[IV0:.+]] = {{.+}} {
 //       CHECK:     scf.for %[[IV1:.+]] = {{.+}} {
 //   CHECK-DAG:       %[[LHS:.+]] = memref.subview %[[TENSOR_LHS]][%[[IV0]], 0] [1, 3] [1, 1]
@@ -454,13 +438,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1) -> (d0, d1)>
 func.func @tile_from_pointwise_outs() {
@@ -470,10 +452,10 @@
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
   scf.for %arg0 = %workgroup_id_y to %c2 step %c2 {
@@ -494,10 +476,10 @@
   return
 }
 // CHECK-LABEL: func.func @tile_from_pointwise_outs()
-//   CHECK-DAG:   %[[TENSOR_LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[TENSOR_RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//   CHECK-DAG:   %[[TENSOR_INIT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
-//   CHECK-DAG:   %[[RETURN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(3)
+//   CHECK-DAG:   %[[TENSOR_LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[TENSOR_RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//   CHECK-DAG:   %[[TENSOR_INIT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
+//   CHECK-DAG:   %[[RETURN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(3)
 //       CHECK:   scf.for %[[IV0:.+]] = {{.+}} {
 //       CHECK:     scf.for %[[IV1:.+]] = {{.+}} {
 //   CHECK-DAG:       %[[RESULT:.+]] = memref.subview %[[RETURN]][%[[IV0]], %[[IV1]]] [1, 1] [1, 1]
@@ -513,12 +495,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1) -> (d0, d1)>
 func.func @tile_from_pointwise_outs_inplace() {
@@ -529,9 +509,9 @@
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readwrite:tensor<?x?xf32>>{%0, %1}
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readwrite:tensor<?x?xf32>>{%0, %1}
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
   scf.for %arg0 = %workgroup_id_y to %c2 step %c2 {
@@ -552,9 +532,9 @@
 }
 
 // CHECK-LABEL: func.func @tile_from_pointwise_outs_inplace()
-//   CHECK-DAG:   %[[TENSOR_LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[TENSOR_RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//   CHECK-DAG:   %[[RETURN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//   CHECK-DAG:   %[[TENSOR_LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[TENSOR_RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//   CHECK-DAG:   %[[RETURN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //       CHECK:   scf.for %[[IV0:.+]] = {{.+}} {
 //       CHECK:     scf.for %[[IV1:.+]] = {{.+}} {
 //   CHECK-DAG:       %[[RESULT:.+]] = memref.subview %[[RETURN]][%[[IV0]], %[[IV1]]] [1, 1] [1, 1]
@@ -568,12 +548,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @tile_from_matmul_outs_inplace() {
   %c2 = arith.constant 2 : index
@@ -582,9 +560,9 @@
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readwrite:tensor<?x?xf32>>{%0, %1}
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readwrite:tensor<?x?xf32>>{%0, %1}
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
   scf.for %arg0 = %workgroup_id_y to %c2 step %c2 {
@@ -601,9 +579,9 @@
 }
 
 // CHECK-LABEL: func.func @tile_from_matmul_outs_inplace()
-//   CHECK-DAG:   %[[TENSOR_LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[TENSOR_RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//   CHECK-DAG:   %[[RETURN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//   CHECK-DAG:   %[[TENSOR_LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[TENSOR_RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//   CHECK-DAG:   %[[RETURN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //       CHECK:   scf.for %[[IV0:.+]] = {{.+}} {
 //       CHECK:     scf.for %[[IV1:.+]] = {{.+}} {
 //   CHECK-DAG:       %[[RESULT:.+]] = memref.subview %[[RETURN]][%[[IV0]], %[[IV1]]] [1, 1] [1, 1]
@@ -616,12 +594,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 6, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 6, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map0 = affine_map<(d0)[s0, s1] -> (-d0 + s0, s1)>
 #map1 = affine_map<(d0)[s0, s1] -> (-d0 + s1, s0)>
@@ -633,9 +609,9 @@
   %3 = hal.interface.constant.load layout(#pipeline_layout) ordinal(3) : index
   %4 = hal.interface.constant.load layout(#pipeline_layout) ordinal(4) : index
   %5 = hal.interface.constant.load layout(#pipeline_layout) ordinal(5) : index
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
-  %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3}
-  %8 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readwrite:tensor<?x?xf32>>{%4, %5}
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
+  %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3}
+  %8 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readwrite:tensor<?x?xf32>>{%4, %5}
   %workgroup_size_x = hal.interface.workgroup.size[0] : index
   %workgroup_size_y = hal.interface.workgroup.size[1] : index
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
@@ -671,9 +647,9 @@
 //       CHECK:   %[[DIM3:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(3) : index
 //       CHECK:   %[[DIM4:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(4) : index
 //       CHECK:   %[[DIM5:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(5) : index
-//       CHECK:   %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>{%[[DIM0]], %[[DIM1]]}
-//       CHECK:   %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>{%[[DIM2]], %[[DIM3]]}
-//       CHECK:   %[[RESULT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>{%[[DIM4]], %[[DIM5]]}
+//       CHECK:   %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>{%[[DIM0]], %[[DIM1]]}
+//       CHECK:   %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>{%[[DIM2]], %[[DIM3]]}
+//       CHECK:   %[[RESULT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>{%[[DIM4]], %[[DIM5]]}
 //   CHECK-DAG:   %[[WGSIZE_X:.+]] = hal.interface.workgroup.size[0]
 //   CHECK-DAG:   %[[WGSIZE_Y:.+]] = hal.interface.workgroup.size[1]
 //       CHECK:   scf.for %[[IV0:.+]] = {{.+}} {
@@ -691,39 +667,35 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @reshape_simple() {
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<12xi32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<3x4xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<12xi32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<3x4xi32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [12], strides = [1] : !flow.dispatch.tensor<readonly:tensor<12xi32>> -> tensor<12xi32>
   %3 = tensor.expand_shape %2 [[0, 1]] output_shape [3, 4] : tensor<12xi32> into tensor<3x4xi32>
   flow.dispatch.tensor.store %3, %1, offsets = [0, 0], sizes = [3, 4], strides = [1, 1] : tensor<3x4xi32> -> !flow.dispatch.tensor<writeonly:tensor<3x4xi32>>
   return
 }
 // CHECK-LABEL: func.func @reshape_simple()
-//   CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//   CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //       CHECK:   %[[RESHAPE:.+]] = memref.expand_shape %[[ARG0]] {{\[}}[0, 1]]
 //       CHECK:   linalg.generic {{.*}} ins(%[[RESHAPE]] {{.*}} outs(%[[RET0]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1) -> (d0, d1)>
 module {
   func.func @reshape_fused_source() {
-    %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<12xi32>>
-    %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<3x4xi32>>
+    %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<12xi32>>
+    %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<3x4xi32>>
     %2 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [3, 4], strides = [1, 1] : !flow.dispatch.tensor<writeonly:tensor<3x4xi32>> -> tensor<3x4xi32>
     %3 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [12], strides = [1] : !flow.dispatch.tensor<readonly:tensor<12xi32>> -> tensor<12xi32>
     %4 = tensor.expand_shape %3 [[0, 1]] output_shape [3, 4] : tensor<12xi32> into tensor<3x4xi32>
@@ -737,8 +709,8 @@
   }
 }
 // CHECK-LABEL: func.func @reshape_fused_source()
-//   CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<12xi32, #hal.descriptor_type<storage_buffer>>
-//   CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) : memref<3x4xi32, #hal.descriptor_type<storage_buffer>>
+//   CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<12xi32, #hal.descriptor_type<storage_buffer>>
+//   CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) : memref<3x4xi32, #hal.descriptor_type<storage_buffer>>
 //       CHECK:   %[[RESHAPE:.+]] = memref.expand_shape %[[ARG0]] {{\[}}[0, 1]]
 //       CHECK:   linalg.generic
 //  CHECK-SAME:     ins(%[[RESHAPE]] : memref<3x4xi32
@@ -746,19 +718,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1) -> (d0, d1)>
 func.func @reshape_fused_source_and_copyout() {
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<12xi32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<3x4xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<12xi32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<3x4xi32>>
   %2 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [3, 4], strides = [1, 1] : !flow.dispatch.tensor<writeonly:tensor<3x4xi32>> -> tensor<3x4xi32>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<3x4xi32>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<3x4xi32>>
   %4 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [12], strides = [1] : !flow.dispatch.tensor<readonly:tensor<12xi32>> -> tensor<12xi32>
   %5 = tensor.expand_shape %4 [[0, 1]] output_shape [3, 4] : tensor<12xi32> into tensor<3x4xi32>
   %6 = linalg.generic {indexing_maps = [#map, #map], iterator_types = ["parallel", "parallel"]} ins(%5 : tensor<3x4xi32>) outs(%2 : tensor<3x4xi32>) {
@@ -771,9 +741,9 @@
   return
 }
 // CHECK-LABEL: func.func @reshape_fused_source_and_copyout()
-//   CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<12xi32, #hal.descriptor_type<storage_buffer>>
-//   CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) : memref<3x4xi32, #hal.descriptor_type<storage_buffer>>
-//   CHECK-DAG:   %[[RET1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) : memref<3x4xi32, #hal.descriptor_type<storage_buffer>>
+//   CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<12xi32, #hal.descriptor_type<storage_buffer>>
+//   CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) : memref<3x4xi32, #hal.descriptor_type<storage_buffer>>
+//   CHECK-DAG:   %[[RET1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2) : memref<3x4xi32, #hal.descriptor_type<storage_buffer>>
 //       CHECK:   %[[RESHAPE:.+]] = memref.expand_shape %[[ARG0]] {{\[}}[0, 1]]
 //       CHECK:   linalg.generic
 //  CHECK-SAME:     ins(%[[RESHAPE]] : memref<3x4xi32
@@ -782,16 +752,14 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1) -> (d0, d1)>
 func.func @reshape_fused_target() {
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<3x4xi32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<12xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<3x4xi32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<12xi32>>
   %2 = flow.dispatch.tensor.load %1, offsets = [0], sizes = [12], strides = [1] : !flow.dispatch.tensor<writeonly:tensor<12xi32>> -> tensor<12xi32>
   %3 = tensor.expand_shape %2 [[0, 1]] output_shape [3, 4] : tensor<12xi32> into tensor<3x4xi32>
   %4 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [3, 4], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<3x4xi32>> -> tensor<3x4xi32>
@@ -805,8 +773,8 @@
   return
 }
 // CHECK-LABEL: func.func @reshape_fused_target()
-//   CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<3x4xi32, #hal.descriptor_type<storage_buffer>>
-//   CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) : memref<12xi32, #hal.descriptor_type<storage_buffer>>
+//   CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<3x4xi32, #hal.descriptor_type<storage_buffer>>
+//   CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) : memref<12xi32, #hal.descriptor_type<storage_buffer>>
 //       CHECK:   %[[RESHAPE:.+]] = memref.expand_shape %[[RET0]] {{\[}}[0, 1]]
 //       CHECK:   linalg.generic
 //  CHECK-SAME:     ins(%[[ARG0]] : memref<3x4xi32
@@ -814,12 +782,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map0 = affine_map<(d0)[s0] -> (-d0 + 1, s0)>
 #map1 = affine_map<(d0)[s0] -> (-d0 + 3, s0)>
@@ -827,9 +793,9 @@
   %cst = arith.constant 0.000000e+00 : f32
   %c3 = arith.constant 3 : index
   %c1 = arith.constant 1 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x1x2xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<2x3xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x3xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x1x2xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<2x3xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x3xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [1, 1, 2], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x1x2xf32>> -> tensor<1x1x2xf32>
   %4 = tensor.collapse_shape %3 [[0, 1], [2]] : tensor<1x1x2xf32> into tensor<1x2xf32>
   %workgroup_size_x = hal.interface.workgroup.size[0] : index
@@ -857,10 +823,10 @@
   return
 }
 // CHECK-LABEL: func.func @dot_general_lowering()
-//   CHECK-DAG:   %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//   CHECK-DAG:   %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //   CHECK-DAG:   %[[RESHAPE_LHS:.+]] = memref.collapse_shape %[[LHS]]
-//   CHECK-DAG:   %[[RETURN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//   CHECK-DAG:   %[[RETURN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //       CHECK:   scf.for %[[IV0:.+]] = {{.+}} {
 //       CHECK:     scf.for %[[IV1:.+]] = {{.+}} {
 //   CHECK-DAG:       %[[LHS_TILE:.+]] = memref.subview %[[RESHAPE_LHS]][%[[IV0]], 0]
@@ -874,37 +840,33 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 4, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 4, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @slice() {
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
   %3 = hal.interface.constant.load layout(#pipeline_layout) ordinal(3) : index
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xi32>>{%0, %1}
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?x?xi32>>{%2, %3}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xi32>>{%0, %1}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?x?xi32>>{%2, %3}
   %6 = flow.dispatch.tensor.load %4, offsets = [0, 0], sizes = [%0, %1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xi32>>{%0, %1} -> tensor<?x?xi32>
   %7 = tensor.extract_slice %6[%0, %1] [%2, %3] [1, 1] : tensor<?x?xi32> to tensor<?x?xi32>
   flow.dispatch.tensor.store %7, %5, offsets = [0, 0], sizes = [%2, %3], strides = [1, 1] : tensor<?x?xi32> -> !flow.dispatch.tensor<writeonly:tensor<?x?xi32>>{%2, %3}
   return
 }
 // CHECK-LABEL: func.func @slice()
-//   CHECK-DAG: %[[ARG:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG: %[[RETURN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//   CHECK-DAG: %[[ARG:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG: %[[RETURN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //   CHECK-DAG: %[[SUBVIEW:.+]] = memref.subview %[[ARG]]
 //       CHECK: linalg.generic {{.*}} ins(%[[SUBVIEW]] {{.*}} outs(%[[RETURN]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 5, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 5, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @slice_rank_reducing() {
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
@@ -912,27 +874,25 @@
   %2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
   %3 = hal.interface.constant.load layout(#pipeline_layout) ordinal(3) : index
   %4 = hal.interface.constant.load layout(#pipeline_layout) ordinal(4) : index
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?x?xi32>>{%4, %4, %4}
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?x?xi32>>{%2, %3}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?x?xi32>>{%4, %4, %4}
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?x?xi32>>{%2, %3}
   %7 = flow.dispatch.tensor.load %5, offsets = [0, 0, 0], sizes = [%4, %4, %4], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?x?xi32>>{%4, %4, %4} -> tensor<?x?x?xi32>
   %8 = tensor.extract_slice %7[%0, %0, %1] [%2, 1, %3] [1, 1, 1] : tensor<?x?x?xi32> to tensor<?x?xi32>
   flow.dispatch.tensor.store %8, %6, offsets = [0, 0], sizes = [%2, %3], strides = [1, 1] : tensor<?x?xi32> -> !flow.dispatch.tensor<writeonly:tensor<?x?xi32>>{%2, %3}
   return
 }
 // CHECK-LABEL: func.func @slice_rank_reducing()
-//   CHECK-DAG: %[[ARG:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG: %[[RETURN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//   CHECK-DAG: %[[ARG:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG: %[[RETURN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //   CHECK-DAG: %[[SUBVIEW:.+]] = memref.subview %[[ARG]]
 //       CHECK: linalg.generic {{.*}} ins(%[[SUBVIEW]] {{.*}} outs(%[[RETURN]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 7, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 7, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @slice_multiple_copy() {
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
@@ -942,9 +902,9 @@
   %4 = hal.interface.constant.load layout(#pipeline_layout) ordinal(4) : index
   %5 = hal.interface.constant.load layout(#pipeline_layout) ordinal(5) : index
   %6 = hal.interface.constant.load layout(#pipeline_layout) ordinal(6) : index
-  %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?x?xi32>>{%6, %6, %6}
-  %8 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?x?x?xi32>>{%3, %4, %5}
-  %9 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xi32>>{%3, %5}
+  %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?x?xi32>>{%6, %6, %6}
+  %8 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?x?x?xi32>>{%3, %4, %5}
+  %9 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xi32>>{%3, %5}
   %10 = flow.dispatch.tensor.load %7, offsets = [0, 0, 0], sizes = [%6, %6, %6], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?x?xi32>>{%6, %6, %6} -> tensor<?x?x?xi32>
   %11 = tensor.extract_slice %10[%0, %1, %2] [%3, %4, %5] [1, 1, 1] : tensor<?x?x?xi32> to tensor<?x?x?xi32>
   %12 = tensor.extract_slice %10[%0, %1, %2] [%3, 1, %5] [1, 1, 1] : tensor<?x?x?xi32> to tensor<?x?xi32>
@@ -953,9 +913,9 @@
   return
 }
 // CHECK-LABEL: func.func @slice_multiple_copy()
-//   CHECK-DAG: %[[ARG:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG: %[[RETURN1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//   CHECK-DAG: %[[RETURN2:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//   CHECK-DAG: %[[ARG:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG: %[[RETURN1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//   CHECK-DAG: %[[RETURN2:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //   CHECK-DAG: %[[SIZE1:.+]] = hal.interface.constant.load layout(#pipeline_layout) ordinal(3) : index
 //   CHECK-DAG: %[[SIZE2:.+]] = hal.interface.constant.load layout(#pipeline_layout) ordinal(4) : index
 //   CHECK-DAG: %[[SIZE3:.+]] = hal.interface.constant.load layout(#pipeline_layout) ordinal(5) : index
@@ -966,15 +926,13 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @slice_in_place() {
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readwrite:tensor<?x?xi32>>{%0, %1}
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readwrite:tensor<?x?xi32>>{%0, %1}
   %3 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [%0, %1], strides = [1, 1] : !flow.dispatch.tensor<readwrite:tensor<?x?xi32>>{%0, %1} -> tensor<?x?xi32>
   flow.dispatch.tensor.store %3, %2, offsets = [0, 0], sizes = [%0, %1], strides = [1, 1] : tensor<?x?xi32> -> !flow.dispatch.tensor<readwrite:tensor<?x?xi32>>{%0, %1}
   return
@@ -984,39 +942,35 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 4, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 4, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @slice_whole_stride_dispatch_0() {
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
   %3 = hal.interface.constant.load layout(#pipeline_layout) ordinal(3) : index
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xi32>>{%0, %1}
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?x?xi32>>{%2, %3}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xi32>>{%0, %1}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?x?xi32>>{%2, %3}
   %6 = flow.dispatch.tensor.load %4, offsets = [0, 0], sizes = [%0, %1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xi32>>{%0, %1} -> tensor<?x?xi32>
   %7 = tensor.extract_slice %6[1, 0] [1, 4] [1, 1] : tensor<?x?xi32> to tensor<1x4xi32>
   flow.dispatch.tensor.store %7, %5, offsets = [0, 0], sizes = [1, 4], strides = [1, 1] : tensor<1x4xi32> -> !flow.dispatch.tensor<writeonly:tensor<?x?xi32>>{%2, %3}
   return
 }
 // CHECK-LABEL: func.func @slice_whole_stride_dispatch_0()
-//   CHECK-DAG:   %[[INPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[OUTPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//   CHECK-DAG:   %[[INPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[OUTPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //   CHECK-DAG:   %[[SUB_IN_FIXED:.+]] = memref.subview %[[INPUT]][1, 0] [1, 4] [1, 1]
 //   CHECK-DAG:   %[[SUB_OUT_FIXED:.+]] = memref.subview %[[OUTPUT]][0, 0] [1, 4] [1, 1]
 //       CHECK:   linalg.generic {{.*}} ins(%[[SUB_IN_FIXED]] {{.*}} outs(%[[SUB_OUT_FIXED]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 6, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 6, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @subtensor_insert() {
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
@@ -1025,9 +979,9 @@
   %3 = hal.interface.constant.load layout(#pipeline_layout) ordinal(3) : index
   %4 = hal.interface.constant.load layout(#pipeline_layout) ordinal(4) : index
   %5 = hal.interface.constant.load layout(#pipeline_layout) ordinal(5) : index
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xi32>>{%0, %1}
-  %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xi32>>{%2, %3}
-  %8 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xi32>>{%4, %5}
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xi32>>{%0, %1}
+  %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xi32>>{%2, %3}
+  %8 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xi32>>{%4, %5}
   %9 = flow.dispatch.tensor.load %6, offsets = [0, 0], sizes = [%0, %1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xi32>>{%0, %1} -> tensor<?x?xi32>
   %10 = flow.dispatch.tensor.load %7, offsets = [0, 0], sizes = [%2, %3], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xi32>>{%2, %3} -> tensor<?x?xi32>
   %11 = tensor.insert_slice %9 into %10[3, 4] [%0, %1] [1, 1] : tensor<?x?xi32> into tensor<?x?xi32>
@@ -1035,9 +989,9 @@
   return
 }
 // CHECK-LABEL: func.func @subtensor_insert()
-//   CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[ARG1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//   CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//   CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[ARG1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//   CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //   CHECK-DAG:   %[[D0:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(0) : index
 //   CHECK-DAG:   %[[D1:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(1) : index
 //   CHECK-DAG:   %[[D2:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2) : index
@@ -1052,15 +1006,13 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @tensor_extract() {
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<i32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<3x9xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<i32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<3x9xi32>>
   %2 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [3, 9], strides = [1, 1] : !flow.dispatch.tensor<writeonly:tensor<3x9xi32>> -> tensor<3x9xi32>
   %3 = flow.dispatch.tensor.load %0, offsets = [], sizes = [], strides = [] : !flow.dispatch.tensor<readonly:tensor<i32>> -> tensor<i32>
   %4 = tensor.extract %3[] : tensor<i32>
@@ -1069,8 +1021,8 @@
   return
 }
 // CHECK-LABEL: func.func @tensor_extract()
-//   CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//   CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //       CHECK:   %[[LOAD:.+]] = memref.load %[[ARG0]]
 //       CHECK:   linalg.fill
 //  CHECK-SAME:       ins(%[[LOAD]] :
@@ -1078,31 +1030,27 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @load_to_store() {
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<writeonly:tensor<3x4xi32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x4xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<writeonly:tensor<3x4xi32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x4xi32>>
   %2 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [3, 4], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<3x4xi32>> -> tensor<3x4xi32>
   flow.dispatch.tensor.store %2, %0, offsets = [0, 0], sizes = [3, 4], strides = [1, 1] : tensor<3x4xi32> -> !flow.dispatch.tensor<writeonly:tensor<3x4xi32>>
   return
 }
 // CHECK-LABEL: func.func @load_to_store()
-//       CHECK:   %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<3x4xi32, #hal.descriptor_type<storage_buffer>>
-//       CHECK:   %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) : memref<3x4xi32, #hal.descriptor_type<storage_buffer>>
+//       CHECK:   %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<3x4xi32, #hal.descriptor_type<storage_buffer>>
+//       CHECK:   %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) : memref<3x4xi32, #hal.descriptor_type<storage_buffer>>
 //       CHECK:   linalg.generic {{.*}} ins(%[[IN]] {{.*}} outs(%[[OUT]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map0 = affine_map<()[s0, s1] -> (s0 * s1)>
 #map1 = affine_map<(d0)[s0] -> (-d0 + 5, s0)>
@@ -1111,8 +1059,8 @@
   %cst_0 = arith.constant 0.000000e+00 : f32
   %c5 = arith.constant 5 : index
   %c1 = arith.constant 1 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x5x3x1xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<5x5xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x5x3x1xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<5x5xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 5, 3, 1], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x5x3x1xf32>> -> tensor<1x5x3x1xf32>
   %3 = tensor.collapse_shape %2 [[0, 1], [2, 3]] : tensor<1x5x3x1xf32> into tensor<5x3xf32>
   %workgroup_size_x = hal.interface.workgroup.size[0] : index
@@ -1142,8 +1090,8 @@
 // CHECK-LABEL: func.func @rhs_non_splat_constant
 //   CHECK-DAG:   %[[CONSTANT:.+]] = arith.constant {{.+}} : tensor<3x5xf32>
 //   CHECK-DAG:   %[[RHS:.+]] = bufferization.to_memref %[[CONSTANT]]
-//   CHECK-DAG:   %[[LHS_INPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<1x5x3x1xf32, #hal.descriptor_type<storage_buffer>>
-//   CHECK-DAG:   %[[RETURN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) : memref<5x5xf32, #hal.descriptor_type<storage_buffer>>
+//   CHECK-DAG:   %[[LHS_INPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<1x5x3x1xf32, #hal.descriptor_type<storage_buffer>>
+//   CHECK-DAG:   %[[RETURN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) : memref<5x5xf32, #hal.descriptor_type<storage_buffer>>
 //       CHECK:   %[[LHS:.+]] = memref.collapse_shape %[[LHS_INPUT]]
 //       CHECK:   scf.for %[[IV0:.+]] =
 //       CHECK:     scf.for %[[IV1:.+]] =
@@ -1158,12 +1106,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 5, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 5, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map0 = affine_map<(d0, d1) -> (d0)>
 #map1 = affine_map<(d0, d1) -> (d0, d1)>
@@ -1173,9 +1119,9 @@
   %2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
   %3 = hal.interface.constant.load layout(#pipeline_layout) ordinal(3) : index
   %4 = hal.interface.constant.load layout(#pipeline_layout) ordinal(4) : index
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?xi32>>{%2}
-  %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%3, %4}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?xi32>>{%2}
+  %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%3, %4}
   %8 = flow.dispatch.tensor.load %7, offsets = [0, 0], sizes = [%3, %4], strides = [1, 1] : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%3, %4} -> tensor<?x?xf32>
   %9 = flow.dispatch.tensor.load %5, offsets = [0, 0], sizes = [%0, %1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1} -> tensor<?x?xf32>
   %10 = flow.dispatch.tensor.load %6, offsets = [0], sizes = [%2], strides = [1] : !flow.dispatch.tensor<readonly:tensor<?xi32>>{%2} -> tensor<?xi32>
@@ -1190,26 +1136,24 @@
   return
 }
 // CHECK-LABEL: func.func @gather()
-//   CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[ARG1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//   CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//   CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[ARG1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//   CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //       CHECK:   linalg.generic
 //       CHECK:     %[[VAL:.+]] = memref.load %[[ARG0]]
 //       CHECK:     linalg.yield %[[VAL]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @pooling_nhwc_sum() {
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<f32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<1x4x6x1xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x2x2x1xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<f32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<1x4x6x1xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x2x2x1xf32>>
   %3 = flow.dispatch.tensor.load %2, offsets = [0, 0, 0, 0], sizes = [1, 2, 2, 1], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<writeonly:tensor<1x2x2x1xf32>> -> tensor<1x2x2x1xf32>
   %4 = bufferization.alloc_tensor() : tensor<2x3xf32>
   %5 = flow.dispatch.tensor.load %0, offsets = [], sizes = [], strides = [] : !flow.dispatch.tensor<readonly:tensor<f32>> -> tensor<f32>
@@ -1222,9 +1166,9 @@
 }
 // CHECK-LABEL: func.func @pooling_nhwc_sum
 //   CHECK-DAG:   %[[WINDOW:.+]] = memref.alloc() : memref<2x3xf32>
-//   CHECK-DAG:   %[[INIT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<f32, #hal.descriptor_type<storage_buffer>>
-//   CHECK-DAG:   %[[INPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) : memref<1x4x6x1xf32, #hal.descriptor_type<storage_buffer>>
-//   CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) : memref<1x2x2x1xf32, #hal.descriptor_type<storage_buffer>>
+//   CHECK-DAG:   %[[INIT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<f32, #hal.descriptor_type<storage_buffer>>
+//   CHECK-DAG:   %[[INPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) : memref<1x4x6x1xf32, #hal.descriptor_type<storage_buffer>>
+//   CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2) : memref<1x2x2x1xf32, #hal.descriptor_type<storage_buffer>>
 //       CHECK:   %[[INIT_VAL:.+]] = memref.load %[[INIT]][] : memref<f32{{.+}}>
 //       CHECK:   linalg.fill
 //  CHECK-SAME:       ins(%[[INIT_VAL]] :
@@ -1237,12 +1181,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 6, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 6, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map0 = affine_map<()[s0, s1] -> (s0 * s1)>
 #map1 = affine_map<(d0)[s0, s1] -> (-d0 + s1, s0)>
@@ -1255,10 +1197,10 @@
   %3 = hal.interface.constant.load layout(#pipeline_layout) ordinal(3) : index
   %4 = hal.interface.constant.load layout(#pipeline_layout) ordinal(4) : index
   %5 = hal.interface.constant.load layout(#pipeline_layout) ordinal(5) : index
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
-  %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3}
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
+  %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3}
   %8 = flow.dispatch.tensor.load %7, offsets = [0, 0], sizes = [%2, %3], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3} -> tensor<?x?xf32>
-  %9 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%4, %5}
+  %9 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%4, %5}
   %10 = flow.dispatch.tensor.load %9, offsets = [0, 0], sizes = [%4, %5], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%4, %5} -> tensor<?x?xf32>
   %workgroup_size_x = hal.interface.workgroup.size[0] : index
   %workgroup_size_y = hal.interface.workgroup.size[1] : index
@@ -1291,9 +1233,9 @@
   return
 }
 // CHECK-LABEL: func.func @read_only_subtensor
-//   CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>
-//   CHECK-DAG:   %[[ARG1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>
-//   CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>
+//   CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>
+//   CHECK-DAG:   %[[ARG1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>
+//   CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>
 //       CHECK:   scf.for
 //       CHECK:     scf.for
 //   CHECK-DAG:       %[[SV1:.+]] = memref.subview %[[ARG0]]
@@ -1305,19 +1247,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0) -> (d0)>
 func.func @reshape_read_only() {
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?xf32>>{%2}
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?xf32>>{%2}
   %5 = flow.dispatch.tensor.load %4, offsets = [0], sizes = [%2], strides = [1] : !flow.dispatch.tensor<writeonly:tensor<?xf32>>{%2} -> tensor<?xf32>
   %6 = flow.dispatch.tensor.load %3, offsets = [0, 0], sizes = [%0, %1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1} -> tensor<?x?xf32>
   %7 = tensor.collapse_shape %6 [[0, 1]] : tensor<?x?xf32> into tensor<?xf32>
@@ -1332,8 +1272,8 @@
   return
 }
 // CHECK-LABEL: func.func @reshape_read_only
-//   CHECK-DAG:   %[[INPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[OUTPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//   CHECK-DAG:   %[[INPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[OUTPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //       CHECK:   %[[RESHAPE:.+]] = memref.collapse_shape %[[INPUT]]
 //       CHECK:   linalg.generic
 //  CHECK-SAME:     ins(%[[RESHAPE]] : memref<?xf32
@@ -1341,22 +1281,20 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map0 = affine_map<(d0, d1, d2, d3) -> (d3)>
 #map1 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
 func.func @use_buffer_for_operand_when_output_tensor_not_used() {
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x225x225x16xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x16x32xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<32xf32>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x32xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x225x225x16xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x16x32xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<32xf32>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x32xf32>>
   %4 = flow.dispatch.tensor.load %3, offsets = [0, 0, 0, 0], sizes = [1, 112, 112, 32], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<writeonly:tensor<1x112x112x32xf32>> -> tensor<1x112x112x32xf32>
   %5 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 255, 255, 16], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x225x225x16xf32>> -> tensor<1x225x225x16xf32>
   %6 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [3, 3, 16, 32], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x16x32xf32>> -> tensor<3x3x16x32xf32>
@@ -1374,7 +1312,7 @@
 // CHECK: func.func @use_buffer_for_operand_when_output_tensor_not_used()
 
 //  CHECK-NOT: memref.alloc
-//      CHECK: %[[OUTPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(3)
+//      CHECK: %[[OUTPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(3)
 //      CHECK: linalg.fill
 // CHECK-SAME:     outs(%[[OUTPUT]] :
 // CHECK-NEXT: linalg.conv_2d_nhwc_hwcf
@@ -1385,23 +1323,21 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map0 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
 #map1 = affine_map<(d0, d1, d2, d3) -> (d3)>
 func.func @dont_use_buffer_for_operand_when_output_tensor_used() {
   %cst = arith.constant 1.000000e+00 : f32
   %cst_0 = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x225x225x16xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x16x32xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<32xf32>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x32xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x225x225x16xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x16x32xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<32xf32>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x32xf32>>
   %4 = flow.dispatch.tensor.load %3, offsets = [0, 0, 0, 0], sizes = [1, 112, 112, 32], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<writeonly:tensor<1x112x112x32xf32>> -> tensor<1x112x112x32xf32>
   %5 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 225, 225, 16], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x225x225x16xf32>> -> tensor<1x225x225x16xf32>
   %6 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [3, 3, 16, 32], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x16x32xf32>> -> tensor<3x3x16x32xf32>
@@ -1421,7 +1357,7 @@
 }
 // CHECK-LABEL: func.func @dont_use_buffer_for_operand_when_output_tensor_used()
 //   CHECK-DAG:   %[[ALLOC:.+]] = memref.alloc
-//   CHECK-DAG:   %[[OUTPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(3)
+//   CHECK-DAG:   %[[OUTPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(3)
 //       CHECK:   linalg.fill
 //  CHECK-SAME:       outs(%[[ALLOC]] :
 //  CHECK-NEXT:   linalg.conv_2d_nhwc_hwcf
@@ -1434,11 +1370,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map0 = affine_map<(d0) -> (-d0 + 4)>
 #map1 = affine_map<(d0) -> (d0)>
@@ -1447,8 +1381,8 @@
   %c-2147483648_i32 = arith.constant -2147483648 : i32
   %cst = arith.constant 0.000000e+00 : f32
   %cst_0 = arith.constant dense<[1, 2, 3, 4, 5]> : tensor<5xi32>
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<5xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<i32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<5xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<i32>>
   %2 = flow.dispatch.tensor.load %1, offsets = [], sizes = [], strides = [] : !flow.dispatch.tensor<writeonly:tensor<i32>> -> tensor<i32>
   %3 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [5], strides = [1] : !flow.dispatch.tensor<readonly:tensor<5xf32>> -> tensor<5xf32>
   %4 = linalg.fill ins(%c-2147483648_i32 : i32) outs(%2 : tensor<i32>) -> tensor<i32>
@@ -1469,20 +1403,18 @@
 //       CHECK-DAG: %[[CST1:.+]] = arith.constant -2147483648 : i32
 //       CHECK-DAG: %[[CST5:.+]] = arith.constant dense<[1, 2, 3, 4, 5]> : tensor<5xi32>
 //       CHECK: %[[CAST5:.+]] = bufferization.to_memref %[[CST5]] : memref<5xi32>
-//       CHECK: %[[INPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<5xf32, #hal.descriptor_type<storage_buffer>>
-//       CHECK: %[[OUTPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) : memref<i32, #hal.descriptor_type<storage_buffer>>
+//       CHECK: %[[INPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<5xf32, #hal.descriptor_type<storage_buffer>>
+//       CHECK: %[[OUTPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) : memref<i32, #hal.descriptor_type<storage_buffer>>
 //       CHECK: linalg.fill ins(%[[CST1]] : i32) outs(%[[OUTPUT]] : memref<i32{{.+}}>)
 //       CHECK: linalg.generic
 //  CHECK-SAME:   ins(%[[INPUT]], %[[CAST5]] : {{.*}}) outs(%[[OUTPUT]] : memref<i32{{.+}}>)
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<()[s0] -> (s0 * 32)>
 func.func @cast_follwed_by_store() {
@@ -1491,9 +1423,9 @@
   %c64 = arith.constant 64 : index
   %c1 = arith.constant 1 : index
   %c32 = arith.constant 32 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<4x32x1024xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<4x1024x64xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<4x32x64xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<4x32x1024xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<4x1024x64xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<4x32x64xf32>>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -1520,9 +1452,9 @@
 }
 // CHECK-LABEL: func.func @cast_follwed_by_store()
 //   CHECK-DAG: %[[ZERO:.+]] = arith.constant 0.000000e+00 : f32
-//   CHECK-DAG: %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<4x32x1024xf32, #hal.descriptor_type<storage_buffer>>
-//   CHECK-DAG: %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) : memref<4x1024x64xf32, #hal.descriptor_type<storage_buffer>>
-//   CHECK-DAG: %[[RESULT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) : memref<4x32x64xf32, #hal.descriptor_type<storage_buffer>>
+//   CHECK-DAG: %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<4x32x1024xf32, #hal.descriptor_type<storage_buffer>>
+//   CHECK-DAG: %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) : memref<4x1024x64xf32, #hal.descriptor_type<storage_buffer>>
+//   CHECK-DAG: %[[RESULT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2) : memref<4x32x64xf32, #hal.descriptor_type<storage_buffer>>
 //   CHECK-DAG: %[[LHSV:.+]] = memref.subview %[[LHS]]
 //   CHECK-DAG: %[[RHSV:.+]] = memref.subview %[[RHS]]
 //   CHECK-DAG: %[[RESULTV:.+]] = memref.subview %[[RESULT]]
@@ -1533,11 +1465,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 5, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 5, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @rank_reduced_subtensor_insert() {
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
@@ -1545,8 +1475,8 @@
   %2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
   %3 = hal.interface.constant.load layout(#pipeline_layout) ordinal(3) : index
   %4 = hal.interface.constant.load layout(#pipeline_layout) ordinal(4) : index
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readwrite:tensor<?x?x?xf32>>{%2, %3, %4}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readwrite:tensor<?x?x?xf32>>{%2, %3, %4}
   %7 = flow.dispatch.tensor.load %5, offsets = [0, 0], sizes = [%0, %1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1} -> tensor<?x?xf32>
   %8 = flow.dispatch.tensor.load %6, offsets = [0, 0, 0], sizes = [%2, %3, %4], strides = [1, 1, 1] : !flow.dispatch.tensor<readwrite:tensor<?x?x?xf32>>{%2, %3, %4} -> tensor<?x?x?xf32>
   %9 = tensor.insert_slice %7 into %8[0, 0, 0] [1, %3, %4] [1, 1, 1] : tensor<?x?xf32> into tensor<?x?x?xf32>
@@ -1554,19 +1484,17 @@
   return
 }
 // CHECK-LABEL: func.func @rank_reduced_subtensor_insert()
-//   CHECK-DAG:   %[[ARG:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[RET:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//   CHECK-DAG:   %[[ARG:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[RET:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //       CHECK:   %[[SUBVIEW:.+]] = memref.subview %[[RET]]
 //       CHECK:   linalg.generic {{.*}} ins(%[[ARG]] {{.*}} outs(%[[SUBVIEW]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map0 = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -1577,9 +1505,9 @@
   %c0 = arith.constant 0 : index
   %c2 = arith.constant 2 : index
   %c1 = arith.constant 1 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<2x3xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x4xf32>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readwrite:tensor<2x4xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<2x3xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x4xf32>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readwrite:tensor<2x4xf32>>
   %4 = flow.dispatch.tensor.load %0, offsets = [%c0, %c0], sizes = [2, 3], strides = [%c1, %c1] : !flow.dispatch.tensor<readonly:tensor<2x3xf32>> -> tensor<2x3xf32>
   %5 = flow.dispatch.tensor.load %1, offsets = [%c0, %c0], sizes = [3, 1], strides = [%c1, %c1] : !flow.dispatch.tensor<readonly:tensor<3x4xf32>> -> tensor<3x1xf32>
   %6 = flow.dispatch.tensor.load %3, offsets = [%c0, %c0], sizes = [2, 1], strides = [%c1, %c1] : !flow.dispatch.tensor<readwrite:tensor<2x4xf32>> -> tensor<2x1xf32>
@@ -1607,9 +1535,9 @@
 }
 
 //   CHECK-LABEL: func.func @bufferize_transfer_op_inplace()
-//     CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//     CHECK-DAG:   %[[ARG1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//     CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//     CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//     CHECK-DAG:   %[[ARG1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//     CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //     CHECK-DAG:   %[[ARG1V:.+]] = memref.subview %[[ARG1]]
 //     CHECK-DAG:   %[[RET0V:.+]] = memref.subview %[[RET0]]
 // CHECK-COUNT-6:   vector.transfer_read %[[ARG0]]
@@ -1621,13 +1549,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 10, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 10, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map0 = affine_map<(d0)[s0, s1] -> (-d0 + s0, s1)>
 #map1 = affine_map<(d0, d1) -> (d0, d1)>
@@ -1641,10 +1567,10 @@
   %5 = hal.interface.constant.load layout(#pipeline_layout) ordinal(5) : index
   %6 = hal.interface.constant.load layout(#pipeline_layout) ordinal(6) : index
   %7 = hal.interface.constant.load layout(#pipeline_layout) ordinal(7) : index
-  %8 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
-  %9 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3}
-  %10 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%4, %5}
-  %11 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%6, %7}
+  %8 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
+  %9 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3}
+  %10 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%4, %5}
+  %11 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%6, %7}
   %12 = hal.interface.constant.load layout(#pipeline_layout) ordinal(8) : index
   %13 = hal.interface.constant.load layout(#pipeline_layout) ordinal(9) : index
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
@@ -1678,10 +1604,10 @@
   return
 }
 // CHECK-LABEL: func.func @multi_result()
-//   CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[ARG1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//   CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
-//   CHECK-DAG:   %[[RET1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(3)
+//   CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[ARG1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//   CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
+//   CHECK-DAG:   %[[RET1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(3)
 //   CHECK-DAG:   %[[ARG0V:.+]] = memref.subview %[[ARG0]]
 //   CHECK-DAG:   %[[ARG1V:.+]] = memref.subview %[[ARG1]]
 //   CHECK-DAG:   %[[RET0V:.+]] = memref.subview %[[RET0]]
@@ -1692,13 +1618,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map0 = affine_map<()[s0] -> (s0 * 128)>
 #map1 = affine_map<(d0)[s0] -> (-d0 + s0, 128)>
@@ -1710,10 +1634,10 @@
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xi32>>{%0, %1}
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xi32>>{%0, %1}
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?xi32>>{%2}
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?xi32>>{%2}
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xi32>>{%0, %1}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xi32>>{%0, %1}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?xi32>>{%2}
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?xi32>>{%2}
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %7 = affine.apply #map0()[%workgroup_id_x]
@@ -1743,10 +1667,10 @@
   return
 }
 // CHECK-LABEL: func.func @multi_result_reduce
-//   CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[ARG1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//   CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
-//   CHECK-DAG:   %[[RET1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(3)
+//   CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[ARG1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//   CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
+//   CHECK-DAG:   %[[RET1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(3)
 //       CHECK:   scf.for
 //   CHECK-DAG:     %[[ARG0_SV:.+]] = memref.subview %[[ARG0]]
 //   CHECK-DAG:     %[[ARG1_SV:.+]] = memref.subview %[[ARG1]]
@@ -1762,12 +1686,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map0 = affine_map<()[s0] -> (s0 * 64)>
 #map1 = affine_map<(d0) -> (-d0 + 250, 64)>
@@ -1784,9 +1706,9 @@
   %c1 = arith.constant 1 : index
   %c250 = arith.constant 250 : index
   %c370 = arith.constant 370 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<250x144xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<144x370xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readwrite:tensor<250x370xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<250x144xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<144x370xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readwrite:tensor<250x370xf32>>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -1834,9 +1756,9 @@
 //    CHECK-DAG: %[[K:.+]] = arith.constant 144 : index
 //    CHECK-DAG: %[[L1_MN_SIZE:.+]] = arith.constant 32 : index
 //    CHECK-DAG: %[[L1_K_SIZE:.+]] = arith.constant 24 : index
-//    CHECK-DAG: %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<250x144xf32, #hal.descriptor_type<storage_buffer>>
-//    CHECK-DAG: %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) : memref<144x370xf32, #hal.descriptor_type<storage_buffer>>
-//    CHECK-DAG: %[[DST:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) : memref<250x370xf32, #hal.descriptor_type<storage_buffer>>
+//    CHECK-DAG: %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<250x144xf32, #hal.descriptor_type<storage_buffer>>
+//    CHECK-DAG: %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) : memref<144x370xf32, #hal.descriptor_type<storage_buffer>>
+//    CHECK-DAG: %[[DST:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2) : memref<250x370xf32, #hal.descriptor_type<storage_buffer>>
 //        CHECK: scf.for %[[WORKGROUP_I:.+]] = %{{.*}} to %[[M]] step %{{.*}} {
 //        CHECK:    scf.for %[[WORKGROUP_J:.+]] = %{{.*}} to %[[N]] step %{{.*}} {
 //    CHECK-DAG:        %[[WORKGROUP_I_SIZE:.+]] = affine.min #{{.*}}(%[[WORKGROUP_I]])
@@ -1858,12 +1780,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map0 = affine_map<()[s0] -> (s0 * 64)>
 #map1 = affine_map<(d0) -> (-d0 + 250, 64)>
@@ -1881,9 +1801,9 @@
   %c1 = arith.constant 1 : index
   %c250 = arith.constant 250 : index
   %c370 = arith.constant 370 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<250x144xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<144x370xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<250x370xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<250x144xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<144x370xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<250x370xf32>>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -1932,9 +1852,9 @@
 //    CHECK-DAG: %[[K:.+]] = arith.constant 144 : index
 //    CHECK-DAG: %[[L1_MN_SIZE:.+]] = arith.constant 32 : index
 //    CHECK-DAG: %[[L1_K_SIZE:.+]] = arith.constant 24 : index
-//    CHECK-DAG: %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<250x144xf32, #hal.descriptor_type<storage_buffer>>
-//    CHECK-DAG: %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) : memref<144x370xf32, #hal.descriptor_type<storage_buffer>>
-//    CHECK-DAG: %[[DST:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) : memref<250x370xf32, #hal.descriptor_type<storage_buffer>>
+//    CHECK-DAG: %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<250x144xf32, #hal.descriptor_type<storage_buffer>>
+//    CHECK-DAG: %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) : memref<144x370xf32, #hal.descriptor_type<storage_buffer>>
+//    CHECK-DAG: %[[DST:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2) : memref<250x370xf32, #hal.descriptor_type<storage_buffer>>
 //        CHECK: scf.for %[[WORKGROUP_I:.+]] = %{{.*}} to %[[M]] step %{{.*}} {
 //        CHECK:    scf.for %[[WORKGROUP_J:.+]] = %{{.*}} to %[[N]] step %{{.*}} {
 //    CHECK-DAG:        %[[WORKGROUP_I_SIZE:.+]] = affine.min #{{.*}}(%[[WORKGROUP_I]])
@@ -1956,11 +1876,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 6, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 6, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map0 = affine_map<()[s0, s1] -> (s1 * s0)>
 #map1 = affine_map<(d0)[s0, s1] -> (-d0 + s1, s0)>
@@ -1972,8 +1890,8 @@
   %3 = hal.interface.constant.load layout(#pipeline_layout) ordinal(3) : index
   %4 = hal.interface.constant.load layout(#pipeline_layout) ordinal(4) : index
   %5 = hal.interface.constant.load layout(#pipeline_layout) ordinal(5) : index
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xi32>>{%2, %3}
-  %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?x?xi32>>{%4, %5}
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xi32>>{%2, %3}
+  %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?x?xi32>>{%4, %5}
   %workgroup_size_x = hal.interface.workgroup.size[0] : index
   %workgroup_size_y = hal.interface.workgroup.size[1] : index
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
@@ -1998,8 +1916,8 @@
 }
 //       CHECK: #[[MAP:.+]] = affine_map<(d0)[s0] -> (d0 + s0)>
 //       CHECK: func.func @tensor_insert_slice()
-//   CHECK-DAG:   %[[SRC:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<?x?xi32, #hal.descriptor_type<storage_buffer>>
-//   CHECK-DAG:   %[[DST:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) : memref<?x?xi32, #hal.descriptor_type<storage_buffer>>
+//   CHECK-DAG:   %[[SRC:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<?x?xi32, #hal.descriptor_type<storage_buffer>>
+//   CHECK-DAG:   %[[DST:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) : memref<?x?xi32, #hal.descriptor_type<storage_buffer>>
 //   CHECK-DAG:   %[[OFFSET_Y:.+]] = hal.interface.constant.load layout(#pipeline_layout) ordinal(0)
 //   CHECK-DAG:   %[[OFFSET_X:.+]] = hal.interface.constant.load layout(#pipeline_layout) ordinal(1)
 //       CHECK:   scf.for %[[IV0:.+]] =
@@ -2012,12 +1930,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map0 = affine_map<()[s0] -> (s0 * 64)>
 #map1 = affine_map<(d0)[s0] -> (-d0 + s0, 64)>
@@ -2026,9 +1942,9 @@
   %c0_i32 = arith.constant 0 : i32
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?xi32>>{%0}
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<i32>>
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xi32>>{%1, %0}
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?xi32>>{%0}
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<i32>>
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xi32>>{%1, %0}
   %5 = flow.dispatch.tensor.load %3, offsets = [], sizes = [], strides = [] : !flow.dispatch.tensor<readonly:tensor<i32>> -> tensor<i32>
   %6 = tensor.extract %5[] : tensor<i32>
   %7 = arith.cmpi slt, %6, %c0_i32 : i32
@@ -2049,8 +1965,8 @@
   return
 }
 // CHECK-LABEL: func.func @dynamic_update_slice()
-//   CHECK-DAG:   %[[SRC:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<?xi32, #hal.descriptor_type<storage_buffer>>
-//   CHECK-DAG:   %[[DST:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) : memref<?x?xi32, #hal.descriptor_type<storage_buffer>>
+//   CHECK-DAG:   %[[SRC:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<?xi32, #hal.descriptor_type<storage_buffer>>
+//   CHECK-DAG:   %[[DST:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2) : memref<?x?xi32, #hal.descriptor_type<storage_buffer>>
 //   CHECK-DAG:   %[[OFFSET_Y:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(0)
 //   CHECK-DAG:   %[[OFFSET_X:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(1)
 //       CHECK:   scf.for %[[IV0:.+]] =
@@ -2062,13 +1978,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map0 = affine_map<()[s0, s1] -> (s0 * s1)>
 #map1 = affine_map<(d0)[s0, s1] -> (-d0 + s1, s0)>
@@ -2083,10 +1997,10 @@
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<f32>>
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<f32>>
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
   %7 = flow.dispatch.tensor.load %5, offsets = [], sizes = [], strides = [] : !flow.dispatch.tensor<readonly:tensor<f32>> -> tensor<f32>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
@@ -2146,10 +2060,10 @@
 //   CHECK-DAG:   %[[M:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(0)
 //   CHECK-DAG:   %[[N:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(1)
 //   CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2)
-//   CHECK-DAG:   %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>{%[[M]], %[[K]]}
-//   CHECK-DAG:   %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>{%[[K]], %[[N]]}
-//   CHECK-DAG:   %[[SCALAR:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) : memref<f32, #hal.descriptor_type<storage_buffer>>
-//   CHECK-DAG:   %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(3) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>{%[[M]], %[[N]]}
+//   CHECK-DAG:   %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>{%[[M]], %[[K]]}
+//   CHECK-DAG:   %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>{%[[K]], %[[N]]}
+//   CHECK-DAG:   %[[SCALAR:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2) : memref<f32, #hal.descriptor_type<storage_buffer>>
+//   CHECK-DAG:   %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(3) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>{%[[M]], %[[N]]}
 //       CHECK:   scf.for
 //       CHECK:     scf.for
 //   CHECK-DAG:       %[[LHS_SUBVIEW1:.+]] = memref.subview %[[LHS]]
@@ -2174,13 +2088,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map0 = affine_map<()[s0] -> (s0 * 4)>
 #map1 = affine_map<(d0) -> (-d0 + 2, 4)>
@@ -2194,10 +2106,10 @@
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<f32>>
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<f32>>
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
   %7 = flow.dispatch.tensor.load %5, offsets = [], sizes = [], strides = [] : !flow.dispatch.tensor<readonly:tensor<f32>> -> tensor<f32>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
@@ -2230,10 +2142,10 @@
 //   CHECK-DAG:   %[[M:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(0)
 //   CHECK-DAG:   %[[N:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(1)
 //   CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2)
-//   CHECK-DAG:   %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>{%[[M]], %[[K]]}
-//   CHECK-DAG:   %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>{%[[K]], %[[N]]}
-//   CHECK-DAG:   %[[SCALAR:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) : memref<f32, #hal.descriptor_type<storage_buffer>>
-//   CHECK-DAG:   %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(3) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>{%[[M]], %[[N]]}
+//   CHECK-DAG:   %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>{%[[M]], %[[K]]}
+//   CHECK-DAG:   %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>{%[[K]], %[[N]]}
+//   CHECK-DAG:   %[[SCALAR:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2) : memref<f32, #hal.descriptor_type<storage_buffer>>
+//   CHECK-DAG:   %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(3) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>{%[[M]], %[[N]]}
 //       CHECK:   scf.for
 //       CHECK:     scf.for
 //   CHECK-DAG:       %[[LHS_SUBVIEW1:.+]] = memref.subview %[[LHS]]
@@ -2250,11 +2162,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map0 = affine_map<()[s0] -> (s0 * 4)>
 #map1 = affine_map<()[s0] -> (s0 * 2)>
@@ -2270,9 +2180,9 @@
   %cst_0 = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
   %c64 = arith.constant 64 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x6x1xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32) offset(%c64) : !flow.dispatch.tensor<readonly:tensor<2x1x2xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(32) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x6x2xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x6x1xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32) offset(%c64) : !flow.dispatch.tensor<readonly:tensor<2x1x2xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(32) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x6x2xf32>>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -2312,26 +2222,24 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @no_op_subview() {
   %c0 = arith.constant 0 : index
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(32) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(32) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
   %4 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [%0, %1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1} -> tensor<?x?xf32>
   %5 = tensor.extract_slice %4[0, 0] [%0, %1] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
   flow.dispatch.tensor.store %5, %3, offsets = [0, 0], sizes = [%0, %1], strides = [1, 1] : tensor<?x?xf32> -> !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
   return
 }
 // CHECK-LABEL: func.func @no_op_subview()
-//   CHECK-DAG:   %[[SRC:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[DEST:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//   CHECK-DAG:   %[[SRC:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[DEST:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //   CHECK-DAG:   %[[SRC_DUP:.+]] = memref.subview %[[SRC]]
 //       CHECK:   linalg.generic
 //  CHECK-SAME:       ins(%[[SRC_DUP]] :
@@ -2339,17 +2247,15 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @rank_reducing_no_op_subview() {
   %c0 = arith.constant 0 : index
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x?xf32>>{%0}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(32) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<?xf32>>{%0}
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x?xf32>>{%0}
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(32) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<?xf32>>{%0}
   %3 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [1, %0], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1x?xf32>>{%0} -> tensor<1x?xf32>
   %4 = tensor.extract_slice %3[0, 0] [1, %0] [1, 1] : tensor<1x?xf32> to tensor<?xf32>
   flow.dispatch.tensor.store %4, %2, offsets = [0], sizes = [%0], strides = [1] : tensor<?xf32> -> !flow.dispatch.tensor<writeonly:tensor<?xf32>>{%0}
@@ -2357,8 +2263,8 @@
 }
 
 // CHECK-LABEL: func.func @rank_reducing_no_op_subview()
-//   CHECK-DAG:   %[[SRC:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[DEST:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//   CHECK-DAG:   %[[SRC:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[DEST:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //       CHECK:   %[[SUBVIEW:.+]] = memref.subview %[[SRC]][0, 0] [1, %{{.+}}]
 //       CHECK:   linalg.generic
 //  CHECK-SAME:       ins(%[[SUBVIEW]] :
@@ -2382,18 +2288,16 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @scan_1d_dim0_inclusive_sum() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<6xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<f32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<6xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<6xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<f32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<6xf32>>
   %3 = flow.dispatch.tensor.load %2, offsets = [0], sizes = [6], strides = [1] : !flow.dispatch.tensor<writeonly:tensor<6xf32>> -> tensor<6xf32>
   %4 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [6], strides = [1] : !flow.dispatch.tensor<readonly:tensor<6xf32>> -> tensor<6xf32>
   %5 = flow.dispatch.tensor.load %1, offsets = [], sizes = [], strides = [] : !flow.dispatch.tensor<readwrite:tensor<f32>> -> tensor<f32>
@@ -2414,14 +2318,12 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @sort1D() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<4xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<4xi32>>
   %1 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [4], strides = [1] : !flow.dispatch.tensor<readwrite:tensor<4xi32>> -> tensor<4xi32>
   %2 = iree_linalg_ext.sort dimension(0) outs(%1 : tensor<4xi32>) {
   ^bb0(%arg0: i32, %arg1: i32):
@@ -2432,25 +2334,23 @@
   return
 }
 // CHECK-LABEL: func.func @sort1D
-// CHECK:        %[[BUF:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64) offset(%c0) : memref<4xi32, #hal.descriptor_type<storage_buffer>>
+// CHECK:        %[[BUF:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64) offset(%c0) : memref<4xi32, #hal.descriptor_type<storage_buffer>>
 // CHECK:        iree_linalg_ext.sort
 // CHECK-SAME:     outs(%[[BUF]] : memref<4xi32{{.+}}>)
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @scatter_update_scalar_1D() {
   %c4 = arith.constant 4 : index
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4xi32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4x1xi32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<8xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4xi32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4x1xi32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<8xi32>>
   %3 = flow.dispatch.tensor.load %2, offsets = [0], sizes = [8], strides = [1] : !flow.dispatch.tensor<readwrite:tensor<8xi32>> -> tensor<8xi32>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
@@ -2468,9 +2368,9 @@
   return
 }
 // CHECK:      func.func @scatter_update_scalar_1D
-// CHECK-DAG:    %[[UPDATE:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64) offset(%c0) : memref<4xi32, #hal.descriptor_type<storage_buffer>>
-// CHECK-DAG:    %[[INDICES:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) alignment(64) offset(%c0) : memref<4x1xi32, #hal.descriptor_type<storage_buffer>>
-// CHECK-DAG:    %[[ORIGINAL:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) alignment(64) offset(%c0) : memref<8xi32, #hal.descriptor_type<storage_buffer>>
+// CHECK-DAG:    %[[UPDATE:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64) offset(%c0) : memref<4xi32, #hal.descriptor_type<storage_buffer>>
+// CHECK-DAG:    %[[INDICES:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) alignment(64) offset(%c0) : memref<4x1xi32, #hal.descriptor_type<storage_buffer>>
+// CHECK-DAG:    %[[ORIGINAL:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2) alignment(64) offset(%c0) : memref<8xi32, #hal.descriptor_type<storage_buffer>>
 // CHECK:        scf.for %[[I:.+]] = %{{.+}} to %{{.+}} step %{{.+}}
 // CHECK:          iree_linalg_ext.scatter
 // CHECK-SAME:     ins(%[[UPDATE]], %[[INDICES]]
@@ -2478,17 +2378,15 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @topk() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<200x8xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<200x8xf32>>
   %input_values = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [200, 8], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<200x8xf32>> -> tensor<200x8xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<200x8xi32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<200x8xi32>>
   %input_indices = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [200, 8], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<200x8xi32>> -> tensor<200x8xi32>
   %out_values = bufferization.alloc_tensor() : tensor<200x3xf32>
   %out_indices = bufferization.alloc_tensor() : tensor<200x3xi32>
@@ -2503,8 +2401,8 @@
   return
 }
 // CHECK:      func.func @topk
-// CHECK-DAG:    %[[INPUT_VALUES:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<200x8xf32, #hal.descriptor_type<storage_buffer>>
-// CHECK-DAG:    %[[INPUT_INDICES:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) : memref<200x8xi32, #hal.descriptor_type<storage_buffer>>
+// CHECK-DAG:    %[[INPUT_VALUES:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<200x8xf32, #hal.descriptor_type<storage_buffer>>
+// CHECK-DAG:    %[[INPUT_INDICES:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) : memref<200x8xi32, #hal.descriptor_type<storage_buffer>>
 // CHECK-DAG:    %[[OUTPUT_VALUES:.+]] = memref.alloc() : memref<200x3xf32>
 // CHECK-DAG:    %[[OUTPUT_INDICES:.+]] = memref.alloc() : memref<200x3xi32>
 // CHECK:        iree_linalg_ext.topk
@@ -2513,17 +2411,15 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @iree_linalg_ext_pack() {
   %c0 = arith.constant 0 : index
   %c0_i32 = arith.constant 0 : i32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4x4xi32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x2x3x3xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4x4xi32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x2x3x3xi32>>
   %2 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [2, 2, 3, 3], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<writeonly:tensor<2x2x3x3xi32>> -> tensor<2x2x3x3xi32>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [4, 4], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4x4xi32>> -> tensor<4x4xi32>
   %4 = iree_linalg_ext.pack %3 padding_value(%c0_i32 : i32) inner_dims_pos = [0, 1] inner_tiles = [3, 3] into %2 : (tensor<4x4xi32> tensor<2x2x3x3xi32>) -> tensor<2x2x3x3xi32>
@@ -2532,24 +2428,22 @@
 }
 // CHECK: func.func @iree_linalg_ext_pack
 // CHECK-DAG:  %[[PAD:.+]] = arith.constant 0 : i32
-// CHECK-DAG:  %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64) offset(%c0) : memref<4x4xi32, #hal.descriptor_type<storage_buffer>>
-// CHECK-DAG:  %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) alignment(64) offset(%c0) : memref<2x2x3x3xi32, #hal.descriptor_type<storage_buffer>>
+// CHECK-DAG:  %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64) offset(%c0) : memref<4x4xi32, #hal.descriptor_type<storage_buffer>>
+// CHECK-DAG:  %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) alignment(64) offset(%c0) : memref<2x2x3x3xi32, #hal.descriptor_type<storage_buffer>>
 // CHECK:      iree_linalg_ext.pack %[[IN]]
 // CHECK-SAME:   padding_value(%[[PAD]] : i32)
 // CHECK-SAME:   inner_dims_pos = [0, 1] inner_tiles = [3, 3] into %[[OUT]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @iree_linalg_ext_unpack() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x2x2x2xi32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4x4xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x2x2x2xi32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4x4xi32>>
   %2 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [4, 4], strides = [1, 1] : !flow.dispatch.tensor<writeonly:tensor<4x4xi32>> -> tensor<4x4xi32>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [2, 2, 2, 2], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2x2x2x2xi32>> -> tensor<2x2x2x2xi32>
   %4 = iree_linalg_ext.unpack %3 inner_dims_pos = [0, 1] inner_tiles = [2, 2] into %2 : (tensor<2x2x2x2xi32> tensor<4x4xi32>) -> tensor<4x4xi32>
@@ -2557,24 +2451,22 @@
   return
 }
 // CHECK: func.func @iree_linalg_ext_unpack
-// CHECK-DAG:  %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64) offset(%c0) : memref<2x2x2x2xi32, #hal.descriptor_type<storage_buffer>>
-// CHECK-DAG:  %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) alignment(64) offset(%c0) : memref<4x4xi32, #hal.descriptor_type<storage_buffer>>
+// CHECK-DAG:  %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64) offset(%c0) : memref<2x2x2x2xi32, #hal.descriptor_type<storage_buffer>>
+// CHECK-DAG:  %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) alignment(64) offset(%c0) : memref<4x4xi32, #hal.descriptor_type<storage_buffer>>
 // CHECK:      iree_linalg_ext.unpack %[[IN]]
 // CHECK-SAME:   inner_dims_pos = [0, 1] inner_tiles = [2, 2] into %[[OUT]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @iree_linalg_ext_unpack_fully_dynamic() {
   %c0 = arith.constant 0 : index
   %inner_d0 = util.unfoldable_constant 2 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x2x2x2xi32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4x4xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x2x2x2xi32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4x4xi32>>
   %2 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [4, 4], strides = [1, 1] : !flow.dispatch.tensor<writeonly:tensor<4x4xi32>> -> tensor<4x4xi32>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [2, 2, %inner_d0, %inner_d0], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2x2x2x2xi32>> -> tensor<2x2x?x?xi32>
   %4 = iree_linalg_ext.unpack %3 inner_dims_pos = [0, 1] inner_tiles = [%inner_d0, %inner_d0] into %2 : (tensor<2x2x?x?xi32> tensor<4x4xi32>) -> tensor<4x4xi32>
@@ -2589,17 +2481,15 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @tensor_pack() {
   %c0 = arith.constant 0 : index
   %c0_i32 = arith.constant 0 : i32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4x4xi32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x2x3x3xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4x4xi32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x2x3x3xi32>>
   %2 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [2, 2, 3, 3], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<writeonly:tensor<2x2x3x3xi32>> -> tensor<2x2x3x3xi32>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [4, 4], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4x4xi32>> -> tensor<4x4xi32>
   %4 = tensor.pack %3 padding_value(%c0_i32 : i32) inner_dims_pos = [0, 1] inner_tiles = [3, 3] into %2 : tensor<4x4xi32> -> tensor<2x2x3x3xi32>
@@ -2608,24 +2498,22 @@
 }
 // CHECK: func.func @tensor_pack
 // CHECK-DAG:  %[[PAD:.+]] = arith.constant 0 : i32
-// CHECK-DAG:  %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64) offset(%c0) : memref<4x4xi32, #hal.descriptor_type<storage_buffer>>
-// CHECK-DAG:  %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) alignment(64) offset(%c0) : memref<2x2x3x3xi32, #hal.descriptor_type<storage_buffer>>
+// CHECK-DAG:  %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64) offset(%c0) : memref<4x4xi32, #hal.descriptor_type<storage_buffer>>
+// CHECK-DAG:  %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) alignment(64) offset(%c0) : memref<2x2x3x3xi32, #hal.descriptor_type<storage_buffer>>
 // CHECK:      iree_linalg_ext.pack %[[IN]]
 // CHECK-SAME:   padding_value(%[[PAD]] : i32)
 // CHECK-SAME:   inner_dims_pos = [0, 1] inner_tiles = [3, 3] into %[[OUT]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @tensor_unpack() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x2x2x2xi32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4x4xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x2x2x2xi32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4x4xi32>>
   %2 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [4, 4], strides = [1, 1] : !flow.dispatch.tensor<writeonly:tensor<4x4xi32>> -> tensor<4x4xi32>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [2, 2, 2, 2], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2x2x2x2xi32>> -> tensor<2x2x2x2xi32>
   %4 = tensor.unpack %3 inner_dims_pos = [0, 1] inner_tiles = [2, 2] into %2 : tensor<2x2x2x2xi32> -> tensor<4x4xi32>
@@ -2633,24 +2521,22 @@
   return
 }
 // CHECK: func.func @tensor_unpack
-// CHECK-DAG:  %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64) offset(%c0) : memref<2x2x2x2xi32, #hal.descriptor_type<storage_buffer>>
-// CHECK-DAG:  %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) alignment(64) offset(%c0) : memref<4x4xi32, #hal.descriptor_type<storage_buffer>>
+// CHECK-DAG:  %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64) offset(%c0) : memref<2x2x2x2xi32, #hal.descriptor_type<storage_buffer>>
+// CHECK-DAG:  %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) alignment(64) offset(%c0) : memref<4x4xi32, #hal.descriptor_type<storage_buffer>>
 // CHECK:      iree_linalg_ext.unpack %[[IN]]
 // CHECK-SAME:   inner_dims_pos = [0, 1] inner_tiles = [2, 2] into %[[OUT]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @tensor_unpack_fully_dynamic() {
   %c0 = arith.constant 0 : index
   %inner_d0 = util.unfoldable_constant 2 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x2x2x2xi32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4x4xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x2x2x2xi32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4x4xi32>>
   %2 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [4, 4], strides = [1, 1] : !flow.dispatch.tensor<writeonly:tensor<4x4xi32>> -> tensor<4x4xi32>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [2, 2, %inner_d0, %inner_d0], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2x2x2x2xi32>> -> tensor<2x2x?x?xi32>
   %4 = tensor.unpack %3 inner_dims_pos = [0, 1] inner_tiles = [%inner_d0, %inner_d0] into %2 : tensor<2x2x?x?xi32> -> tensor<4x4xi32>
@@ -2665,20 +2551,18 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @reduction_ew() {
   %c5120 = arith.constant 5120 : index
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
   %cst_0 = arith.constant 1.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c5120) : !flow.dispatch.tensor<readonly:tensor<1001xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c5120) : !flow.dispatch.tensor<readonly:tensor<1x1001xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x1001xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c5120) : !flow.dispatch.tensor<readonly:tensor<1001xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c5120) : !flow.dispatch.tensor<readonly:tensor<1x1001xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x1001xf32>>
   %3 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [1, 1001], strides = [1, 1] : !flow.dispatch.tensor<writeonly:tensor<1x1001xf32>> -> tensor<1x1001xf32>
   %4 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [1001], strides = [1] : !flow.dispatch.tensor<readonly:tensor<1001xf32>> -> tensor<1001xf32>
   %5 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [1, 1001], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1x1001xf32>> -> tensor<1x1001xf32>
@@ -2700,29 +2584,27 @@
 }
 
 // CHECK: func.func @reduction_ew
-// CHECK: hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64) offset(%c5120) : memref<1001xf32, strided<[1], offset: 1280>, #hal.descriptor_type<storage_buffer>>
-// CHECK: hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64) offset(%c5120) : memref<1x1001xf32, strided<[1001, 1], offset: 1280>, #hal.descriptor_type<storage_buffer>>
-// CHECK: hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) alignment(64) offset(%c0) : memref<1x1001xf32, #hal.descriptor_type<storage_buffer>>
+// CHECK: hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64) offset(%c5120) : memref<1001xf32, strided<[1], offset: 1280>, #hal.descriptor_type<storage_buffer>>
+// CHECK: hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64) offset(%c5120) : memref<1x1001xf32, strided<[1001, 1], offset: 1280>, #hal.descriptor_type<storage_buffer>>
+// CHECK: hal.interface.binding.subspan layout({{.+}}) binding(1) alignment(64) offset(%c0) : memref<1x1001xf32, #hal.descriptor_type<storage_buffer>>
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, uniform_buffer>,
-    #hal.descriptor_set.binding<1, uniform_buffer>,
-    #hal.descriptor_set.binding<2, uniform_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<uniform_buffer>,
+  #hal.pipeline.binding<uniform_buffer>,
+  #hal.pipeline.binding<uniform_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @uniform_storage_buffer() {
   %c0 = arith.constant 0 : index
   %m = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %n = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %k = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %lhs = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%m, %k}
-  %rhs = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%k, %n}
-  %init = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%m, %n}
-  %result = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%m, %n}
+  %lhs = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%m, %k}
+  %rhs = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%k, %n}
+  %init = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%m, %n}
+  %result = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%m, %n}
   %wg_id_y = hal.interface.workgroup.id[1] : index
   %wg_count_y = hal.interface.workgroup.count[1] : index
   %wg_size_y = hal.interface.workgroup.size[1] : index
@@ -2748,30 +2630,28 @@
 }
 
 // CHECK-LABEL: func.func @uniform_storage_buffer()
-//       CHECK: hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<?x?xf32, #hal.descriptor_type<uniform_buffer>>
-//       CHECK: hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) : memref<?x?xf32, #hal.descriptor_type<uniform_buffer>>
-//       CHECK: hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) : memref<?x?xf32, #hal.descriptor_type<uniform_buffer>>
-//       CHECK: hal.interface.binding.subspan layout({{.+}}) set(0) binding(3) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>
+//       CHECK: hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<?x?xf32, #hal.descriptor_type<uniform_buffer>>
+//       CHECK: hal.interface.binding.subspan layout({{.+}}) binding(1) : memref<?x?xf32, #hal.descriptor_type<uniform_buffer>>
+//       CHECK: hal.interface.binding.subspan layout({{.+}}) binding(2) : memref<?x?xf32, #hal.descriptor_type<uniform_buffer>>
+//       CHECK: hal.interface.binding.subspan layout({{.+}}) binding(3) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 4, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, uniform_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, uniform_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 4, bindings = [
+  #hal.pipeline.binding<uniform_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<uniform_buffer>
 ]>
 func.func @micro_kernel_op() {
   %d0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %d1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %s0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : f32
   %s1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(3) : i64
-  %arg0_binding = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%d0, %d1}
-  %arg1_binding = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%d0, %d1}
-  %arg2_binding = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%d0, %d1}
-  %arg3_binding = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%d0, %d1}
+  %arg0_binding = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%d0, %d1}
+  %arg1_binding = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%d0, %d1}
+  %arg2_binding = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%d0, %d1}
+  %arg3_binding = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%d0, %d1}
   %arg0 = flow.dispatch.tensor.load %arg0_binding, offsets = [0, 0], sizes = [%d0, %d1], strides = [1, 1]
       : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%d0, %d1} -> tensor<?x?xf32>
   %arg1 = flow.dispatch.tensor.load %arg1_binding, offsets = [0, 0], sizes = [%d0, %d1], strides = [1, 1]
@@ -2792,10 +2672,10 @@
 // CHECK-LABEL: func @micro_kernel_op()
 //   CHECK-DAG:   %[[S0:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2)
 //   CHECK-DAG:   %[[S1:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(3)
-//   CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<?x?xf32, #hal.descriptor_type<uniform_buffer>>
-//   CHECK-DAG:   %[[ARG1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>
-//   CHECK-DAG:   %[[ARG2:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>
-//   CHECK-DAG:   %[[ARG3:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(3) : memref<?x?xf32, #hal.descriptor_type<uniform_buffer>>
+//   CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<?x?xf32, #hal.descriptor_type<uniform_buffer>>
+//   CHECK-DAG:   %[[ARG1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>
+//   CHECK-DAG:   %[[ARG2:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2) : memref<?x?xf32, #hal.descriptor_type<storage_buffer>>
+//   CHECK-DAG:   %[[ARG3:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(3) : memref<?x?xf32, #hal.descriptor_type<uniform_buffer>>
 //       CHECK:   iree_codegen.ukernel.generic "foo"
 //  CHECK-SAME:       ins(%[[ARG0]] :
 //  CHECK-SAME:       outs(%[[ARG1]], %[[ARG2]] :
@@ -2804,17 +2684,15 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @sub_byte_bufferize_with_offset() {
   %c64 = arith.constant 64 : index
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c64) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64xi4>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c64) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64xi4>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64xf32>>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %2 = affine.apply affine_map<()[s0] -> (s0 * 64)>()[%workgroup_id_x]
   %3 = flow.dispatch.tensor.load %1, offsets = [%2], sizes = [64], strides = [1] : !flow.dispatch.tensor<writeonly:tensor<64xf32>> -> tensor<64xf32>
@@ -2830,7 +2708,7 @@
 }
 // CHECK-LABEL: func.func @sub_byte_bufferize_with_offset()
 //       CHECK:   %[[C64:.+]] = arith.constant 64 : index
-//       CHECK:   hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//       CHECK:   hal.interface.binding.subspan layout({{.+}}) binding(0)
 //  CHECK-SAME:       memref<64xi4, strided<[1], offset: 128>
 
 // -----
diff --git a/compiler/src/iree/compiler/Codegen/Common/test/iree_expand_strided_metadata.mlir b/compiler/src/iree/compiler/Codegen/Common/test/iree_expand_strided_metadata.mlir
index ae71018..5b47372 100644
--- a/compiler/src/iree/compiler/Codegen/Common/test/iree_expand_strided_metadata.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/test/iree_expand_strided_metadata.mlir
@@ -64,14 +64,12 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @resolve_binding_subspan_zero_offset_memref() -> (memref<f32>, index, index, index, index, index) {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<512x384xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<512x384xf32>
   %base_buffer, %offset, %sizes:2, %strides:2 = memref.extract_strided_metadata %0 : memref<512x384xf32> -> memref<f32>, index, index, index, index, index
   return %base_buffer, %offset, %sizes#0, %sizes#1, %strides#0, %strides#1 : memref<f32>, index, index, index, index, index
 }
@@ -80,19 +78,17 @@
 // CHECK-DAG:   %[[C384:.+]] = arith.constant 384 : index
 // CHECK-DAG:   %[[C1:.+]] = arith.constant 1 : index
 // CHECK-DAG:   %[[C0:.+]] = arith.constant 0 : index
-//     CHECK:   %[[BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64) offset(%[[C0]]) : memref<196608xf32>
+//     CHECK:   %[[BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64) offset(%[[C0]]) : memref<196608xf32>
 //     CHECK:   %[[BASE_PTR:.+]] = memref.reinterpret_cast %[[BINDING]] to offset: [0], sizes: [], strides: []
 //     CHECK:   return %[[BASE_PTR]], %[[C0]], %[[C512]], %[[C384]], %[[C384]], %[[C1]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @resolve_binding_subspan_offset_index_memref(%arg0 : index) -> (memref<index>, index, index, index, index, index) {
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%arg0) : memref<512x384xindex, strided<[384, 1], offset:?>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%arg0) : memref<512x384xindex, strided<[384, 1], offset:?>>
   %base_buffer, %offset, %sizes:2, %strides:2 = memref.extract_strided_metadata %0 : memref<512x384xindex, strided<[384, 1], offset:?>> -> memref<index>, index, index, index, index, index
   return %base_buffer, %offset, %sizes#0, %sizes#1, %strides#0, %strides#1 : memref<index>, index, index, index, index, index
 }
@@ -106,20 +102,18 @@
 //     CHECK:   %[[SIZEOF:.+]] = util.sizeof index
 //     CHECK:   %[[OFFSET:.+]] = affine.apply #[[MAP0]]()[%arg0, %[[SIZEOF]]]
 //     CHECK:   %[[SUBSPAN_SIZE:.+]] = affine.apply #[[MAP1]]()[%arg0, %[[SIZEOF]]]
-//     CHECK:   %[[BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64) offset(%[[C0]]) : memref<?xindex>{%[[SUBSPAN_SIZE]]}
+//     CHECK:   %[[BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64) offset(%[[C0]]) : memref<?xindex>{%[[SUBSPAN_SIZE]]}
 //     CHECK:   %[[BASE_PTR:.+]] = memref.reinterpret_cast %[[BINDING]] to offset: [0], sizes: [], strides: []
 //     CHECK:   return %[[BASE_PTR]], %[[OFFSET]], %[[C512]], %[[C384]], %[[C384]], %[[C1]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @resolve_binding_subspan_dyn_dims_memref(%arg0 : index, %arg1 : index) -> (memref<index>, index, index, index, index, index) {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<?x?xindex>{%arg0, %arg1}
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<?x?xindex>{%arg0, %arg1}
   %base_buffer, %offset, %sizes:2, %strides:2 = memref.extract_strided_metadata %0 : memref<?x?xindex> -> memref<index>, index, index, index, index, index
   return %base_buffer, %offset, %sizes#0, %sizes#1, %strides#0, %strides#1 : memref<index>, index, index, index, index, index
 }
@@ -128,7 +122,7 @@
 // CHECK-DAG:   %[[C0:.+]] = arith.constant 0 : index
 // CHECK-DAG:   %[[C1:.+]] = arith.constant 1 : index
 // CHECK-DAG:   %[[SIZE:.+]] = affine.apply #[[MAP]]()[%arg0, %arg1]
-//     CHECK:   %[[BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64) offset(%[[C0]]) : memref<?xindex>{%[[SIZE]]}
+//     CHECK:   %[[BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64) offset(%[[C0]]) : memref<?xindex>{%[[SIZE]]}
 //     CHECK:   %[[BASE_PTR:.+]] = memref.reinterpret_cast %[[BINDING]] to offset: [0], sizes: [], strides: []
 //     CHECK:   return %[[BASE_PTR]], %[[C0]], %arg0, %arg1, %arg1, %[[C1]]
 
@@ -186,15 +180,13 @@
 
 // Tests for the part of the pass that converts iree_codegen to memref.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @external_func_entry_point() -> (memref<bf16>, index) {
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : i32
   %1 = arith.index_castui %0 : i32 to index
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%1) flags(ReadOnly) : memref<1x8x768xbf16, strided<[6144, 768, 1], offset: ?>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%1) flags(ReadOnly) : memref<1x8x768xbf16, strided<[6144, 768, 1], offset: ?>>
   %base_buffer, %offset, %sizes:3, %strides:3 = iree_codegen.extract_strided_metadata %2 : memref<1x8x768xbf16, strided<[6144, 768, 1], offset: ?>> -> memref<bf16>, index, index, index, index, index, index, index
   return %base_buffer, %offset : memref<bf16>, index
 }
diff --git a/compiler/src/iree/compiler/Codegen/Common/test/materialize_encoding_into_nop.mlir b/compiler/src/iree/compiler/Codegen/Common/test/materialize_encoding_into_nop.mlir
index 83c7dc7..3ca8c27 100644
--- a/compiler/src/iree/compiler/Codegen/Common/test/materialize_encoding_into_nop.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/test/materialize_encoding_into_nop.mlir
@@ -175,35 +175,31 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #encoding_lhs = #iree_encoding.encoding<operand_index = 0, op_type = matmul, element_types = [f32, f32, f32], original_type = tensor<1x1xf32>, matmul_narrow_M = 1 : index, matmul_narrow_N = 1 : index, user_indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>]>
 func.func @drop_encoding_for_hal_flow_ops_static() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x1xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x1xf32, #encoding_lhs>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x1xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x1xf32, #encoding_lhs>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1, 1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1x1xf32>> -> tensor<1x1xf32>
   %3 = iree_encoding.set_encoding %2 : tensor<1x1xf32> -> tensor<1x1xf32, #encoding_lhs>
   flow.dispatch.tensor.store %3, %1, offsets = [0, 0], sizes = [1, 1], strides = [1, 1] : tensor<1x1xf32, #encoding_lhs> -> !flow.dispatch.tensor<writeonly:tensor<1x1xf32, #encoding_lhs>>
   return
 }
 // CHECK-LABEL: func.func @drop_encoding_for_hal_flow_ops_static
-// CHECK-DAG:     %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) {{.+}} : !flow.dispatch.tensor<readonly:tensor<1x1xf32>>
-// CHECK-DAG:     %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) {{.+}} : !flow.dispatch.tensor<writeonly:tensor<1x1xf32>>
+// CHECK-DAG:     %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) {{.+}} : !flow.dispatch.tensor<readonly:tensor<1x1xf32>>
+// CHECK-DAG:     %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) {{.+}} : !flow.dispatch.tensor<writeonly:tensor<1x1xf32>>
 // CHECK:         %[[LOAD:.+]] = flow.dispatch.tensor.load %[[IN]]
 // CHECK:         flow.dispatch.tensor.store %[[LOAD]], %[[OUT]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 4, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 4, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #encoding_lhs = #iree_encoding.encoding<operand_index = 0, op_type = matmul, element_types = [bf16, bf16, bf16], original_type = tensor<?x?xbf16>, user_indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>]>
 func.func @drop_encoding_for_hal_flow_ops_dynamic() {
@@ -225,15 +221,15 @@
   %13 = arith.index_castui %12 : i64 to index
   %14 = flow.dispatch.workload.ordinal %8, 0 : index
   %15 = flow.dispatch.workload.ordinal %13, 1 : index
-  %16 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x?xbf16>>{%14, %15}
-  %17 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<?x?xbf16, #encoding_lhs>>{%14, %15}
+  %16 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x?xbf16>>{%14, %15}
+  %17 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<?x?xbf16, #encoding_lhs>>{%14, %15}
   %18 = flow.dispatch.tensor.load %16, offsets = [0, 0], sizes = [%14, %15], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xbf16>>{%14, %15} -> tensor<?x?xbf16>
   %19 = iree_encoding.set_encoding %18 : tensor<?x?xbf16> -> tensor<?x?xbf16, #encoding_lhs>
   flow.dispatch.tensor.store %19, %17, offsets = [0, 0], sizes = [%14, %15], strides = [1, 1] : tensor<?x?xbf16, #encoding_lhs> -> !flow.dispatch.tensor<writeonly:tensor<?x?xbf16, #encoding_lhs>>{%14, %15}
   return
 }
 // CHECK-LABEL: func.func @drop_encoding_for_hal_flow_ops_dynamic
-// CHECK-DAG:     %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) {{.+}} : !flow.dispatch.tensor<readonly:tensor<?x?xbf16>>
-// CHECK-DAG:     %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) {{.+}} : !flow.dispatch.tensor<writeonly:tensor<?x?xbf16>>
+// CHECK-DAG:     %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) {{.+}} : !flow.dispatch.tensor<readonly:tensor<?x?xbf16>>
+// CHECK-DAG:     %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) {{.+}} : !flow.dispatch.tensor<writeonly:tensor<?x?xbf16>>
 // CHECK:         %[[LOAD:.+]] = flow.dispatch.tensor.load %[[IN]]
 // CHECK:         flow.dispatch.tensor.store %[[LOAD]], %[[OUT]]
diff --git a/compiler/src/iree/compiler/Codegen/Common/test/materialize_user_configs.mlir b/compiler/src/iree/compiler/Codegen/Common/test/materialize_user_configs.mlir
index 8e80747..6bc67b8 100644
--- a/compiler/src/iree/compiler/Codegen/Common/test/materialize_user_configs.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/test/materialize_user_configs.mlir
@@ -3,20 +3,18 @@
 #config = #iree_codegen.lowering_config<tile_sizes = [[64, 64, 0], [32, 32, 0], [0, 0, 32], [0, 0, 0]]>
 #executable_target_system_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "system-elf-x86_64", {target_triple = "x86_64-xyz-xyz"}>
 #translation = #iree_codegen.translation_info<CPUDoubleTilingExpert>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #compilation = #iree_codegen.compilation_info<lowering_config = #config, translation_info = #translation>
 module {
   func.func @preset_config() attributes {hal.executable.target = #executable_target_system_elf_x86_64_} {
     %cst = arith.constant 0.000000e+00 : f32
-    %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<128x256xf32>>
-    %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<256x512xf32>>
-    %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<128x512xf32>>
+    %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<128x256xf32>>
+    %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<256x512xf32>>
+    %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<128x512xf32>>
     %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [128, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<128x256xf32>> -> tensor<128x256xf32>
     %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [256, 512], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x512xf32>> -> tensor<256x512xf32>
     %5 = tensor.empty() : tensor<128x512xf32>
diff --git a/compiler/src/iree/compiler/Codegen/Common/test/optimize_tensor_insert_extract_slices.mlir b/compiler/src/iree/compiler/Codegen/Common/test/optimize_tensor_insert_extract_slices.mlir
index 1255453..5b072cb 100644
--- a/compiler/src/iree/compiler/Codegen/Common/test/optimize_tensor_insert_extract_slices.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/test/optimize_tensor_insert_extract_slices.mlir
@@ -172,12 +172,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 #map = affine_map<()[s0] -> (s0 * 64)>
@@ -192,7 +190,7 @@
   %c1 = arith.constant 1 : index
   %cst_0 = arith.constant 0.000000e+00 : f16
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64x968x1280xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64x968x1280xf16>>
   %workgroup_id_z = hal.interface.workgroup.id[2] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
   %1 = affine.apply #map()[%workgroup_id_y]
@@ -229,19 +227,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @_batch_matmul_narrow_n_2_dispatch_4_unpack_i32() attributes {translation_info = #iree_codegen.translation_info<CPUDataTiling>} {
   %c0_i32 = arith.constant 0 : i32
   %c2 = arith.constant 2 : index
   %c128 = arith.constant 128 : index
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c128) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x1x1x2x8xi32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x3x2xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c128) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x1x1x2x8xi32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x3x2xi32>>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   scf.for %arg0 = %workgroup_id_x to %c2 step %workgroup_count_x {
diff --git a/compiler/src/iree/compiler/Codegen/Common/test/reconcile_translation_info.mlir b/compiler/src/iree/compiler/Codegen/Common/test/reconcile_translation_info.mlir
index 50a28e3..18c1677 100644
--- a/compiler/src/iree/compiler/Codegen/Common/test/reconcile_translation_info.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/test/reconcile_translation_info.mlir
@@ -1,6 +1,8 @@
 // RUN: iree-opt --split-input-file --pass-pipeline="builtin.module(hal.executable(hal.executable.variant(iree-codegen-reconcile-translation-info)))" %s --verify-diagnostics | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [#hal.descriptor_set.layout<0, bindings = [#hal.descriptor_set.binding<0, storage_buffer>]>]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
+]>
 hal.executable private @err_multiple_entry_point {
   // expected-error @+1 {{reconciliation for multiple export ops unsupported}}
   hal.executable.variant public @reconcile_workgroup_size target(#hal.executable.target<"", "", {}>) {
@@ -11,7 +13,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [#hal.descriptor_set.layout<0, bindings = [#hal.descriptor_set.binding<0, storage_buffer>]>]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
+]>
 hal.executable private @reconcile_workgroup_size {
   hal.executable.variant public @reconcile_workgroup_size target(#hal.executable.target<"", "", {}>) {
     hal.executable.export public @entry_point layout(#pipeline_layout)
@@ -31,7 +35,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [#hal.descriptor_set.layout<0, bindings = [#hal.descriptor_set.binding<0, storage_buffer>]>]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
+]>
 hal.executable private @single_translation_info {
   hal.executable.variant public @single_translation_info target(#hal.executable.target<"", "", {}>) {
     hal.executable.export public @entry_point layout(#pipeline_layout)
@@ -51,7 +57,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [#hal.descriptor_set.layout<0, bindings = [#hal.descriptor_set.binding<0, storage_buffer>]>]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
+]>
 hal.executable private @err_mistmatched_workgroup_size {
   hal.executable.variant public @err_mismatched_workgroup_size target(#hal.executable.target<"", "", {}>) {
     // expected-error @+1 {{failed to reconcile workgroup sizes}}
@@ -69,7 +77,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [#hal.descriptor_set.layout<0, bindings = [#hal.descriptor_set.binding<0, storage_buffer>]>]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
+]>
 hal.executable private @err_mistmatched_workgroup_size2 {
   hal.executable.variant public @err_mismatched_workgroup_size2 target(#hal.executable.target<"", "", {}>) {
     // expected-error @+1 {{failed to reconcile workgroup sizes}}
@@ -87,7 +97,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [#hal.descriptor_set.layout<0, bindings = [#hal.descriptor_set.binding<0, storage_buffer>]>]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
+]>
 hal.executable private @reconcile_subgroup_size {
   hal.executable.variant public @reconcile_subgroup_size target(#hal.executable.target<"", "", {}>) {
     hal.executable.export public @entry_point layout(#pipeline_layout)
@@ -107,7 +119,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [#hal.descriptor_set.layout<0, bindings = [#hal.descriptor_set.binding<0, storage_buffer>]>]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
+]>
 hal.executable private @err_reconcile_subgroup_size {
   hal.executable.variant public @err_reconcile_subgroup_size target(#hal.executable.target<"", "", {}>) {
     hal.executable.export public @entry_point layout(#pipeline_layout)
@@ -127,7 +141,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [#hal.descriptor_set.layout<0, bindings = [#hal.descriptor_set.binding<0, storage_buffer>]>]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
+]>
 hal.executable private @llvm_func_attrs {
   hal.executable.variant public @llvm_func_attrs target(#hal.executable.target<"", "", {}>) {
     hal.executable.export public @entry_point layout(#pipeline_layout)
diff --git a/compiler/src/iree/compiler/Codegen/Common/test/remove_dead_allocs.mlir b/compiler/src/iree/compiler/Codegen/Common/test/remove_dead_allocs.mlir
index 6607225..0ec9ceb 100644
--- a/compiler/src/iree/compiler/Codegen/Common/test/remove_dead_allocs.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/test/remove_dead_allocs.mlir
@@ -19,13 +19,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @cleanup_only_assume_alignment_uses() {
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<42xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<42xf32>
   memref.assume_alignment %0, 64 : memref<42xf32>
   return
 }
diff --git a/compiler/src/iree/compiler/Codegen/Common/test/remove_trivial_loops.mlir b/compiler/src/iree/compiler/Codegen/Common/test/remove_trivial_loops.mlir
index 4fa8f0a..dc7a356 100644
--- a/compiler/src/iree/compiler/Codegen/Common/test/remove_trivial_loops.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/test/remove_trivial_loops.mlir
@@ -1,10 +1,8 @@
 // RUN: iree-opt --split-input-file --pass-pipeline='builtin.module(hal.executable(hal.executable.variant(builtin.module(func.func(iree-codegen-remove-single-iteration-loop)))))' %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #translation_info = #iree_codegen.translation_info<None workgroup_size = [64, 1, 1]>
 // CHECK-LABEL: func.func @dispatch_0()
@@ -48,11 +46,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK-LABEL: func.func @workgroup_tile_loop()
@@ -85,11 +81,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK-LABEL: func.func @workgroup_tile_loop_negative()
@@ -122,11 +116,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK-LABEL: func.func @both_workgroup_and_workitem()
@@ -187,7 +179,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [#hal.descriptor_set.layout<0, bindings = [#hal.descriptor_set.binding<0, storage_buffer>, #hal.descriptor_set.binding<1, storage_buffer>, #hal.descriptor_set.binding<2, storage_buffer>]>]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
+]>
 #translation = #iree_codegen.translation_info<CPUDoubleTilingExpert>
 #map0 = affine_map<()[s0] -> (s0 ceildiv 4)>
 #map1 = affine_map<()[s0] -> (s0 * 4)>
@@ -206,11 +202,11 @@
         %cst = arith.constant 0.000000e+00 : f32
         %c4 = arith.constant 4 : index
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<4xf32>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<4xf32>
         memref.assume_alignment %0, 64 : memref<4xf32>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<4xf32>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<4xf32>
         memref.assume_alignment %1, 64 : memref<4xf32>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<4xf32>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<4xf32>
         memref.assume_alignment %2, 64 : memref<4xf32>
         %workgroup_id_x = hal.interface.workgroup.id[0] : index
         %workgroup_count_x = hal.interface.workgroup.count[0] : index
diff --git a/compiler/src/iree/compiler/Codegen/Common/test/tile_and_distribute_to_workgroups.mlir b/compiler/src/iree/compiler/Codegen/Common/test/tile_and_distribute_to_workgroups.mlir
index 89ac360..408885d 100644
--- a/compiler/src/iree/compiler/Codegen/Common/test/tile_and_distribute_to_workgroups.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/test/tile_and_distribute_to_workgroups.mlir
@@ -2,13 +2,11 @@
 // RUN: iree-opt --pass-pipeline='builtin.module(hal.executable(hal.executable.variant(builtin.module(func.func(iree-codegen-tile-and-distribute-to-workgroups{max-workgroup-parallel-dims=1}, canonicalize)), cse)))' --split-input-file %s | FileCheck %s -check-prefix=CHECKW
 // RUN: iree-opt --pass-pipeline='builtin.module(hal.executable(hal.executable.variant(builtin.module(func.func(iree-codegen-tile-and-distribute-to-workgroups{distribution-method=2})), canonicalize, cse)))' --split-input-file %s | FileCheck %s -check-prefix=NO-LOOP
 #config = #iree_codegen.lowering_config<tile_sizes = [[64, 64, 0], [16, 4, 0], [0, 0, 64]]>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-arm_64", {
   data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128",
@@ -31,13 +29,13 @@
         %0 = flow.dispatch.workload.ordinal %cl_0, 0 : index
         %1 = flow.dispatch.workload.ordinal %cl_1, 1 : index
         %2 = flow.dispatch.workload.ordinal %cl_2, 2 : index
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0)
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0)
             : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
-        %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1)
+        %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1)
             : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
-        %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2)
+        %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2)
             : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
-        %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3)
+        %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3)
             : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
         %7 = flow.dispatch.tensor.load %3, offsets = [0, 0], sizes = [%0, %2], strides = [1, 1]
             : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2} -> tensor<?x?xf32>
@@ -72,10 +70,10 @@
 //  CHECK-DAG:   %[[M:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(0)
 //  CHECK-DAG:   %[[N:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(1)
 //  CHECK-DAG:   %[[K:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(2)
-//  CHECK-DAG:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//  CHECK-DAG:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//  CHECK-DAG:   %[[INIT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
-//  CHECK-DAG:   %[[OUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(3)
+//  CHECK-DAG:   %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//  CHECK-DAG:   %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//  CHECK-DAG:   %[[INIT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
+//  CHECK-DAG:   %[[OUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(3)
 //  CHECK-DAG:   %[[WG_ID_X:.+]] = hal.interface.workgroup.id[0]
 //  CHECK-DAG:   %[[WG_COUNT_X:.+]] = hal.interface.workgroup.count[0]
 //  CHECK-DAG:   %[[WG_ID_Y:.+]] = hal.interface.workgroup.id[1]
@@ -100,12 +98,10 @@
 // -----
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[64, 64], [1, 4], [0, 0]]>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {
   data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128",
@@ -128,11 +124,11 @@
         %cl_1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
         %0 = flow.dispatch.workload.ordinal %cl_0, 0 : index
         %1 = flow.dispatch.workload.ordinal %cl_1, 1 : index
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0)
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0)
             : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1)
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1)
             : !flow.dispatch.tensor<readonly:tensor<?xf32>>{%1}
-        %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2)
+        %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2)
             : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
         %5 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [%0, %1], strides = [1, 1]
             : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1} -> tensor<?x?xf32>
@@ -175,12 +171,10 @@
 // -----
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[0, 64, 64, 64], [1, 1, 1, 4], [0, 0, 0, 0]]>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 4, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 4, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {
   data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128",
@@ -205,11 +199,11 @@
         %1 = flow.dispatch.workload.ordinal %cl_1, 1 : index
         %2 = flow.dispatch.workload.ordinal %cl_2, 2 : index
         %3 = flow.dispatch.workload.ordinal %cl_3, 3  : index
-        %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32)
+        %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32)
             : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xf32>>{%0, %1, %2, %3}
-        %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(32)
+        %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(32)
             : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xf32>>{%0, %1, %2, %3}
-        %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(32)
+        %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(32)
             : !flow.dispatch.tensor<writeonly:tensor<?x?x?x?xf32>>{%0, %1, %2, %3}
         %7 = flow.dispatch.tensor.load %4, offsets = [0, 0, 0, 0], sizes = [%0, %1, %2, %3], strides = [1, 1, 1, 1]
             : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xf32>>{%0, %1, %2, %3} -> tensor<?x?x?x?xf32>
@@ -254,12 +248,10 @@
 // -----
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[2, 64, 64, 64], [1, 1, 1, 4], [0, 0, 0, 0]]>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 4, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 4, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {
   data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128",
@@ -284,11 +276,11 @@
         %1 = flow.dispatch.workload.ordinal %cl_1, 1 : index
         %2 = flow.dispatch.workload.ordinal %cl_2, 2 : index
         %3 = flow.dispatch.workload.ordinal %cl_3, 3 : index
-        %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32)
+        %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32)
             : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xf32>>{%0, %1, %2, %3}
-        %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(32)
+        %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(32)
             : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xf32>>{%0, %1, %2, %3}
-        %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(32)
+        %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(32)
             : !flow.dispatch.tensor<writeonly:tensor<?x?x?x?xf32>>{%0, %1, %2, %3}
         %7 = flow.dispatch.tensor.load %4, offsets = [0, 0, 0, 0], sizes = [%0, %1, %2, %3], strides = [1, 1, 1, 1]
             : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xf32>>{%0, %1, %2, %3} -> tensor<?x?x?x?xf32>
@@ -337,9 +329,9 @@
 //  CHECK-DAG:      %[[D1:.*]] = hal.interface.constant.load layout({{.+}}) ordinal(1) : index
 //  CHECK-DAG:      %[[D2:.*]] = hal.interface.constant.load layout({{.+}}) ordinal(2) : index
 //  CHECK-DAG:      %[[D3:.*]] = hal.interface.constant.load layout({{.+}}) ordinal(3) : index
-//  CHECK-DAG:      %[[D4:.*]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(32) : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xf32>>{%[[D0]], %[[D1]], %[[D2]], %[[D3]]}
-//  CHECK-DAG:      %[[D5:.*]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) alignment(32) : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xf32>>{%[[D0]], %[[D1]], %[[D2]], %[[D3]]}
-//  CHECK-DAG:      %[[D6:.*]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) alignment(32) : !flow.dispatch.tensor<writeonly:tensor<?x?x?x?xf32>>{%[[D0]], %[[D1]], %[[D2]], %[[D3]]}
+//  CHECK-DAG:      %[[D4:.*]] = hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(32) : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xf32>>{%[[D0]], %[[D1]], %[[D2]], %[[D3]]}
+//  CHECK-DAG:      %[[D5:.*]] = hal.interface.binding.subspan layout({{.+}}) binding(1) alignment(32) : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xf32>>{%[[D0]], %[[D1]], %[[D2]], %[[D3]]}
+//  CHECK-DAG:      %[[D6:.*]] = hal.interface.binding.subspan layout({{.+}}) binding(2) alignment(32) : !flow.dispatch.tensor<writeonly:tensor<?x?x?x?xf32>>{%[[D0]], %[[D1]], %[[D2]], %[[D3]]}
 //      CHECK:      %[[WORKGROUP_ID_X:.*]] = hal.interface.workgroup.id[0] : index
 //      CHECK:      %[[WORKGROUP_COUNT_X:.*]] = hal.interface.workgroup.count[0] : index
 //      CHECK:      %[[WORKGROUP_ID_Y:.*]] = hal.interface.workgroup.id[1] : index
@@ -374,12 +366,10 @@
 // -----
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[2, 64, 0, 64], [1, 1, 1, 4], [0, 0, 0, 0]]>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 4, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 4, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {
   data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128",
@@ -404,11 +394,11 @@
         %1 = flow.dispatch.workload.ordinal %cl_1, 1 : index
         %2 = flow.dispatch.workload.ordinal %cl_2, 2 : index
         %3 = flow.dispatch.workload.ordinal %cl_3, 3 : index
-        %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32)
+        %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32)
             : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xf32>>{%0, %1, %2, %3}
-        %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(32)
+        %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(32)
             : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xf32>>{%0, %1, %2, %3}
-        %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(32)
+        %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(32)
             : !flow.dispatch.tensor<writeonly:tensor<?x?x?x?xf32>>{%0, %1, %2, %3}
         %7 = flow.dispatch.tensor.load %4, offsets = [0, 0, 0, 0], sizes = [%0, %1, %2, %3], strides = [1, 1, 1, 1]
             : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xf32>>{%0, %1, %2, %3} -> tensor<?x?x?x?xf32>
@@ -449,12 +439,10 @@
 // -----
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[1, 64, 64, 0], [1, 16, 4, 0], [0, 0, 0, 64]]>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 4, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 4, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-arm_64", {
   data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128",
@@ -479,11 +467,11 @@
         %1 = flow.dispatch.workload.ordinal %cl_1, 1 : index
         %2 = flow.dispatch.workload.ordinal %cl_2, 2 : index
         %3 = flow.dispatch.workload.ordinal %cl_3, 3 : index
-        %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32)
+        %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32)
             : !flow.dispatch.tensor<readonly:tensor<?x?x?xf32>>{%0, %1, %3}
-        %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(32)
+        %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(32)
             : !flow.dispatch.tensor<readonly:tensor<?x?x?xf32>>{%0, %3, %2}
-        %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(32)
+        %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(32)
             : !flow.dispatch.tensor<writeonly:tensor<?x?x?xf32>>{%0, %1, %2}
         %7 = flow.dispatch.tensor.load %4, offsets = [0, 0, 0], sizes = [%0, %1, %3], strides = [1, 1, 1]
             : !flow.dispatch.tensor<readonly:tensor<?x?x?xf32>>{%0, %1, %3} -> tensor<?x?x?xf32>
@@ -523,12 +511,10 @@
 // -----
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[32, 16, 0], [16, 8, 0], [0, 0, 2]]>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "system-elf-x86_64">
 #translation = #iree_codegen.translation_info<CPUDoubleTilingExpert>
@@ -542,11 +528,11 @@
     builtin.module {
       func.func @preset_config() attributes {translation_info = #translation} {
         %cst = arith.constant 0.000000e+00 : f32
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0)
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0)
             : !flow.dispatch.tensor<readonly:tensor<128x256xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1)
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1)
             : !flow.dispatch.tensor<readonly:tensor<256x512xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2)
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2)
             : !flow.dispatch.tensor<writeonly:tensor<128x512xf32>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [128, 256], strides = [1, 1]
             : !flow.dispatch.tensor<readonly:tensor<128x256xf32>> -> tensor<128x256xf32>
@@ -587,11 +573,9 @@
 // -----
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[64, 64]]>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 10, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 10, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "system-elf-x86_64">
 #translation = #iree_codegen.translation_info<CPUBufferOpsTileAndVectorize>
@@ -624,8 +608,8 @@
         %dest_offset_x = flow.dispatch.workload.ordinal %cl_7, 7: index
         %slice_size_y = flow.dispatch.workload.ordinal %cl_8, 8: index
         %slice_size_x = flow.dispatch.workload.ordinal %cl_9, 9: index
-        %source = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<?x?xi32>{%source_size_y, %source_size_x}
-        %dest = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<?x?xi32>{%dest_size_y, %dest_size_x}
+        %source = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<?x?xi32>{%source_size_y, %source_size_x}
+        %dest = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<?x?xi32>{%dest_size_y, %dest_size_x}
         %source_subview = memref.subview %source[%source_offset_y, %source_offset_x] [%slice_size_y, %slice_size_x] [1, 1] : memref<?x?xi32> to memref<?x?xi32, strided<[?, 1], offset : ?>>
         %dest_subview = memref.subview %dest[%dest_offset_y, %dest_offset_x] [%slice_size_y, %slice_size_x] [1, 1] : memref<?x?xi32> to memref<?x?xi32, strided<[?, 1], offset : ?>>
         linalg.generic {
@@ -663,8 +647,8 @@
 //  CHECK-DAG:   %[[DEST_OFFSET_X:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(7) : index
 //  CHECK-DAG:   %[[SLICE_SIZE_Y:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(8) : index
 //  CHECK-DAG:   %[[SLICE_SIZE_X:.+]] = hal.interface.constant.load layout({{.+}}) ordinal(9) : index
-//  CHECK-DAG:   %[[SOURCE_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//  CHECK-DAG:   %[[DEST_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//  CHECK-DAG:   %[[SOURCE_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//  CHECK-DAG:   %[[DEST_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //  CHECK-DAG:   %[[SOURCE:.+]] = memref.subview %[[SOURCE_BINDING]][%[[SOURCE_OFFSET_Y]], %[[SOURCE_OFFSET_X]]]
 //  CHECK-DAG:   %[[DEST:.+]] = memref.subview %[[DEST_BINDING]][%[[DEST_OFFSET_Y]], %[[DEST_OFFSET_X]]]
 //  CHECK-DAG:   %[[WG_ID_X:.+]] = hal.interface.workgroup.id[0]
@@ -688,11 +672,9 @@
 // -----
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[15]]>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "system-elf-x86_64">
 #translation = #iree_codegen.translation_info<CPUDefault>
@@ -708,9 +690,9 @@
         %c2 = arith.constant 2 : index
         %cst = arith.constant dense<[1.000000e+00, 6.12323426E-17]> : tensor<2xf32>
         %cst_0 = arith.constant dense<[-0.000000e+00, -1.000000e+00]> : tensor<2xf32>
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0)
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0)
             : !flow.dispatch.tensor<readwrite:tensor<32xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1)
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1)
             : !flow.dispatch.tensor<readwrite:tensor<32xf32>>
         %2 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [32], strides = [1]
             : !flow.dispatch.tensor<readwrite:tensor<32xf32>> -> tensor<32xf32>
@@ -743,11 +725,9 @@
 // -----
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[64, 64, 64]]>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "system-elf-x86_64">
 #translation = #iree_codegen.translation_info<CPUDefault>
@@ -765,8 +745,8 @@
         %cst_0 = arith.constant dense<[-0.000000e+00, -0.707106769, -1.000000e+00, -0.707106769]> : tensor<4xf32>
         %0 = bufferization.to_memref %cst_0 : memref<4xf32>
         %1 = bufferization.to_memref %cst : memref<4xf32>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<64x128x32xf32>
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<64x128x32xf32>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<64x128x32xf32>
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<64x128x32xf32>
         iree_linalg_ext.fft {lowering_config = #config}
             ins(%c3, %1, %0 : index, memref<4xf32>, memref<4xf32>) outs(%2, %3 : memref<64x128x32xf32>, memref<64x128x32xf32>)
         return
@@ -794,12 +774,10 @@
 // -----
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[64, 64, 0], [1, 4, 0], [0, 0, 4]]>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "system-elf-x86_64">
 #map0 = affine_map<(d0, d1) -> (d0, d1)>
@@ -823,11 +801,11 @@
         %0 = flow.dispatch.workload.ordinal %cl_0, 0 : index
         %1 = flow.dispatch.workload.ordinal %cl_1, 1 : index
         %2 = flow.dispatch.workload.ordinal %cl_2, 2 : index
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0)
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0)
             : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
-        %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1)
+        %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1)
             : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
-        %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2)
+        %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2)
             : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
         %6 = tensor.empty(%0, %1) : tensor<?x?xf32>
         %7 = linalg.generic {
@@ -879,12 +857,10 @@
 // -----
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[0, 64, 64, 64, 0, 0, 0]]>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 9, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 9, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "system-elf-x86_64", {
   data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128",
@@ -918,11 +894,11 @@
         %6 = flow.dispatch.workload.ordinal %cl_6, 6 : index
         %7 = flow.dispatch.workload.ordinal %cl_7, 7 : index
         %8 = flow.dispatch.workload.ordinal %cl_8, 8 : index
-        %9 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0)
+        %9 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0)
             : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xf32>>{%0, %1, %2, %3}
-        %10 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1)
+        %10 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1)
             : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xf32>>{%4, %5, %3, %6}
-        %11 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2)
+        %11 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2)
             : !flow.dispatch.tensor<readwrite:tensor<?x?x?x?xf32>>{%0, %7, %8, %6}
         %12 = flow.dispatch.tensor.load %9, offsets = [0, 0, 0, 0], sizes = [%0, %1, %2, %3], strides = [1, 1, 1, 1]
             : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xf32>>{%0, %1, %2, %3} -> tensor<?x?x?x?xf32>
@@ -969,12 +945,10 @@
 // -----
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[0, 20, 40, 48, 0, 0]]>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "system-elf-x86_64", {
   data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128",
@@ -991,11 +965,11 @@
     builtin.module {
       func.func @conv_static() attributes {translation_info = #translation} {
         %cst = arith.constant 0.000000e+00 : f32
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0)
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0)
             : !flow.dispatch.tensor<readonly:tensor<1x161x161x96xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1)
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1)
             : !flow.dispatch.tensor<readonly:tensor<3x3x96xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2)
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2)
             : !flow.dispatch.tensor<writeonly:tensor<1x80x80x96xf32>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 161, 161, 96], strides = [1, 1, 1, 1]
             : !flow.dispatch.tensor<readonly:tensor<1x161x161x96xf32>> -> tensor<1x161x161x96xf32>
@@ -1043,11 +1017,9 @@
 // -----
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[16, 32], [16, 16], [0, 0]]>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "system-elf-x86_64", {
   data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128",
@@ -1065,9 +1037,9 @@
     }
     builtin.module {
       func.func @generic_static() attributes {translation_info = #translation} {
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0)
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0)
             : !flow.dispatch.tensor<readonly:tensor<96x16xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1)
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1)
             : !flow.dispatch.tensor<writeonly:tensor<16x96xf32>>
         %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [96, 16], strides = [1, 1]
             : !flow.dispatch.tensor<readonly:tensor<96x16xf32>> -> tensor<96x16xf32>
@@ -1110,12 +1082,10 @@
 // -----
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[28, 8, 0], [4, 4, 0], [0, 0, 60]]>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "system-elf-arm_64", {
   data_layout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128",
@@ -1132,11 +1102,11 @@
     builtin.module {
       func.func @matmul_static() attributes {translation_info = #translation} {
         %cst = arith.constant 0.000000e+00 : f32
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0)
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0)
             : !flow.dispatch.tensor<readonly:tensor<196x240xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1)
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1)
             : !flow.dispatch.tensor<readonly:tensor<240x40xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2)
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2)
             : !flow.dispatch.tensor<writeonly:tensor<196x40xf32>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [196, 240], strides = [1, 1]
             : !flow.dispatch.tensor<readonly:tensor<196x240xf32>> -> tensor<196x240xf32>
@@ -1165,12 +1135,10 @@
 // -----
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[0, 1, 7, 64, 0, 0]]>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "system-elf-arm_64", {
   data_layout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128",
@@ -1187,11 +1155,11 @@
     builtin.module {
       func.func @restrict_num_workgroups() attributes {translation_info = #translation} {
         %cst = arith.constant 0.000000e+00 : f32
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0)
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0)
             : !flow.dispatch.tensor<readonly:tensor<1x11x11x576xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1)
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1)
             : !flow.dispatch.tensor<readonly:tensor<5x5x576xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2)
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2)
             : !flow.dispatch.tensor<writeonly:tensor<1x7x7x576xf32>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 11, 11, 576], strides = [1, 1, 1, 1]
             : !flow.dispatch.tensor<readonly:tensor<1x11x11x576xf32>> -> tensor<1x11x11x576xf32>
@@ -1219,12 +1187,10 @@
 // -----
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[4, 0, 0], [4, 0, 0], [0, 1, 4]]>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 4, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 4, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {
   data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128",
@@ -1292,12 +1258,10 @@
 // -----
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[64, 0, 0], [8, 0, 0], [0, 0, 16]]>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {
   data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128",
@@ -1318,11 +1282,11 @@
         %cl_1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
         %0 = flow.dispatch.workload.ordinal %cl_0, 0 : index
         %1 = flow.dispatch.workload.ordinal %cl_1, 1 : index
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32) offset(%c0)
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32) offset(%c0)
             : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(32) offset(%c0)
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(32) offset(%c0)
             : !flow.dispatch.tensor<readonly:tensor<?x1xf32>>{%1}
-        %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(32) offset(%c0)
+        %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(32) offset(%c0)
             : !flow.dispatch.tensor<readwrite:tensor<?x1xf32>>{%0}
         %5 = flow.dispatch.tensor.load %3, offsets = [0, 0], sizes = [%1, 1], strides = [1, 1]
             : !flow.dispatch.tensor<readonly:tensor<?x1xf32>>{%1} -> tensor<?x1xf32>
@@ -1364,12 +1328,10 @@
 
 // -----
 #config = #iree_codegen.lowering_config<tile_sizes = [[0, 0, 0], [0, 0, 0], [0, 0, 16]]>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {
   data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128",
@@ -1387,11 +1349,11 @@
       func.func @gemm_unit_M_unit_N() attributes {translation_info = #translation} {
         %c0 = arith.constant 0 : index
         %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32) offset(%c0)
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32) offset(%c0)
             : !flow.dispatch.tensor<readonly:tensor<1x?xf32>>{%0}
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(32) offset(%c0)
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(32) offset(%c0)
             : !flow.dispatch.tensor<readonly:tensor<?x1xf32>>{%0}
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(32) offset(%c0)
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(32) offset(%c0)
             : !flow.dispatch.tensor<readwrite:tensor<1x1xf32>>
         %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [1, %0], strides = [1, 1]
             : !flow.dispatch.tensor<readonly:tensor<1x?xf32>>{%0} -> tensor<1x?xf32>
@@ -1423,11 +1385,9 @@
 // -----
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[0, 0, 0, 0, 64, 64, 0, 64], [0, 1, 0, 0, 1, 1, 0, 4], [0, 0, 0, 0, 0, 0, 0, 0]]>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 4, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 4, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {
   data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128",
@@ -1452,9 +1412,9 @@
         %1 = flow.dispatch.workload.ordinal %cl_1, 1 : index
         %2 = flow.dispatch.workload.ordinal %cl_2, 2 : index
         %3 = flow.dispatch.workload.ordinal %cl_3, 3 : index
-        %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0)
+        %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0)
             : !flow.dispatch.tensor<readonly:tensor<1x?x1x1x?x?x1x?xf32>>{%0, %1, %2, %3}
-        %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1)
+        %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1)
             : !flow.dispatch.tensor<writeonly:tensor<1x?x1x1x?x?x1x?xf32>>{%0, %1, %2, %3}
         %6 = flow.dispatch.tensor.load %4, offsets = [0, 0, 0, 0, 0, 0, 0, 0], sizes = [1, %0, 1, 1, %1, %2, 1, %3], strides = [1, 1, 1, 1, 1, 1, 1, 1]
             : !flow.dispatch.tensor<readonly:tensor<1x?x1x1x?x?x1x?xf32>>{%0, %1, %2, %3} -> tensor<1x?x1x1x?x?x1x?xf32>
@@ -1497,11 +1457,9 @@
 
 // -----
 #config = #iree_codegen.lowering_config<tile_sizes = [[0], [0], [4]]>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {
   data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128",
@@ -1521,9 +1479,9 @@
       func.func @reduce_to_scalar() attributes {translation_info = #translation} {
         %cl_0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
         %0 = flow.dispatch.workload.ordinal %cl_0, 0 : index
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0)
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0)
             : !flow.dispatch.tensor<readonly:tensor<?xf32>>{%0}
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1)
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1)
             : !flow.dispatch.tensor<readwrite:tensor<f32>>
         %3 = flow.dispatch.tensor.load %1, offsets = [0], sizes = [%0], strides = [1]
             : !flow.dispatch.tensor<readonly:tensor<?xf32>>{%0} -> tensor<?xf32>
@@ -1557,11 +1515,9 @@
 // -----
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[]]>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {
   data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128",
@@ -1578,9 +1534,9 @@
     }
     builtin.module {
       func.func @scalar() attributes {translation_info = #translation} {
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0)
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0)
             : !flow.dispatch.tensor<readonly:tensor<f32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1)
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1)
             : !flow.dispatch.tensor<writeonly:tensor<f32>>
         %2 = flow.dispatch.tensor.load %0, offsets = [], sizes = [], strides = []
             : !flow.dispatch.tensor<readonly:tensor<f32>> -> tensor<f32>
@@ -1614,11 +1570,9 @@
 // -----
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[2], [2], [0]]>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-arm_64", {
   data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128",
@@ -1635,9 +1589,9 @@
     }
     builtin.module {
       func.func @rank_reduced_slice() attributes {translation_info = #translation} {
-        %in_binding = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0)
+        %in_binding = hal.interface.binding.subspan layout(#pipeline_layout) binding(0)
             : !flow.dispatch.tensor<readonly:tensor<5x40xf32>>
-        %out_binding = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1)
+        %out_binding = hal.interface.binding.subspan layout(#pipeline_layout) binding(1)
             : !flow.dispatch.tensor<writeonly:tensor<10xf32>>
         %in = flow.dispatch.tensor.load %in_binding, offsets = [3, 10], sizes = [1, 10], strides = [2, 1]
             : !flow.dispatch.tensor<readonly:tensor<5x40xf32>> -> tensor<10xf32>
@@ -1667,9 +1621,9 @@
 //      CHECK:   hal.return %[[C5]], %[[C1]], %[[C1]]
 //      CHECK: func.func @rank_reduced_slice()
 // CHECK-SAME:     translation_info = #[[TRANSLATION]]
-//  CHECK-DAG:   %[[SRC_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//  CHECK-DAG:   %[[SRC_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 // CHECK-SAME:       : !flow.dispatch.tensor<readonly:tensor<5x40xf32>>
-//  CHECK-DAG:   %[[DST_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//  CHECK-DAG:   %[[DST_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 // CHECK-SAME:       : !flow.dispatch.tensor<writeonly:tensor<10xf32>>
 //      CHECK:   scf.for %[[IV0:.+]] =
 //      CHECK:     %[[OFFSET:.+]] = affine.apply #[[MAP]]()[%[[IV0]]]
@@ -1681,13 +1635,11 @@
 // -----
 
 #config = #iree_codegen.lowering_config<tile_sizes = [{sizes=[32, 64, 0], interchange=[1, 0, 2]}, [8, 32, 0], [0, 0, 16]]>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {
   data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128",
@@ -1709,13 +1661,13 @@
         %0 = flow.dispatch.workload.ordinal %cl_0, 0 : index
         %1 = flow.dispatch.workload.ordinal %cl_1, 1 : index
         %2 = flow.dispatch.workload.ordinal %cl_2, 2 : index
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0)
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0)
             : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
-        %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1)
+        %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1)
             : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
-        %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2)
+        %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2)
             : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
-        %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3)
+        %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3)
             : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
         %7 = flow.dispatch.tensor.load %3, offsets = [0, 0], sizes = [%0, %2], strides = [1, 1]
             : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2} -> tensor<?x?xf32>
@@ -1753,11 +1705,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 5, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 5, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @no_compute {
   hal.executable.variant public @embedded_elf_x86_64 target(<"llvm-cpu", "embedded-elf-x86_64", {}>) {
@@ -1784,9 +1734,9 @@
         %7 = flow.dispatch.workload.ordinal %2, 2 : index
         %8 = flow.dispatch.workload.ordinal %3, 3 : index
         %9 = flow.dispatch.workload.ordinal %4, 4 : index
-        %10 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<?x?x?xf32>{%5, %6, %7}
+        %10 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<?x?x?xf32>{%5, %6, %7}
         memref.assume_alignment %10, 64 : memref<?x?x?xf32>
-        %11 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<1x?x?xf32>{%8, %9}
+        %11 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<1x?x?xf32>{%8, %9}
         memref.assume_alignment %11, 64 : memref<1x?x?xf32>
         return
       }
@@ -1800,13 +1750,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @tile_multiuse_producer {
   hal.executable.variant public @embedded_elf_x86_64 target(<"llvm-cpu", "embedded-elf_x86_64", {}>) {
@@ -1820,13 +1768,13 @@
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f32
         %cst_0 = arith.constant 1.000000e+00 : f32
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
             : !flow.dispatch.tensor<readonly:tensor<12x128x128xf32>>
-        %s0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+        %s0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
             : !flow.dispatch.tensor<writeonly:tensor<12x128x128xf32>>
-        %s1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+        %s1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
             : !flow.dispatch.tensor<writeonly:tensor<12x128xf32>>
-        %s2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0)
+        %s2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0)
             : !flow.dispatch.tensor<writeonly:tensor<12x128xf32>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [12, 128, 128], strides = [1, 1, 1]
             : !flow.dispatch.tensor<readonly:tensor<12x128x128xf32>> -> tensor<12x128x128xf32>
@@ -1877,10 +1825,10 @@
   }
 }
 // CHECK-LABEL: func @tile_multiuse_producer()
-//   CHECK-DAG:     %[[SRC_BINDING:.+]] = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0)
-//   CHECK-DAG:     %[[RESULT_BINDING0:.+]] = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1)
-//   CHECK-DAG:     %[[RESULT_BINDING1:.+]] = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2)
-//   CHECK-DAG:     %[[RESULT_BINDING2:.+]] = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3)
+//   CHECK-DAG:     %[[SRC_BINDING:.+]] = hal.interface.binding.subspan layout(#pipeline_layout) binding(0)
+//   CHECK-DAG:     %[[RESULT_BINDING0:.+]] = hal.interface.binding.subspan layout(#pipeline_layout) binding(1)
+//   CHECK-DAG:     %[[RESULT_BINDING1:.+]] = hal.interface.binding.subspan layout(#pipeline_layout) binding(2)
+//   CHECK-DAG:     %[[RESULT_BINDING2:.+]] = hal.interface.binding.subspan layout(#pipeline_layout) binding(3)
 //       CHECK:     scf.for %[[IV0:.+]] =
 //       CHECK:       scf.for %[[IV1:.+]] =
 //       CHECK:         %[[SRC:.+]] = flow.dispatch.tensor.load %[[SRC_BINDING]], offsets = [%[[IV0]], %[[IV1]], 0]
@@ -1902,13 +1850,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @no_tile {
   hal.executable.variant public @embedded_elf_x86_64 target(<"llvm-cpu", "embedded-elf-x86_64", {}>) {
@@ -1921,10 +1867,10 @@
       func.func @no_tile() attributes {translation_info = #iree_codegen.translation_info<CPUDefault>} {
         %c0 = arith.constant 0 : index
         %c64 = arith.constant 64 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<10xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<10xi32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<3xf32>>
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c64) : !flow.dispatch.tensor<readwrite:tensor<3xi32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<10xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<10xi32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<3xf32>>
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c64) : !flow.dispatch.tensor<readwrite:tensor<3xi32>>
         %4 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [10], strides = [1] : !flow.dispatch.tensor<readonly:tensor<10xf32>> -> tensor<10xf32>
         %5 = flow.dispatch.tensor.load %1, offsets = [0], sizes = [10], strides = [1] : !flow.dispatch.tensor<readonly:tensor<10xi32>> -> tensor<10xi32>
         %6 = flow.dispatch.tensor.load %2, offsets = [0], sizes = [3], strides = [1] : !flow.dispatch.tensor<readwrite:tensor<3xf32>> -> tensor<3xf32>
@@ -1947,11 +1893,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @pack_lowering {
   hal.executable.variant public @embedded_elf_x86_64 target(<"llvm-cpu", "embedded-elf-x86_64", {}>) {
@@ -1964,9 +1908,9 @@
       func.func @gemm_lhs_pack() attributes {translation_info = #iree_codegen.translation_info<CPUDataTiling>} {
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f32
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
             : !flow.dispatch.tensor<readonly:tensor<100x250xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
             : !flow.dispatch.tensor<writeonly:tensor<14x64x8x4xf32>>
         %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [100, 250], strides = [1, 1]
             : !flow.dispatch.tensor<readonly:tensor<100x250xf32>> -> tensor<100x250xf32>
@@ -1990,11 +1934,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @pack_lowering {
   hal.executable.variant public @embedded_elf_x86_64 target(<"llvm-cpu", "embedded-elf-x86_64", {}>) {
@@ -2008,9 +1950,9 @@
         %c0 = arith.constant 0 : index
         %c114688 = arith.constant 114688 : index
         %cst = arith.constant 0.000000e+00 : f32
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
             : !flow.dispatch.tensor<readonly:tensor<250x500xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c114688)
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c114688)
             : !flow.dispatch.tensor<writeonly:tensor<64x64x8x4xf32>>
         %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [250, 500], strides = [1, 1]
             : !flow.dispatch.tensor<readonly:tensor<250x500xf32>> -> tensor<250x500xf32>
@@ -2033,18 +1975,13 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 4, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 4, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @clone_index_computations {
   hal.executable.variant public @embedded_elf_x86_64 target(<"llvm-cpu", "embedded-elf-x86_64", {}>) {
-    hal.executable.export public @clone_index_computations ordinal(0) layout(
-        #hal.pipeline.layout<push_constants = 4, sets = [
-            <0, bindings = [<0, storage_buffer, ReadOnly>, <1, storage_buffer>]>]>)
-        {
+    hal.executable.export public @clone_index_computations ordinal(0) layout(#pipeline_layout) {
     ^bb0(%arg0: !hal.device, %arg1: index, %arg2: index, %arg3 : index, %arg4 : index):
       %x, %y, %z = flow.dispatch.workgroup_count_from_slice %arg1, %arg2, %arg3, %arg4
       hal.return %x, %y, %z : index, index, index
@@ -2065,11 +2002,11 @@
         %5 = flow.dispatch.workload.ordinal %1, 1 : index
         %6 = flow.dispatch.workload.ordinal %2, 2 : index
         %7 = flow.dispatch.workload.ordinal %3, 3 : index
-        %8 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+        %8 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
             : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%4, %5}
         %9 = affine.apply affine_map<()[s0] -> (s0 ceildiv 8)>()[%6]
         %10 = affine.apply affine_map<()[s0] -> (s0 ceildiv 4)>()[%7]
-        %11 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+        %11 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
             : !flow.dispatch.tensor<writeonly:tensor<?x?x8x4xf32>>{%9, %10}
         %12 = flow.dispatch.tensor.load %8, offsets = [0, 0], sizes = [%4, %5], strides = [1, 1]
             : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%4, %5} -> tensor<?x?xf32>
@@ -2102,11 +2039,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 4, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 4, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @dynamic_unpack {
   hal.executable.variant public @embedded_elf_x86_64 target(<"llvm-cpu", "embedded-elf-x86_64", {}>) {
@@ -2131,8 +2068,8 @@
         %5 = flow.dispatch.workload.ordinal %1, 1 : index
         %6 = flow.dispatch.workload.ordinal %2, 2 : index
         %7 = flow.dispatch.workload.ordinal %3, 3 : index
-        %8 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?x?x32x16xi32>>{%4, %5}
-        %9 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c131072) : !flow.dispatch.tensor<writeonly:tensor<?x?xi32>>{%6, %7}
+        %8 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?x?x32x16xi32>>{%4, %5}
+        %9 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c131072) : !flow.dispatch.tensor<writeonly:tensor<?x?xi32>>{%6, %7}
         %10 = flow.dispatch.tensor.load %8, offsets = [0, 0, 0, 0], sizes = [%4, %5, 32, 16], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?x32x16xi32>>{%4, %5} -> tensor<?x?x32x16xi32>
         %11 = tensor.empty(%6, %7) : tensor<?x?xi32>
         %12 = tensor.unpack %10 inner_dims_pos = [0, 1] inner_tiles = [32, 16] into %11
@@ -2151,18 +2088,15 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 4, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 4, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @dynamic_unpack_dynamic_tile {
   hal.executable.variant public @embedded_elf_x86_64 target(<"llvm-cpu", "embedded-elf-x86_64", {}>) {
-    hal.executable.export public @dynamic_unpack_dynamic_tile ordinal(0) layout(
-        #hal.pipeline.layout<push_constants = 4, sets = [
-            <0, bindings = [<0, storage_buffer, ReadOnly>, <1, storage_buffer>]>]>)
-        {
+    hal.executable.export public @dynamic_unpack_dynamic_tile ordinal(0) layout(#pipeline_layout) {
     ^bb0(%arg0: !hal.device, %arg1: index, %arg2: index, %arg3: index, %arg4: index):
       %x, %y, %z = flow.dispatch.workgroup_count_from_slice %arg1, %arg2, %arg3, %arg4
       hal.return %x, %y, %z : index, index, index
@@ -2185,8 +2119,8 @@
         %5 = flow.dispatch.workload.ordinal %1, 1 : index
         %6 = flow.dispatch.workload.ordinal %2, 2 : index
         %7 = flow.dispatch.workload.ordinal %3, 3 : index
-        %8 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xi32>>{%4, %5, %c32, %c16}
-        %9 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c131072) : !flow.dispatch.tensor<writeonly:tensor<?x?xi32>>{%6, %7}
+        %8 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xi32>>{%4, %5, %c32, %c16}
+        %9 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c131072) : !flow.dispatch.tensor<writeonly:tensor<?x?xi32>>{%6, %7}
         %10 = flow.dispatch.tensor.load %8, offsets = [0, 0, 0, 0], sizes = [%4, %5, %c32, %c16], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xi32>>{%4, %5, %c32, %c16} -> tensor<?x?x?x?xi32>
         %11 = tensor.empty(%6, %7) : tensor<?x?xi32>
         %12 = tensor.unpack %10 inner_dims_pos = [0, 1] inner_tiles = [%c32, %c16] into %11
@@ -2205,11 +2139,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @unpack_elem {
   hal.executable.variant public @embedded_elf_arm_64 target(<"llvm-cpu", "embedded-elf-arm_64", {}>) {
@@ -2221,8 +2153,8 @@
     builtin.module {
       func.func @unpack_elem() attributes {translation_info = #iree_codegen.translation_info<CPUDataTiling>} {
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x48x8x8xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x384xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x48x8x8xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x384xf32>>
         %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [16, 48, 8, 8], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<16x48x8x8xf32>> -> tensor<16x48x8x8xf32>
         %3 = tensor.empty() : tensor<128x384xf32>
         %4 = tensor.unpack %2 inner_dims_pos = [0, 1] inner_tiles = [8, 8] into %3 {lowering_config = #iree_codegen.lowering_config<tile_sizes = [[64, 64]]>} : tensor<16x48x8x8xf32> -> tensor<128x384xf32>
@@ -2246,11 +2178,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
@@ -2277,10 +2207,10 @@
         %0:2 = iree_codegen.query_tile_sizes tensor<12544x16xi32, #iree_encoding.encoding<operand_index = 2, op_type = matmul, element_types = [i8, i8, i32], user_indexing_maps = [#map, #map1, #map2]>> -> index, index
         %1 = affine.apply affine_map<()[s0] -> (12544 ceildiv s0)>()[%0#0]
         %2 = affine.apply affine_map<()[s0] -> (16 ceildiv s0)>()[%0#1]
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c200960) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xi32>>{%1, %2, %0#0, %0#1}
-        %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c1003776) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<12544xi32>>
-        %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c1053952) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<16xi32>>
-        %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<12544x16xi32>>
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c200960) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xi32>>{%1, %2, %0#0, %0#1}
+        %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c1003776) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<12544xi32>>
+        %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c1053952) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<16xi32>>
+        %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<12544x16xi32>>
         %10 = flow.dispatch.tensor.load %3, offsets = [0, 0, 0, 0], sizes = [%1, %2, %0#0, %0#1], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xi32>>{%1, %2, %0#0, %0#1} -> tensor<?x?x?x?xi32>
         %11 = flow.dispatch.tensor.load %4, offsets = [0], sizes = [12544], strides = [1] : !flow.dispatch.tensor<readonly:tensor<12544xi32>> -> tensor<12544xi32>
         %12 = flow.dispatch.tensor.load %5, offsets = [0], sizes = [16], strides = [1] : !flow.dispatch.tensor<readonly:tensor<16xi32>> -> tensor<16xi32>
@@ -2311,15 +2241,13 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>,
-    #hal.descriptor_set.binding<4, storage_buffer>,
-    #hal.descriptor_set.binding<5, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @elem_pack {
   hal.executable.variant public @embedded_elf_arm_64 target(<"llvm-cpu", "embedded-elf-arm_64", {cpu = "generic", cpu_features = "", data_layout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128", native_vector_size = 16 : index, target_triple = "aarch64-none-elf"}>) {
@@ -2338,15 +2266,15 @@
         %c1572864 = arith.constant 1572864 : index
         %c2359296 = arith.constant 2359296 : index
         %cst = arith.constant 0.000000e+00 : f32
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c1339392) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x2x512xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c786432) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<384x512xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<384x512xf32>>
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<384xi32>>
-        %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c823296) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<512xf32>>
-        %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c825344) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<512xf32>>
-        %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<48x512x8x1xf32>>
-        %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(4) alignment(64) offset(%c1572864) : !flow.dispatch.tensor<writeonly:tensor<384x512xf32>>
-        %8 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(5) alignment(64) offset(%c2359296) : !flow.dispatch.tensor<writeonly:tensor<384x512xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c1339392) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x2x512xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c786432) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<384x512xf32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<384x512xf32>>
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<384xi32>>
+        %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c823296) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<512xf32>>
+        %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c825344) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<512xf32>>
+        %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<48x512x8x1xf32>>
+        %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(4) alignment(64) offset(%c1572864) : !flow.dispatch.tensor<writeonly:tensor<384x512xf32>>
+        %8 = hal.interface.binding.subspan layout(#pipeline_layout) binding(5) alignment(64) offset(%c2359296) : !flow.dispatch.tensor<writeonly:tensor<384x512xf32>>
         %9 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [1, 2, 512], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x2x512xf32>> -> tensor<1x2x512xf32>
         %10 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [384, 512], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<384x512xf32>> -> tensor<384x512xf32>
         %11 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [384, 512], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<384x512xf32>> -> tensor<384x512xf32>
@@ -2385,16 +2313,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @scatter {
   hal.executable.variant public @cuda_nvptx_fb target(<"cuda", "cuda-nvptx-fb">) {
     hal.executable.export public @scatter ordinal(0)
-    layout(#hal.pipeline.layout<push_constants = 0, sets = [<0, bindings = [<0, storage_buffer, ReadOnly>, <1, storage_buffer>]>]>)
+    layout(#hal.pipeline.layout<bindings = [
+      #hal.pipeline.binding<storage_buffer, ReadOnly>,
+      #hal.pipeline.binding<storage_buffer>
+    ]>)
     {
     ^bb0(%arg0: !hal.device):
       %x, %y, %z = flow.dispatch.workgroup_count_from_slice
@@ -2406,9 +2335,9 @@
         %c251668480 = arith.constant 251668480 : index
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f32
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c228075520) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<5898240xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c251668480) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<5898240x4xi32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x640x48x48xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c228075520) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<5898240xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c251668480) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<5898240x4xi32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x640x48x48xf32>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [5898240], strides = [1] : !flow.dispatch.tensor<readonly:tensor<5898240xf32>> -> tensor<5898240xf32>
         %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [5898240, 4], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<5898240x4xi32>> -> tensor<5898240x4xi32>
         %5 = tensor.empty() : tensor<1x640x48x48xf32>
@@ -2429,11 +2358,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @collapse_workgroups_dispatch_dispatch_0 {
   hal.executable.variant public @cuda_nvptx_fb target(<"cuda", "cuda-nvptx-fb">) {
@@ -2445,8 +2372,8 @@
     builtin.module {
       func.func @collapse_workgroups_dispatch_dispatch_0_generic_1024x128x16x64() {
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1024x16x128x64xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x128x16x64xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1024x16x128x64xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x128x16x64xf32>>
         %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1024, 16, 128, 64], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1024x16x128x64xf32>> -> tensor<1024x16x128x64xf32>
         %3 = tensor.empty() : tensor<1024x128x16x64xf32>
         %4 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d2, d1, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%2 : tensor<1024x16x128x64xf32>) outs(%3 : tensor<1024x128x16x64xf32>) attrs = {lowering_config = #iree_codegen.lowering_config<tile_sizes = [[1, 1, 1, 64]]>} {
@@ -2472,13 +2399,11 @@
 // -----
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[64, 64, 0], [16, 4, 0], [0, 0, 64]]>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-arm_64", {
   data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128",
@@ -2498,13 +2423,13 @@
         %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
         %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
         %2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0)
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0)
             : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
-        %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1)
+        %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1)
             : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
-        %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2)
+        %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2)
             : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
-        %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3)
+        %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3)
             : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
         %7 = flow.dispatch.tensor.load %3, offsets = [0, 0], sizes = [%0, %2], strides = [1, 1]
             : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2} -> tensor<?x?xf32>
@@ -2539,13 +2464,11 @@
 #config = #iree_codegen.lowering_config<tile_sizes = [[64, 64, 0], [16, 4, 0], [0, 0, 64]]>
 #executable_target_embedded_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-arm_64", {data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "aarch64-none-elf"}>
 #map = affine_map<()[s0] -> (s0 ceildiv 64)>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #translation = #iree_codegen.translation_info<CPUDoubleTilingExpert>
 module {
@@ -2563,10 +2486,10 @@
           %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
           %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
           %2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-          %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
-          %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
-          %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
-          %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
+          %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
+          %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
+          %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
+          %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
           %workgroup_id_x = hal.interface.workgroup.id[0] : index
           %workgroup_count_x = hal.interface.workgroup.count[0] : index
           %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -2584,9 +2507,9 @@
 }
 
 // CHECK-LABEL: func.func @matmul_already_distributed
-// CHECK:         %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-// CHECK:         %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-// CHECK:         %[[OUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(3)
+// CHECK:         %[[LHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+// CHECK:         %[[RHS_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+// CHECK:         %[[OUT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(3)
 // CHECK-NOT:     scf.for
 // CHECK:         %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_BINDING]], offsets = [%workgroup_id_y, 0]
 // CHECK:         %[[RHS:.+]] = flow.dispatch.tensor.load %[[RHS_BINDING]], offsets = [0, %workgroup_id_x]
@@ -2597,11 +2520,9 @@
 
 // Check that the distribution avoids distributing unit-trip count loops.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 6, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 6, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @avoid_unit_range_distribute {
   hal.executable.variant public @rocm_hsaco_fb target(<"rocm", "rocm-hsaco-fb">) {
@@ -2638,9 +2559,9 @@
         %20 = arith.index_castui %19 : i64 to index
         %21 = flow.dispatch.workload.ordinal %15, 0 : index
         %22 = flow.dispatch.workload.ordinal %20, 1 : index
-        %23 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x?x?x16x16xf16>>{%21, %22}
-        %24 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%10) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x?x8x16x16xf16>>{%22}
-        %25 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32x?x16x8x16xf16>>{%22}
+        %23 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x?x?x16x16xf16>>{%21, %22}
+        %24 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%10) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x?x8x16x16xf16>>{%22}
+        %25 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32x?x16x8x16xf16>>{%22}
         %26 = flow.dispatch.tensor.load %23, offsets = [0, 0, 0, 0, 0], sizes = [32, %21, %22, 16, 16], strides = [1, 1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<32x?x?x16x16xf16>>{%21, %22} -> tensor<32x?x?x16x16xf16>
         %27 = flow.dispatch.tensor.load %24, offsets = [0, 0, 0, 0, 0], sizes = [32, %22, 8, 16, 16], strides = [1, 1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<32x?x8x16x16xf16>>{%22} -> tensor<32x?x8x16x16xf16>
         %28 = tensor.empty(%22) : tensor<32x?x16x8x16xf16>
@@ -2674,12 +2595,10 @@
 
 // Check that the distribution avoids distributing unit-trip count loops.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 6, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 6, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @set_size_to_tilesize_when_divisible {
   hal.executable.variant public @rocm_hsaco_fb target(<"rocm", "rocm-hsaco-fb">) {
@@ -2715,10 +2634,10 @@
         %19 = arith.ori %16, %18 : i64
         %20 = arith.index_castui %19 : i64 to index
         %21 = flow.dispatch.workload.ordinal %20, 1 : index
-        %22 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x32x128xf16>>
+        %22 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x32x128xf16>>
         %23 = flow.dispatch.workload.ordinal %21, 2 : index
-        %24 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x16x32x128xf16>>{%21}
-        %25 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%10) : !flow.dispatch.tensor<writeonly:tensor<?x16x4096xf16>>{%23}
+        %24 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x16x32x128xf16>>{%21}
+        %25 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%10) : !flow.dispatch.tensor<writeonly:tensor<?x16x4096xf16>>{%23}
         %26 = flow.dispatch.workload.ordinal %15, 0 : index
         %27 = flow.dispatch.tensor.load %24, offsets = [0, 0, 0, 0], sizes = [%21, 16, 32, 128], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<?x16x32x128xf16>>{%21} -> tensor<?x16x32x128xf16>
         %28 = flow.dispatch.tensor.load %22, offsets = [0, 0, 0], sizes = [4096, 32, 128], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x32x128xf16>> -> tensor<4096x32x128xf16>
@@ -2751,12 +2670,10 @@
 // -----
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[32, 16, 0], [16, 8, 0], [0, 0, 2]]>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "system-elf-x86_64">
 #translation = #iree_codegen.translation_info<CPUDoubleTilingExpert>
@@ -2770,11 +2687,11 @@
     builtin.module {
       func.func @reshape_matmul() attributes {translation_info = #translation} {
         %cst = arith.constant 0.000000e+00 : f32
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0)
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0)
             : !flow.dispatch.tensor<readonly:tensor<64x2x256xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1)
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1)
             : !flow.dispatch.tensor<readonly:tensor<256x512xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2)
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2)
             : !flow.dispatch.tensor<writeonly:tensor<128x512xf32>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [64, 2, 256], strides = [1, 1, 1]
             : !flow.dispatch.tensor<readonly:tensor<64x2x256xf32>> -> tensor<64x2x256xf32>
diff --git a/compiler/src/iree/compiler/Codegen/Common/test/type_propagation.mlir b/compiler/src/iree/compiler/Codegen/Common/test/type_propagation.mlir
index 982c6ab..55b6ffd 100644
--- a/compiler/src/iree/compiler/Codegen/Common/test/type_propagation.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/test/type_propagation.mlir
@@ -1,15 +1,13 @@
 // RUN: iree-opt --pass-pipeline="builtin.module(func.func(iree-codegen-type-propagation))" --split-input-file %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @generic_op_illegal_operand() {
   %d = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?xi8>>{%d}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?xi8>>{%d}
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?xi8>>{%d}
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?xi8>>{%d}
   %2 = flow.dispatch.tensor.load %0, offsets = [0], sizes=[%d], strides=[1] : !flow.dispatch.tensor<readonly:tensor<?xi8>>{%d} -> tensor<?xi8>
   %3 = arith.trunci %2 : tensor<?xi8> to tensor<?xi1>
   %4 = tensor.empty(%d) : tensor<?xi8>
@@ -25,8 +23,8 @@
   return
 }
 // CHECK-LABEL: func.func @generic_op_illegal_operand()
-//   CHECK-DAG:   %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//   CHECK-DAG:   %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //   CHECK-DAG:   %[[INTENSOR:.+]] = flow.dispatch.tensor.load %[[IN]]
 //   CHECK-DAG:   %[[INIT:.+]] = tensor.empty(%{{.+}}) : tensor<?xi8>
 //       CHECK:   %[[GENERIC:.+]] = linalg.generic
@@ -40,16 +38,14 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @generic_op_illegal_operand_i7() {
   %d = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?xi8>>{%d}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?xi8>>{%d}
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?xi8>>{%d}
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?xi8>>{%d}
   %2 = flow.dispatch.tensor.load %0, offsets = [0], sizes=[%d], strides=[1] : !flow.dispatch.tensor<readonly:tensor<?xi8>>{%d} -> tensor<?xi8>
   %3 = arith.trunci %2 : tensor<?xi8> to tensor<?xi7>
   %4 = tensor.empty(%d) : tensor<?xi8>
@@ -65,8 +61,8 @@
   return
 }
 // CHECK-LABEL: func.func @generic_op_illegal_operand_i7()
-//   CHECK-DAG:   %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//   CHECK-DAG:   %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //   CHECK-DAG:   %[[INTENSOR:.+]] = flow.dispatch.tensor.load %[[IN]]
 //   CHECK-DAG:   %[[INIT:.+]] = tensor.empty(%{{.+}}) : tensor<?xi8>
 //       CHECK:   %[[GENERIC:.+]] = linalg.generic
@@ -80,16 +76,14 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @generic_op_illegal_operand_i33() {
   %d = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?xi64>>{%d}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?xi64>>{%d}
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?xi64>>{%d}
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?xi64>>{%d}
   %2 = flow.dispatch.tensor.load %0, offsets = [0], sizes=[%d], strides=[1] : !flow.dispatch.tensor<readonly:tensor<?xi64>>{%d} -> tensor<?xi64>
   %3 = arith.trunci %2 : tensor<?xi64> to tensor<?xi33>
   %4 = tensor.empty(%d) : tensor<?xi64>
@@ -105,8 +99,8 @@
   return
 }
 // CHECK-LABEL: func.func @generic_op_illegal_operand_i33()
-//   CHECK-DAG:   %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//   CHECK-DAG:   %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //   CHECK-DAG:   %[[INTENSOR:.+]] = flow.dispatch.tensor.load %[[IN]]
 //   CHECK-DAG:   %[[INIT:.+]] = tensor.empty(%{{.+}}) : tensor<?xi64>
 //       CHECK:   %[[GENERIC:.+]] = linalg.generic
@@ -120,16 +114,14 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @generic_op_illegal_result() {
   %d = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?xi8>>{%d}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?xi8>>{%d}
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?xi8>>{%d}
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?xi8>>{%d}
   %2 = flow.dispatch.tensor.load %0, offsets = [0], sizes=[%d], strides=[1] : !flow.dispatch.tensor<readonly:tensor<?xi8>>{%d} -> tensor<?xi8>
   %3 = tensor.empty(%d) : tensor<?xi1>
   %4 = linalg.generic {
@@ -145,8 +137,8 @@
   return
 }
 // CHECK-LABEL: func.func @generic_op_illegal_result()
-//   CHECK-DAG:   %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//   CHECK-DAG:   %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //   CHECK-DAG:   %[[INTENSOR:.+]] = flow.dispatch.tensor.load %[[IN]]
 //   CHECK-DAG:   %[[INIT:.+]] = tensor.empty(%{{.+}}) : tensor<?xi8>
 //       CHECK:   %[[GENERIC:.+]] = linalg.generic
@@ -160,18 +152,16 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @tensor_extract() {
   %d = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %offset = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %size = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?xi8>>{%d}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?xi8>>{%d}
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?xi8>>{%d}
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?xi8>>{%d}
   %2 = flow.dispatch.tensor.load %0, offsets = [0], sizes=[%d], strides=[1] : !flow.dispatch.tensor<readonly:tensor<?xi8>>{%d} -> tensor<?xi8>
   %3 = tensor.extract_slice %2[%offset] [%size] [1] : tensor<?xi8> to tensor<?xi8>
   %4 = arith.trunci %3 : tensor<?xi8> to tensor<?xi1>
@@ -180,28 +170,26 @@
   return
 }
 // CHECK-LABEL: func.func @tensor_extract()
-//   CHECK-DAG:   %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//   CHECK-DAG:   %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //   CHECK-DAG:   %[[INTENSOR:.+]] = flow.dispatch.tensor.load %[[IN]]
 //       CHECK:   %[[EXTRACT:.+]] = tensor.extract_slice %[[INTENSOR]]
 //       CHECK:   flow.dispatch.tensor.store %[[EXTRACT]], %[[OUT]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @tensor_insert() {
   %d = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %offset = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %size = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?xi8>>{%d}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?xi8>>{%d}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?xi8>>{%d}
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?xi8>>{%d}
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?xi8>>{%d}
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?xi8>>{%d}
   %3 = flow.dispatch.tensor.load %0, offsets = [%offset], sizes=[%size], strides=[1] : !flow.dispatch.tensor<readonly:tensor<?xi8>>{%d} -> tensor<?xi8>
   %4 = flow.dispatch.tensor.load %1, offsets = [0], sizes=[%d], strides=[1] : !flow.dispatch.tensor<readonly:tensor<?xi8>>{%d} -> tensor<?xi8>
   %5 = arith.trunci %3 : tensor<?xi8> to tensor<?xi1>
@@ -212,9 +200,9 @@
   return
 }
 // CHECK-LABEL: func.func @tensor_insert()
-//   CHECK-DAG:   %[[IN1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[IN2:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//   CHECK-DAG:   %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//   CHECK-DAG:   %[[IN1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[IN2:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//   CHECK-DAG:   %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //   CHECK-DAG:   %[[IN1TENSOR:.+]] = flow.dispatch.tensor.load %[[IN1]]
 //   CHECK-DAG:   %[[IN2TENSOR:.+]] = flow.dispatch.tensor.load %[[IN2]]
 //       CHECK:   %[[INSERT:.+]] = tensor.insert_slice %[[IN1TENSOR]] into %[[IN2TENSOR]]
@@ -222,18 +210,16 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @for_loop() {
   %d = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %lb = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %step = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?xi8>>{%d}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?xi8>>{%d}
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?xi8>>{%d}
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?xi8>>{%d}
   %2 = flow.dispatch.tensor.load %0, offsets=[0], sizes=[%d], strides=[1] : !flow.dispatch.tensor<readonly:tensor<?xi8>>{%d} -> tensor<?xi8>
   %3 = flow.dispatch.tensor.load %1, offsets=[0], sizes=[%d], strides=[1] : !flow.dispatch.tensor<writeonly:tensor<?xi8>>{%d} -> tensor<?xi8>
   %4 = arith.trunci %2 : tensor<?xi8> to tensor<?xi1>
@@ -249,8 +235,8 @@
   return
 }
 // CHECK-LABEL: func.func @for_loop()
-//   CHECK-DAG:   %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//   CHECK-DAG:   %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //   CHECK-DAG:   %[[INTENSOR:.+]] = flow.dispatch.tensor.load %[[IN]]
 //   CHECK-DAG:   %[[OUTTENSOR:.+]] = flow.dispatch.tensor.load %[[OUT]]
 //       CHECK:   %[[FOR:.+]] = scf.for
@@ -262,14 +248,12 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @fill_op() {
   %d = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<writeonly:tensor<?xi8>>{%d}
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<writeonly:tensor<?xi8>>{%d}
   %1 = tensor.empty(%d) : tensor<?xi1>
   %false = arith.constant false
   %2 = linalg.fill ins(%false : i1) outs(%1 : tensor<?xi1>) -> tensor<?xi1>
@@ -278,7 +262,7 @@
   return
 }
 // CHECK-LABEL: func.func @fill_op()
-//   CHECK-DAG:   %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//   CHECK-DAG:   %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //   CHECK-DAG:   %[[INIT:.+]] = tensor.empty
 //   CHECK-DAG:   %[[FALSE:.+]] = arith.constant false
 //   CHECK-DAG:   %[[EXT_SCALAR:.+]] = arith.extui %[[FALSE]]
@@ -289,16 +273,14 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0) -> (d0)>
 func.func @constant_op() {
-  %a = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<4xi32>>
-  %b = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<4xi32>>
-  %c = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<writeonly:tensor<4xi32>>
+  %a = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<4xi32>>
+  %b = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<4xi32>>
+  %c = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<writeonly:tensor<4xi32>>
   %at = flow.dispatch.tensor.load %a, offsets = [0], sizes = [4], strides = [1] : !flow.dispatch.tensor<readonly:tensor<4xi32>> -> tensor<4xi32>
   %bt = flow.dispatch.tensor.load %b, offsets = [0], sizes = [4], strides = [1] : !flow.dispatch.tensor<readonly:tensor<4xi32>> -> tensor<4xi32>
   %select = arith.constant dense<[true, false, true, false]> : tensor<4xi1>
@@ -326,16 +308,14 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0) -> (d0)>
 func.func @constant_splat_op() {
-  %a = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<4xi32>>
-  %b = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<4xi32>>
-  %c = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<writeonly:tensor<4xi32>>
+  %a = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<4xi32>>
+  %b = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<4xi32>>
+  %c = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<writeonly:tensor<4xi32>>
   %at = flow.dispatch.tensor.load %a, offsets = [0], sizes = [4], strides = [1] : !flow.dispatch.tensor<readonly:tensor<4xi32>> -> tensor<4xi32>
   %bt = flow.dispatch.tensor.load %b, offsets = [0], sizes = [4], strides = [1] : !flow.dispatch.tensor<readonly:tensor<4xi32>> -> tensor<4xi32>
   %select = arith.constant dense<true> : tensor<4xi1>
@@ -357,18 +337,16 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @tensor_extract() {
   %c0 = arith.constant 0 : index
   %c13 = arith.constant 13 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<readonly:tensor<14xi8>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
       : !flow.dispatch.tensor<writeonly:tensor<14xi8>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [14], strides = [1]
       : !flow.dispatch.tensor<readonly:tensor<14xi8>> -> tensor<14xi8>
@@ -389,7 +367,7 @@
   return
 }
 // CHECK-LABEL: func @tensor_extract()
-//       CHECK:   %[[BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//       CHECK:   %[[BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //  CHECK-SAME:       !flow.dispatch.tensor<readonly:tensor<14xi8>>
 //       CHECK:   %[[LOAD:.+]] = flow.dispatch.tensor.load %[[BINDING]]
 //       CHECK:   %[[EXTRACTED:.+]] = tensor.extract %[[LOAD]]
@@ -425,18 +403,16 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @scatter() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<8xi8>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<8x1xi32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<3xi8>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<8xi8>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<8x1xi32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<3xi8>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [8], strides = [1] : !flow.dispatch.tensor<readonly:tensor<8xi8>> -> tensor<8xi8>
   %4 = arith.trunci %3 : tensor<8xi8> to tensor<8xi1>
   %5 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [8, 1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<8x1xi32>> -> tensor<8x1xi32>
@@ -453,9 +429,9 @@
 }
 
 // CHECK-LABEL: func.func @scatter()
-//   CHECK-DAG:   %[[UPDATES:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[INDICES:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//   CHECK-DAG:   %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//   CHECK-DAG:   %[[UPDATES:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[INDICES:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//   CHECK-DAG:   %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //   CHECK-DAG:   %[[UPDATES_TENSOR:.+]] = flow.dispatch.tensor.load %[[UPDATES]]
 //   CHECK-DAG:   %[[INDICES_TENSOR:.+]] = flow.dispatch.tensor.load %[[INDICES]]
 //   CHECK-DAG:   %[[OUT_TENSOR:.+]] = flow.dispatch.tensor.load %[[OUT]]
@@ -472,16 +448,14 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @sort() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1xi8>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<1xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1xi8>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<1xi32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [1], strides = [1] : !flow.dispatch.tensor<readonly:tensor<1xi8>> -> tensor<1xi8>
   %3 = arith.trunci %2 : tensor<1xi8> to tensor<1xi1>
   %4 = flow.dispatch.tensor.load %1, offsets = [0], sizes = [1], strides = [1] : !flow.dispatch.tensor<readwrite:tensor<1xi32>> -> tensor<1xi32>
@@ -496,8 +470,8 @@
 
 // CHECK-LABEL: func.func @sort()
 //   CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index
-//   CHECK-DAG:   %[[A:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[B:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//   CHECK-DAG:   %[[A:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[B:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //   CHECK-DAG:   %[[A_TENSOR:.+]] = flow.dispatch.tensor.load %[[A]]
 //   CHECK-DAG:   %[[B_TENSOR:.+]] = flow.dispatch.tensor.load %[[B]]
 //       CHECK:   %[[SORT:.+]]:2 = iree_linalg_ext.sort dimension(0)
@@ -511,16 +485,14 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @sort_secondary() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1xi32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<1xi8>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1xi32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<1xi8>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [1], strides = [1] : !flow.dispatch.tensor<readonly:tensor<1xi32>> -> tensor<1xi32>
   %3 = flow.dispatch.tensor.load %1, offsets = [0], sizes = [1], strides = [1] : !flow.dispatch.tensor<readwrite:tensor<1xi8>> -> tensor<1xi8>
   %4 = arith.trunci %3 : tensor<1xi8> to tensor<1xi1>
@@ -536,8 +508,8 @@
 
 // CHECK-LABEL: func.func @sort_secondary()
 //   CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index
-//   CHECK-DAG:   %[[A:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[B:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//   CHECK-DAG:   %[[A:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[B:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //   CHECK-DAG:   %[[A_TENSOR:.+]] = flow.dispatch.tensor.load %[[A]]
 //   CHECK-DAG:   %[[B_TENSOR:.+]] = flow.dispatch.tensor.load %[[B]]
 //       CHECK:   %[[SORT:.+]]:2 = iree_linalg_ext.sort dimension(0)
@@ -549,16 +521,14 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @branch_op() {
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<i8>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<i8>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<i8>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<i8>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<i8>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<i8>>
   %3 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : i8
   %4 = flow.dispatch.tensor.load %0, offsets = [], sizes = [], strides = [] : !flow.dispatch.tensor<readonly:tensor<i8>> -> tensor<i8>
   %5 = flow.dispatch.tensor.load %1, offsets = [], sizes = [], strides = [] : !flow.dispatch.tensor<readonly:tensor<i8>> -> tensor<i8>
diff --git a/compiler/src/iree/compiler/Codegen/Common/test/type_propagation_packing.mlir b/compiler/src/iree/compiler/Codegen/Common/test/type_propagation_packing.mlir
index 367a1fc..d095933 100644
--- a/compiler/src/iree/compiler/Codegen/Common/test/type_propagation_packing.mlir
+++ b/compiler/src/iree/compiler/Codegen/Common/test/type_propagation_packing.mlir
@@ -1,15 +1,13 @@
 // RUN: iree-opt --split-input-file --pass-pipeline="builtin.module(func.func(iree-codegen-type-propagation))" %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @generic_op_i4() {
   %d = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?xi4>>{%d}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?xi4>>{%d}
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?xi4>>{%d}
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?xi4>>{%d}
   %2 = flow.dispatch.tensor.load %0, offsets = [0], sizes=[%d], strides=[1] : !flow.dispatch.tensor<readonly:tensor<?xi4>>{%d} -> tensor<?xi4>
   %4 = tensor.empty(%d) : tensor<?xi4>
   %5 = linalg.generic {
@@ -25,8 +23,8 @@
 }
 
 // CHECK-LABEL: func.func @generic_op_i4()
-//   CHECK-DAG:   %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//   CHECK-DAG:   %[[IN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[OUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //   CHECK-DAG:   %[[INTENSOR:.+]] = flow.dispatch.tensor.load %[[IN]]{{.+}} -> tensor<?xi4>
 //   CHECK-DAG:   %[[INIT:.+]] = tensor.empty(%{{.+}}) : tensor<?xi4>
 //       CHECK:   %[[GENERIC:.+]] = linalg.generic
diff --git a/compiler/src/iree/compiler/Codegen/Interfaces/BufferizationInterfaces.cpp b/compiler/src/iree/compiler/Codegen/Interfaces/BufferizationInterfaces.cpp
index 1143c67..dcab0b7 100644
--- a/compiler/src/iree/compiler/Codegen/Interfaces/BufferizationInterfaces.cpp
+++ b/compiler/src/iree/compiler/Codegen/Interfaces/BufferizationInterfaces.cpp
@@ -121,8 +121,7 @@
     if (!bufferMemrefType)
       continue;
 
-    if (bufferSubspanOp.getSet() != subspanOp.getSet() ||
-        bufferSubspanOp.getBinding() != subspanOp.getBinding() ||
+    if (bufferSubspanOp.getBinding() != subspanOp.getBinding() ||
         bufferSubspanOp.getDescriptorType() != subspanOp.getDescriptorType() ||
         bufferSubspanOp.getByteOffset() != subspanOp.getByteOffset() ||
         !llvm::equal(bufferSubspanOp.getDynamicDims(),
@@ -139,7 +138,7 @@
   // Just change the result type of the InterfaceBindingSubspanOp.
   Value buffer = rewriter.create<IREE::HAL::InterfaceBindingSubspanOp>(
       subspanOp->getLoc(), memRefType, subspanOp.getLayout(),
-      subspanOp.getSet(), subspanOp.getBinding(), subspanOp.getByteOffset(),
+      subspanOp.getBinding(), subspanOp.getByteOffset(),
       subspanOp.getDynamicDims(), subspanOp.getAlignmentAttr(),
       subspanOp.getDescriptorFlagsAttr());
   rewriter.create<memref::AssumeAlignmentOp>(
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/ConvertToLLVM.cpp b/compiler/src/iree/compiler/Codegen/LLVMCPU/ConvertToLLVM.cpp
index 21847b4..e006e1d 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/ConvertToLLVM.cpp
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/ConvertToLLVM.cpp
@@ -309,9 +309,10 @@
           subspanOp,
           "failed to convert interface.binding.subspan result to memref type");
     }
-    auto memRefDesc = abi.loadBinding(
-        subspanOp, subspanOp.getFlatBindingIndex(), operands.getByteOffset(),
-        memRefType, operands.getDynamicDims(), rewriter);
+    auto memRefDesc =
+        abi.loadBinding(subspanOp, subspanOp.getBinding().getSExtValue(),
+                        operands.getByteOffset(), memRefType,
+                        operands.getDynamicDims(), rewriter);
     rewriter.replaceOp(subspanOp, {memRefDesc});
     return success();
   }
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/DispatchABI.cpp b/compiler/src/iree/compiler/Codegen/LLVMCPU/DispatchABI.cpp
index 6c9af29..154ab37 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/DispatchABI.cpp
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/DispatchABI.cpp
@@ -277,13 +277,13 @@
               getMemberOf("workgroup_size_x", getUint32T(), &offsetInBits),
               getMemberOf("workgroup_size_y", getUint32T(), &offsetInBits),
               getMemberOf("workgroup_size_z", getUint16T(), &offsetInBits),
-              getMemberOf("push_constant_count", getUint16T(), &offsetInBits),
+              getMemberOf("constant_count", getUint16T(), &offsetInBits),
               getMemberOf("workgroup_count_x", getUint32T(), &offsetInBits),
               getMemberOf("workgroup_count_y", getUint32T(), &offsetInBits),
               getMemberOf("workgroup_count_z", getUint16T(), &offsetInBits),
               getMemberOf("max_concurrency", getUint8T(), &offsetInBits),
               getMemberOf("binding_count", getUint8T(), &offsetInBits),
-              getMemberOf("push_constants",
+              getMemberOf("constants",
                           getPtrOf(getConstOf(getArrayOf(getUint32T(), 64))),
                           &offsetInBits),
               getMemberOf(
@@ -412,7 +412,7 @@
   fieldTypes.push_back(uint32Type);
   fieldTypes.push_back(uint16Type);
 
-  // uint16_t push_constant_count;
+  // uint16_t constant_count;
   fieldTypes.push_back(uint16Type);
 
   // uint32_t workgroup_count_x;
@@ -428,7 +428,7 @@
   // uint8_t binding_count;
   fieldTypes.push_back(uint8Type);
 
-  // const uint32_t * push_constants;
+  // const uint32_t * constants;
   // void *const * binding_ptrs;
   // const size_t * binding_lengths;
   fieldTypes.push_back(opaquePtrType);
@@ -698,11 +698,11 @@
 Value HALDispatchABI::loadPushConstantCount(Operation *forOp,
                                             OpBuilder &builder) {
   auto countValue =
-      loadFieldValue(forOp, DispatchStateField::push_constant_count, builder);
+      loadFieldValue(forOp, DispatchStateField::constant_count, builder);
   auto resultValue = castValueToType(
       forOp->getLoc(), countValue,
       typeConverter->convertType(builder.getIndexType()), builder);
-  return buildValueDI(forOp, resultValue, "push_constant_count", di.getSizeT(),
+  return buildValueDI(forOp, resultValue, "constant_count", di.getSizeT(),
                       builder);
 }
 
@@ -710,7 +710,7 @@
                                        Type resultType, OpBuilder &builder) {
   auto loc = forOp->getLoc();
   auto constantsPtrValue =
-      loadFieldValue(forOp, DispatchStateField::push_constants, builder);
+      loadFieldValue(forOp, DispatchStateField::constants, builder);
   auto pushConstantType = IntegerType::get(context, 32);
   Value constantPtrValue = builder.create<LLVM::GEPOp>(
       loc, constantsPtrValue.getType(), pushConstantType, constantsPtrValue,
@@ -719,8 +719,7 @@
       builder.create<LLVM::LoadOp>(loc, pushConstantType, constantPtrValue);
   auto resultValue = castValueToType(loc, constantValue, resultType, builder);
   return buildValueDI(forOp, resultValue,
-                      StringRef("push_constant[") + std::to_string(offset) +
-                          "]",
+                      StringRef("constant[") + std::to_string(offset) + "]",
                       di.getBasicType(resultType), builder);
 }
 
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/DispatchABI.h b/compiler/src/iree/compiler/Codegen/LLVMCPU/DispatchABI.h
index d87b063..ab341cd 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/DispatchABI.h
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/DispatchABI.h
@@ -156,13 +156,13 @@
     /*uint32_t*/ workgroup_size_x,
     /*uint32_t*/ workgroup_size_y,
     /*uint16_t*/ workgroup_size_z,
-    /*uint16_t*/ push_constant_count,
+    /*uint16_t*/ constant_count,
     /*uint32_t*/ workgroup_count_x,
     /*uint32_t*/ workgroup_count_y,
     /*uint16_t*/ workgroup_count_z,
     /*uint8_t*/ max_concurrency,
     /*uint8_t*/ binding_count,
-    /*intptr_t*/ push_constants,
+    /*intptr_t*/ constants,
     /*intptr_t*/ binding_ptrs,
     /*intptr_t*/ binding_lengths,
   };
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/aarch64_dotprod_vector_lowering.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/aarch64_dotprod_vector_lowering.mlir
index 327ef62..839e38b 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/aarch64_dotprod_vector_lowering.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/aarch64_dotprod_vector_lowering.mlir
@@ -5,11 +5,9 @@
     data_layout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128",
     native_vector_size = 16 : index,
     target_triple = "aarch64-none-linux-android29"}>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @mmt4d_kernel_dispatch() attributes {hal.executable.target = #target} {
   %c0_i8 = arith.constant 0 : i8
@@ -19,11 +17,11 @@
   %c0 = arith.constant 0 : index
   %c128 = arith.constant 128 : index
   %c64 = arith.constant 64 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<1x2x8x4xi8>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<1x2x8x4xi8>
   memref.assume_alignment %0, 64 : memref<1x2x8x4xi8>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c64) : memref<1x2x8x4xi8>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c64) : memref<1x2x8x4xi8>
   memref.assume_alignment %1, 64 : memref<1x2x8x4xi8>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c128) : memref<1x1x8x8xi32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c128) : memref<1x1x8x8xi32>
   memref.assume_alignment %2, 64 : memref<1x1x8x8xi32>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/aarch64_vector_lowering.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/aarch64_vector_lowering.mlir
index 814138a..38dd757 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/aarch64_vector_lowering.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/aarch64_vector_lowering.mlir
@@ -2,12 +2,10 @@
 // RUN: iree-opt %s --pass-pipeline="builtin.module(func.func(iree-llvmcpu-mmt4d-vector-lowering{vector-contract-custom-kernels=false}))" --split-input-file | FileCheck %s -check-prefix=CHECK-KERNEL-OFF
 // RUN: iree-opt %s --pass-pipeline="builtin.module(func.func(iree-llvmcpu-mmt4d-vector-lowering{vector-contract-custom-kernels=true}))" --split-input-file | FileCheck %s -check-prefix=CHECK-KERNEL-ON
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map0 = affine_map<()[s0] -> (s0 * 64)>
 #map1 = affine_map<(d0, d1, d2) -> (d0, d2)>
@@ -24,9 +22,9 @@
     %cst_0 = arith.constant 0.000000e+00 : f32
     %c384 = arith.constant 384 : index
     %c128 = arith.constant 128 : index
-    %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<384x512xf32>>
-    %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<512x128xf32>>
-    %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<384x128xf32>>
+    %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<384x512xf32>>
+    %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<512x128xf32>>
+    %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<384x128xf32>>
     %workgroup_id_x = hal.interface.workgroup.id[0] : index
     %workgroup_count_x = hal.interface.workgroup.count[0] : index
     %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -71,9 +69,9 @@
 //  CHECK-DAG: %[[C16:.+]] = arith.constant 16 : index
 //  CHECK-DAG: %[[C32:.+]] = arith.constant 32 : index
 //  CHECK-DAG: %[[C64:.+]] = arith.constant 64 : index
-//      CHECK: %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<384x512xf32>>
-//      CHECK: %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<512x128xf32>>
-//      CHECK: %[[DST:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<384x128xf32>>
+//      CHECK: %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : !flow.dispatch.tensor<readonly:tensor<384x512xf32>>
+//      CHECK: %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) : !flow.dispatch.tensor<readonly:tensor<512x128xf32>>
+//      CHECK: %[[DST:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2) : !flow.dispatch.tensor<writeonly:tensor<384x128xf32>>
 //      CHECK: %[[DST_TILE_INIT:.+]] = tensor.empty()
 //      CHECK: scf.for %[[I_IDX:.+]] = {{.*}} to %[[C384]] step %{{[0-9]*}} {
 //      CHECK:   %[[LHS_TILE:.+]] = flow.dispatch.tensor.load %[[LHS]], {{.*}} -> tensor<64x512xf32>
@@ -95,15 +93,13 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>,
-    #hal.descriptor_set.binding<4, storage_buffer>,
-    #hal.descriptor_set.binding<5, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map0 = affine_map<()[s0] -> (s0 * 64)>
 #map1 = affine_map<(d0, d1, d2) -> (d0, d2)>
@@ -125,12 +121,12 @@
     %c1835008 = arith.constant 1835008 : index
     %c0 = arith.constant 0 : index
     %c64 = arith.constant 64 : index
-    %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<384xi32>>
-    %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<384x512xf32>>
-    %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<384x384xf32>>
-    %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<readonly:tensor<384x512xf32>>
-    %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(4) offset(%c1835008) : !flow.dispatch.tensor<readonly:tensor<2x512xf32>>
-    %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(5) : !flow.dispatch.tensor<writeonly:tensor<384x512xf32>>
+    %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<384xi32>>
+    %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<384x512xf32>>
+    %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<384x384xf32>>
+    %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<readonly:tensor<384x512xf32>>
+    %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(4) offset(%c1835008) : !flow.dispatch.tensor<readonly:tensor<2x512xf32>>
+    %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(5) : !flow.dispatch.tensor<writeonly:tensor<384x512xf32>>
     %6 = flow.dispatch.tensor.load %4, offsets = [0, 0], sizes = [2, 512], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2x512xf32>> -> tensor<2x512xf32>
     %workgroup_id_x = hal.interface.workgroup.id[0] : index
     %workgroup_count_x = hal.interface.workgroup.count[0] : index
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/apply_scale_lowering.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/apply_scale_lowering.mlir
index 20c5933..4b3902c 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/apply_scale_lowering.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/apply_scale_lowering.mlir
@@ -1,10 +1,8 @@
 // RUN: iree-opt --split-input-file --pass-pipeline='builtin.module(hal.executable(hal.executable.variant(builtin.module(iree-convert-to-llvm))))' %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_riscv_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-riscv_64", {
   cpu_features = "+m,+a,+f,+d,+c",
@@ -27,8 +25,8 @@
         %cst = arith.constant dense<19689> : vector<2xi32>
         %cst_0 = arith.constant dense<15> : vector<2xi8>
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<2xi32>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<2xi32>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<2xi32>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<2xi32>
         %2 = vector.load %0[%c0] : memref<2xi32>, vector<2xi32>
         %3 = tosa.apply_scale %2, %cst, %cst_0 {double_round = false} : (vector<2xi32>, vector<2xi32>, vector<2xi8>) -> vector<2xi32>
         vector.store %3, %1[%c0] : memref<2xi32>, vector<2xi32>
@@ -48,11 +46,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_riscv_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-riscv_64", {
   cpu_features = "+m,+a,+f,+d,+c,+zvl512b,+v",
@@ -75,8 +71,8 @@
         %cst = arith.constant dense<19689> : vector<2xi32>
         %cst_0 = arith.constant dense<15> : vector<2xi8>
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<2xi32>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<2xi32>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<2xi32>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<2xi32>
         %2 = vector.load %0[%c0] : memref<2xi32>, vector<2xi32>
         %3 = tosa.apply_scale %2, %cst, %cst_0 {double_round = false} : (vector<2xi32>, vector<2xi32>, vector<2xi8>) -> vector<2xi32>
         vector.store %3, %1[%c0] : memref<2xi32>, vector<2xi32>
@@ -94,11 +90,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_riscv_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-riscv_64", {
   cpu_features = "+m,+a,+f,+d,+c,+zvl512b,+zve64x",
@@ -121,8 +115,8 @@
         %cst = arith.constant dense<19689> : vector<2xi32>
         %cst_0 = arith.constant dense<15> : vector<2xi8>
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<2xi32>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<2xi32>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<2xi32>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<2xi32>
         %2 = vector.load %0[%c0] : memref<2xi32>, vector<2xi32>
         %3 = tosa.apply_scale %2, %cst, %cst_0 {double_round = false} : (vector<2xi32>, vector<2xi32>, vector<2xi8>) -> vector<2xi32>
         vector.store %3, %1[%c0] : memref<2xi32>, vector<2xi32>
@@ -140,11 +134,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_riscv_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-riscv_64", {
   cpu_features = "+m,+a,+f,+d,+c,+zvl512b,+zve32x",
@@ -167,8 +159,8 @@
         %cst = arith.constant dense<19689> : vector<2xi32>
         %cst_0 = arith.constant dense<15> : vector<2xi8>
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<2xi32>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<2xi32>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<2xi32>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<2xi32>
         %2 = vector.load %0[%c0] : memref<2xi32>, vector<2xi32>
         %3 = tosa.apply_scale %2, %cst, %cst_0 {double_round = false} : (vector<2xi32>, vector<2xi32>, vector<2xi8>) -> vector<2xi32>
         vector.store %3, %1[%c0] : memref<2xi32>, vector<2xi32>
@@ -193,11 +185,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_riscv_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-riscv_64", {
   cpu_features = "+m,+a,+f,+d,+c,+zvl512b,+zve32f",
@@ -220,8 +210,8 @@
         %cst = arith.constant dense<19689> : vector<2xi32>
         %cst_0 = arith.constant dense<15> : vector<2xi8>
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<2xi32>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<2xi32>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<2xi32>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<2xi32>
         %2 = vector.load %0[%c0] : memref<2xi32>, vector<2xi32>
         %3 = tosa.apply_scale %2, %cst, %cst_0 {double_round = false} : (vector<2xi32>, vector<2xi32>, vector<2xi8>) -> vector<2xi32>
         vector.store %3, %1[%c0] : memref<2xi32>, vector<2xi32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/convert_to_llvm.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/convert_to_llvm.mlir
index 6268c4f..f9189db 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/convert_to_llvm.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/convert_to_llvm.mlir
@@ -43,11 +43,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @interleave_and_bitcast_lowering() {
   %cst = arith.constant dense<4> : vector<4x2xi8>
@@ -58,8 +56,8 @@
   %c3 = arith.constant 3 : index
   %c4096 = arith.constant 4096 : index
   %c8192 = arith.constant 8192 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c4096) flags(ReadOnly) : memref<128xi8, strided<[1], offset: 4096>>
-  %out_buffer = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c8192) : memref<256x64xi4, strided<[64, 1], offset: 8192>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c4096) flags(ReadOnly) : memref<128xi8, strided<[1], offset: 4096>>
+  %out_buffer = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c8192) : memref<256x64xi4, strided<[64, 1], offset: 8192>>
   %2 = vector.load %0[%c0] : memref<128xi8, strided<[1], offset: 4096>>, vector<2xi8>
   %3 = vector.bitcast %2 : vector<2xi8> to vector<4xi4>
   %4 = vector.insert %3, %cst_0 [3] : vector<4xi4> into vector<4x4xi4>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/hal_interface_bindings.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/hal_interface_bindings.mlir
index 8675111..bda5e45 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/hal_interface_bindings.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/hal_interface_bindings.mlir
@@ -1,10 +1,8 @@
 // RUN: iree-opt --iree-convert-to-llvm --split-input-file %s | FileCheck %s --dump-input=always
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK-LABEL: llvm.func @binding_ptrs(
@@ -19,7 +17,7 @@
   // CHECK: %[[BASE_PTR:.+]] = llvm.load %[[ARRAY_PTR]] : !llvm.ptr -> !llvm.ptr
   %c72 = arith.constant 72 : index
   %c128 = arith.constant 128 : index
-  %memref = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) offset(%c72) : memref<?x2xf32, strided<[2, 1], offset: 18>>{%c128}
+  %memref = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) offset(%c72) : memref<?x2xf32, strided<[2, 1], offset: 18>>{%c128}
 
   // CHECK: %[[OFFSET_PTR0:.+]] = llvm.getelementptr %[[BASE_PTR]][18]
   // CHECK: %[[OFFSET_D0:.+]] = llvm.mul %[[C5]], %[[C2]]
@@ -40,11 +38,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 4, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 4, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK-LABEL: llvm.func @binding_ptrs_dynamic(
@@ -80,7 +76,7 @@
   // CHECK: %[[BINDING_PTRS:.+]] = llvm.extractvalue %[[STATE3]][10]
   // CHECK: %[[ARRAY_PTR:.+]] = llvm.getelementptr %[[BINDING_PTRS]][1] : (!llvm.ptr) -> !llvm.ptr, !llvm.ptr
   // CHECK: %[[BASE_PTR:.+]] = llvm.load %[[ARRAY_PTR]] : !llvm.ptr -> !llvm.ptr
-  %memref = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) offset(%offset) : memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>{%dim0, %dim1, %dim2}
+  %memref = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) offset(%offset) : memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>{%dim0, %dim1, %dim2}
 
   // CHECK: %[[BASE_BIT_OFFSET:.+]] = llvm.mul %[[OFFSET_ZEXT]], %[[C8]]
   // CHECK: %[[BASE_OFFSET:.+]] = llvm.udiv %[[BASE_BIT_OFFSET]], %[[C32]]
@@ -108,11 +104,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK-LABEL: llvm.func @binding_ptrs_sub_byte_dynamic(
@@ -131,7 +125,7 @@
   // CHECK: %[[BINDING_PTRS:.+]] = llvm.extractvalue %[[STATE3]][10]
   // CHECK: %[[ARRAY_PTR:.+]] = llvm.getelementptr %[[BINDING_PTRS]][1] : (!llvm.ptr) -> !llvm.ptr, !llvm.ptr
   // CHECK: %[[BASE_PTR:.+]] = llvm.load %[[ARRAY_PTR]] : !llvm.ptr -> !llvm.ptr
-  %memref = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) offset(%offset) : memref<?xi4, strided<[1], offset: ?>>{%dim0}
+  %memref = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) offset(%offset) : memref<?xi4, strided<[1], offset: ?>>{%dim0}
 
   // CHECK: %[[BASE_BIT_OFFSET:.+]] = llvm.mul %[[OFFSET_ZEXT]], %[[C8]]
   // CHECK: %[[BASE_OFFSET:.+]] = llvm.udiv %[[BASE_BIT_OFFSET]], %[[C4]]
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/hal_interface_constants.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/hal_interface_constants.mlir
index cccef04..9be75c9 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/hal_interface_constants.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/hal_interface_constants.mlir
@@ -1,9 +1,7 @@
 // RUN: iree-opt --iree-convert-to-llvm --split-input-file %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK-LABEL: llvm.func @constant_values
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/illegal_configuration.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/illegal_configuration.mlir
index 7601ed0..db93959 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/illegal_configuration.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/illegal_configuration.mlir
@@ -1,20 +1,18 @@
 // RUN: iree-opt --pass-pipeline='builtin.module(iree-llvmcpu-select-lowering-strategy)' --verify-diagnostics --split-input-file %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = []>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64">
 #translation = #iree_codegen.translation_info<CPUDoubleTilingExpert>
 func.func @illegal() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_, translation_info = #translation} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<4x8xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<8x16xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<4x16xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<4x8xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<8x16xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<4x16xf32>
   // expected-error @+1 {{expected four tiling levels, got 0}}
   linalg.matmul {lowering_config = #config} ins(%0, %1 : memref<4x8xf32>, memref<8x16xf32>) outs(%2 : memref<4x16xf32>)
   return
@@ -22,21 +20,19 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[4, 8], [8, 8, 0], [0, 0, 8], [0, 0, 0]], native_vector_size = [0, 0, 4]>
 #translation = #iree_codegen.translation_info<CPUDoubleTilingExpert>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64">
 func.func @illegal() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_, translation_info = #translation} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<4x8xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<8x16xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<4x16xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<4x8xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<8x16xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<4x16xf32>
   // expected-error @+1 {{native_vector_size must be empty}}
   linalg.matmul {lowering_config = #config} ins(%0, %1 : memref<4x8xf32>, memref<8x16xf32>) outs(%2 : memref<4x16xf32>)
   return
@@ -44,12 +40,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[64, 64], [8, 32, 16], [0, 0, 16], [0, 0, 0]]>
 #translation = #iree_codegen.translation_info<CPUDoubleTilingExpert>
@@ -57,9 +51,9 @@
 module {
   func.func @illegal() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_, translation_info = #translation} {
     %c0 = arith.constant 0 : index
-    %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<4x8xf32>
-    %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<8x16xf32>
-    %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<4x16xf32>
+    %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<4x8xf32>
+    %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<8x16xf32>
+    %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<4x16xf32>
     // expected-error @+1 {{expected only parallel dims to be set in the second tiling level, got 2-th tile size set}}
     linalg.matmul {lowering_config = #config} ins(%0, %1 : memref<4x8xf32>, memref<8x16xf32>) outs(%2 : memref<4x16xf32>)
     return
@@ -68,21 +62,19 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[64, 64], [8, 0, 0], [0, 16, 16], [0, 0, 0]]>
 #translation = #iree_codegen.translation_info<CPUDoubleTilingExpert>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64">
 func.func @illegal() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_, translation_info = #translation} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<4x8xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<8x16xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<4x16xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<4x8xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<8x16xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<4x16xf32>
   // expected-error @+1 {{only reduction dims to be set in the third tiling level, got 1-th tile size set}}
   linalg.matmul {lowering_config = #config} ins(%0, %1 : memref<4x8xf32>, memref<8x16xf32>) outs(%2 : memref<4x16xf32>)
   return
@@ -90,21 +82,19 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [{sizes = [4, 8], interchange = [1]}, [8, 8, 0], [0, 0, 8], [0, 0, 0]]>
 #translation = #iree_codegen.translation_info<CPUDoubleTilingExpert>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64">
 func.func @illegal() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_, translation_info = #translation} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<4x8xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<8x16xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<4x16xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<4x8xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<8x16xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<4x16xf32>
   // expected-error @+1 {{expected [0, 2) to be set exactly once in interchange #0}}
   linalg.matmul {lowering_config = #config} ins(%0, %1 : memref<4x8xf32>, memref<8x16xf32>) outs(%2 : memref<4x16xf32>)
   return
@@ -112,21 +102,19 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[0, 7, 7, 64, 0, 0, 0], [6, 1, 7, 32, 0, 0, 0], [0, 0, 0, 0, 3, 3, 4], [0, 0, 0, 0, 0, 0, 0]]>
 #translation = #iree_codegen.translation_info<CPUConvTileAndDecomposeExpert>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64">
 func.func @illegal() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_, translation_info = #translation} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<36x9x9x512xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<3x3x512x512xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<36x7x7x512xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<36x9x9x512xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<3x3x512x512xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<36x7x7x512xf32>
   // expected-error @+1 {{can't decompose the conv op}}
   linalg.conv_2d_nhwc_hwcf {lowering_config = #config} ins(%0, %1 : memref<36x9x9x512xf32>, memref<3x3x512x512xf32>) outs(%2 : memref<36x7x7x512xf32>)
   return
@@ -134,12 +122,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[0, 1, 7, 64, 0, 0], [1, 1, 7, 8, 0, 0], [0, 0, 0, 0, 5, 5], [0, 0, 0, 0, 0, 0]]>
 #translation = #iree_codegen.translation_info<CPUConvTileAndDecomposeExpert>
@@ -147,9 +133,9 @@
 module {
   func.func @illegal() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_, translation_info = #translation} {
     %c0 = arith.constant 0 : index
-    %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<1x11x11x576xf32>
-    %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<5x5x576xf32>
-    %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<1x7x7x576xf32>
+    %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<1x11x11x576xf32>
+    %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<5x5x576xf32>
+    %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<1x7x7x576xf32>
     // expected-error @+1 {{can't decompose the conv op}}
     linalg.depthwise_conv_2d_nhwc_hwc {dilations = dense<1> : tensor<2xi64>, lowering_config = #config, strides = dense<1> : tensor<2xi64>} ins(%0, %1 : memref<1x11x11x576xf32>, memref<5x5x576xf32>) outs(%2 : memref<1x7x7x576xf32>)
     return
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/peel.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/peel.mlir
index 10aa93a..a44a370 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/peel.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/peel.mlir
@@ -1,11 +1,9 @@
 // RUN: iree-opt --pass-pipeline="builtin.module(func.func(iree-llvmcpu-peel))" -split-input-file %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @peel_static_matmul() {
   %c16 = arith.constant 16 : index
@@ -16,9 +14,9 @@
   %c512 = arith.constant 512 : index
   %c128 = arith.constant 128 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<128x49xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<49x512xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<128x512xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<128x49xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<49x512xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<128x512xf32>>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_arm_sme_streaming_mode_tests.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_arm_sme_streaming_mode_tests.mlir
index f9eae56..7769fc6 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_arm_sme_streaming_mode_tests.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_arm_sme_streaming_mode_tests.mlir
@@ -1,12 +1,10 @@
 // RUN: iree-opt --iree-codegen-linalg-to-llvm-pipeline=enable-arm-sme --split-input-file %s | FileCheck %s
 // RUN: iree-opt --iree-codegen-linalg-to-llvm-pipeline=enable-arm-sme --iree-llvmcpu-force-arm-streaming --split-input-file %s | FileCheck %s -check-prefixes=FORCE-ARM-STREAMING
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 module {
 module {
@@ -16,7 +14,7 @@
     %c1 = arith.constant 1 : index
     %cst = arith.constant 0.000000e+00 : f32
     %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : i32
-    %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<1xf32>>
+    %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<1xf32>>
     %2 = tensor.empty() : tensor<1xf32>
     %3 = linalg.fill {lowering_config = #iree_codegen.lowering_config<tile_sizes = [[0], [1], [0], [0]]>}
         ins(%cst : f32) outs(%2 : tensor<1xf32>) -> tensor<1xf32>
@@ -39,12 +37,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 module {
 module {
@@ -54,7 +50,7 @@
     %c1 = arith.constant 1 : index
     %cst = arith.constant 0.000000e+00 : f32
     %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : i32
-    %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<1xf32>>
+    %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<1xf32>>
     %2 = tensor.empty() : tensor<1xf32>
     %3 = linalg.fill {lowering_config = #iree_codegen.lowering_config<tile_sizes = [[0], [[1]], [0], [0]]>}
         ins(%cst : f32) outs(%2 : tensor<1xf32>) -> tensor<1xf32>
@@ -78,12 +74,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 module {
 module {
@@ -93,7 +87,7 @@
     %c1 = arith.constant 1 : index
     %cst = arith.constant 0.000000e+00 : f32
     %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : i32
-    %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<100x100xf32>>
+    %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<100x100xf32>>
     %2 = tensor.empty() : tensor<100x100xf32>
     %3 = linalg.fill {lowering_config = #iree_codegen.lowering_config<tile_sizes = [[0, 0], [[4], [4]], [0, 0], [0, 0]]>}
         ins(%cst : f32) outs(%2 : tensor<100x100xf32>) -> tensor<100x100xf32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_pack_unpack_tests.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_pack_unpack_tests.mlir
index e26a427..faca283 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_pack_unpack_tests.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_pack_unpack_tests.mlir
@@ -1,11 +1,9 @@
 // RUN: iree-opt --pass-pipeline='builtin.module(iree-llvmcpu-select-lowering-strategy, func.func(iree-llvmcpu-lower-executable-target))' --split-input-file %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "+avx512f", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 64 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1) -> (d1)>
@@ -15,9 +13,9 @@
     %c0 = arith.constant 0 : index
     %cst = arith.constant 3.40282347E+38 : f32
     %cst_0 = arith.constant 0.000000e+00 : f32
-    %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<512xf32>>
-    %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<384x512xf32>>
-    %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<24x512x16x1xf32>>
+    %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<512xf32>>
+    %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<384x512xf32>>
+    %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<24x512x16x1xf32>>
     %3 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [512], strides = [1] : !flow.dispatch.tensor<readonly:tensor<512xf32>> -> tensor<512xf32>
     %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [384, 512], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<384x512xf32>> -> tensor<384x512xf32>
     %5 = tensor.empty() : tensor<24x512x16x1xf32>
@@ -47,12 +45,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "+avx512f", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 64 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1) -> (d1)>
@@ -62,9 +58,9 @@
     %c0 = arith.constant 0 : index
     %cst = arith.constant 3.40282347E+38 : f32
     %cst_0 = arith.constant 0.000000e+00 : f32
-    %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<24x32x16x16xf32>>
-    %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<512xf32>>
-    %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<384x512xf32>>
+    %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<24x32x16x16xf32>>
+    %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<512xf32>>
+    %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<384x512xf32>>
     %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [24, 32, 16, 16], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<24x32x16x16xf32>> -> tensor<24x32x16x16xf32>
     %4 = flow.dispatch.tensor.load %1, offsets = [0], sizes = [512], strides = [1] : !flow.dispatch.tensor<readonly:tensor<512xf32>> -> tensor<512xf32>
     %5 = tensor.empty() : tensor<384x512xf32>
@@ -93,19 +89,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "+avx512f", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 64 : index, target_triple = "x86_64-none-elf"}>
 module {
   func.func @unaligned_pack() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
     %c0 = arith.constant 0 : index
     %cst = arith.constant 0.000000e+00 : f32
-    %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<383x512xf32>>
-    %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<24x512x16x1xf32>>
+    %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<383x512xf32>>
+    %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<24x512x16x1xf32>>
     %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [383, 512], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<383x512xf32>> -> tensor<383x512xf32>
     %3 = tensor.empty() : tensor<24x512x16x1xf32>
     %pack = tensor.pack %2 padding_value(%cst : f32) inner_dims_pos = [0, 1] inner_tiles = [16, 1] into %3 : tensor<383x512xf32> -> tensor<24x512x16x1xf32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_pad_conv_tests.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_pad_conv_tests.mlir
index b619559..71c88e4 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_pad_conv_tests.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_pad_conv_tests.mlir
@@ -1,11 +1,9 @@
 // RUN: iree-opt --pass-pipeline='builtin.module(iree-llvmcpu-select-lowering-strategy, func.func(iree-llvmcpu-lower-executable-target))' --split-input-file %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 5, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 5, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu = "generic", cpu_features = "+avx512f", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 32 : index, target_triple = "x86_64-none-elf"}>
 func.func @pad_conv_2d_nchw_fchw_1x320x64x64x320x3x3() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
@@ -23,12 +21,12 @@
   %7 = arith.index_castui %2 {stream.alignment = 256 : index, stream.values = [10507520 : index, 21488640 : index]} : i32 to index
   %8 = arith.index_castui %3 {stream.alignment = 256 : index, stream.values = [10508800 : index, 21489920 : index]} : i32 to index
   %9 = arith.index_castui %4 {stream.alignment = 128 : index, stream.values = [10486400 : index, 10487680 : index]} : i32 to index
-  %10 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c5243520) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x320x64x64xf32>>
-  %11 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%6) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<320x320x3x3xf32>>
-  %12 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%7) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x320xf32>>
-  %13 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%8) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x320xf32>>
-  %14 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%5) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x320xf32>>
-  %15 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%9) : !flow.dispatch.tensor<writeonly:tensor<1x320x64x64xf32>>
+  %10 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c5243520) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x320x64x64xf32>>
+  %11 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%6) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<320x320x3x3xf32>>
+  %12 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%7) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x320xf32>>
+  %13 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%8) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x320xf32>>
+  %14 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%5) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x320xf32>>
+  %15 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%9) : !flow.dispatch.tensor<writeonly:tensor<1x320x64x64xf32>>
   %16 = flow.dispatch.tensor.load %10, offsets = [0, 0, 0, 0], sizes = [1, 320, 64, 64], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x320x64x64xf32>> -> tensor<1x320x64x64xf32>
   %17 = flow.dispatch.tensor.load %11, offsets = [0, 0, 0, 0], sizes = [320, 320, 3, 3], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<320x320x3x3xf32>> -> tensor<320x320x3x3xf32>
   %18 = flow.dispatch.tensor.load %12, offsets = [0, 0], sizes = [1, 320], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1x320xf32>> -> tensor<1x320xf32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_pad_tests.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_pad_tests.mlir
index ab8b261..899c422 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_pad_tests.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_pad_tests.mlir
@@ -1,18 +1,16 @@
 // RUN: iree-opt --pass-pipeline="builtin.module(iree-llvmcpu-select-lowering-strategy, func.func(iree-llvmcpu-lower-executable-target))" --split-input-file %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu = "generic", cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf"}>
 func.func @pad_only_dispatch() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c634816 = arith.constant 634816 : index
   %c3846080 = arith.constant 3846080 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c634816) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x112x112x64xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c3846080) : !flow.dispatch.tensor<writeonly:tensor<1x114x114x64xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c634816) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x112x112x64xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c3846080) : !flow.dispatch.tensor<writeonly:tensor<1x114x114x64xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 112, 112, 64], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x112x112x64xf32>> -> tensor<1x112x112x64xf32>
   %padded = tensor.pad %2 low[0, 1, 1, 0] high[0, 1, 1, 0] {
   ^bb0(%arg0: index, %arg1: index, %arg2: index, %arg3: index):
@@ -47,12 +45,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu = "generic", cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
@@ -64,10 +60,10 @@
   %c0 = arith.constant 0 : index
   %cst = arith.constant 1.001000e-05 : f32
   %cst_0 = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c802816) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x56x56x256xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c72545728) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x1x256x128xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c72676800) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128xf32>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x30x30x128xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c802816) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x56x56x256xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c72545728) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x1x256x128xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c72676800) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128xf32>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x30x30x128xf32>>
   %4 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 56, 56, 256], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x56x56x256xf32>> -> tensor<1x56x56x256xf32>
   %5 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [1, 1, 256, 128], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x1x256x128xf32>> -> tensor<1x1x256x128xf32>
   %6 = flow.dispatch.tensor.load %2, offsets = [0], sizes = [128], strides = [1] : !flow.dispatch.tensor<readonly:tensor<128xf32>> -> tensor<128xf32>
@@ -128,20 +124,18 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu = "generic", cpu_features = "+avx512f", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf"}>
 func.func @pad_consumer_fusion_dispatch() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x14x14x256xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<3x3x256x256xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<1x14x14x256xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x14x14x256xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<3x3x256x256xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<1x14x14x256xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 14, 14, 256], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x14x14x256xf32>> -> tensor<1x14x14x256xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [3, 3, 256, 256], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x256x256xf32>> -> tensor<3x3x256x256xf32>
   %5 = flow.dispatch.tensor.load %2, offsets = [0, 0, 0, 0], sizes = [1, 14, 14, 256], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readwrite:tensor<1x14x14x256xf32>> -> tensor<1x14x14x256xf32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_peel_and_vectorize_tests.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_peel_and_vectorize_tests.mlir
index e8602dc..d3c411e 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_peel_and_vectorize_tests.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_peel_and_vectorize_tests.mlir
@@ -1,20 +1,18 @@
 // RUN: iree-opt --pass-pipeline='builtin.module(func.func(iree-llvmcpu-lower-executable-target))' -split-input-file %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[64, 64, 0], [8, 32, 0], [0, 0, 16], [0, 0, 0]]>
 #translation = #iree_codegen.translation_info<CPUDoubleTilingExpert, {enable_loop_peeling = true}>
 #executable_target_system_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "system-elf-x86_64", {native_vector_size = 64}>
 func.func @no_peel_static_matmul() attributes {hal.executable.target = #executable_target_system_elf_x86_64_, translation_info = #translation} {
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<128x64xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<64x512xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<128x512xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<128x64xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<64x512xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<128x512xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [128, 64], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<128x64xf32>> -> tensor<128x64xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [64, 512], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<64x512xf32>> -> tensor<64x512xf32>
   %5 = tensor.empty() : tensor<128x512xf32>
@@ -34,21 +32,19 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[65, 65, 0], [8, 32, 0], [0, 0, 16], [0, 0, 0]]>
 #translation = #iree_codegen.translation_info<CPUDoubleTilingExpert, {enable_loop_peeling = true}>
 #executable_target_system_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "system-elf-x86_64", {native_vector_size = 64}>
 func.func @peel_static_matmul() attributes {hal.executable.target = #executable_target_system_elf_x86_64_, translation_info = #translation} {
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<128x49xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<49x512xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<128x512xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<128x49xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<49x512xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<128x512xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [128, 49], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<128x49xf32>> -> tensor<128x49xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [49, 512], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<49x512xf32>> -> tensor<49x512xf32>
   %5 = tensor.empty() : tensor<128x512xf32>
@@ -80,12 +76,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[64, 64, 0], [8, 32, 0], [0, 0, 16], [0, 0, 0]]>
 #translation = #iree_codegen.translation_info<CPUDoubleTilingExpert, {enable_loop_peeling = true}>
@@ -98,9 +92,9 @@
   %3 = arith.index_cast %0 : i32 to index
   %4 = arith.index_cast %1 : i32 to index
   %5 = arith.index_cast %2 : i32 to index
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%4, %3}
-  %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%3, %5}
-  %8 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%4, %5}
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%4, %3}
+  %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%3, %5}
+  %8 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%4, %5}
   %9 = flow.dispatch.tensor.load %6, offsets = [0, 0], sizes = [%4, %3], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%4, %3} -> tensor<?x?xf32>
   %10 = flow.dispatch.tensor.load %7, offsets = [0, 0], sizes = [%3, %5], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%3, %5} -> tensor<?x?xf32>
   %11 = tensor.empty(%4, %5) : tensor<?x?xf32>
@@ -140,12 +134,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[0, 0, 0], [8, [32], 0], [0, 0, 1], [0, 0, 0]]>
 #translation = #iree_codegen.translation_info<CPUDoubleTilingExpert, {enable_loop_peeling = true}>
@@ -159,9 +151,9 @@
     %3 = arith.index_cast %0 : i32 to index
     %4 = arith.index_cast %1 : i32 to index
     %5 = arith.index_cast %2 : i32 to index
-    %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%4, %3}
-    %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%3, %5}
-    %8 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%4, %5}
+    %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%4, %3}
+    %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%3, %5}
+    %8 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%4, %5}
     %9 = flow.dispatch.tensor.load %6, offsets = [0, 0], sizes = [%4, %3], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%4, %3} -> tensor<?x?xf32>
     %10 = flow.dispatch.tensor.load %7, offsets = [0, 0], sizes = [%3, %5], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%3, %5} -> tensor<?x?xf32>
     %11 = tensor.empty(%4, %5) : tensor<?x?xf32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_split_reduction_tests.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_split_reduction_tests.mlir
index 3d4f9c7..639f813 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_split_reduction_tests.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_split_reduction_tests.mlir
@@ -1,11 +1,9 @@
 // RUN: iree-opt --pass-pipeline='builtin.module(iree-llvmcpu-select-lowering-strategy, func.func(iree-llvmcpu-lower-executable-target))' --iree-llvmcpu-reassociate-fp-reductions=false --split-input-file %s | FileCheck %s
 // RUN: iree-opt --pass-pipeline='builtin.module(iree-llvmcpu-select-lowering-strategy, func.func(iree-llvmcpu-lower-executable-target))' --iree-llvmcpu-reassociate-fp-reductions=true --split-input-file %s | FileCheck %s --check-prefix=REORDERCHECK
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
@@ -15,8 +13,8 @@
   %c0 = arith.constant 0 : index
   %cst = arith.constant dense<0> : tensor<1024x512xi32>
   %c1_i32 = arith.constant 1 : i32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1024x512x256xi32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x512xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1024x512x256xi32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x512xi32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [1024, 512, 256], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1024x512x256xi32>> -> tensor<1024x512x256xi32>
   %3 = tensor.empty() : tensor<1024x512xi32>
   %4 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel", "reduction"]} ins(%2 : tensor<1024x512x256xi32>) outs(%cst : tensor<1024x512xi32>) {
@@ -48,11 +46,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
@@ -62,8 +58,8 @@
   %c0 = arith.constant 0 : index
   %cst = arith.constant dense<0.000000e+00> : tensor<1024x512xf32>
   %cst_0 = arith.constant 1.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1024x512x256xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x512xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1024x512x256xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x512xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [1024, 512, 256], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1024x512x256xf32>> -> tensor<1024x512x256xf32>
   %3 = tensor.empty() : tensor<1024x512xf32>
   %4 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel", "reduction"]} ins(%2 : tensor<1024x512x256xf32>) outs(%cst : tensor<1024x512xf32>) {
@@ -98,11 +94,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
@@ -112,8 +106,8 @@
   %c0 = arith.constant 0 : index
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : i32
   %1 = arith.index_castui %0 : i32 to index
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1024x?x256xi32>>{%1}
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x?xi32>>{%1}
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1024x?x256xi32>>{%1}
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x?xi32>>{%1}
   %4 = flow.dispatch.tensor.load %2, offsets = [0, 0, 0], sizes = [1024, %1, 256], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1024x?x256xi32>>{%1} -> tensor<1024x?x256xi32>
   %5 = tensor.empty(%1) : tensor<1024x?xi32>
   %6 = linalg.fill ins(%c0_i32 : i32) outs(%5 : tensor<1024x?xi32>) -> tensor<1024x?xi32>
@@ -140,11 +134,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
@@ -152,8 +144,8 @@
 func.func @split_reduction_innermost_reduction_next_imperfect_tiling_supported() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c0 = arith.constant 0 : index
   %cst = arith.constant dense<0> : tensor<1024x513xi32>
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1024x513x256xi32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x513xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1024x513x256xi32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x513xi32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [1024, 513, 256], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1024x513x256xi32>> -> tensor<1024x513x256xi32>
   %3 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel", "reduction"]} ins(%2 : tensor<1024x513x256xi32>) outs(%cst : tensor<1024x513xi32>) {
   ^bb0(%in: i32, %out: i32):
@@ -178,11 +170,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
@@ -192,8 +182,8 @@
   %c0 = arith.constant 0 : index
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : i32
   %1 = arith.index_castui %0 : i32 to index
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x512xi32>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1024x512x?xi32>>{%1}
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x512xi32>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1024x512x?xi32>>{%1}
   %4 = flow.dispatch.tensor.load %3, offsets = [0, 0, 0], sizes = [1024, 512, %1], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1024x512x?xi32>>{%1} -> tensor<1024x512x?xi32>
   %5 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel", "reduction"]} ins(%4 : tensor<1024x512x?xi32>) outs(%cst : tensor<1024x512xi32>) {
   ^bb0(%in: i32, %out: i32):
@@ -209,11 +199,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
@@ -221,8 +209,8 @@
 func.func @split_reduction_innermost_imperfect_reduction_unsupported() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c0 = arith.constant 0 : index
   %cst = arith.constant dense<0> : tensor<1024x512xi32>
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1024x512x257xi32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x512xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1024x512x257xi32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x512xi32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [1024, 512, 257], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1024x512x257xi32>> -> tensor<1024x512x257xi32>
   %3 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel", "reduction"]} ins(%2 : tensor<1024x512x257xi32>) outs(%cst : tensor<1024x512xi32>) {
   ^bb0(%in: i32, %out: i32):
@@ -238,11 +226,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1, d2) -> (d0, d2, d1)>
@@ -250,8 +236,8 @@
 func.func @split_reduction_not_innermost_reduction_unsupported() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c0 = arith.constant 0 : index
   %cst = arith.constant dense<0> : tensor<1024x256xi32>
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1024x512x256xi32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x256xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1024x512x256xi32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x256xi32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [1024, 512, 256], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1024x512x256xi32>> -> tensor<1024x512x256xi32>
   %3 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel", "reduction"]} ins(%2 : tensor<1024x512x256xi32>) outs(%cst : tensor<1024x256xi32>) {
   ^bb0(%in: i32, %out: i32):
@@ -268,11 +254,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
@@ -280,8 +264,8 @@
 func.func @split_reduction_double_reduction_unsupported() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c0 = arith.constant 0 : index
   %cst = arith.constant dense<0> : tensor<1024xi32>
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1024x512x256xi32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1024x512x256xi32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024xi32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [1024, 512, 256], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1024x512x256xi32>> -> tensor<1024x512x256xi32>
   %3 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "reduction", "reduction"]} ins(%2 : tensor<1024x512x256xi32>) outs(%cst : tensor<1024xi32>) {
   ^bb0(%in: i32, %out: i32):
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_tests.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_tests.mlir
index f646cee..4d91d67 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_tests.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_tests.mlir
@@ -6,11 +6,9 @@
 // and the conversion to destination passing style. Running CSE
 // before hoists the fill and the empty out of the loop causing
 // issues with the conversion.
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1) -> (d0, d1)>
@@ -23,8 +21,8 @@
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : i32
   %2 = arith.index_cast %0 {stream.alignment = 512 : index, stream.values = [0 : index, 10752 : index]} : i32 to index
   %3 = arith.index_cast %1 {stream.alignment = 512 : index, stream.values = [10752 : index, 21504 : index]} : i32 to index
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%2) : !flow.dispatch.tensor<readonly:tensor<7x384xf32>>
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%3) : !flow.dispatch.tensor<writeonly:tensor<7xf32>>
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%2) : !flow.dispatch.tensor<readonly:tensor<7x384xf32>>
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%3) : !flow.dispatch.tensor<writeonly:tensor<7xf32>>
   %6 = flow.dispatch.tensor.load %4, offsets = [0, 0], sizes = [7, 384], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<7x384xf32>> -> tensor<7x384xf32>
   %7 = tensor.empty() : tensor<7xf32>
   %8 = linalg.fill ins(%cst_0 : f32) outs(%7 : tensor<7xf32>) -> tensor<7xf32>
@@ -51,12 +49,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 4, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 4, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "+avx512f", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 64 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1) -> (d0, d1)>
@@ -70,10 +66,10 @@
   %5 = arith.index_castui %1 {stream.alignment = 64 : index, stream.values = [576704 : index, 1763072 : index]} : i32 to index
   %6 = arith.index_castui %2 {stream.alignment = 64 : index, stream.values = [908480 : index, 2094848 : index]} : i32 to index
   %7 = arith.index_castui %3 {stream.alignment = 128 : index, stream.values = [2304 : index, 134016 : index]} : i32 to index
-  %8 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%4) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x576xf32>>
-  %9 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%5) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<576x144xf32>>
-  %10 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%6) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x144xf32>>
-  %11 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%7) : !flow.dispatch.tensor<writeonly:tensor<1x144xf32>>
+  %8 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%4) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x576xf32>>
+  %9 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%5) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<576x144xf32>>
+  %10 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%6) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x144xf32>>
+  %11 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%7) : !flow.dispatch.tensor<writeonly:tensor<1x144xf32>>
   %12 = flow.dispatch.tensor.load %8, offsets = [0, 0], sizes = [1, 576], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1x576xf32>> -> tensor<1x576xf32>
   %13 = flow.dispatch.tensor.load %9, offsets = [0, 0], sizes = [576, 144], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<576x144xf32>> -> tensor<576x144xf32>
   %14 = flow.dispatch.tensor.load %10, offsets = [0, 0], sizes = [1, 144], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1x144xf32>> -> tensor<1x144xf32>
@@ -103,12 +99,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 6, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 6, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "+avx512f", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 64 : index, target_triple = "x86_64-none-elf"}>
 func.func @batch_matmul_dynamic() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
@@ -126,9 +120,9 @@
   %9 = arith.index_cast %3 : i32 to index
   %10 = arith.index_cast %4 : i32 to index
   %11 = arith.index_cast %5 : i32 to index
-  %12 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?x?x?xf32>>{%6, %7, %9}
-  %13 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?x?x?xf32>>{%10, %11, %8}
-  %14 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<?x?x?xf32>>{%6, %7, %8}
+  %12 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?x?x?xf32>>{%6, %7, %9}
+  %13 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?x?x?xf32>>{%10, %11, %8}
+  %14 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<?x?x?xf32>>{%6, %7, %8}
   %15 = flow.dispatch.tensor.load %12, offsets = [0, 0, 0], sizes = [%6, %7, %9], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?x?xf32>>{%6, %7, %9} -> tensor<?x?x?xf32>
   %16 = flow.dispatch.tensor.load %13, offsets = [0, 0, 0], sizes = [%10, %11, %8], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?x?xf32>>{%10, %11, %8} -> tensor<?x?x?xf32>
   %17 = tensor.empty(%6, %7, %8) : tensor<?x?x?xf32>
@@ -142,20 +136,18 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1) -> (d0 * 1536 + d1)>
 #map1 = affine_map<(d0, d1) -> (d0, d1)>
 func.func @check_buffer_ops_vectorization() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<128x1024xi32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<128x1024xi32>
   memref.assume_alignment %0, 64 : memref<128x1024xi32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<128x1536xi32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<128x1536xi32>
   memref.assume_alignment %1, 64 : memref<128x1536xi32>
   %subview = memref.subview %1[0, 0] [128, 1024] [1, 1] : memref<128x1536xi32> to memref<128x1024xi32, #map>
   linalg.generic {indexing_maps = [#map1, #map1], iterator_types = ["parallel", "parallel"]} ins(%0 : memref<128x1024xi32>) outs(%subview : memref<128x1024xi32, #map>) {
@@ -171,12 +163,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1, d2, d3) -> (d3)>
@@ -188,9 +178,9 @@
   %cst_2 = arith.constant 0.166666672 : f32
   %cst_3 = arith.constant dense<0.000000e+00> : tensor<16xf32>
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x225x225x3xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x3x3x16xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x16xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x225x225x3xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x3x3x16xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x16xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 225, 225, 3], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x225x225x3xf32>> -> tensor<1x225x225x3xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [3, 3, 3, 16], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x3x16xf32>> -> tensor<3x3x3x16xf32>
   %5 = tensor.empty() : tensor<1x112x112x16xf32>
@@ -221,15 +211,13 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>,
-    #hal.descriptor_set.binding<4, storage_buffer>,
-    #hal.descriptor_set.binding<5, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu = "generic", cpu_features = "+avx512f", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 64 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1) -> (d0, d1)>
@@ -238,12 +226,12 @@
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
   %cst_0 = arith.constant 1.000000e-03 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x128xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128x256xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256xf32>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<writeonly:tensor<128x256xf32>>
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(4) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64x256xf32>>
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(5) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64x256xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x128xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128x256xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256xf32>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<writeonly:tensor<128x256xf32>>
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(4) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64x256xf32>>
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(5) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64x256xf32>>
   %6 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [64, 128], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<64x128xf32>> -> tensor<64x128xf32>
   %7 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [128, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<128x256xf32>> -> tensor<128x256xf32>
   %8 = flow.dispatch.tensor.load %2, offsets = [0], sizes = [256], strides = [1] : !flow.dispatch.tensor<readonly:tensor<256xf32>> -> tensor<256xf32>
@@ -268,19 +256,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu = "generic", cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf", ukernels = "mmt4d"}>
 func.func @ukernel_dispatch() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x4x8x32xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<16x4x16x32xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<2x16x8x16xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x4x8x32xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<16x4x16x32xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<2x16x8x16xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [2, 4, 8, 32], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2x4x8x32xf32>> -> tensor<2x4x8x32xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [16, 4, 16, 32], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<16x4x16x32xf32>> -> tensor<16x4x16x32xf32>
   %5 = flow.dispatch.tensor.load %2, offsets = [0, 0, 0, 0], sizes = [2, 16, 8, 16], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readwrite:tensor<2x16x8x16xf32>> -> tensor<2x16x8x16xf32>
@@ -301,12 +287,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu = "generic", cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf", ukernels = "all"}>
 #map = affine_map<()[s0, s1, s2] -> (s0 - s1 * (s0 ceildiv s2), s0 ceildiv s2)>
@@ -317,9 +301,9 @@
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : i32
   %2 = arith.index_castui %0 : i32 to index
   %3 = arith.index_castui %1 : i32 to index
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?xf32>>{%2}
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?xf32>>{%3}
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<?xf32>>{%2}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?xf32>>{%2}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?xf32>>{%3}
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<?xf32>>{%2}
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %7 = affine.min #map()[%2, %workgroup_id_x, %workgroup_count_x]
@@ -332,11 +316,11 @@
   return
 }
 //       CHECK:   func @dispatch
-//       CHECK:     %[[INPUT0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//       CHECK:     %[[INPUT0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //  CHECK-SAME:         memref<?xf32, #hal.descriptor_type<storage_buffer>>
-//       CHECK:     %[[INPUT1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//       CHECK:     %[[INPUT1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //  CHECK-SAME:         memref<?xf32, #hal.descriptor_type<storage_buffer>>
-//       CHECK:     %[[OUTPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//       CHECK:     %[[OUTPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //  CHECK-SAME:         memref<?xf32, #hal.descriptor_type<storage_buffer>>
 //   CHECK-DAG:     %[[OFFSET:.+]] = affine.apply
 //   CHECK-DAG:     %[[SIZE:.+]] = affine.min
@@ -349,12 +333,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[1, 2, 0, 0], [1, 1, 0, 1], [0, 0, 0, 0], [0, 0, 0, 0]]>
 #config1 = #iree_codegen.lowering_config<tile_sizes = [[1, 2, 0, 0, 0, 0], [1, 1, 0, 1, 128, 0], [0, 0, 1, 0, 0, 1]]>
@@ -364,9 +346,9 @@
   %c1024 = arith.constant 1024 : index
   %c132096 = arith.constant 132096 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x256x1x1xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c1024) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4x256x128x1xi8>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c132096) : !flow.dispatch.tensor<writeonly:tensor<1x4x1x128xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x256x1x1xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c1024) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4x256x128x1xi8>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c132096) : !flow.dispatch.tensor<writeonly:tensor<1x4x1x128xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 256, 1, 1], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x256x1x1xf32>> -> tensor<1x256x1x1xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [4, 256, 128, 1], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4x256x128x1xi8>> -> tensor<4x256x128x1xi8>
   %5 = tensor.empty() : tensor<1x4x1x128xf32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_transpose_avx2_tests.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_transpose_avx2_tests.mlir
index 97ef7fc..77b4078 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_transpose_avx2_tests.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_transpose_avx2_tests.mlir
@@ -1,10 +1,8 @@
 // RUN: iree-opt --pass-pipeline='builtin.module(iree-llvmcpu-select-lowering-strategy, func.func(iree-llvmcpu-lower-executable-target))' --split-input-file %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "+avx2", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 32 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1) -> (d1, d0)>
@@ -12,8 +10,8 @@
 func.func @transpose_10_8x8_pattern() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<512x1024xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x512xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<512x1024xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x512xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [512, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<512x1024xf32>> -> tensor<512x1024xf32>
   %3 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [1024, 512], strides = [1, 1] : !flow.dispatch.tensor<writeonly:tensor<1024x512xf32>> -> tensor<1024x512xf32>
   %4 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel"]} ins(%2 : tensor<512x1024xf32>) outs(%3 : tensor<1024x512xf32>) {
@@ -37,11 +35,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "+avx2", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 32 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1, d2) -> (d0, d2, d1)>
@@ -49,8 +45,8 @@
 func.func @transpose_021_8x8_pattern() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<64x96x128xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64x128x96xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<64x96x128xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64x128x96xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [64, 96, 128], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<64x96x128xf32>> -> tensor<64x96x128xf32>
   %3 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [64, 128, 96], strides = [1, 1, 1] : !flow.dispatch.tensor<writeonly:tensor<64x128x96xf32>> -> tensor<64x128x96xf32>
   %4 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel", "parallel"]} ins(%2 : tensor<64x96x128xf32>) outs(%3 : tensor<64x128x96xf32>) {
@@ -74,11 +70,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "+avx2", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 32 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1, d2) -> (d1, d2, d0)>
@@ -86,8 +80,8 @@
 func.func @transpose_201_8x8_pattern() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<64x96x128xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x64x96xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<64x96x128xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x64x96xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [64, 96, 128], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<64x96x128xf32>> -> tensor<64x96x128xf32>
   %3 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [128, 64, 96], strides = [1, 1, 1] : !flow.dispatch.tensor<writeonly:tensor<128x64x96xf32>> -> tensor<128x64x96xf32>
   %4 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel", "parallel"]} ins(%2 : tensor<64x96x128xf32>) outs(%3 : tensor<128x64x96xf32>) {
@@ -111,11 +105,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "+avx2", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 32 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1, d2) -> (d2, d1, d0)>
@@ -123,8 +115,8 @@
 func.func @transpose_210_8x8_pattern() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<64x96x128xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x96x64xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<64x96x128xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x96x64xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [64, 96, 128], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<64x96x128xf32>> -> tensor<64x96x128xf32>
   %3 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [128, 96, 64], strides = [1, 1, 1] : !flow.dispatch.tensor<writeonly:tensor<128x96x64xf32>> -> tensor<128x96x64xf32>
   %4 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel", "parallel"]} ins(%2 : tensor<64x96x128xf32>) outs(%3 : tensor<128x96x64xf32>) {
@@ -148,11 +140,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "+avx2", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 32 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1, d2) -> (d2, d0, d1)>
@@ -160,8 +150,8 @@
 func.func @transpose_120_8x8_pattern() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<64x96x128xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<96x128x64xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<64x96x128xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<96x128x64xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [64, 96, 128], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<64x96x128xf32>> -> tensor<64x96x128xf32>
   %3 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [96, 128, 64], strides = [1, 1, 1] : !flow.dispatch.tensor<writeonly:tensor<96x128x64xf32>> -> tensor<96x128x64xf32>
   %4 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel", "parallel"]} ins(%2 : tensor<64x96x128xf32>) outs(%3 : tensor<96x128x64xf32>) {
@@ -185,11 +175,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "+avx2", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 32 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1, d2) -> (d1, d0, d2)>
@@ -197,8 +185,8 @@
 func.func @transpose_102() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<64x96x128xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<96x64x128xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<64x96x128xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<96x64x128xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [64, 96, 128], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<64x96x128xf32>> -> tensor<64x96x128xf32>
   %3 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [96, 64, 128], strides = [1, 1, 1] : !flow.dispatch.tensor<writeonly:tensor<96x64x128xf32>> -> tensor<96x64x128xf32>
   %4 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel", "parallel"]} ins(%2 : tensor<64x96x128xf32>) outs(%3 : tensor<96x64x128xf32>) {
@@ -215,11 +203,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 32 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1) -> (d1, d0)>
@@ -227,8 +213,8 @@
 func.func @test_no_avx2_feature() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<512x1024xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x512xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<512x1024xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x512xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [512, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<512x1024xf32>> -> tensor<512x1024xf32>
   %3 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [1024, 512], strides = [1, 1] : !flow.dispatch.tensor<writeonly:tensor<1024x512xf32>> -> tensor<1024x512xf32>
   %4 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel"]} ins(%2 : tensor<512x1024xf32>) outs(%3 : tensor<1024x512xf32>) {
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_vector_masking_tests.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_vector_masking_tests.mlir
index c93ab70..005492f 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_vector_masking_tests.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_vector_masking_tests.mlir
@@ -1,11 +1,9 @@
 // RUN: iree-opt --pass-pipeline='builtin.module(iree-llvmcpu-select-lowering-strategy, func.func(iree-llvmcpu-lower-executable-target))' -split-input-file %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "system-elf-x86_64", {data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 32 : index, target_triple = "x86_64-unknown-linux-gnu"}>
 #map = affine_map<(d0, d1) -> (d0, d1)>
@@ -15,9 +13,9 @@
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : i32
   %2 = arith.index_cast %0 : i32 to index
   %3 = arith.index_cast %1 : i32 to index
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3}
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3}
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%2, %3}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3}
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%2, %3}
   %7 = flow.dispatch.tensor.load %4, offsets = [0, 0], sizes = [%2, %3], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3} -> tensor<?x?xf32>
   %8 = flow.dispatch.tensor.load %5, offsets = [0, 0], sizes = [%2, %3], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3} -> tensor<?x?xf32>
   %9 = tensor.empty(%2, %3) : tensor<?x?xf32>
@@ -43,11 +41,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "system-elf-x86_64", {data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 32 : index, target_triple = "x86_64-unknown-linux-gnu"}>
 #map = affine_map<(d0, d1) -> (d0, d1)>
@@ -58,8 +54,8 @@
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : i32
   %2 = arith.index_cast %0 : i32 to index
   %3 = arith.index_cast %1 : i32 to index
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3}
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?xf32>>{%2}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<?xf32>>{%2}
   %6 = flow.dispatch.tensor.load %4, offsets = [0, 0], sizes = [%2, %3], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3} -> tensor<?x?xf32>
   %7 = tensor.empty(%2) : tensor<?xf32>
   %8 = linalg.fill ins(%cst : f32) outs(%7 : tensor<?xf32>) -> tensor<?xf32>
@@ -78,12 +74,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_riscv_32_ = #hal.executable.target<"llvm-cpu", "embedded-elf-riscv_32", {data_layout = "e-m:e-p:32:32-i64:64-n32-S128", native_vector_size = 32 : index, target_triple = "riscv32-none-elf"}>
 #map = affine_map<(d0, d1) -> (d0, d1)>
@@ -93,9 +87,9 @@
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : i32
   %2 = arith.index_cast %0 : i32 to index
   %3 = arith.index_cast %1 : i32 to index
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3}
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3}
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%2, %3}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3}
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%2, %3}
   %7 = flow.dispatch.tensor.load %4, offsets = [0, 0], sizes = [%2, %3], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3} -> tensor<?x?xf32>
   %8 = flow.dispatch.tensor.load %5, offsets = [0, 0], sizes = [%2, %3], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3} -> tensor<?x?xf32>
   %9 = tensor.empty(%2, %3) : tensor<?x?xf32>
@@ -121,12 +115,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-arm_64", {data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "aarch64-none-elf"}>
 #map = affine_map<(d0, d1) -> (d0, d1)>
@@ -136,9 +128,9 @@
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : i32
   %2 = arith.index_cast %0 : i32 to index
   %3 = arith.index_cast %1 : i32 to index
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3}
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3}
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%2, %3}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3}
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%2, %3}
   %7 = flow.dispatch.tensor.load %4, offsets = [0, 0], sizes = [%2, %3], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3} -> tensor<?x?xf32>
   %8 = flow.dispatch.tensor.load %5, offsets = [0, 0], sizes = [%2, %3], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3} -> tensor<?x?xf32>
   %9 = tensor.empty(%2, %3) : tensor<?x?xf32>
@@ -159,13 +151,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-arm_64", {cpu_features = "+sve", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "aarch64-none-elf"}>
 func.func @mask_matmul_sve() attributes {hal.executable.target = #executable_target_embedded_elf_arm_64_} {
@@ -174,10 +164,10 @@
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
   %7 = flow.dispatch.tensor.load %3, offsets = [0, 0], sizes = [%0, %2], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2} -> tensor<?x?xf32>
   %8 = flow.dispatch.tensor.load %4, offsets = [0, 0], sizes = [%2, %1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1} -> tensor<?x?xf32>
   %9 = flow.dispatch.tensor.load %5, offsets = [0, 0], sizes = [%0, %1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1} -> tensor<?x?xf32>
@@ -193,12 +183,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-arm_64", {cpu_features = "+sve", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "aarch64-none-elf"}>
 #map = affine_map<(d0, d1) -> (d0, d1)>
@@ -208,9 +196,9 @@
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : i32
   %2 = arith.index_cast %0 : i32 to index
   %3 = arith.index_cast %1 : i32 to index
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3}
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3}
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%2, %3}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3}
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%2, %3}
   %7 = flow.dispatch.tensor.load %4, offsets = [0, 0], sizes = [%2, %3], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3} -> tensor<?x?xf32>
   %8 = flow.dispatch.tensor.load %5, offsets = [0, 0], sizes = [%2, %3], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %3} -> tensor<?x?xf32>
   %9 = tensor.empty(%2, %3) : tensor<?x?xf32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_vectorize_nd_extract_tests.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_vectorize_nd_extract_tests.mlir
index 1e2e60d..f6782a5 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_vectorize_nd_extract_tests.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/pipeline_vectorize_nd_extract_tests.mlir
@@ -1,10 +1,8 @@
 // RUN: iree-opt --pass-pipeline='builtin.module(iree-llvmcpu-select-lowering-strategy, func.func(iree-llvmcpu-lower-executable-target))' --split-input-file %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_riscv_64_ = #hal.executable.target<"llvm-cpu", "system-elf-riscv_64", {cpu = "generic-rv64", cpu_features = "+m,+a,+f,+d,+v", data_layout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128", native_vector_size = 64 : index, target_triple = "riscv64"}>
 #map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
@@ -19,8 +17,8 @@
   %c32_i32 = arith.constant 32 : i32
   %cst_2 = arith.constant 1.000000e+00 : f32
   %c0_i32 = arith.constant 0 : i32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c1115136) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x33x33x21xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x257x257x21xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c1115136) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x33x33x21xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x257x257x21xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 33, 33, 21], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x33x33x21xf32>> -> tensor<1x33x33x21xf32>
   %3 = tensor.empty() : tensor<1x257x257x21xf32>
   %4 = linalg.generic {indexing_maps = [#map], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} outs(%3 : tensor<1x257x257x21xf32>) {
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_aarch64_lowering_strategy.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_aarch64_lowering_strategy.mlir
index 410d801..1903cab 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_aarch64_lowering_strategy.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_aarch64_lowering_strategy.mlir
@@ -1,12 +1,10 @@
 // RUN: iree-opt --pass-pipeline='builtin.module(iree-llvmcpu-select-lowering-strategy)' --split-input-file %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-arm_64", {data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "aarch64-none-elf"}>
 func.func @matmul_tensors_default() attributes {hal.executable.target = #executable_target_embedded_elf_arm_64_} {
@@ -15,10 +13,10 @@
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
   %7 = flow.dispatch.tensor.load %3, offsets = [0, 0], sizes = [%0, %2], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2} -> tensor<?x?xf32>
   %8 = flow.dispatch.tensor.load %4, offsets = [0, 0], sizes = [%2, %1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1} -> tensor<?x?xf32>
   %9 = flow.dispatch.tensor.load %5, offsets = [0, 0], sizes = [%0, %1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1} -> tensor<?x?xf32>
@@ -35,13 +33,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-arm_64", {data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "aarch64-none-elf"}>
 func.func @i4_i4_i32_matmul() attributes {hal.executable.target = #executable_target_embedded_elf_arm_64_} {
@@ -50,10 +46,10 @@
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xi4>>{%0, %2}
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xi4>>{%2, %1}
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<?x?xi32>>{%0, %1}
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xi32>>{%0, %1}
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xi4>>{%0, %2}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xi4>>{%2, %1}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<?x?xi32>>{%0, %1}
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xi32>>{%0, %1}
   %7 = flow.dispatch.tensor.load %3, offsets = [0, 0], sizes = [%0, %2], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xi4>>{%0, %2} -> tensor<?x?xi4>
   %8 = flow.dispatch.tensor.load %4, offsets = [0, 0], sizes = [%2, %1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xi4>>{%2, %1} -> tensor<?x?xi4>
   %9 = flow.dispatch.tensor.load %5, offsets = [0, 0], sizes = [%0, %1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xi32>>{%0, %1} -> tensor<?x?xi32>
@@ -71,12 +67,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 4, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 4, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-arm_64", {data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "aarch64-none-elf"}>
 func.func @batch_matmul_tensors() attributes {hal.executable.target = #executable_target_embedded_elf_arm_64_} {
@@ -85,9 +79,9 @@
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
   %3 = hal.interface.constant.load layout(#pipeline_layout) ordinal(3) : index
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32) : !flow.dispatch.tensor<readonly:tensor<?x?x?xf32>>{%0, %1, %3}
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(32) : !flow.dispatch.tensor<readonly:tensor<?x?x?xf32>>{%0, %3, %2}
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(32) : !flow.dispatch.tensor<writeonly:tensor<?x?x?xf32>>{%0, %1, %2}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32) : !flow.dispatch.tensor<readonly:tensor<?x?x?xf32>>{%0, %1, %3}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(32) : !flow.dispatch.tensor<readonly:tensor<?x?x?xf32>>{%0, %3, %2}
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(32) : !flow.dispatch.tensor<writeonly:tensor<?x?x?xf32>>{%0, %1, %2}
   %7 = flow.dispatch.tensor.load %4, offsets = [0, 0, 0], sizes = [%0, %1, %3], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?x?xf32>>{%0, %1, %3} -> tensor<?x?x?xf32>
   %8 = flow.dispatch.tensor.load %5, offsets = [0, 0, 0], sizes = [%0, %3, %2], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?x?xf32>>{%0, %3, %2} -> tensor<?x?x?xf32>
   %9 = tensor.empty(%0, %1, %2) : tensor<?x?x?xf32>
@@ -105,19 +99,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "system-elf-arm_64", {data_layout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128", native_vector_size = 16 : index, target_triple = "aarch64-none-linux-android30"}>
 func.func @matmul_static() attributes {hal.executable.target = #executable_target_system_elf_arm_64_} {
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<196x240xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<240x40xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<196x40xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<196x240xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<240x40xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<196x40xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [196, 240], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<196x240xf32>> -> tensor<196x240xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [240, 40], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<240x40xf32>> -> tensor<240x40xf32>
   %5 = tensor.empty() : tensor<196x40xf32>
@@ -135,21 +127,19 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "system-elf-arm_64", {data_layout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128", native_vector_size = 16 : index, target_triple = "aarch64-none-linux-android30"}>
 func.func @conv_static() attributes {hal.executable.target = #executable_target_system_elf_arm_64_} {
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
   %c607520 = arith.constant 607520 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x51x41x512xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32) offset(%c607520) : !flow.dispatch.tensor<readonly:tensor<3x3x512x512xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(32) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x25x20x512xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x51x41x512xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32) offset(%c607520) : !flow.dispatch.tensor<readonly:tensor<3x3x512x512xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(32) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x25x20x512xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 51, 41, 512], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x51x41x512xf32>> -> tensor<1x51x41x512xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [3, 3, 512, 512], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x512x512xf32>> -> tensor<3x3x512x512xf32>
   %5 = tensor.empty() : tensor<1x25x20x512xf32>
@@ -166,19 +156,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "system-elf-arm_64", {data_layout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128", native_vector_size = 16 : index, target_triple = "aarch64-none-linux-android30"}>
 func.func @restrict_num_workgroups() attributes {hal.executable.target = #executable_target_system_elf_arm_64_} {
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x11x11x576xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<5x5x576xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x7x7x576xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x11x11x576xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<5x5x576xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x7x7x576xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 11, 11, 576], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x11x11x576xf32>> -> tensor<1x11x11x576xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [5, 5, 576], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<5x5x576xf32>> -> tensor<5x5x576xf32>
   %5 = tensor.empty() : tensor<1x7x7x576xf32>
@@ -196,20 +184,18 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "system-elf-arm_64", {data_layout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128", native_vector_size = 16 : index, target_triple = "aarch64-none-linux-android30"}>
 func.func @matmul_aarch_i8_i8_i32_static() attributes {hal.executable.target = #executable_target_system_elf_arm_64_} {
   %c0_i32 = arith.constant 0 : i32
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x384xi8>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<384x1536xi8>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x1536xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x384xi8>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<384x1536xi8>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x1536xi32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [128, 384], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<128x384xi8>> -> tensor<128x384xi8>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [384, 1536], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<384x1536xi8>> -> tensor<384x1536xi8>
   %5 = tensor.empty() : tensor<128x1536xi32>
@@ -227,12 +213,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "system-elf-arm_64", {data_layout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128", native_vector_size = 16 : index, target_triple = "aarch64-none-linux-android30"}>
 func.func @matmul_aarch_i8_i8_i32_dynamic() attributes {hal.executable.target = #executable_target_system_elf_arm_64_} {
@@ -240,9 +224,9 @@
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?x?xi8>>{%0, %2}
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?x?xi8>>{%2, %1}
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(32) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<?x?xi32>>{%0, %1}
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?x?xi8>>{%0, %2}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?x?xi8>>{%2, %1}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(32) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<?x?xi32>>{%0, %1}
   %6 = flow.dispatch.tensor.load %3, offsets = [0, 0], sizes = [%0, %2], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xi8>>{%0, %2} -> tensor<?x?xi8>
   %7 = flow.dispatch.tensor.load %4, offsets = [0, 0], sizes = [%2, %1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xi8>>{%2, %1} -> tensor<?x?xi8>
   %8 = flow.dispatch.tensor.load %5, offsets = [0, 0], sizes = [%0, %1], strides = [1, 1] : !flow.dispatch.tensor<readwrite:tensor<?x?xi32>>{%0, %1} -> tensor<?x?xi32>
@@ -259,18 +243,16 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "system-elf-arm_64", {data_layout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128", native_vector_size = 16 : index, target_triple = "aarch64-none-linux-android30"}>
 func.func @pack() attributes {hal.executable.target = #executable_target_system_elf_arm_64_} {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<20x40xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4x48x8x1xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<20x40xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4x48x8x1xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [20, 40], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<20x40xf32>> -> tensor<20x40xf32>
   %3 = tensor.empty() : tensor<4x48x8x1xf32>
   %pack = tensor.pack %2 padding_value(%cst : f32) inner_dims_pos = [0, 1] inner_tiles = [8, 1] into %3 : tensor<20x40xf32> -> tensor<4x48x8x1xf32>
@@ -286,11 +268,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 4, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 4, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "system-elf-arm_64", {data_layout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128", native_vector_size = 16 : index, target_triple = "aarch64-none-linux-android30"}>
 func.func @unpack_outer_dynamic() attributes {hal.executable.target = #executable_target_system_elf_arm_64_} {
@@ -304,8 +284,8 @@
   %5 = arith.index_castui %1 : i32 to index
   %6 = arith.index_castui %2 : i32 to index
   %7 = arith.index_castui %3 : i32 to index
-  %8 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?x?x32x16xi32>>{%4, %5}
-  %9 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c131072) : !flow.dispatch.tensor<writeonly:tensor<?x?xi32>>{%6, %7}
+  %8 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?x?x32x16xi32>>{%4, %5}
+  %9 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c131072) : !flow.dispatch.tensor<writeonly:tensor<?x?xi32>>{%6, %7}
   %10 = flow.dispatch.tensor.load %8, offsets = [0, 0, 0, 0], sizes = [%4, %5, 32, 16], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?x32x16xi32>>{%4, %5} -> tensor<?x?x32x16xi32>
   %11 = tensor.empty(%6, %7) : tensor<?x?xi32>
   %unpack = tensor.unpack %10 inner_dims_pos = [0, 1] inner_tiles = [32, 16] into %11 : tensor<?x?x32x16xi32> -> tensor<?x?xi32>
@@ -321,21 +301,19 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-arm_64", {data_layout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128", native_vector_size = 16 : index, target_triple = "aarch64-none-elf"}>
 func.func @mmt4d_384x384x512_4x1x4_dispatch_0() attributes {hal.executable.target = #executable_target_embedded_elf_arm_64_} {
   %c0 = arith.constant 0 : index
   %c96 = arith.constant 96 : index
   %c128 = arith.constant 128 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<96x384x4x1xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<128x384x4x1xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readwrite:tensor<96x128x4x4xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<96x384x4x1xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<128x384x4x1xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readwrite:tensor<96x128x4x4xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [96, 384, 4, 1], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<96x384x4x1xf32>> -> tensor<96x384x4x1xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [128, 384, 4, 1], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<128x384x4x1xf32>> -> tensor<128x384x4x1xf32>
   %5 = flow.dispatch.tensor.load %2, offsets = [0, 0, 0, 0], sizes = [96, 384, 4, 4], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readwrite:tensor<96x128x4x4xf32>> -> tensor<96x128x4x4xf32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_aarch64_sme_lowering_strategy.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_aarch64_sme_lowering_strategy.mlir
index 11dc420..eb84c70 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_aarch64_sme_lowering_strategy.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_aarch64_sme_lowering_strategy.mlir
@@ -1,16 +1,14 @@
 // RUN: iree-opt --pass-pipeline='builtin.module(iree-llvmcpu-select-lowering-strategy)' --iree-llvmcpu-enable-scalable-vectorization=true --split-input-file %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-arm_64", {cpu_features = "+sve,+sme", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "aarch64-none-elf"}>
 func.func @transpose_f32() attributes {hal.executable.target = #executable_target_embedded_elf_arm_64_} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x32xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32x32xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x32xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32x32xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [32, 32], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<32x32xf32>> -> tensor<32x32xf32>
   %3 = tensor.empty() : tensor<32x32xf32>
   %4 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d1, d0)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%2 : tensor<32x32xf32>) outs(%3 : tensor<32x32xf32>) {
@@ -30,17 +28,15 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-arm_64", {cpu_features = "+sve,+sme", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "aarch64-none-elf"}>
 func.func @transpose_f64() attributes {hal.executable.target = #executable_target_embedded_elf_arm_64_} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x32xf64>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32x32xf64>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x32xf64>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32x32xf64>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [32, 32], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<32x32xf64>> -> tensor<32x32xf64>
   %3 = tensor.empty() : tensor<32x32xf64>
   %4 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d1, d0)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%2 : tensor<32x32xf64>) outs(%3 : tensor<32x32xf64>) {
@@ -60,17 +56,15 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-arm_64", {cpu_features = "+sve,+sme", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "aarch64-none-elf"}>
 func.func @transpose_unsupported_not_rank_2() attributes {hal.executable.target = #executable_target_embedded_elf_arm_64_} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x4x8xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x8x4xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x4x8xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x8x4xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [2, 4, 8], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2x4x8xf32>> -> tensor<2x4x8xf32>
   %3 = tensor.empty() : tensor<2x8x4xf32>
   %4 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>], iterator_types = ["parallel", "parallel", "parallel"]} ins(%2 : tensor<2x4x8xf32>) outs(%3 : tensor<2x8x4xf32>) {
@@ -90,17 +84,15 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-arm_64", {cpu_features = "+sve,+sme", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "aarch64-none-elf"}>
 func.func @transpose_unsupported_not_simple_transpose() attributes {hal.executable.target = #executable_target_embedded_elf_arm_64_} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x32xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32x32xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x32xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32x32xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [32, 32], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<32x32xf32>> -> tensor<32x32xf32>
   %3 = tensor.empty() : tensor<32x32xf32>
   %4 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d1, d0)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%2 : tensor<32x32xf32>) outs(%3 : tensor<32x32xf32>) {
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_aarch64_sve_lowering_strategy.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_aarch64_sve_lowering_strategy.mlir
index 139f196..757a039 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_aarch64_sve_lowering_strategy.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_aarch64_sve_lowering_strategy.mlir
@@ -3,13 +3,11 @@
 // RUN: iree-opt --pass-pipeline='builtin.module(iree-llvmcpu-select-lowering-strategy)' \
 // RUN:   --iree-llvmcpu-enable-scalable-vectorization=true --split-input-file --iree-llvmcpu-disable-arm-sme-tiling %s | FileCheck %s --check-prefixes=CHECK,DISABLE-ARM-SME
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-arm_64", {cpu_features = "+sve", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "aarch64-none-elf"}>
 func.func @matmul_tensors() attributes {hal.executable.target = #executable_target_embedded_elf_arm_64_} {
@@ -18,10 +16,10 @@
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
   %7 = flow.dispatch.tensor.load %3, offsets = [0, 0], sizes = [%0, %2], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2} -> tensor<?x?xf32>
   %8 = flow.dispatch.tensor.load %4, offsets = [0, 0], sizes = [%2, %1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1} -> tensor<?x?xf32>
   %9 = flow.dispatch.tensor.load %5, offsets = [0, 0], sizes = [%0, %1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1} -> tensor<?x?xf32>
@@ -39,19 +37,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-arm_64", {cpu_features = "+sve", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "aarch64-none-elf"}>
 func.func @static_tensors_non_pow_two_sizes() attributes {hal.executable.target = #executable_target_embedded_elf_arm_64_} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<15x14xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<14x7xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<15x7xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<15x14xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<14x7xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<15x7xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [15, 14], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<15x14xf32>> -> tensor<15x14xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [14, 7], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<14x7xf32>> -> tensor<14x7xf32>
   %5 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [15, 7], strides = [1, 1] : !flow.dispatch.tensor<readwrite:tensor<15x7xf32>> -> tensor<15x7xf32>
@@ -69,19 +65,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-arm_64", {cpu_features = "+sve", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "aarch64-none-elf"}>
 func.func @static_tensors_1x1() attributes {hal.executable.target = #executable_target_embedded_elf_arm_64_} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x1xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x1xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<1x1xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x1xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x1xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<1x1xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1, 1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1x1xf32>> -> tensor<1x1xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [1, 1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1x1xf32>> -> tensor<1x1xf32>
   %5 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [1, 1], strides = [1, 1] : !flow.dispatch.tensor<readwrite:tensor<1x1xf32>> -> tensor<1x1xf32>
@@ -99,13 +93,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-arm_64", {cpu_features = "+sve,+sme", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "aarch64-none-elf"}>
 func.func @matmul_tensors() attributes {hal.executable.target = #executable_target_embedded_elf_arm_64_} {
@@ -114,10 +106,10 @@
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
   %7 = flow.dispatch.tensor.load %3, offsets = [0, 0], sizes = [%0, %2], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2} -> tensor<?x?xf32>
   %8 = flow.dispatch.tensor.load %4, offsets = [0, 0], sizes = [%2, %1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1} -> tensor<?x?xf32>
   %9 = flow.dispatch.tensor.load %5, offsets = [0, 0], sizes = [%0, %1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1} -> tensor<?x?xf32>
@@ -142,12 +134,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 5, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 5, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "system-elf-arm_64", {cpu = "", cpu_features = "+v9a,+sve", data_layout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128", link_embedded = false, native_vector_size = 16 : index, target_triple = "aarch64-none-linux-android34"}>
 #map = affine_map<(d0, d1) -> (d0, d1)>
@@ -165,11 +155,11 @@
   %7 = arith.index_castui %2 : i32 to index
   %8 = arith.index_castui %3 : i32 to index
   %9 = arith.index_castui %4 : i32 to index
-  %10 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%5) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1024x256xi8>>
-  %11 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%7) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xi8>>
-  %12 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%6) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1024xf32>>
-  %13 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%8) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256xf32>>
-  %14 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%9) : !flow.dispatch.tensor<writeonly:tensor<1024x256xf32>>
+  %10 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%5) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1024x256xi8>>
+  %11 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%7) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xi8>>
+  %12 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%6) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1024xf32>>
+  %13 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%8) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256xf32>>
+  %14 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%9) : !flow.dispatch.tensor<writeonly:tensor<1024x256xf32>>
   %15 = flow.dispatch.tensor.load %10, offsets = [0, 0], sizes = [1024, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1024x256xi8>> -> tensor<1024x256xi8>
   %16 = flow.dispatch.tensor.load %11, offsets = [0, 0], sizes = [256, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x256xi8>> -> tensor<256x256xi8>
   %17 = flow.dispatch.tensor.load %12, offsets = [0], sizes = [1024], strides = [1] : !flow.dispatch.tensor<readonly:tensor<1024xf32>> -> tensor<1024xf32>
@@ -201,19 +191,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "system-elf-arm_64", {cpu = "", cpu_features = "+v9a,+sve", data_layout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128", link_embedded = false, native_vector_size = 16 : index, target_triple = "aarch64-none-linux-android34"}>
 func.func @depthwise_conv() attributes {hal.executable.target = #executable_target_system_elf_arm_64_} {
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x57x57x72xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x72xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x28x28x72xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x57x57x72xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x72xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x28x28x72xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 161, 161, 240], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x57x57x72xf32>> -> tensor<1x57x57x72xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [3, 3, 240], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x72xf32>> -> tensor<3x3x72xf32>
   %5 = tensor.empty() : tensor<1x28x28x72xf32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_aarch64_sve_lowering_strategy_peeling.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_aarch64_sve_lowering_strategy_peeling.mlir
index 02ad9c9..ec9241b 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_aarch64_sve_lowering_strategy_peeling.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_aarch64_sve_lowering_strategy_peeling.mlir
@@ -2,13 +2,11 @@
 // RUN:   --iree-llvmcpu-enable-scalable-vectorization=true --iree-llvmcpu-vector-pproc-strategy=peel \
 // RUN:   --split-input-file %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-arm_64", {cpu_features = "+sve", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "aarch64-none-elf"}>
 func.func @matmul_tensors() attributes {hal.executable.target = #executable_target_embedded_elf_arm_64_} {
@@ -17,10 +15,10 @@
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
   %7 = flow.dispatch.tensor.load %3, offsets = [0, 0], sizes = [%0, %2], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2} -> tensor<?x?xf32>
   %8 = flow.dispatch.tensor.load %4, offsets = [0, 0], sizes = [%2, %1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1} -> tensor<?x?xf32>
   %9 = flow.dispatch.tensor.load %5, offsets = [0, 0], sizes = [%0, %1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1} -> tensor<?x?xf32>
@@ -38,19 +36,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-arm_64", {cpu_features = "+sve", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "aarch64-none-elf"}>
 func.func @static_tensors_non_pow_two_sizes() attributes {hal.executable.target = #executable_target_embedded_elf_arm_64_} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<15x14xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<14x7xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<15x7xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<15x14xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<14x7xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<15x7xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [15, 14], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<15x14xf32>> -> tensor<15x14xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [14, 7], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<14x7xf32>> -> tensor<14x7xf32>
   %5 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [15, 7], strides = [1, 1] : !flow.dispatch.tensor<readwrite:tensor<15x7xf32>> -> tensor<15x7xf32>
@@ -68,19 +64,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-arm_64", {cpu_features = "+sve", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "aarch64-none-elf"}>
 func.func @static_tensors_1x1() attributes {hal.executable.target = #executable_target_embedded_elf_arm_64_} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x1xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x1xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<1x1xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x1xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x1xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<1x1xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1, 1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1x1xf32>> -> tensor<1x1xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [1, 1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1x1xf32>> -> tensor<1x1xf32>
   %5 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [1, 1], strides = [1, 1] : !flow.dispatch.tensor<readwrite:tensor<1x1xf32>> -> tensor<1x1xf32>
@@ -99,19 +93,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_arm_64_ = #hal.executable.target<"llvm-cpu", "system-elf-arm_64", {data_layout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128", native_vector_size = 16 : index, target_triple = "aarch64-none-linux-android30"}>
 func.func @depthwise_conv() attributes {hal.executable.target = #executable_target_system_elf_arm_64_} {
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x1x4x4xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<1x4x4xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x1x1x4xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x1x4x4xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<1x4x4xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x1x1x4xf32>>
   %input = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 1, 4, 4], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x1x4x4xf32>> -> tensor<1x1x4x4xf32>
   %filter = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [1, 4, 4], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x4x4xf32>> -> tensor<1x4x4xf32>
   %5 = tensor.empty() : tensor<1x1x1x4xf32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_lowering_strategy_without_distribution.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_lowering_strategy_without_distribution.mlir
index fe4c802..3042c00 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_lowering_strategy_without_distribution.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_lowering_strategy_without_distribution.mlir
@@ -1,18 +1,16 @@
 // RUN: iree-opt --pass-pipeline='builtin.module(iree-llvmcpu-select-lowering-strategy)' --iree-llvmcpu-disable-distribution --split-input-file %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf"}>
 func.func @matmul_static() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<384x512xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<512x128xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<384x128xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<384x512xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<512x128xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<384x128xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [384, 512], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<384x512xf32>> -> tensor<384x512xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [512, 128], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<512x128xf32>> -> tensor<512x128xf32>
   %5 = tensor.empty() : tensor<384x128xf32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_riscv_lowering_strategy.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_riscv_lowering_strategy.mlir
index 07769ef..02095fc 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_riscv_lowering_strategy.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_riscv_lowering_strategy.mlir
@@ -1,18 +1,16 @@
 // RUN: iree-opt --pass-pipeline='builtin.module(iree-llvmcpu-select-lowering-strategy)' --split-input-file %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_riscv_32_ = #hal.executable.target<"llvm-cpu", "embedded-elf-riscv_32", {cpu_features = "+m,+f", data_layout = "e-m:e-p:32:32-i64:64-n32-S128", native_vector_size = 16 : index, target_triple = "riscv32-none-elf"}>
 func.func @matmul_riscv() attributes {hal.executable.target = #executable_target_embedded_elf_riscv_32_} {
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<384x512xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<512x128xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<384x128xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<384x512xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<512x128xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<384x128xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [384, 512], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<384x512xf32>> -> tensor<384x512xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [512, 128], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<512x128xf32>> -> tensor<512x128xf32>
   %5 = tensor.empty() : tensor<384x128xf32>
@@ -32,19 +30,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_riscv_32_ = #hal.executable.target<"llvm-cpu", "embedded-elf-riscv_32", {cpu_features = "+m,+f", data_layout = "e-m:e-p:32:32-i64:64-n32-S128", native_vector_size = 16 : index, target_triple = "riscv32-none-elf"}>
 func.func @thin_depthwise_conv_static() attributes {hal.executable.target = #executable_target_embedded_elf_riscv_32_} {
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x57x57x72xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x72xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x28x28x72xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x57x57x72xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x72xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x28x28x72xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 161, 161, 240], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x57x57x72xf32>> -> tensor<1x57x57x72xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [3, 3, 240], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x72xf32>> -> tensor<3x3x72xf32>
   %5 = tensor.empty() : tensor<1x28x28x72xf32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_x86_64_lowering_strategy.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_x86_64_lowering_strategy.mlir
index 2e21413..e67e7af 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_x86_64_lowering_strategy.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/select_x86_64_lowering_strategy.mlir
@@ -1,19 +1,17 @@
 // RUN: iree-opt --pass-pipeline='builtin.module(iree-llvmcpu-select-lowering-strategy)' --split-input-file %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "+avx512f", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-unknown-linux-gnu"}>
 func.func @matvec_static() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x384xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<384xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x384xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<384xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [128, 384], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<128x384xf32>> -> tensor<128x384xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0], sizes = [384], strides = [1] : !flow.dispatch.tensor<readonly:tensor<384xf32>> -> tensor<384xf32>
   %5 = tensor.empty() : tensor<128xf32>
@@ -32,12 +30,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "+avx512f", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-unknown-linux-gnu"}>
 func.func @matvec_dynamic() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
@@ -49,9 +45,9 @@
   %3 = arith.index_cast %0 : i32 to index
   %4 = arith.index_cast %1 : i32 to index
   %5 = arith.index_cast %2 : i32 to index
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%3, %4}
-  %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?xf32>>{%5}
-  %8 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<?xf32>>{%3}
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%3, %4}
+  %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?xf32>>{%5}
+  %8 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<?xf32>>{%3}
   %9 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : i32
   %10 = arith.index_cast %9 : i32 to index
   %11 = flow.dispatch.tensor.load %8, offsets = [0], sizes = [%10], strides = [1] : !flow.dispatch.tensor<writeonly:tensor<?xf32>>{%3} -> tensor<?xf32>
@@ -72,20 +68,18 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "+avx512f", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-unknown-linux-gnu"}>
 func.func @dot_static() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<384xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<384xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<f32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<384xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<384xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<f32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [384], strides = [1] : !flow.dispatch.tensor<readonly:tensor<384xf32>> -> tensor<384xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0], sizes = [384], strides = [1] : !flow.dispatch.tensor<readonly:tensor<384xf32>> -> tensor<384xf32>
   %5 = tensor.empty() : tensor<f32>
@@ -104,12 +98,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "+avx512f", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-unknown-linux-gnu"}>
 func.func @dot_dynamic() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
@@ -119,10 +111,10 @@
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : i32
   %2 = arith.index_cast %0 : i32 to index
   %3 = arith.index_cast %1 : i32 to index
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<f32>>
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<f32>>
   %5 = flow.dispatch.tensor.load %4, offsets = [], sizes = [], strides = [] : !flow.dispatch.tensor<writeonly:tensor<f32>> -> tensor<f32>
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?xf32>>{%2}
-  %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?xf32>>{%3}
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?xf32>>{%2}
+  %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?xf32>>{%3}
   %8 = flow.dispatch.tensor.load %6, offsets = [0], sizes = [%2], strides = [1] : !flow.dispatch.tensor<readonly:tensor<?xf32>>{%2} -> tensor<?xf32>
   %9 = flow.dispatch.tensor.load %7, offsets = [0], sizes = [%3], strides = [1] : !flow.dispatch.tensor<readonly:tensor<?xf32>>{%3} -> tensor<?xf32>
   %10 = linalg.fill ins(%cst : f32) outs(%5 : tensor<f32>) -> tensor<f32>
@@ -140,12 +132,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-unknown-linux-gnu"}>
 #map = affine_map<(d0, d1) -> (d0, d1)>
@@ -154,9 +144,9 @@
   %c0 = arith.constant 0 : index
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?xf32>>{%1}
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?xf32>>{%1}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
   %5 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [%0, %1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1} -> tensor<?x?xf32>
   %6 = flow.dispatch.tensor.load %3, offsets = [0], sizes = [%1], strides = [1] : !flow.dispatch.tensor<readonly:tensor<?xf32>>{%1} -> tensor<?xf32>
   %7 = tensor.empty(%0, %1) : tensor<?x?xf32>
@@ -177,12 +167,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 4, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 4, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-unknown-linux-gnu"}>
 #map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
@@ -191,9 +179,9 @@
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
   %3 = hal.interface.constant.load layout(#pipeline_layout) ordinal(3) : index
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32) : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xf32>>{%0, %1, %2, %3}
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(32) : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xf32>>{%0, %1, %2, %3}
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(32) : !flow.dispatch.tensor<writeonly:tensor<?x?x?x?xf32>>{%0, %1, %2, %3}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32) : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xf32>>{%0, %1, %2, %3}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(32) : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xf32>>{%0, %1, %2, %3}
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(32) : !flow.dispatch.tensor<writeonly:tensor<?x?x?x?xf32>>{%0, %1, %2, %3}
   %7 = flow.dispatch.tensor.load %4, offsets = [0, 0, 0, 0], sizes = [%0, %1, %2, %3], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xf32>>{%0, %1, %2, %3} -> tensor<?x?x?x?xf32>
   %8 = flow.dispatch.tensor.load %5, offsets = [0, 0, 0, 0], sizes = [%0, %1, %2, %3], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xf32>>{%0, %1, %2, %3} -> tensor<?x?x?x?xf32>
   %9 = tensor.empty(%0, %1, %2, %3) : tensor<?x?x?x?xf32>
@@ -215,18 +203,16 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-unknown-linux-gnu"}>
 #map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
 func.func @add_static() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<64x16x32x128xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64x16x32x128xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<64x16x32x128xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64x16x32x128xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [64, 16, 32, 128], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<64x16x32x128xf32>> -> tensor<64x16x32x128xf32>
   %3 = tensor.empty() : tensor<64x16x32x128xf32>
   %4 = linalg.generic {indexing_maps = [#map, #map], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%2 : tensor<64x16x32x128xf32>) outs(%3 : tensor<64x16x32x128xf32>) {
@@ -247,12 +233,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[64, 64, 0], [64, 64, 0], [0, 0, 0], [32, 32, 0], [0, 0, 32], [0, 0, 0]]>
 #translation = #iree_codegen.translation_info<CPUDoubleTilingExpert, {enable_loop_peeling}>
@@ -262,9 +246,9 @@
     translation_info = #translation
   } {
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<128x256xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<256x512xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<128x512xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<128x256xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<256x512xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<128x512xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [128, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<128x256xf32>> -> tensor<128x256xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [256, 512], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x512xf32>> -> tensor<256x512xf32>
   %5 = tensor.empty() : tensor<128x512xf32>
@@ -283,19 +267,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "system-elf-x86_64", {cpu_features = "+avx512f", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-unknown-linux-gnu"}>
 func.func @matmul_partially_peel() attributes {hal.executable.target = #executable_target_system_elf_x86_64_} {
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<16641x16xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<16x8xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<16641x8xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<16641x16xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<16x8xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<16641x8xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [16641, 16], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<16641x16xf32>> -> tensor<16641x16xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [16, 8], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<16x8xf32>> -> tensor<16x8xf32>
   %5 = tensor.empty() : tensor<16641x8xf32>
@@ -314,11 +296,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 6, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 6, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "system-elf-x86_64", {data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-unknown-linux-gnu"}>
 #map = affine_map<(d0, d1) -> (d0, d1)>
@@ -329,8 +309,8 @@
   %3 = hal.interface.constant.load layout(#pipeline_layout) ordinal(3) : index
   %4 = hal.interface.constant.load layout(#pipeline_layout) ordinal(4) : index
   %5 = hal.interface.constant.load layout(#pipeline_layout) ordinal(5) : index
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<?x?xi32>{%0, %1}
-  %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<?x?xi32>{%2, %3}
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<?x?xi32>{%0, %1}
+  %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<?x?xi32>{%2, %3}
   %subview = memref.subview %7[%4, %5] [%0, %1] [1, 1] : memref<?x?xi32> to memref<?x?xi32, strided<[?, 1], offset: ?>>
   linalg.generic {indexing_maps = [#map, #map], iterator_types = ["parallel", "parallel"]} ins(%6 : memref<?x?xi32>) outs(%subview : memref<?x?xi32, strided<[?, 1], offset: ?>>) {
   ^bb0(%in: i32, %out: i32):
@@ -348,11 +328,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "system-elf-x86_64", {data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-unknown-linux-gnu"}>
 func.func @static_1d_fft_stage2() attributes {hal.executable.target = #executable_target_system_elf_x86_64_} {
@@ -360,8 +338,8 @@
   %c2 = arith.constant 2 : index
   %cst = arith.constant dense<[1.000000e+00, 6.12323426E-17]> : tensor<2xf32>
   %cst_0 = arith.constant dense<[-0.000000e+00, -1.000000e+00]> : tensor<2xf32>
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readwrite:tensor<32xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readwrite:tensor<32xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readwrite:tensor<32xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readwrite:tensor<32xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [32], strides = [1] : !flow.dispatch.tensor<readwrite:tensor<32xf32>> -> tensor<32xf32>
   %3 = flow.dispatch.tensor.load %1, offsets = [0], sizes = [32], strides = [1] : !flow.dispatch.tensor<readwrite:tensor<32xf32>> -> tensor<32xf32>
   %4:2 = iree_linalg_ext.fft ins(%c2, %cst, %cst_0 : index, tensor<2xf32>, tensor<2xf32>) outs(%2, %3 : tensor<32xf32>, tensor<32xf32>) : tensor<32xf32>, tensor<32xf32>
@@ -379,11 +357,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "system-elf-x86_64", {data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-unknown-linux-gnu"}>
 func.func @static_3d_fft_stage3() attributes {hal.executable.target = #executable_target_system_elf_x86_64_} {
@@ -392,8 +368,8 @@
   %cst_0 = arith.constant dense<[-0.000000e+00, -0.707106769, -1.000000e+00, -0.707106769]> : tensor<4xf32>
   %0 = bufferization.to_memref %cst_0 : memref<4xf32>
   %1 = bufferization.to_memref %cst : memref<4xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<64x128x32xf32>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<64x128x32xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<64x128x32xf32>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<64x128x32xf32>
   iree_linalg_ext.fft ins(%c3, %1, %0 : index, memref<4xf32>, memref<4xf32>) outs(%2, %3 : memref<64x128x32xf32>, memref<64x128x32xf32>)
   return
 }
@@ -407,12 +383,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "system-elf-x86_64", {data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-unknown-linux-gnu"}>
 #map = affine_map<(d0, d1) -> (d0, d1)>
@@ -424,9 +398,9 @@
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %2}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%2, %1}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<?x?xf32>>{%0, %1}
   %6 = tensor.empty(%0, %1) : tensor<?x?xf32>
   %7 = linalg.generic {indexing_maps = [#map], iterator_types = ["parallel", "parallel"]} outs(%6 : tensor<?x?xf32>) {
   ^bb0(%out: f32):
@@ -456,12 +430,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 9, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 9, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "system-elf-x86_64", {data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-unknown-linux-gnu"}>
 func.func @conv_dynamic() attributes {hal.executable.target = #executable_target_system_elf_x86_64_} {
@@ -474,9 +446,9 @@
   %6 = hal.interface.constant.load layout(#pipeline_layout) ordinal(6) : index
   %7 = hal.interface.constant.load layout(#pipeline_layout) ordinal(7) : index
   %8 = hal.interface.constant.load layout(#pipeline_layout) ordinal(8) : index
-  %9 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xf32>>{%0, %1, %2, %3}
-  %10 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xf32>>{%4, %5, %3, %6}
-  %11 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readwrite:tensor<?x?x?x?xf32>>{%0, %7, %8, %6}
+  %9 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xf32>>{%0, %1, %2, %3}
+  %10 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xf32>>{%4, %5, %3, %6}
+  %11 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readwrite:tensor<?x?x?x?xf32>>{%0, %7, %8, %6}
   %12 = flow.dispatch.tensor.load %9, offsets = [0, 0, 0, 0], sizes = [%0, %1, %2, %3], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xf32>>{%0, %1, %2, %3} -> tensor<?x?x?x?xf32>
   %13 = flow.dispatch.tensor.load %10, offsets = [0, 0, 0, 0], sizes = [%4, %5, %3, %6], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xf32>>{%4, %5, %3, %6} -> tensor<?x?x?x?xf32>
   %14 = flow.dispatch.tensor.load %11, offsets = [0, 0, 0, 0], sizes = [%0, %7, %8, %6], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readwrite:tensor<?x?x?x?xf32>>{%0, %7, %8, %6} -> tensor<?x?x?x?xf32>
@@ -494,20 +466,18 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "system-elf-x86_64", {data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-unknown-linux-gnu"}>
 func.func @conv_static() attributes {hal.executable.target = #executable_target_system_elf_x86_64_} {
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
   %c607520 = arith.constant 607520 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x225x225x3xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32) offset(%c607520) : !flow.dispatch.tensor<readonly:tensor<3x3x3x16xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(32) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x16xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x225x225x3xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32) offset(%c607520) : !flow.dispatch.tensor<readonly:tensor<3x3x3x16xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(32) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x16xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 225, 225, 3], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x225x225x3xf32>> -> tensor<1x225x225x3xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [3, 3, 3, 16], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x3x16xf32>> -> tensor<3x3x3x16xf32>
   %5 = tensor.empty() : tensor<1x112x112x16xf32>
@@ -525,20 +495,18 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "+avx512f", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf"}>
 func.func @conv_nchw_static() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x128x30x30xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x128x3x3xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x128x28x28xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x128x30x30xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x128x3x3xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x128x28x28xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 128, 30, 30], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x128x30x30xf32>> -> tensor<1x128x30x30xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [128, 128, 3, 3], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<128x128x3x3xf32>> -> tensor<128x128x3x3xf32>
   %5 = tensor.empty() : tensor<1x128x28x28xf32>
@@ -556,19 +524,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "system-elf-x86_64", {cpu_features = "+avx512f", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 64 : index, target_triple = "x86_64-unknown-linux-gnu"}>
 func.func @depthwise_conv_static() attributes {hal.executable.target = #executable_target_system_elf_x86_64_} {
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x161x161x240xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x240xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x80x80x240xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x161x161x240xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x240xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x80x80x240xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 161, 161, 240], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x161x161x240xf32>> -> tensor<1x161x161x240xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [3, 3, 240], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x240xf32>> -> tensor<3x3x240xf32>
   %5 = tensor.empty() : tensor<1x80x80x240xf32>
@@ -587,19 +553,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "system-elf-x86_64", {cpu_features = "+avx512f", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 64 : index, target_triple = "x86_64-unknown-linux-gnu"}>
 func.func @thin_depthwise_conv_static() attributes {hal.executable.target = #executable_target_system_elf_x86_64_} {
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x57x57x72xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x72xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x28x28x72xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x57x57x72xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x72xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x28x28x72xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 161, 161, 240], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x57x57x72xf32>> -> tensor<1x57x57x72xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [3, 3, 240], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x72xf32>> -> tensor<3x3x72xf32>
   %5 = tensor.empty() : tensor<1x28x28x72xf32>
@@ -618,19 +582,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu = "cascadelake", cpu_features = "+mmx,+popcnt,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+avx,+avx2,+fma,+avx512f,+bmi,+bmi2,+aes,+pclmul,+avx512vl,+avx512bw,+avx512dq,+avx512cd,+avx512vnni,+adx,+clflushopt,+clwb,+cx16,+cx8,+crc32,+f16c,+fsgsbase,+fxsr,+invpcid,+lzcnt,+movbe,+pku,+prfchw,+rdrnd,+rdseed,+sahf,+x87,+xsave,+xsavec,+xsaveopt,+xsaves", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 32 : index, target_triple = "x86_64-none-elf", ukernels = false}>
 func.func @pooling_nchw_max() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c3846080 = arith.constant 3846080 : index
   %c0 = arith.constant 0 : index
   %cst = arith.constant -3.40282347E+38 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c3846080) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x64x114x114xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x64x56x56xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c3846080) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x64x114x114xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x64x56x56xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 64, 114, 114], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x64x114x114xf32>> -> tensor<1x64x114x114xf32>
   %3 = tensor.empty() : tensor<1x64x56x56xf32>
   %4 = tensor.empty() : tensor<3x3xf32>
@@ -649,18 +611,16 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "system-elf-x86_64", {data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 64 : index, target_triple = "x86_64-pc-linux-gnu"}>
 #map = affine_map<(d0, d1) -> (d1, d0)>
 #map1 = affine_map<(d0, d1) -> (d0, d1)>
 func.func @generic_static() attributes {hal.executable.target = #executable_target_system_elf_x86_64_} {
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<96x16xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<16x96xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<96x16xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<16x96xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [96, 16], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<96x16xf32>> -> tensor<96x16xf32>
   %3 = tensor.empty() : tensor<16x96xf32>
   %4 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel"]} ins(%2 : tensor<96x16xf32>) outs(%3 : tensor<16x96xf32>) {
@@ -679,19 +639,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "+avx512f", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf"}>
 func.func @matmul_static() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<384x512xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<512x128xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<384x128xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<384x512xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<512x128xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<384x128xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [384, 512], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<384x512xf32>> -> tensor<384x512xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [512, 128], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<512x128xf32>> -> tensor<512x128xf32>
   %5 = tensor.empty() : tensor<384x128xf32>
@@ -745,20 +703,18 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf"}>
 func.func @matmul_i8_i8_i32_static() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c0_i32 = arith.constant 0 : i32
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x384xi8>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<384x1536xi8>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x1536xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x384xi8>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<384x1536xi8>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x1536xi32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [128, 384], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<128x384xi8>> -> tensor<128x384xi8>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [384, 1536], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<384x1536xi8>> -> tensor<384x1536xi8>
   %5 = tensor.empty() : tensor<128x1536xi32>
@@ -777,21 +733,19 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "+avx512f", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf"}>
 func.func @gemm_unit_N() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c0 = arith.constant 0 : index
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?x1xf32>>{%1}
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(32) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<?x1xf32>>{%0}
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1}
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?x1xf32>>{%1}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(32) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<?x1xf32>>{%0}
   %5 = flow.dispatch.tensor.load %3, offsets = [0, 0], sizes = [%1, 1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x1xf32>>{%1} -> tensor<?x1xf32>
   %6 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [%0, %1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xf32>>{%0, %1} -> tensor<?x?xf32>
   %7 = flow.dispatch.tensor.load %4, offsets = [0, 0], sizes = [%0, 1], strides = [1, 1] : !flow.dispatch.tensor<readwrite:tensor<?x1xf32>>{%0} -> tensor<?x1xf32>
@@ -809,20 +763,18 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "+avx512f", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf"}>
 func.func @gemm_unit_M_unit_N() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c0 = arith.constant 0 : index
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x?xf32>>{%0}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?x1xf32>>{%0}
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(32) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<1x1xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x?xf32>>{%0}
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?x1xf32>>{%0}
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(32) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<1x1xf32>>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [1, %0], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1x?xf32>>{%0} -> tensor<1x?xf32>
   %5 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [%0, 1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x1xf32>>{%0} -> tensor<?x1xf32>
   %6 = flow.dispatch.tensor.load %3, offsets = [0, 0], sizes = [1, 1], strides = [1, 1] : !flow.dispatch.tensor<readwrite:tensor<1x1xf32>> -> tensor<1x1xf32>
@@ -840,22 +792,20 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "+avx512f", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf"}>
 func.func @matmul_odd() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<33x16xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x49xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<33x49xf32>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(32) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<33x49xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<33x16xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x49xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<33x49xf32>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(32) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<33x49xf32>>
   %4 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [33, 16], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<33x16xf32>> -> tensor<33x16xf32>
   %5 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [16, 49], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<16x49xf32>> -> tensor<16x49xf32>
   %6 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [33, 49], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<33x49xf32>> -> tensor<33x49xf32>
@@ -875,11 +825,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 4, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 4, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-unknown-linux-gnu"}>
 #map = affine_map<(d0, d1, d2, d3, d4, d5, d6, d7) -> (d0, d1, d2, d3, d4, d5, d6, d7)>
@@ -889,8 +837,8 @@
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
   %2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
   %3 = hal.interface.constant.load layout(#pipeline_layout) ordinal(3) : index
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x?x1x1x?x?x1x?xf32>>{%0, %1, %2, %3}
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<1x?x1x1x?x?x1x?xf32>>{%0, %1, %2, %3}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x?x1x1x?x?x1x?xf32>>{%0, %1, %2, %3}
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<1x?x1x1x?x?x1x?xf32>>{%0, %1, %2, %3}
   %6 = flow.dispatch.tensor.load %4, offsets = [0, 0, 0, 0, 0, 0, 0, 0], sizes = [1, %0, 1, 1, %1, %2, 1, %3], strides = [1, 1, 1, 1, 1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x?x1x1x?x?x1x?xf32>>{%0, %1, %2, %3} -> tensor<1x?x1x1x?x?x1x?xf32>
   %7 = tensor.empty(%0, %1, %2, %3) : tensor<1x?x1x1x?x?x1x?xf32>
   %8 = linalg.generic {indexing_maps = [#map, #map], iterator_types = ["parallel", "parallel", "parallel", "parallel", "parallel", "parallel", "parallel", "parallel"]} ins(%6 : tensor<1x?x1x1x?x?x1x?xf32>) outs(%7 : tensor<1x?x1x1x?x?x1x?xf32>) {
@@ -910,11 +858,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-unknown-linux-gnu"}>
 #map = affine_map<(d0) -> (d0)>
@@ -922,8 +868,8 @@
 func.func @reduce_to_scalar_static() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<f32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<f32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [128], strides = [1] : !flow.dispatch.tensor<readonly:tensor<128xf32>> -> tensor<128xf32>
   %3 = tensor.empty() : tensor<f32>
   %4 = linalg.fill ins(%cst : f32) outs(%3 : tensor<f32>) -> tensor<f32>
@@ -945,11 +891,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-unknown-linux-gnu"}>
 #map = affine_map<(d0) -> (d0)>
@@ -957,8 +901,8 @@
 func.func @reduce_to_scalar_dynamic() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c0 = arith.constant 0 : index
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<?xf32>>{%0}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readwrite:tensor<f32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<?xf32>>{%0}
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readwrite:tensor<f32>>
   %3 = flow.dispatch.tensor.load %1, offsets = [0], sizes = [%0], strides = [1] : !flow.dispatch.tensor<readonly:tensor<?xf32>>{%0} -> tensor<?xf32>
   %4 = flow.dispatch.tensor.load %2, offsets = [], sizes = [], strides = [] : !flow.dispatch.tensor<readwrite:tensor<f32>> -> tensor<f32>
   %5 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["reduction"]} ins(%3 : tensor<?xf32>) outs(%4 : tensor<f32>) {
@@ -978,18 +922,16 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-unknown-linux-gnu"}>
 #map = affine_map<() -> ()>
 func.func @scalar() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<f32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<f32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<f32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<f32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [], sizes = [], strides = [] : !flow.dispatch.tensor<readonly:tensor<f32>> -> tensor<f32>
   %3 = flow.dispatch.tensor.load %1, offsets = [], sizes = [], strides = [] : !flow.dispatch.tensor<writeonly:tensor<f32>> -> tensor<f32>
   %4 = linalg.generic {indexing_maps = [#map, #map], iterator_types = []} ins(%2 : tensor<f32>) outs(%3 : tensor<f32>) {
@@ -1006,11 +948,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "+avx2", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 64 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1) -> (d1, d0)>
@@ -1018,8 +958,8 @@
 func.func @transpose_8x8() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<512x1024xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x512xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<512x1024xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x512xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [512, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<512x1024xf32>> -> tensor<512x1024xf32>
   %3 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [1024, 512], strides = [1, 1] : !flow.dispatch.tensor<writeonly:tensor<1024x512xf32>> -> tensor<1024x512xf32>
   %4 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel"]} ins(%2 : tensor<512x1024xf32>) outs(%3 : tensor<1024x512xf32>) {
@@ -1035,11 +975,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "+avx2,+avx512f", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 64 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1) -> (d1, d0)>
@@ -1047,8 +985,8 @@
 func.func @transpose_16x16() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<512x1024xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x512xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<512x1024xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x512xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [512, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<512x1024xf32>> -> tensor<512x1024xf32>
   %3 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [1024, 512], strides = [1, 1] : !flow.dispatch.tensor<writeonly:tensor<1024x512xf32>> -> tensor<1024x512xf32>
   %4 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel"]} ins(%2 : tensor<512x1024xf32>) outs(%3 : tensor<1024x512xf32>) {
@@ -1064,12 +1002,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
@@ -1079,9 +1015,9 @@
   %c6144 = arith.constant 6144 : index
   %c792576 = arith.constant 792576 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<12x128x128xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<12x128xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c792576) : !flow.dispatch.tensor<writeonly:tensor<12x128xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<12x128x128xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<12x128xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c792576) : !flow.dispatch.tensor<writeonly:tensor<12x128xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [12, 128, 128], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<12x128x128xf32>> -> tensor<12x128x128xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [12, 128], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<12x128xf32>> -> tensor<12x128xf32>
   %5 = tensor.empty() : tensor<12x128xf32>
@@ -1116,18 +1052,16 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "+avx512f", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 64 : index, target_triple = "x86_64-none-elf"}>
 func.func @pack() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<20x40xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x48x16x1xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<20x40xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x48x16x1xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [20, 40], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<20x40xf32>> -> tensor<20x40xf32>
   %3 = tensor.empty() : tensor<2x48x16x1xf32>
   %pack = tensor.pack %2 padding_value(%cst : f32) inner_dims_pos = [0, 1] inner_tiles = [16, 1] into %3 : tensor<20x40xf32> -> tensor<2x48x16x1xf32>
@@ -1144,18 +1078,16 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "+avx512f", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 64 : index, target_triple = "x86_64-none-elf"}>
 func.func @pack_f16() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<20x40xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x48x16x1xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<20x40xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x48x16x1xf16>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [20, 40], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<20x40xf16>> -> tensor<20x40xf16>
   %3 = tensor.empty() : tensor<2x48x16x1xf16>
   %pack = tensor.pack %2 padding_value(%cst : f16) inner_dims_pos = [0, 1] inner_tiles = [16, 1] into %3 : tensor<20x40xf16> -> tensor<2x48x16x1xf16>
@@ -1172,17 +1104,15 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "+avx512f", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 64 : index, target_triple = "x86_64-none-elf"}>
 func.func @pack_many_elements() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1200x500000xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<31250x1200x16x1xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1200x500000xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<31250x1200x16x1xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1200, 500000], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1200x500000xf32>> -> tensor<1200x500000xf32>
   %3 = tensor.empty() : tensor<31250x1200x16x1xf32>
   %pack = tensor.pack %2 outer_dims_perm = [1, 0] inner_dims_pos = [1, 0] inner_tiles = [16, 1] into %3 : tensor<1200x500000xf32> -> tensor<31250x1200x16x1xf32>
@@ -1199,12 +1129,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "+avx512f", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1) -> (d1)>
@@ -1213,9 +1141,9 @@
   %c0 = arith.constant 0 : index
   %cst = arith.constant 3.40282347E+38 : f32
   %cst_0 = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<24x32x16x16xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<512xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<24x512x16x1xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<24x32x16x16xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<512xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<24x512x16x1xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [24, 32, 16, 16], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<24x32x16x16xf32>> -> tensor<24x32x16x16xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0], sizes = [512], strides = [1] : !flow.dispatch.tensor<readonly:tensor<512xf32>> -> tensor<512xf32>
   %5 = tensor.empty() : tensor<24x512x16x1xf32>
@@ -1247,18 +1175,16 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1) -> (d0, d1)>
 func.func @elem_pack() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x384xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<16x384x8x1xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x384xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<16x384x8x1xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [128, 384], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<128x384xf32>> -> tensor<128x384xf32>
   %3 = tensor.empty() : tensor<128x384xf32>
   %4 = linalg.generic {indexing_maps = [#map, #map], iterator_types = ["parallel", "parallel"]} ins(%2 : tensor<128x384xf32>) outs(%3 : tensor<128x384xf32>) {
@@ -1284,11 +1210,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu = "generic", cpu_features = "+avx2", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 64 : index, target_triple = "x86_64-none-elf", ukernels = false}>
 #map = affine_map<(d0, d1) -> (d1, d0)>
@@ -1297,8 +1221,8 @@
   %c1579008 = arith.constant 1579008 : index
   %c3147776 = arith.constant 3147776 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c1579008) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<30522x768xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c3147776) : !flow.dispatch.tensor<writeonly:tensor<1908x768x16x1xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c1579008) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<30522x768xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c3147776) : !flow.dispatch.tensor<writeonly:tensor<1908x768x16x1xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [30522, 768], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<30522x768xf32>> -> tensor<30522x768xf32>
   %3 = tensor.empty() : tensor<768x30522xf32>
   %4 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel"]} ins(%2 : tensor<30522x768xf32>) outs(%3 : tensor<768x30522xf32>) {
@@ -1323,14 +1247,12 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>,
-    #hal.descriptor_set.binding<4, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu = "generic", cpu_features = "+avx2", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 64 : index, target_triple = "x86_64-none-elf", ukernels = false}>
 #map = affine_map<(d0, d1) -> (d0, d1)>
@@ -1341,11 +1263,11 @@
   %cst = arith.constant -0.000000e+00 : f32
   %cst_0 = arith.constant 1.024000e+03 : f32
   %cst_1 = arith.constant 9.99999996E-13 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<384x1024xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<384xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1024xf32>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1024xf32>>
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(4) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<24x1024x16x1xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<384x1024xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<384xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1024xf32>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1024xf32>>
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(4) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<24x1024x16x1xf32>>
   %5 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [384, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<384x1024xf32>> -> tensor<384x1024xf32>
   %6 = flow.dispatch.tensor.load %1, offsets = [0], sizes = [384], strides = [1] : !flow.dispatch.tensor<readonly:tensor<384xf32>> -> tensor<384xf32>
   %7 = flow.dispatch.tensor.load %2, offsets = [0], sizes = [1024], strides = [1] : !flow.dispatch.tensor<readonly:tensor<1024xf32>> -> tensor<1024xf32>
@@ -1396,12 +1318,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<4, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu = "generic", cpu_features = "+avx2", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 64 : index, target_triple = "x86_64-none-elf", ukernels = false}>
 #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
@@ -1409,9 +1329,9 @@
 func.func @reduction_pack() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c0 = arith.constant 0 : index
   %cst = arith.constant -0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<384x1024x32xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<384x1024xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(4) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x24x16x1xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<384x1024x32xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<384x1024xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x24x16x1xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [384, 1024, 32], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<384x1024x32xf32>> -> tensor<384x1024x32xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [384, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<384x1024xf32>> -> tensor<384x1024xf32>
   %5 = tensor.empty() : tensor<1024x24x16x1xf32>
@@ -1445,18 +1365,16 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf"}>
 func.func @unpack_static() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c41943040 = arith.constant 41943040 : index
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c41943040) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x256x16x16xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x4096xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c41943040) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x256x16x16xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x4096xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [64, 256, 16, 16], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<64x256x16x16xf32>> -> tensor<64x256x16x16xf32>
   %3 = tensor.empty() : tensor<1024x4096xf32>
   %unpack = tensor.unpack %2 inner_dims_pos = [0, 1] inner_tiles = [16, 16] into %3 : tensor<64x256x16x16xf32> -> tensor<1024x4096xf32>
@@ -1473,12 +1391,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1) -> (d0)>
@@ -1486,9 +1402,9 @@
 #map2 = affine_map<(d0, d1) -> (d0, d1)>
 func.func @unpack_elem() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<48x64x8x2xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x384xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<48x64x8x2xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x384xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [48, 64, 8, 2], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<48x64x8x2xf32>> -> tensor<48x64x8x2xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0], sizes = [128], strides = [1] : !flow.dispatch.tensor<readonly:tensor<128xf32>> -> tensor<128xf32>
   %5 = tensor.empty() : tensor<128x384xf32>
@@ -1512,13 +1428,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu = "generic", cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf"}>
 #map = affine_map<(d0, d1) -> (d1)>
@@ -1529,10 +1443,10 @@
   %c-128_i32 = arith.constant -128 : i32
   %c127_i32 = arith.constant 127 : i32
   %c0_i32 = arith.constant 0 : i32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2304x24xi8>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<24x144xi8>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<144xi32>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2304x144xi8>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2304x24xi8>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<24x144xi8>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<144xi32>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2304x144xi8>>
   %4 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2304, 24], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2304x24xi8>> -> tensor<2304x24xi8>
   %5 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [24, 144], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<24x144xi8>> -> tensor<24x144xi8>
   %6 = flow.dispatch.tensor.load %2, offsets = [0], sizes = [144], strides = [1] : !flow.dispatch.tensor<readonly:tensor<144xi32>> -> tensor<144xi32>
@@ -1560,19 +1474,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu = "generic", cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-unknown-unknown-eabi-elf", ukernels = false}>
 func.func @test() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c0 = arith.constant 0 : index
   %c6364136223846793005_i64 = arith.constant 6364136223846793005 : i64
   %c1442695040888963407_i64 = arith.constant 1442695040888963407 : i64
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<i64>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<i64>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<i64>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<i64>>
   %2 = flow.dispatch.tensor.load %0, offsets = [], sizes = [], strides = [] : !flow.dispatch.tensor<readonly:tensor<i64>> -> tensor<i64>
   %extracted = tensor.extract %2[] : tensor<i64>
   %3 = arith.muli %extracted, %c6364136223846793005_i64 : i64
@@ -1588,12 +1500,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_system_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "system-elf-x86_64", {cpu = "cascadelake", cpu_features = "+avx512f", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", link_embedded = false, native_vector_size = 64 : index, target_triple = "x86_64-unknown-linux-gnu", ukernels = false}>
 #map = affine_map<(d0, d1) -> (d0, d1)>
@@ -1601,9 +1511,9 @@
 func.func @non_trivial_program() attributes {hal.executable.target = #executable_target_system_elf_x86_64_} {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128x1x128x1xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128x1xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x1xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128x1x128x1xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128x1xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x1xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [128, 1, 128, 1], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<128x1x128x1xf32>> -> tensor<128x1x128x1xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [128, 1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<128x1xf32>> -> tensor<128x1xf32>
   %5 = tensor.empty() : tensor<1x1xf32>
@@ -1631,12 +1541,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 4, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 4, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu = "cascadelake", cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 32 : index, target_triple = "x86_64-unknown-unknown-eabi-elf", ukernels = true}>
 func.func @batch_mmt4d() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
@@ -1657,9 +1565,9 @@
   %11 = arith.shli %10, %c32_i64 : i64
   %12 = arith.ori %9, %11 : i64
   %13 = arith.index_castui %12 : i64 to index
-  %14 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128x10x32x8x1xf32>>
-  %15 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%8) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128x80x32x4x1xf32>>
-  %16 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%13) : !flow.dispatch.tensor<writeonly:tensor<128x10x80x8x4xf32>>
+  %14 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128x10x32x8x1xf32>>
+  %15 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%8) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128x80x32x4x1xf32>>
+  %16 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%13) : !flow.dispatch.tensor<writeonly:tensor<128x10x80x8x4xf32>>
   %17 = flow.dispatch.tensor.load %14, offsets = [0, 0, 0, 0, 0], sizes = [128, 10, 32, 8, 1], strides = [1, 1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<128x10x32x8x1xf32>> -> tensor<128x10x32x8x1xf32>
   %18 = flow.dispatch.tensor.load %15, offsets = [0, 0, 0, 0, 0], sizes = [128, 80, 32, 4, 1], strides = [1, 1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<128x80x32x4x1xf32>> -> tensor<128x80x32x4x1xf32>
   %19 = tensor.empty() : tensor<128x10x80x8x4xf32>
@@ -1676,20 +1584,18 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu = "cascadelake", cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 64 : index, target_triple = "x86_64-unknown-unknown-eabi-elf"}>
 func.func @mmt4d_with_large_reduction() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<7x18176x16x1xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<284x18176x16x1xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<7x284x16x16xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<7x18176x16x1xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<284x18176x16x1xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<7x284x16x16xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [7, 18176, 16, 1], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<7x18176x16x1xf32>> -> tensor<7x18176x16x1xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [284, 18176, 16, 1], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<284x18176x16x1xf32>> -> tensor<284x18176x16x1xf32>
   %5 = tensor.empty() : tensor<7x284x16x16xf32>
@@ -1706,19 +1612,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu = "generic", cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf"}>
 func.func @pad_only() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c634816 = arith.constant 634816 : index
   %c3846080 = arith.constant 3846080 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c634816) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x112x112x64xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c3846080) : !flow.dispatch.tensor<writeonly:tensor<1x114x114x64xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c634816) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x112x112x64xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c3846080) : !flow.dispatch.tensor<writeonly:tensor<1x114x114x64xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 112, 112, 64], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x112x112x64xf32>> -> tensor<1x112x112x64xf32>
   %padded = tensor.pad %2 low[0, 1, 1, 0] high[0, 1, 1, 0] {
   ^bb0(%arg0: index, %arg1: index, %arg2: index, %arg3: index):
@@ -1738,11 +1642,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {
       cpu = "generic", cpu_features = "",
@@ -1750,8 +1652,8 @@
       native_vector_size = 64 : index, target_triple = "x86_64-none-elf"}>
 func.func @winograd_output_transform() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<8x8x2x6x6x128xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x36x36x128xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<8x8x2x6x6x128xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x36x36x128xf16>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0, 0, 0], sizes = [8, 8, 2, 6, 6, 128], strides = [1, 1, 1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<8x8x2x6x6x128xf16>> -> tensor<8x8x2x6x6x128xf16>
   %3 = tensor.empty() : tensor<2x36x36x128xf16>
   %4 = iree_linalg_ext.winograd.output_transform output_tile_size(6) kernel_size(3) image_dimensions([1, 2]) ins(%2 : tensor<8x8x2x6x6x128xf16>) outs(%3 : tensor<2x36x36x128xf16>) -> tensor<2x36x36x128xf16>
@@ -1767,11 +1669,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {
       cpu = "generic", cpu_features = "",
@@ -1779,8 +1679,8 @@
       native_vector_size = 64 : index, target_triple = "x86_64-none-elf"}>
 func.func @winograd_input_transform() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x34x34x128xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8x8x2x6x6x128xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x34x34x128xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8x8x2x6x6x128xf16>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [2, 34, 34, 128], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2x34x34x128xf16>> -> tensor<2x34x34x128xf16>
   %3 = tensor.empty() : tensor<8x8x2x6x6x128xf16>
   %4 = iree_linalg_ext.winograd.input_transform output_tile_size(6) kernel_size(3) image_dimensions([1, 2]) ins(%2 : tensor<2x34x34x128xf16>) outs(%3 : tensor<8x8x2x6x6x128xf16>) -> tensor<8x8x2x6x6x128xf16>
@@ -1796,11 +1696,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {
       cpu = "generic", cpu_features = "",
@@ -1808,8 +1706,8 @@
       native_vector_size = 64 : index, target_triple = "x86_64-none-elf"}>
 func.func @winograd_filter_transform() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<3x3x64x128xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8x8x64x128xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<3x3x64x128xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8x8x64x128xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [3, 3, 64, 128], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x64x128xf32>> -> tensor<3x3x64x128xf32>
   %3 = tensor.empty() : tensor<8x8x64x128xf32>
   %4 = iree_linalg_ext.winograd.filter_transform output_tile_size(6) kernel_size(3) kernel_dimensions([0, 1]) ins(%2 : tensor<3x3x64x128xf32>) outs(%3 : tensor<8x8x64x128xf32>) -> tensor<8x8x64x128xf32>
@@ -1825,13 +1723,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {
       cpu = "generic", cpu_features = "",
@@ -1840,10 +1736,10 @@
 func.func @attention() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c0 = arith.constant 0 : index
   %scale = arith.constant 0.125 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<20x4096x64xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<20x4096x64xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<20x4096x64xf16>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<20x4096x64xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<20x4096x64xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<20x4096x64xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<20x4096x64xf16>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<20x4096x64xf16>>
   %4 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [20, 4096, 64], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<20x4096x64xf16>> -> tensor<20x4096x64xf16>
   %5 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [20, 4096, 64], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<20x4096x64xf16>> -> tensor<20x4096x64xf16>
   %6 = flow.dispatch.tensor.load %2, offsets = [0, 0, 0], sizes = [20, 4096, 64], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<20x4096x64xf16>> -> tensor<20x4096x64xf16>
@@ -1866,13 +1762,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {
       cpu = "generic", cpu_features = "",
@@ -1880,10 +1774,10 @@
       native_vector_size = 64 : index, target_triple = "x86_64-none-elf"}>
 func.func @elementwise_output_transposed() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<i64>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<768xi64>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32xi64>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32x32x768xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<i64>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<768xi64>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32xi64>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32x32x768xf32>>
   %4 = flow.dispatch.tensor.load %0, offsets = [], sizes = [], strides = [] : !flow.dispatch.tensor<readonly:tensor<i64>> -> tensor<i64>
   %5 = flow.dispatch.tensor.load %1, offsets = [0], sizes = [768], strides = [1] : !flow.dispatch.tensor<readonly:tensor<768xi64>> -> tensor<768xi64>
   %6 = flow.dispatch.tensor.load %2, offsets = [0], sizes = [32], strides = [1] : !flow.dispatch.tensor<readonly:tensor<32xi64>> -> tensor<32xi64>
@@ -1916,9 +1810,9 @@
 module {
   func.func @test_mod_vectorizing_strategy_peeling() attributes {hal.executable.target = #executable_target_system_elf_x86_64_}{
     %c0 = arith.constant 0 : index
-    %0 = hal.interface.binding.subspan layout(<push_constants = 0, sets = [<0, bindings = [<0, storage_buffer, ReadOnly>, <1, storage_buffer, ReadOnly>, <2, storage_buffer>], flags = Indirect>]>) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<6xi32>>
-    %1 = hal.interface.binding.subspan layout(<push_constants = 0, sets = [<0, bindings = [<0, storage_buffer, ReadOnly>, <1, storage_buffer, ReadOnly>, <2, storage_buffer>], flags = Indirect>]>) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<6xi32>>
-    %2 = hal.interface.binding.subspan layout(<push_constants = 0, sets = [<0, bindings = [<0, storage_buffer, ReadOnly>, <1, storage_buffer, ReadOnly>, <2, storage_buffer>], flags = Indirect>]>) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<6xi32>>
+    %0 = hal.interface.binding.subspan layout(#hal.pipeline.layout<bindings = [#hal.pipeline.binding<storage_buffer, ReadOnly>, #hal.pipeline.binding<storage_buffer, ReadOnly>, #hal.pipeline.binding<storage_buffer>]>) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<6xi32>>
+    %1 = hal.interface.binding.subspan layout(#hal.pipeline.layout<bindings = [#hal.pipeline.binding<storage_buffer, ReadOnly>, #hal.pipeline.binding<storage_buffer, ReadOnly>, #hal.pipeline.binding<storage_buffer>]>) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<6xi32>>
+    %2 = hal.interface.binding.subspan layout(#hal.pipeline.layout<bindings = [#hal.pipeline.binding<storage_buffer, ReadOnly>, #hal.pipeline.binding<storage_buffer, ReadOnly>, #hal.pipeline.binding<storage_buffer>]>) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<6xi32>>
     %3 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [6], strides = [1] : !flow.dispatch.tensor<readonly:tensor<6xi32>> -> tensor<6xi32>
     %4 = flow.dispatch.tensor.load %1, offsets = [0], sizes = [6], strides = [1] : !flow.dispatch.tensor<readonly:tensor<6xi32>> -> tensor<6xi32>
     %5 = tensor.empty() : tensor<6xi32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/tile_and_fuse.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/tile_and_fuse.mlir
index 284dd8c..44c0292 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/tile_and_fuse.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/tile_and_fuse.mlir
@@ -101,12 +101,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @shared_out_operand() {
   %cst = arith.constant 0.000000e+00 : f32
@@ -117,10 +115,10 @@
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : i32
   %2 = arith.index_castui %0 {stream.alignment = 1024 : index, stream.values = [205824 : index, 795648 : index, 1385472 : index, 1975296 : index, 2565120 : index, 3154944 : index, 3744768 : index]} : i32 to index
   %3 = arith.index_castui %1 {stream.alignment = 1024 : index, stream.values = [0 : index, 3072 : index, 6144 : index, 9216 : index, 12288 : index, 15360 : index, 18432 : index]} : i32 to index
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<391x384xf32>>
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%2) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<384x384xf32>>
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%3) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<384xf32>>
-  %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c600576) : !flow.dispatch.tensor<writeonly:tensor<391x384xf32>>
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<391x384xf32>>
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%2) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<384x384xf32>>
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%3) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<384xf32>>
+  %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c600576) : !flow.dispatch.tensor<writeonly:tensor<391x384xf32>>
   %8 = flow.dispatch.tensor.load %4, offsets = [0, 0], sizes = [391, 384], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<391x384xf32>> -> tensor<391x384xf32>
   %9 = flow.dispatch.tensor.load %5, offsets = [0, 0], sizes = [384, 384], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<384x384xf32>> -> tensor<384x384xf32>
   %10 = flow.dispatch.tensor.load %6, offsets = [0], sizes = [384], strides = [1] : !flow.dispatch.tensor<readonly:tensor<384xf32>> -> tensor<384xf32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/transform_dialect_bufferize.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/transform_dialect_bufferize.mlir
index 2c8388b..a2944d9 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/transform_dialect_bufferize.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/transform_dialect_bufferize.mlir
@@ -1,18 +1,16 @@
 // RUN: iree-opt %s --iree-transform-dialect-interpreter --transform-dialect-drop-schedule | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu_features = "", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", native_vector_size = 16 : index, target_triple = "x86_64-none-elf"}>
 func.func @pad_matmul_static_dispatch_0() attributes {hal.executable.target = #executable_target_embedded_elf_x86_64_} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<250x500xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<500x1020xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<250x1020xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<250x500xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<500x1020xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<250x1020xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [250, 500], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<250x500xf32>> -> tensor<250x500xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [500, 1020], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<500x1020xf32>> -> tensor<500x1020xf32>
   %5 = tensor.empty() : tensor<250x1020xf32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/vector_lowering.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/vector_lowering.mlir
index 71adee3..6f6ab10 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/vector_lowering.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/vector_lowering.mlir
@@ -1,12 +1,10 @@
 // RUN: iree-opt --pass-pipeline="builtin.module(func.func(iree-codegen-llvmcpu-vector-lowering-pipeline))" --split-input-file %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_391x384x384_f32() {
   %cst = arith.constant 0.000000e+00 : f32
@@ -19,13 +17,13 @@
   %cst_0 = arith.constant dense<0.000000e+00> : vector<8x32xf32>
   %cst_1 = arith.constant dense<6.000000e+00> : vector<8x32xf32>
   %alloca = memref.alloca() {alignment = 64 : i64} : memref<8x32xf32>
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<391x384xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<391x384xf32>
   memref.assume_alignment %0, 64 : memref<391x384xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : memref<384x384xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : memref<384x384xf32>
   memref.assume_alignment %1, 64 : memref<384x384xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : memref<384xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : memref<384xf32>
   memref.assume_alignment %2, 64 : memref<384xf32>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : memref<391x384xf32>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : memref<391x384xf32>
   memref.assume_alignment %3, 64 : memref<391x384xf32>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -83,13 +81,11 @@
 // Check that vector.loads whose elements are extracted and
 // consumed in a scalar fashion are scalarized.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_scalar_loads() {
   %cst = arith.constant 0.000000e+00 : f32
@@ -102,13 +98,13 @@
   %cst_0 = arith.constant dense<0.000000e+00> : vector<8x32xf32>
   %cst_1 = arith.constant dense<6.000000e+00> : vector<8x32xf32>
   %alloca = memref.alloca() {alignment = 64 : i64} : memref<8x32xf32>
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<391x384xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<391x384xf32>
   memref.assume_alignment %0, 64 : memref<391x384xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : memref<384x384xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : memref<384x384xf32>
   memref.assume_alignment %1, 64 : memref<384x384xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : memref<384xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : memref<384xf32>
   memref.assume_alignment %2, 64 : memref<384xf32>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : memref<391x384xf32>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : memref<391x384xf32>
   memref.assume_alignment %3, 64 : memref<391x384xf32>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -146,16 +142,14 @@
 
 // Make sure we don't transpose a mask but create a transposed mask instead.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @transpose_mask() {
   %a = arith.constant 4 : index
   %b = arith.constant 8 : index
   %c0 = arith.constant 0 : index
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<4x2xi1>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<4x2xi1>
   %mask = vector.create_mask %a, %b : vector<2x4xi1>
   %transpose_mask = vector.transpose %mask, [1, 0] : vector<2x4xi1> to vector<4x2xi1>
   vector.transfer_write %transpose_mask, %3[%c0, %c0] {in_bounds = [true, true]} : vector<4x2xi1>, memref<4x2xi1>
@@ -174,12 +168,10 @@
 // Make sure that the gather patterns get rid of vector.gather over strided
 // memref.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @gather_strided_memref() {
   %cst = arith.constant dense<0.000000e+00> : vector<4xf32>
@@ -187,11 +179,11 @@
   %c0_i32 = arith.constant 0 : i32
   %c4 = arith.constant 4 : index
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<2592000x3xf32, #hal.descriptor_type<storage_buffer>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<2592000x3xf32, #hal.descriptor_type<storage_buffer>>
   memref.assume_alignment %0, 64 : memref<2592000x3xf32, #hal.descriptor_type<storage_buffer>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : memref<518400xi32, #hal.descriptor_type<storage_buffer>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : memref<518400xi32, #hal.descriptor_type<storage_buffer>>
   memref.assume_alignment %1, 64 : memref<518400xi32, #hal.descriptor_type<storage_buffer>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<518400xf32, #hal.descriptor_type<storage_buffer>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<518400xf32, #hal.descriptor_type<storage_buffer>>
   memref.assume_alignment %2, 64 : memref<518400xf32, #hal.descriptor_type<storage_buffer>>
   %subview = memref.subview %0[0, 0] [2592000, 1] [1, 1] : memref<2592000x3xf32, #hal.descriptor_type<storage_buffer>> to memref<2592000xf32, strided<[3]>, #hal.descriptor_type<storage_buffer>>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/vectorize_with_masking_and_hoist.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/vectorize_with_masking_and_hoist.mlir
index 2e9b3c4..6d5e65a 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/vectorize_with_masking_and_hoist.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/vectorize_with_masking_and_hoist.mlir
@@ -34,12 +34,10 @@
 // CHECK-NEXT:  %[[INSERT_SLICE:.*]] = tensor.insert_slice %[[OUT_WRITE]] into %[[OUT_SLICE]]{{.*}} : tensor<8x?xf32> into tensor<8x?xf32>
 // CHECK-NEXT:  tensor.insert_slice %[[INSERT_SLICE]] into %[[OUT_TENSOR_1]]{{.*}} : tensor<8x?xf32> into tensor<1024x1024xf32>
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @pipeline() {
   %c1 = arith.constant 1 : index
@@ -47,9 +45,9 @@
   %c16 = arith.constant 16 : index
   %c8 = arith.constant 8 : index
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1024x1024xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1024x1024xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<1024x1024xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1024x1024xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1024x1024xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<1024x1024xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1024, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1024x1024xf32>> -> tensor<1024x1024xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [1024, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1024x1024xf32>> -> tensor<1024x1024xf32>
   %5 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [1024, 1024], strides = [1, 1] : !flow.dispatch.tensor<readwrite:tensor<1024x1024xf32>> -> tensor<1024x1024xf32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/verify_linalg_transform_legality.mlir b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/verify_linalg_transform_legality.mlir
index 5d7e055..9aa61c2 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMCPU/test/verify_linalg_transform_legality.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMCPU/test/verify_linalg_transform_legality.mlir
@@ -1,18 +1,16 @@
 // RUN: iree-opt --pass-pipeline="builtin.module(func.func(iree-llvmcpu-verify-linalg-transform-legality))" %s --verify-diagnostics -split-input-file
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_123x456xf32_times_456x789xf32_into_123x789xf32_dispatch_0() {
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<123x4x114xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4x114x789xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4x123x789xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<123x4x114xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4x114x789xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4x123x789xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [123, 4, 114], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<123x4x114xf32>> -> tensor<123x4x114xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [4, 114, 789], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4x114x789xf32>> -> tensor<4x114x789xf32>
   %5 = tensor.empty() : tensor<4x123x789xf32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp b/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp
index a111245..e5f1149 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp
@@ -202,8 +202,6 @@
   }
 };
 
-using SetBinding = std::pair<APInt, APInt>;
-
 /// Convention with the HAL side to pass kernel arguments.
 /// The bindings are ordered based on binding set and binding index then
 /// compressed and mapped to dense set of arguments.
@@ -211,21 +209,16 @@
 /// InterfaceBindingOp and kernel argument index.
 /// For instance if the kernel has (set, bindings) A(0, 1), B(1, 5), C(0, 6) it
 /// will return the mapping [A, 0], [C, 1], [B, 2]
-static llvm::SmallDenseMap<SetBinding, size_t>
+static llvm::SmallDenseMap<APInt, size_t>
 getKernelArgMapping(Operation *funcOp) {
-  llvm::SetVector<SetBinding> usedBindingSet;
+  llvm::SetVector<APInt> usedBindingSet;
   funcOp->walk([&](IREE::HAL::InterfaceBindingSubspanOp subspanOp) {
-    usedBindingSet.insert(
-        SetBinding(subspanOp.getSet(), subspanOp.getBinding()));
+    usedBindingSet.insert(subspanOp.getBinding());
   });
   auto sparseBindings = usedBindingSet.takeVector();
   std::sort(sparseBindings.begin(), sparseBindings.end(),
-            [](SetBinding lhs, SetBinding rhs) {
-              if (lhs.first == rhs.first)
-                return lhs.second.ult(rhs.second);
-              return lhs.first.ult(rhs.first);
-            });
-  llvm::SmallDenseMap<SetBinding, size_t> mapBindingArgIndex;
+            [](APInt lhs, APInt rhs) { return lhs.ult(rhs); });
+  llvm::SmallDenseMap<APInt, size_t> mapBindingArgIndex;
   for (auto [index, binding] : llvm::enumerate(sparseBindings)) {
     mapBindingArgIndex[binding] = index;
   }
@@ -263,8 +256,7 @@
       } else {
         llvmType = LLVM::LLVMPointerType::get(rewriter.getContext());
       }
-      llvmInputTypes[argMapping[SetBinding(subspanOp.getSet(),
-                                           subspanOp.getBinding())]] = llvmType;
+      llvmInputTypes[argMapping[subspanOp.getBinding()]] = llvmType;
     });
     // As a convention with HAL, push constants are appended as kernel arguments
     // after all the binding inputs.
@@ -353,8 +345,8 @@
         operands, op->getAttrDictionary());
     MemRefType memrefType =
         llvm::dyn_cast<MemRefType>(subspanOp.getResult().getType());
-    mlir::BlockArgument llvmBufferArg = llvmFuncOp.getArgument(
-        argMapping[SetBinding(subspanOp.getSet(), subspanOp.getBinding())]);
+    mlir::BlockArgument llvmBufferArg =
+        llvmFuncOp.getArgument(argMapping[subspanOp.getBinding()]);
     // As a convention with HAL all the kernel argument pointers are 16Bytes
     // aligned.
     llvmFuncOp.setArgAttr(llvmBufferArg.getArgNumber(),
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/TransformExtensions/LLVMGPUExtensionsOps.td b/compiler/src/iree/compiler/Codegen/LLVMGPU/TransformExtensions/LLVMGPUExtensionsOps.td
index 0d361fe..ac3e7ee 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/TransformExtensions/LLVMGPUExtensionsOps.td
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/TransformExtensions/LLVMGPUExtensionsOps.td
@@ -165,7 +165,7 @@
       func.func @foo() {
         %c0 = arith.constant 0 : index
         %c1 = arith.constant 1 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<128xf32>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<128xf32>
         %1 = gpu.thread_id  x
         %2 = arith.cmpi ult, %1, %c1 : index
         scf.if %2 {
@@ -186,7 +186,7 @@
         %c4 = arith.constant 4 : index
         %c32 = arith.constant 32 : index
         %cst = arith.constant dense<1.000000e+00> : vector<128xf32>
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<128xf32>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<128xf32>
         %1 = gpu.thread_id  x
         %2 = arith.cmpi ult, %1, %c32 : index
         // Single-warp guard filters out threads 32-63.
@@ -266,7 +266,7 @@
         %c4 = arith.constant 4 : index
         %c32 = arith.constant 32 : index
         %cst = arith.constant dense<1.000000e+00> : vector<128xf32>
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<128xf32>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<128xf32>
         %1 = gpu.thread_id  x
         %2 = arith.cmpi ult, %1, %c32 : index
         // Single-warp guard filters out threads 32-63.
@@ -290,7 +290,7 @@
         %c4 = arith.constant 4 : index
         %c32 = arith.constant 32 : index
         %cst = arith.constant dense<1.000000e+00> : vector<128xf32>
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<128xf32>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<128xf32>
         %1 = gpu.thread_id  x
         %2 = arith.cmpi ult, %1, %c32 : index
         // Single-warp guard filters out threads 32-63.
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/config_user_vector_distribute.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/config_user_vector_distribute.mlir
index b7ca495..8cb1b2e 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/config_user_vector_distribute.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/config_user_vector_distribute.mlir
@@ -16,12 +16,10 @@
 // OPT-IN:       #[[$TRANSLATION:.+]] = #iree_codegen.translation_info<LLVMGPUVectorDistribute workgroup_size = [128, 2, 1] subgroup_size = 64
 // OPT-IN-SAME:    mma_schedule = #iree_gpu.mma_schedule<intrinsic = #iree_gpu.mma_layout<MFMA_F32_16x16x16_F16>,
 // OPT-IN-SAME:    no_reduce_shared_memory_bank_conflicts
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable public @main_0_dispatch_0 {
   hal.executable.variant public @rocm_hsaco_fb target(<"rocm", "rocm-hsaco-fb">) {
@@ -54,9 +52,9 @@
         }>} {
         %cst = arith.constant 0.000000e+00 : f16
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x1280xf16>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<10240x1280xf16>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x10240xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x1280xf16>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<10240x1280xf16>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x10240xf32>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2048, 1280], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2048x1280xf16>> -> tensor<2048x1280xf16>
         %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [10240, 1280], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<10240x1280xf16>> -> tensor<10240x1280xf16>
         %5 = tensor.empty() : tensor<2048x10240xf32>
@@ -92,12 +90,10 @@
 // OPT-IN:       #[[$TRANSLATION:.+]] = #iree_codegen.translation_info<LLVMGPUVectorDistribute workgroup_size = [128, 2, 1] subgroup_size = 64
 // OPT-IN-SAME:    mma_schedule = #iree_gpu.mma_schedule<intrinsic = #iree_gpu.mma_layout<MFMA_F32_16x16x16_F16>,
 // OPT-IN-SAME:    reorder_workgroups = "transpose"
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable public @main_0_dispatch_0 {
   hal.executable.variant public @rocm_hsaco_fb target(<"rocm", "rocm-hsaco-fb">) {
@@ -131,9 +127,9 @@
         }>} {
         %cst = arith.constant 0.000000e+00 : f16
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x1280xf16>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<10240x1280xf16>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x10240xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x1280xf16>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<10240x1280xf16>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x10240xf32>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2048, 1280], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2048x1280xf16>> -> tensor<2048x1280xf16>
         %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [10240, 1280], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<10240x1280xf16>> -> tensor<10240x1280xf16>
         %5 = tensor.empty() : tensor<2048x10240xf32>
@@ -164,12 +160,10 @@
 // OPT-OUT:       #[[$TRANSLATION:.+]] = #iree_codegen.translation_info<LLVMGPUVectorDistribute workgroup_size = [128, 2, 1] subgroup_size = 64
 // OPT-OUT-SAME:    mma_schedule = #iree_gpu.mma_schedule<intrinsic = #iree_gpu.mma_layout<MFMA_F32_16x16x16_F16>,
 // OPT-OUT-SAME:    reorder_workgroups = "none"
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable public @main_0_dispatch_0 {
   hal.executable.variant public @rocm_hsaco_fb target(<"rocm", "rocm-hsaco-fb">) {
@@ -192,9 +186,9 @@
         }>} {
         %cst = arith.constant 0.000000e+00 : f16
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x1280xf16>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<10240x1280xf16>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x10240xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x1280xf16>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<10240x1280xf16>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x10240xf32>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2048, 1280], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2048x1280xf16>> -> tensor<2048x1280xf16>
         %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [10240, 1280], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<10240x1280xf16>> -> tensor<10240x1280xf16>
         %5 = tensor.empty() : tensor<2048x10240xf32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/config_vector_distribute.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/config_vector_distribute.mlir
index 3450851..710168c 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/config_vector_distribute.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/config_vector_distribute.mlir
@@ -13,12 +13,10 @@
 // CHECK-SAME:   intrinsic = #iree_gpu.mma_layout<MFMA_F32_16x16x16_F16>
 // CHECK-SAME:   subgroup_m_count = 1, subgroup_n_count = 4
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2, d3, d4) -> (d0, d2, d4)>
 #map1 = affine_map<(d0, d1, d2, d3, d4) -> (d1, d3, d4)>
@@ -26,9 +24,9 @@
 func.func @expanded_matmul_transpose_b() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x64x2048xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<10x64x2048xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x10x64x64xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x64x2048xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<10x64x2048xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x10x64x64xf16>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [2, 64, 2048], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2x64x2048xf16>> -> tensor<2x64x2048xf16>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [10, 64, 2048], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<10x64x2048xf16>> -> tensor<10x64x2048xf16>
   %5 = tensor.empty() : tensor<2x10x64x64xf16>
@@ -54,19 +52,17 @@
 // CHECK-SAME:   intrinsic = #iree_gpu.mma_layout<MFMA_F32_16x16x16_F16>
 // CHECK-SAME:   subgroup_m_count = 2, subgroup_n_count = 2
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @conv_nhwc() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x258x514x768xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x3x768x256xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x256x512x256xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x258x514x768xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x3x768x256xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x256x512x256xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [2, 258, 514, 768], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2x258x514x768xf16>> -> tensor<2x258x514x768xf16>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [3, 3, 768, 256], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x768x256xf16>> -> tensor<3x3x768x256xf16>
   %5 = tensor.empty() : tensor<2x256x512x256xf32>
@@ -81,12 +77,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #target = #iree_gpu.target<arch = "gfx940", features = "", wgp = <
   compute = fp64|fp32|fp16|int64|int32|int16|int8, storage = b64|b32|b16|b8,
@@ -99,9 +93,9 @@
 func.func @matmul_256x256x256() attributes {hal.executable.target = #executable_target_rocm_hsaco_fb} {
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<256x256xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<256x256xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [256, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x256xf16>> -> tensor<256x256xf16>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [256, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x256xf16>> -> tensor<256x256xf16>
   %5 = tensor.empty() : tensor<256x256xf32>
@@ -123,19 +117,17 @@
 // CHECK-SAME:   intrinsic = #iree_gpu.mma_layout<MFMA_F32_16x16x16_F16>
 // CHECK-SAME:   subgroup_m_count = 2, subgroup_n_count = 2
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @mfma_matmul_1024x1024x1024() {
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1024x1024xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1024x1024xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x1024xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1024x1024xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1024x1024xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x1024xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1024, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1024x1024xf16>> -> tensor<1024x1024xf16>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [1024, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1024x1024xf16>> -> tensor<1024x1024xf16>
   %5 = tensor.empty() : tensor<1024x1024xf32>
@@ -156,12 +148,10 @@
 // CHECK-SAME:   intrinsic = #iree_gpu.mma_layout<MFMA_F32_16x16x16_F16>
 // CHECK-SAME:   subgroup_m_count = 2, subgroup_n_count = 2
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[1, 1, 1, 32, 0, 1, 1, 1, 0]]>
 #map = affine_map<(d0, d1, d2, d3, d4, d5, d6, d7, d8) -> (d0, d5, d2 + d6, d3 + d7, d8)>
@@ -171,9 +161,9 @@
 func.func @conv_nchwc() {
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x20x34x34x64xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<8x20x3x3x160x64xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x8x32x32x160xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x20x34x34x64xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<8x20x3x3x160x64xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x8x32x32x160xf16>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0, 0], sizes = [2, 20, 34, 34, 64], strides = [1, 1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2x20x34x34x64xf16>> -> tensor<2x20x34x34x64xf16>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0, 0, 0], sizes = [8, 20, 3, 3, 160, 64], strides = [1, 1, 1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<8x20x3x3x160x64xf16>> -> tensor<8x20x3x3x160x64xf16>
   %5 = tensor.empty() : tensor<2x8x32x32x160xf32>
@@ -207,19 +197,17 @@
 // WMMA-SAME:   intrinsic = #iree_gpu.mma_layout<WMMA_F32_16x16x16_F16>
 // WMMA-SAME:   subgroup_m_count = 2, subgroup_n_count = 2
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @wmma_matmul_1024x1024x1024() {
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1024x1024xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1024x1024xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x1024xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1024x1024xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1024x1024xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024x1024xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1024, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1024x1024xf16>> -> tensor<1024x1024xf16>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [1024, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1024x1024xf16>> -> tensor<1024x1024xf16>
   %5 = tensor.empty() : tensor<1024x1024xf32>
@@ -240,19 +228,17 @@
 // CHECK-SAME:   intrinsic =  #iree_gpu.mma_layout<MFMA_F32_16x16x16_F16>
 // CHECK-SAME:   subgroup_m_count = 1, subgroup_n_count = 1
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @unaligned_mk_batch_matmul() {
   %cst = arith.constant 0.000000e+00 : f16
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x968x1281xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x1281x1281xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64x968x1281xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x968x1281xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x1281x1281xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64x968x1281xf16>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [64, 968, 1281], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<64x968x1281xf16>> -> tensor<64x968x1281xf16>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [64, 1281, 1281], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<64x1281x1281xf16>> -> tensor<64x1281x1281xf16>
   %5 = tensor.empty() : tensor<64x968x1281xf16>
@@ -273,19 +259,17 @@
 // CHECK-SAME:   intrinsic =  #iree_gpu.mma_layout<MFMA_F32_16x16x16_F16>
 // CHECK-SAME:   subgroup_m_count = 1, subgroup_n_count = 4
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @unaligned_m_batch_matmul_64x72x1280x1280() {
   %cst = arith.constant 0.000000e+00 : f16
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x72x1280xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x1280x1280xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64x72x1280xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x72x1280xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x1280x1280xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64x72x1280xf16>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [64, 72, 1280], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<64x72x1280xf16>> -> tensor<64x72x1280xf16>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [64, 1280, 1280], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<64x1280x1280xf16>> -> tensor<64x1280x1280xf16>
   %5 = tensor.empty() : tensor<64x72x1280xf16>
@@ -300,19 +284,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @narrow_n_batch_matmul_64x968x4x320_f16() {
   %cst = arith.constant 0.000000e+00 : f16
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x968x320xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x320x4xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64x968x4xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x968x320xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x320x4xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64x968x4xf16>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [64, 968, 320], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<64x968x320xf16>> -> tensor<64x968x320xf16>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [64, 320, 4], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<64x320x4xf16>> -> tensor<64x320x4xf16>
   %5 = tensor.empty() : tensor<64x968x4xf16>
@@ -327,12 +309,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_dynamic_dim() {
   %c0 = arith.constant 0 : index
@@ -345,10 +325,10 @@
   %4 = arith.shli %3, %c32_i64 : i64
   %5 = arith.ori %2, %4 : i64
   %6 = arith.index_castui %5 : i64 to index
-  %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xf16>>
+  %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xf16>>
   %8 = flow.dispatch.workload.ordinal %6, 0 : index
-  %9 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x256xf16>>{%8}
-  %10 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<?x256xf32>>{%8}
+  %9 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x256xf16>>{%8}
+  %10 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<?x256xf32>>{%8}
   %11 = flow.dispatch.tensor.load %9, offsets = [0, 0], sizes = [%8, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x256xf16>>{%8} -> tensor<?x256xf16>
   %12 = flow.dispatch.tensor.load %7, offsets = [0, 0], sizes = [256, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x256xf16>> -> tensor<256x256xf16>
   %13 = tensor.empty(%8) : tensor<?x256xf32>
@@ -369,21 +349,19 @@
 
 // CHECK-LABEL: func.func @attention_20x4096x64x4096x64()
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @attention_20x4096x64x4096x64() {
   %cst = arith.constant 1.250000e-01 : f16
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<20x4096x64xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<20x4096x64xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<20x4096x64xf16>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<20x4096x64xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<20x4096x64xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<20x4096x64xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<20x4096x64xf16>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<20x4096x64xf16>>
   %4 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [20, 4096, 64], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<20x4096x64xf16>> -> tensor<20x4096x64xf16>
   %5 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [20, 4096, 64], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<20x4096x64xf16>> -> tensor<20x4096x64xf16>
   %6 = flow.dispatch.tensor.load %2, offsets = [0, 0, 0], sizes = [20, 4096, 64], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<20x4096x64xf16>> -> tensor<20x4096x64xf16>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/lowering_scalar_dispatch.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/lowering_scalar_dispatch.mlir
index 5c0ba99..41253b9 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/lowering_scalar_dispatch.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/lowering_scalar_dispatch.mlir
@@ -2,7 +2,10 @@
 
 #executable_target_rocm_hsaco_fb = #hal.executable.target<"rocm", "rocm-hsaco-fb">
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [<0, bindings = [<0, storage_buffer, ReadOnly>, <1, storage_buffer>]>]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer, ReadOnly>,
+  #hal.pipeline.binding<storage_buffer>
+]>
 
 hal.executable @scalar_dispatch {
   hal.executable.variant public @rocm_hsaco_fb target(#executable_target_rocm_hsaco_fb) {
@@ -16,8 +19,8 @@
         %c0 = arith.constant 0 : index
         %c6364136223846793005_i64 = arith.constant 6364136223846793005 : i64
         %c1442695040888963407_i64 = arith.constant 1442695040888963407 : i64
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<i64>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<i64>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<i64>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<i64>>
         %2 = flow.dispatch.tensor.load %0, offsets = [], sizes = [], strides = [] : !flow.dispatch.tensor<readonly:tensor<i64>> -> tensor<i64>
         %extracted = tensor.extract %2[] : tensor<i64>
         %3 = arith.muli %extracted, %c6364136223846793005_i64 : i64
@@ -32,8 +35,8 @@
 
 // CHECK-LABEL: func.func @scalar_dispatch()
 //  CHECK-SAME: translation_info = #iree_codegen.translation_info<LLVMGPUBaseLowering workgroup_size = [1, 1, 1]>
-//       CHECK:   %[[SPAN0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//       CHECK:   %[[SPAN1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//       CHECK:   %[[SPAN0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//       CHECK:   %[[SPAN1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //       CHECK:   memref.load %[[SPAN0]][] : memref<i64, #hal.descriptor_type<storage_buffer>>
 //       CHECK:   arith.muli {{.+}} : i64
 //       CHECK:   arith.addi {{.+}} : i64
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/pipeline_tile_and_fuse.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/pipeline_tile_and_fuse.mlir
index c7d242b..f4a7238 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/pipeline_tile_and_fuse.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/pipeline_tile_and_fuse.mlir
@@ -1,12 +1,10 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=gfx940 \
 // RUN:   --pass-pipeline="builtin.module(hal.executable(hal.executable.variant(builtin.module(func.func(iree-llvmgpu-lower-executable-target)))))" %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_gpu.lowering_config<{workgroup = [64, 64, 0], reduction = [0, 0, 4], thread = [8, 4]}>
 hal.executable public @main {
@@ -21,9 +19,9 @@
         attributes {translation_info = #iree_codegen.translation_info<LLVMGPUTileAndFuse workgroup_size = [128, 1, 1] subgroup_size = 64>} {
         %cst = arith.constant 0.000000e+00 : f16
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x1280xf16>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<10240x1280xf16>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x10240xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x1280xf16>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<10240x1280xf16>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x10240xf32>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2048, 1280], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2048x1280xf16>> -> tensor<2048x1280xf16>
         %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [10240, 1280], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<10240x1280xf16>> -> tensor<10240x1280xf16>
         %5 = tensor.empty() : tensor<2048x10240xf32>
@@ -42,9 +40,9 @@
 // analysis should be able to simplify the below to just two barriers.
 
 // CHECK-LABEL: func @matmul_transpose_b
-//   CHECK-DAG:   %[[B0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[B1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//   CHECK-DAG:   %[[B2:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//   CHECK-DAG:   %[[B0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[B1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//   CHECK-DAG:   %[[B2:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //   CHECK-DAG:   memref.alloc() : memref<64x8xf16, #gpu.address_space<workgroup>>
 //   CHECK-DAG:   memref.alloc() : memref<64x8xf16, #gpu.address_space<workgroup>>
 //       CHECK:   %[[LOOP:.+]] = scf.for %[[IV:.+]] = %c0 to %c1280 step %c4 {{.*}} -> (vector<8x4xf32>)
@@ -65,12 +63,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_gpu.lowering_config<{workgroup = [64, 64, 0], reduction = [0, 0, 2], subgroup = [2, 2], mma_kind = #iree_gpu.mma_layout<MFMA_F32_16x16x16_F16>}>
 hal.executable public @main {
@@ -85,9 +81,9 @@
         attributes {translation_info = #iree_codegen.translation_info<LLVMGPUTileAndFuse workgroup_size = [128, 2, 1] subgroup_size = 64>} {
         %cst = arith.constant 0.000000e+00 : f16
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x1280xf16>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<10240x1280xf16>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x10240xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x1280xf16>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<10240x1280xf16>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x10240xf32>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2048, 1280], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2048x1280xf16>> -> tensor<2048x1280xf16>
         %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [10240, 1280], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<10240x1280xf16>> -> tensor<10240x1280xf16>
         %5 = tensor.empty() : tensor<2048x10240xf32>
@@ -103,9 +99,9 @@
 }
 
 // CHECK-LABEL: func @matmul_transpose_b_mfma
-//   CHECK-DAG:   %[[B0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[B1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//   CHECK-DAG:   %[[B2:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//   CHECK-DAG:   %[[B0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[B1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//   CHECK-DAG:   %[[B2:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //   CHECK-DAG:   memref.alloc() : memref<64x36xf16, #gpu.address_space<workgroup>>
 //   CHECK-DAG:   memref.alloc() : memref<64x36xf16, #gpu.address_space<workgroup>>
 //       CHECK:   %[[LOOP:.+]] = scf.for %[[IV:.+]] = %c0 to %c80 step %c2 {{.*}} -> (vector<2x2x4x1xf32>)
@@ -129,12 +125,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer, ReadOnly>,
-    #hal.descriptor_set.binding<1, storage_buffer, ReadOnly>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer, ReadOnly>,
+  #hal.pipeline.binding<storage_buffer, ReadOnly>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_gpu.lowering_config<{workgroup = [1, 64, 64, 0], reduction = [0, 0, 0, 2], subgroup = [1, 2, 2], mma_kind = #iree_gpu.mma_layout<MFMA_F32_16x16x16_F16>}>
 hal.executable private @main {
@@ -148,9 +142,9 @@
       func.func @conv_igemm_im2col() attributes {translation_info = #iree_codegen.translation_info<LLVMGPUTileAndFuse workgroup_size = [128, 2, 1] subgroup_size = 64>} {
         %cst = arith.constant 0.000000e+00 : f32
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x34x34x1280xf16>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<3x3x1280x1280xf16>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x16x16x1280xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x34x34x1280xf16>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<3x3x1280x1280xf16>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x16x16x1280xf32>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [2, 34, 34, 1280], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2x34x34x1280xf16>> -> tensor<2x34x34x1280xf16>
         %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [3, 3, 1280, 1280], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x1280x1280xf16>> -> tensor<3x3x1280x1280xf16>
         %5 = tensor.empty() : tensor<2x16x16x1280xf32>
@@ -187,9 +181,9 @@
 }
 
 //   CHECK-LABEL: func @conv_igemm_im2col
-//     CHECK-DAG:   %[[B0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//     CHECK-DAG:   %[[B1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//     CHECK-DAG:   %[[B2:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//     CHECK-DAG:   %[[B0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//     CHECK-DAG:   %[[B1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//     CHECK-DAG:   %[[B2:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //     CHECK-DAG:   memref.alloc() : memref<1x64x36xf16, #gpu.address_space<workgroup>>
 //     CHECK-DAG:   memref.alloc() : memref<32x68xf16, #gpu.address_space<workgroup>>
 //     CHECK-DAG:   %[[C0:.+]] = arith.constant 0 : index
@@ -217,12 +211,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_gpu.lowering_config<{
   workgroup = [64, 64, 0],
@@ -241,9 +233,9 @@
         attributes {translation_info = #iree_codegen.translation_info<LLVMGPUTileAndFuse workgroup_size = [64, 2, 1] subgroup_size = 32>} {
         %cst = arith.constant 0.000000e+00 : f16
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x1280xf16>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<10240x1280xf16>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x10240xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x1280xf16>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<10240x1280xf16>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x10240xf32>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2048, 1280], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2048x1280xf16>> -> tensor<2048x1280xf16>
         %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [10240, 1280], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<10240x1280xf16>> -> tensor<10240x1280xf16>
         %5 = tensor.empty() : tensor<2048x10240xf32>
@@ -259,9 +251,9 @@
 }
 
 // CHECK-LABEL: func @matmul_transpose_b_wmma
-//   CHECK-DAG:   %[[B0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//   CHECK-DAG:   %[[B1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//   CHECK-DAG:   %[[B2:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//   CHECK-DAG:   %[[B0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//   CHECK-DAG:   %[[B1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//   CHECK-DAG:   %[[B2:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //   CHECK-DAG:   memref.alloc() : memref<64x36xf16, #gpu.address_space<workgroup>>
 //   CHECK-DAG:   memref.alloc() : memref<64x36xf16, #gpu.address_space<workgroup>>
 //       CHECK:   %[[LOOP:.+]] = scf.for %[[IV:.+]] = %c0 to %c80 step %c2 {{.*}} -> (vector<2x2x8x1x1xf32>)
@@ -285,12 +277,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_gpu.lowering_config<{
   workgroup = [64, 64, 0],
@@ -313,9 +303,9 @@
         attributes {translation_info = #iree_codegen.translation_info<LLVMGPUTileAndFuse workgroup_size = [128, 2, 1] subgroup_size = 64>} {
         %cst = arith.constant 0.000000e+00 : f16
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x1280x!eltype>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<10240x1280x!eltype>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x10240x!aeltype>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x1280x!eltype>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<10240x1280x!eltype>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x10240x!aeltype>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2048, 1280], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2048x1280x!eltype>> -> tensor<2048x1280x!eltype>
         %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [10240, 1280], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<10240x1280x!eltype>> -> tensor<10240x1280x!eltype>
         %5 = tensor.empty() : tensor<2048x10240x!aeltype>
@@ -339,12 +329,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_gpu.lowering_config<{
   workgroup = [64, 64, 0],
@@ -367,9 +355,9 @@
         attributes {translation_info = #iree_codegen.translation_info<LLVMGPUTileAndFuse workgroup_size = [128, 2, 1] subgroup_size = 64>} {
         %cst = arith.constant 0.000000e+00 : f16
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x1280x!eltype>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<10240x1280x!eltype>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x10240x!aeltype>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x1280x!eltype>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<10240x1280x!eltype>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x10240x!aeltype>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2048, 1280], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2048x1280x!eltype>> -> tensor<2048x1280x!eltype>
         %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [10240, 1280], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<10240x1280x!eltype>> -> tensor<10240x1280x!eltype>
         %5 = tensor.empty() : tensor<2048x10240x!aeltype>
@@ -393,12 +381,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_gpu.lowering_config<{
   workgroup = [64, 64, 0],
@@ -421,9 +407,9 @@
         attributes {translation_info = #iree_codegen.translation_info<LLVMGPUTileAndFuse workgroup_size = [128, 2, 1] subgroup_size = 64>} {
         %cst = arith.constant 0.000000e+00 : f16
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x1280x!eltype>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<10240x1280x!eltype>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x10240x!aeltype>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x1280x!eltype>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<10240x1280x!eltype>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x10240x!aeltype>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2048, 1280], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2048x1280x!eltype>> -> tensor<2048x1280x!eltype>
         %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [10240, 1280], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<10240x1280x!eltype>> -> tensor<10240x1280x!eltype>
         %5 = tensor.empty() : tensor<2048x10240x!aeltype>
@@ -447,12 +433,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_gpu.lowering_config<{
   workgroup = [64, 64, 0],
@@ -475,9 +459,9 @@
         attributes {translation_info = #iree_codegen.translation_info<LLVMGPUTileAndFuse workgroup_size = [64, 2, 1] subgroup_size = 32>} {
         %cst = arith.constant 0.000000e+00 : f16
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x1280x!eltype>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<10240x1280x!eltype>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x10240x!aeltype>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x1280x!eltype>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<10240x1280x!eltype>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x10240x!aeltype>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2048, 1280], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2048x1280x!eltype>> -> tensor<2048x1280x!eltype>
         %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [10240, 1280], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<10240x1280x!eltype>> -> tensor<10240x1280x!eltype>
         %5 = tensor.empty() : tensor<2048x10240x!aeltype>
@@ -509,19 +493,15 @@
 
 #translation_info = #iree_codegen.translation_info<LLVMGPUTileAndFuse workgroup_size = [8, 4, 1] subgroup_size = 32>
 
-#pipeline_layout = #hal.pipeline.layout<
-  push_constants = 0,
-  sets = [
-    <0, bindings = [
-      <0, storage_buffer, "ReadOnly|Indirect">,
-      <1, storage_buffer, ReadOnly>,
-      <2, storage_buffer, Indirect>
-    ], flags = Indirect>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">,
+  #hal.pipeline.binding<storage_buffer, ReadOnly>,
+  #hal.pipeline.binding<storage_buffer, Indirect>
+]>
 
 hal.executable public @main {
   hal.executable.variant public @rocm_hsaco_fb target(<"rocm", "rocm-hsaco-fb">) {
-    hal.executable.export public @conv_nchw_fused ordinal(0) layout(#pipeline_layout) attributes {hal.interface.bindings = [#hal.interface.binding<0, 0>, #hal.interface.binding<0, 1>, #hal.interface.binding<0, 2>]} {
+    hal.executable.export public @conv_nchw_fused ordinal(0) layout(#pipeline_layout) {
     ^bb0(%arg0: !hal.device):
       %x, %y, %z = flow.dispatch.workgroup_count_from_slice
       hal.return %x, %y, %z : index, index, index
@@ -531,9 +511,9 @@
         %cst = arith.constant 0.000000e+00 : f32
         %cst_0 = arith.constant dense<1.0> : tensor<1x64xf32>
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags("ReadOnly|Indirect") : !flow.dispatch.tensor<readonly:tensor<1x64x58x58xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x64x3x3xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) flags(Indirect) : !flow.dispatch.tensor<writeonly:tensor<1x64x56x56xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags("ReadOnly|Indirect") : !flow.dispatch.tensor<readonly:tensor<1x64x58x58xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x64x3x3xf32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) flags(Indirect) : !flow.dispatch.tensor<writeonly:tensor<1x64x56x56xf32>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 64, 58, 58], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x64x58x58xf32>> -> tensor<1x64x58x58xf32>
         %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [64, 64, 3, 3], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<64x64x3x3xf32>> -> tensor<64x64x3x3xf32>
         %5 = tensor.empty() : tensor<1x64x56x56xf32>
@@ -574,19 +554,15 @@
 
 #translation_info = #iree_codegen.translation_info<LLVMGPUTileAndFuse workgroup_size = [8, 4, 1] subgroup_size = 32>
 
-#pipeline_layout = #hal.pipeline.layout<
-  push_constants = 0,
-  sets = [
-    <0, bindings = [
-      <0, storage_buffer, ReadOnly>,
-      <1, storage_buffer, "ReadOnly|Indirect">,
-      <2, storage_buffer, Indirect>
-    ], flags = Indirect>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">,
+  #hal.pipeline.binding<storage_buffer, ReadOnly>,
+  #hal.pipeline.binding<storage_buffer, Indirect>
+]>
 
 hal.executable public @main {
   hal.executable.variant public @rocm_hsaco_fb target(<"rocm", "rocm-hsaco-fb">) {
-    hal.executable.export public @skinny_matmul_config ordinal(0) layout(#pipeline_layout) attributes {hal.interface.bindings = [#hal.interface.binding<0, 0>, #hal.interface.binding<0, 1>, #hal.interface.binding<0, 2>]} {
+    hal.executable.export public @skinny_matmul_config ordinal(0) layout(#pipeline_layout) {
     ^bb0(%arg0: !hal.device):
       %x, %y, %z = flow.dispatch.workgroup_count_from_slice
       hal.return %x, %y, %z : index, index, index
@@ -598,10 +574,10 @@
         %c111444672 = arith.constant 111444672 : index
         %c4014080 = arith.constant 4014080 : index
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c102227904) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128x256xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c4014080) flags("ReadOnly|Indirect") : !flow.dispatch.tensor<readonly:tensor<256x3136xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c111444672) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128xf32>>
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) flags(Indirect) : !flow.dispatch.tensor<writeonly:tensor<128x3136xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c102227904) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128x256xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c4014080) flags("ReadOnly|Indirect") : !flow.dispatch.tensor<readonly:tensor<256x3136xf32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c111444672) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128xf32>>
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) flags(Indirect) : !flow.dispatch.tensor<writeonly:tensor<128x3136xf32>>
         %4 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [128, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<128x256xf32>> -> tensor<128x256xf32>
         %5 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [256, 3136], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x3136xf32>> -> tensor<256x3136xf32>
         %6 = flow.dispatch.tensor.load %2, offsets = [0], sizes = [128], strides = [1] : !flow.dispatch.tensor<readonly:tensor<128xf32>> -> tensor<128xf32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/pipeline_vector_distribute.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/pipeline_vector_distribute.mlir
index d8f9673..1b7c818 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/pipeline_vector_distribute.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/pipeline_vector_distribute.mlir
@@ -12,12 +12,10 @@
 // to be migrated to the rocdl heuristics, but for now is just physically
 // located here.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @matmul_256x256x256_f16_f32 {
 hal.executable.variant @rocm target(<"rocm", "rocm-hsaco-fb">) {
@@ -30,9 +28,9 @@
     func.func @matmul_256x256x256_f16_f32() {
       %cst = arith.constant 0.000000e+00 : f32
       %c0 = arith.constant 0 : index
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xf16>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xf16>>
-      %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<256x256xf32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xf16>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xf16>>
+      %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<256x256xf32>>
       %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [256, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x256xf16>> -> tensor<256x256xf16>
       %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [256, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x256xf16>> -> tensor<256x256xf16>
       %5 = tensor.empty() : tensor<256x256xf32>
@@ -63,12 +61,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @matmul_256x256x256_f16_f16 {
 hal.executable.variant @rocm target(<"rocm", "rocm-hsaco-fb">) {
@@ -81,9 +77,9 @@
     func.func @matmul_256x256x256_f16_f16() {
       %cst = arith.constant 0.000000e+00 : f16
       %c0 = arith.constant 0 : index
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xf16>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xf16>>
-      %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<256x256xf16>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xf16>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xf16>>
+      %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<256x256xf16>>
       %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [256, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x256xf16>> -> tensor<256x256xf16>
       %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [256, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x256xf16>> -> tensor<256x256xf16>
       %5 = tensor.empty() : tensor<256x256xf16>
@@ -112,12 +108,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @expanded_matmul_transpose_b_executable {
 hal.executable.variant @rocm target(<"rocm", "rocm-hsaco-fb">) {
@@ -130,11 +124,11 @@
       func.func @expanded_matmul_transpose_b() {
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f16
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0)
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0)
           : !flow.dispatch.tensor<readonly:tensor<2x64x2048xf16>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0)
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0)
           : !flow.dispatch.tensor<readonly:tensor<10x64x2048xf16>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0)
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0)
           : !flow.dispatch.tensor<writeonly:tensor<2x10x64x64xf16>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [2, 64, 2048], strides = [1, 1, 1]
           : !flow.dispatch.tensor<readonly:tensor<2x64x2048xf16>> -> tensor<2x64x2048xf16>
@@ -184,12 +178,10 @@
 
 // Basic f8, f8 -> f32 matmul.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @matmul_256x256x256_f8_f32 {
 hal.executable.variant @rocm target(<"rocm", "rocm-hsaco-fb">) {
@@ -202,9 +194,9 @@
     func.func @matmul_256x256x256_f8_f32() {
       %cst = arith.constant 0.000000e+00 : f32
       %c0 = arith.constant 0 : index
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xf8E4M3FNUZ>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xf8E4M3FNUZ>>
-      %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<256x256xf32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xf8E4M3FNUZ>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xf8E4M3FNUZ>>
+      %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<256x256xf32>>
       %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [256, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x256xf8E4M3FNUZ>> -> tensor<256x256xf8E4M3FNUZ>
       %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [256, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x256xf8E4M3FNUZ>> -> tensor<256x256xf8E4M3FNUZ>
       %5 = tensor.empty() : tensor<256x256xf32>
@@ -235,12 +227,10 @@
 
 // Basic i8, i8 -> i32 matmul.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @matmul_256x256x256_i8_i32 {
 hal.executable.variant @rocm target(<"rocm", "rocm-hsaco-fb">) {
@@ -253,9 +243,9 @@
     func.func @matmul_256x256x256_i8_i32() {
       %cst = arith.constant 0 : i32
       %c0 = arith.constant 0 : index
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xi8>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xi8>>
-      %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<256x256xi32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xi8>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xi8>>
+      %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<256x256xi32>>
       %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [256, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x256xi8>> -> tensor<256x256xi8>
       %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [256, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x256xi8>> -> tensor<256x256xi8>
       %5 = tensor.empty() : tensor<256x256xi32>
@@ -286,12 +276,10 @@
 
 // Basic i8, i8 -> i32 matmul_transpose_b.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @matmul_transpose_b_256x256x256_i8_i32 {
 hal.executable.variant @rocm target(<"rocm", "rocm-hsaco-fb">) {
@@ -304,9 +292,9 @@
     func.func @matmul_transpose_b_256x256x256_i8_i32() {
       %cst = arith.constant 0 : i32
       %c0 = arith.constant 0 : index
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xi8>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xi8>>
-      %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<256x256xi32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xi8>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xi8>>
+      %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<256x256xi32>>
       %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [256, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x256xi8>> -> tensor<256x256xi8>
       %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [256, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x256xi8>> -> tensor<256x256xi8>
       %5 = tensor.empty() : tensor<256x256xi32>
@@ -335,12 +323,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @conv_nhwc_dispatch_0 {
 hal.executable.variant @rocm target(<"rocm", "rocm-hsaco-fb">) {
@@ -353,9 +339,9 @@
       func.func @conv_nhwc() {
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f32
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x258x514x768xf16>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x3x768x256xf16>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x256x512x256xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x258x514x768xf16>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x3x768x256xf16>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x256x512x256xf32>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [2, 258, 514, 768], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2x258x514x768xf16>> -> tensor<2x258x514x768xf16>
         %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [3, 3, 768, 256], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x768x256xf16>> -> tensor<3x3x768x256xf16>
         %5 = tensor.empty() : tensor<2x256x512x256xf32>
@@ -377,12 +363,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_rocm_hsaco_fb = #hal.executable.target<"rocm", "rocm-hsaco-fb">
 #map = affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d4)>
@@ -403,9 +387,9 @@
         %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : i32
         %2 = arith.index_castui %0 : i32 to index
         %3 = arith.index_castui %1 : i32 to index
-        %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%2) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x1024x1280xf16>>
-        %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<20x64x1280xf16>>
-        %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%3) : !flow.dispatch.tensor<writeonly:tensor<2x1024x20x64xf16>>
+        %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%2) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x1024x1280xf16>>
+        %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<20x64x1280xf16>>
+        %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%3) : !flow.dispatch.tensor<writeonly:tensor<2x1024x20x64xf16>>
         %7 = flow.dispatch.tensor.load %4, offsets = [0, 0, 0], sizes = [2, 1024, 1280], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2x1024x1280xf16>> -> tensor<2x1024x1280xf16>
         %8 = flow.dispatch.tensor.load %5, offsets = [0, 0, 0], sizes = [20, 64, 1280], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<20x64x1280xf16>> -> tensor<20x64x1280xf16>
         %9 = tensor.empty() : tensor<2x1024x20x64xf16>
@@ -446,12 +430,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @matmul_256x256x256_f16_f32 {
 hal.executable.variant @rocm target(<"rocm", "rocm-hsaco-fb">) {
@@ -464,9 +446,9 @@
     func.func @matmul_256x256x256_f16_f32() {
       %cst = arith.constant 0.000000e+00 : f32
       %c0 = arith.constant 0 : index
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xf16>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xf16>>
-      %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<256x256xf32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xf16>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xf16>>
+      %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<256x256xf32>>
       %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [256, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x256xf16>> -> tensor<256x256xf16>
       %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [256, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x256xf16>> -> tensor<256x256xf16>
       %5 = tensor.empty() : tensor<256x256xf32>
@@ -498,12 +480,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @matmul_256x256x256_f16_f16 {
 hal.executable.variant @rocm target(<"rocm", "rocm-hsaco-fb">) {
@@ -516,9 +496,9 @@
     func.func @matmul_256x256x256_f16_f16() {
       %cst = arith.constant 0.000000e+00 : f16
       %c0 = arith.constant 0 : index
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xf16>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xf16>>
-      %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<256x256xf16>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xf16>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x256xf16>>
+      %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<256x256xf16>>
       %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [256, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x256xf16>> -> tensor<256x256xf16>
       %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [256, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x256xf16>> -> tensor<256x256xf16>
       %5 = tensor.empty() : tensor<256x256xf16>
@@ -550,12 +530,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @unaligned_mk_batch_matmul_64x978x1281x1281_f16_f16 {
 hal.executable.variant @rocm target(<"rocm", "rocm-hsaco-fb">) {
@@ -568,9 +546,9 @@
     func.func @unaligned_nk_batch_matmul() {
       %cst = arith.constant 0.000000e+00 : f16
       %c0 = arith.constant 0 : index
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x968x1281xf16>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x1281x1281xf16>>
-      %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64x968x1281xf16>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x968x1281xf16>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x1281x1281xf16>>
+      %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64x968x1281xf16>>
       %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [64, 968, 1281], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<64x968x1281xf16>> -> tensor<64x968x1281xf16>
       %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [64, 1281, 1281], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<64x1281x1281xf16>> -> tensor<64x1281x1281xf16>
       %5 = tensor.empty() : tensor<64x968x1281xf16>
@@ -595,9 +573,9 @@
 // CHECK-DAG:     %[[RHS_SHARED_SUB:.+]] =  memref.subview %[[RHS_SHARED]][0, 0, 0] [1, 16, 16] [1, 1, 1]
 // CHECK-DAG:     %[[LHS_SHARED:.+]] = memref.alloc() : memref<1x16x20xf16, #gpu.address_space<workgroup>>
 // CHECK-DAG:     %[[LHS_SHARED_SUB:.+]] =  memref.subview %[[LHS_SHARED]][0, 0, 0] [1, 16, 16] [1, 1, 1]
-// CHECK-DAG:     %[[LHS_GLOBAL:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<64x968x1281xf16, #hal.descriptor_type<storage_buffer>>
-// CHECK-DAG:     %[[RHS_GLOBAL:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : memref<64x1281x1281xf16, #hal.descriptor_type<storage_buffer>>
-// CHECK-DAG:     %[[OUT_GLOBAL:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) alignment(64) offset(%c0) : memref<64x968x1281xf16, #hal.descriptor_type<storage_buffer>>
+// CHECK-DAG:     %[[LHS_GLOBAL:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<64x968x1281xf16, #hal.descriptor_type<storage_buffer>>
+// CHECK-DAG:     %[[RHS_GLOBAL:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : memref<64x1281x1281xf16, #hal.descriptor_type<storage_buffer>>
+// CHECK-DAG:     %[[OUT_GLOBAL:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2) alignment(64) offset(%c0) : memref<64x968x1281xf16, #hal.descriptor_type<storage_buffer>>
 // CHECK-DAG:     %[[LHS_GLOBAL_SUB:.+]] = memref.subview %[[LHS_GLOBAL]]
 // CHECK-DAG:     %[[RHS_GLOBAL_SUB:.+]] = memref.subview %[[RHS_GLOBAL]]
 // CHECK:         %[[LHS_LOAD:.+]] = vector.transfer_read %[[LHS_GLOBAL_SUB]]{{.+}} {in_bounds = [true, false, false]}
@@ -634,11 +612,9 @@
 // NOTE: This test is not exhaustive of all possible ways the above condition is breaking,
 //       but rather is an example of a matmul shape from a model that broke our compilation heuristic.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable public @contract_schedule_considering_read_layout {
   hal.executable.variant public @rocm_hsaco_fb target(<"rocm", "rocm-hsaco-fb">) {
@@ -656,9 +632,9 @@
         %3 = arith.index_castui %0 : i32 to index
         %4 = arith.index_castui %1 : i32 to index
         %5 = arith.index_castui %2 : i32 to index
-        %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%3) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x160x1536xf16>>
-        %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%4) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x1536x1536xf16>>
-        %8 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%5) : !flow.dispatch.tensor<writeonly:tensor<2x160x1536xf16>>
+        %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%3) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x160x1536xf16>>
+        %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%4) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x1536x1536xf16>>
+        %8 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%5) : !flow.dispatch.tensor<writeonly:tensor<2x160x1536xf16>>
         %9 = flow.dispatch.tensor.load %6, offsets = [0, 0, 0], sizes = [2, 160, 1536], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2x160x1536xf16>> -> tensor<2x160x1536xf16>
         %10 = flow.dispatch.tensor.load %7, offsets = [0, 0, 0], sizes = [2, 1536, 1536], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2x1536x1536xf16>> -> tensor<2x1536x1536xf16>
         %11 = tensor.empty() : tensor<2x160x1536xf16>
@@ -690,13 +666,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @attention_20x4096x64x4096x64 {
   hal.executable.variant public @rocm_hsaco_fb target(<"rocm", "rocm-hsaco-fb">) {
@@ -709,10 +683,10 @@
       func.func @attention_20x4096x64x4096x64() {
         %cst = arith.constant 1.250000e-01 : f16
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<20x4096x64xf16>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<20x4096x64xf16>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<20x4096x64xf16>>
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<20x4096x64xf16>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<20x4096x64xf16>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<20x4096x64xf16>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<20x4096x64xf16>>
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<20x4096x64xf16>>
         %4 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [20, 4096, 64], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<20x4096x64xf16>> -> tensor<20x4096x64xf16>
         %5 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [20, 4096, 64], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<20x4096x64xf16>> -> tensor<20x4096x64xf16>
         %6 = flow.dispatch.tensor.load %2, offsets = [0, 0, 0], sizes = [20, 4096, 64], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<20x4096x64xf16>> -> tensor<20x4096x64xf16>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/pipeline_warp_reduction.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/pipeline_warp_reduction.mlir
index 846b30e..4128e4f 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/pipeline_warp_reduction.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/pipeline_warp_reduction.mlir
@@ -1,10 +1,8 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=gfx940 --pass-pipeline="builtin.module(hal.executable(hal.executable.variant(builtin.module(iree-codegen-rocdl-configuration-pipeline), iree-codegen-linalg-to-rocdl-pipeline2)))" %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @warp_reduction {
   hal.executable.variant @rocm target(<"rocm", "rocm-hsaco-fb">) {
@@ -17,8 +15,8 @@
       func.func @warp_reduction() {
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f32
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x512xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x512xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2xf32>>
         %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2, 512], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2x512xf32>> -> tensor<2x512xf32>
         %3 = tensor.empty() : tensor<2xf32>
         %4 = linalg.fill ins(%cst : f32) outs(%3 : tensor<2xf32>) -> tensor<2xf32>
@@ -42,12 +40,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable public @main_dispatch_517 {
   hal.executable.variant public @rocm target(<"rocm", "rocm-hsaco-fb">) {
@@ -62,9 +58,9 @@
         %c128 = arith.constant 128 : index
         %c0 = arith.constant 0 : index
         %c394240 = arith.constant 394240 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c128) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x1280xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1280x1280xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c394240) : !flow.dispatch.tensor<writeonly:tensor<1x1280xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c128) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x1280xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1280x1280xf32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c394240) : !flow.dispatch.tensor<writeonly:tensor<1x1280xf32>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1, 1280], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1x1280xf32>> -> tensor<1x1280xf32>
         %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [1280, 1280], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1280x1280xf32>> -> tensor<1280x1280xf32>
         %5 = tensor.empty() : tensor<1x1280xf32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/attention.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/attention.mlir
index e029174..649cbd9 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/attention.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/attention.mlir
@@ -2,21 +2,19 @@
 // RUN:   --iree-gpu-test-target=sm_60 | \
 // RUN: FileCheck --check-prefix=CHECK %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @_attention_dispatch_0() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 1.250000e-01 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<192x1024x64xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<192x1024x64xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<192x1024x64xf16>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<192x1024x64xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<192x1024x64xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<192x1024x64xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<192x1024x64xf16>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<192x1024x64xf16>>
   %4 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [192, 1024, 64], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<192x1024x64xf16>> -> tensor<192x1024x64xf16>
   %5 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [192, 1024, 64], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<192x1024x64xf16>> -> tensor<192x1024x64xf16>
   %6 = flow.dispatch.tensor.load %2, offsets = [0, 0, 0], sizes = [192, 1024, 64], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<192x1024x64xf16>> -> tensor<192x1024x64xf16>
@@ -52,16 +50,16 @@
 // CHECK-DAG:    %[[C1024:.+]] = arith.constant 1024 : index
 // CHECK-DAG:    %[[CST_5:.+]] = arith.constant 0.000000e+00 : f32
 // CHECK-dAG:    %[[CST_6:.+]] = arith.constant dense<1.802980e-01> : vector<128x64xf16>
-// CHECK:        %[[D0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64)
+// CHECK:        %[[D0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64)
 // CHECK-SAME:     offset(%[[C0]]) flags(ReadOnly) : memref<192x1024x64xf16, #hal.descriptor_type<storage_buffer>>
 // CHECK:        memref.assume_alignment %[[D0]], 64 : memref<192x1024x64xf16, #hal.descriptor_type<storage_buffer>>
-// CHECK:        %[[D1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) alignment(64)
+// CHECK:        %[[D1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) alignment(64)
 // CHECK-SAME:     offset(%[[C0]]) flags(ReadOnly) : memref<192x1024x64xf16, #hal.descriptor_type<storage_buffer>>
 // CHECK:        memref.assume_alignment %[[D1]], 64 : memref<192x1024x64xf16, #hal.descriptor_type<storage_buffer>>
-// CHECK:        %[[D2:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) alignment(64)
+// CHECK:        %[[D2:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2) alignment(64)
 // CHECK-SAME:     offset(%[[C0]]) flags(ReadOnly) : memref<192x1024x64xf16, #hal.descriptor_type<storage_buffer>>
 // CHECK:        memref.assume_alignment %[[D2]], 64 : memref<192x1024x64xf16, #hal.descriptor_type<storage_buffer>>
-// CHECK:        %[[D3:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(3) alignment(64)
+// CHECK:        %[[D3:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(3) alignment(64)
 // CHECK-SAME:     offset(%[[C0]]) : memref<192x1024x64xf16, #hal.descriptor_type<storage_buffer>>
 // CHECK:        memref.assume_alignment %[[D3]], 64 : memref<192x1024x64xf16, #hal.descriptor_type<storage_buffer>>
 // CHECK:        %[[WORKGROUP_ID_X:.+]] = hal.interface.workgroup.id[0] : index
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/attention_mfma.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/attention_mfma.mlir
index 7e69786..13a84ec 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/attention_mfma.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/attention_mfma.mlir
@@ -2,21 +2,19 @@
 // RUN:   --iree-gpu-test-target=gfx908 | \
 // RUN: FileCheck --check-prefix=CHECK %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @attention_dispatch_0_attention_16x16384x128xf16() {
   %c0 = arith.constant 0 : index
   %scale = arith.constant 0.08838834764 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<16x16384x128xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<16x16384x128xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<16x16384x128xf16>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<16x16384x128xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<16x16384x128xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<16x16384x128xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<16x16384x128xf16>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<16x16384x128xf16>>
   %4 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [16, 16384, 128], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<16x16384x128xf16>> -> tensor<16x16384x128xf16>
   %5 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [16, 16384, 128], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<16x16384x128xf16>> -> tensor<16x16384x128xf16>
   %6 = flow.dispatch.tensor.load %2, offsets = [0, 0, 0], sizes = [16, 16384, 128], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<16x16384x128xf16>> -> tensor<16x16384x128xf16>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/config_matvec.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/config_matvec.mlir
index 2782383..1a60072 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/config_matvec.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/config_matvec.mlir
@@ -1,12 +1,10 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=gfx940 --pass-pipeline='builtin.module(iree-llvmgpu-select-lowering-strategy)' %s | FileCheck %s
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=gfx1100 --pass-pipeline='builtin.module(iree-llvmgpu-select-lowering-strategy)' %s | FileCheck %s --check-prefix=CDNA3
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 5, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 5, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @dynamic_batch_matvec() {
   %c32_i64 = arith.constant 32 : i64
@@ -21,11 +19,11 @@
   %7 = arith.index_castui %2 : i32 to index
   %8 = arith.index_castui %3 : i32 to index
   %9 = arith.index_castui %4 : i32 to index
-  %10 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%7) : !flow.dispatch.tensor<writeonly:tensor<32x1x128xf16>>
+  %10 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%7) : !flow.dispatch.tensor<writeonly:tensor<32x1x128xf16>>
   %11 = flow.dispatch.workload.ordinal %8, 0 : index
   %12 = flow.dispatch.workload.ordinal %9, 1 : index
-  %13 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%5) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x1x?xf16>>{%11}
-  %14 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%6) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x?x128xf16>>{%12}
+  %13 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%5) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x1x?xf16>>{%11}
+  %14 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%6) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x?x128xf16>>{%12}
   %15 = flow.dispatch.tensor.load %13, offsets = [0, 0, 0], sizes = [32, 1, %11], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<32x1x?xf16>>{%11} -> tensor<32x1x?xf16>
   %16 = flow.dispatch.tensor.load %14, offsets = [0, 0, 0], sizes = [32, %12, 128], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<32x?x128xf16>>{%12} -> tensor<32x?x128xf16>
   %17 = tensor.empty() : tensor<32x1x128xf16>
@@ -46,12 +44,10 @@
 
 // This test uses special heuristics that needs to check the backend in the #hal.executable.target.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_rocm_hsaco_fb = #hal.executable.target<"rocm", "rocm-hsaco-fb">
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
@@ -60,9 +56,9 @@
 func.func @vmt1() attributes {hal.executable.target = #executable_target_rocm_hsaco_fb} {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x4096xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32000x4096xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x32000xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x4096xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32000x4096xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x32000xf16>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1, 4096], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1x4096xf16>> -> tensor<1x4096xf16>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [32000, 4096], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<32000x4096xf16>> -> tensor<32000x4096xf16>
   %5 = tensor.empty() : tensor<1x32000xf16>
@@ -88,12 +84,10 @@
 
 // This test uses special heuristics that needs to check the backend in the #hal.executable.target.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_rocm_hsaco_fb = #hal.executable.target<"rocm", "rocm-hsaco-fb">
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
@@ -102,9 +96,9 @@
 func.func @vmt2() attributes {hal.executable.target = #executable_target_rocm_hsaco_fb} {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x4096xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32000x4096xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x32000xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x4096xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32000x4096xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x32000xf16>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1, 4096], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1x4096xf16>> -> tensor<1x4096xf16>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [32000, 4096], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<32000x4096xf16>> -> tensor<32000x4096xf16>
   %5 = tensor.empty() : tensor<1x32000xf16>
@@ -128,14 +122,12 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>,
-    #hal.descriptor_set.binding<4, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d0, d1)>
@@ -144,11 +136,11 @@
 func.func @i4_dequant_matvec() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x32x128xi4>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x32xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x32xf16>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x128xf16>>
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(4) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4096xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x32x128xi4>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x32xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x32xf16>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x128xf16>>
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(4) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4096xf16>>
   %5 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [4096, 32, 128], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x32x128xi4>> -> tensor<4096x32x128xi4>
   %6 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [4096, 32], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x32xf16>> -> tensor<4096x32xf16>
   %7 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [4096, 32], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x32xf16>> -> tensor<4096x32xf16>
@@ -187,19 +179,17 @@
 
 // Send 2xNxK mmt to the warp reduction pipeline.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @skinny_mmt() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x4096xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32000x4096xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x32000xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x4096xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32000x4096xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x32000xf16>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2, 4096], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2x4096xf16>> -> tensor<2x4096xf16>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [32000, 4096], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<32000x4096xf16>> -> tensor<32000x4096xf16>
   %5 = tensor.empty() : tensor<2x32000xf16>
@@ -220,19 +210,17 @@
 
 // Send Mx2xK mmt to the warp reduction pipeline.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @skinny_mmt() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x4096xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32000x4096xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32000x2xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x4096xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32000x4096xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32000x2xf16>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2, 4096], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2x4096xf16>> -> tensor<2x4096xf16>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [32000, 4096], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<32000x4096xf16>> -> tensor<32000x4096xf16>
   %5 = tensor.empty() : tensor<32000x2xf16>
@@ -251,12 +239,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d1, d2)>
@@ -264,9 +250,9 @@
 func.func @not_vmt() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<5x4096xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32000x4096xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<5x32000xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<5x4096xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32000x4096xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<5x32000xf16>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [5, 4096], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<5x4096xf16>> -> tensor<5x4096xf16>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [32000, 4096], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<32000x4096xf16>> -> tensor<32000x4096xf16>
   %5 = tensor.empty() : tensor<5x32000xf16>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/config_winograd.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/config_winograd.mlir
index 8db080c..e5d6b2a 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/config_winograd.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/config_winograd.mlir
@@ -1,15 +1,13 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=gfx1100 --pass-pipeline='builtin.module(iree-llvmgpu-select-lowering-strategy)' %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @winograd_filter_transform() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<3x3x64x128xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8x8x64x128xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<3x3x64x128xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8x8x64x128xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [3, 3, 64, 128], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x64x128xf32>> -> tensor<3x3x64x128xf32>
   %3 = tensor.empty() : tensor<8x8x64x128xf32>
   %4 = iree_linalg_ext.winograd.filter_transform output_tile_size(6) kernel_size(3) kernel_dimensions([0, 1]) ins(%2 : tensor<3x3x64x128xf32>) outs(%3 : tensor<8x8x64x128xf32>) -> tensor<8x8x64x128xf32>
@@ -26,16 +24,14 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @winograd_input_transform() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x34x34x128xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8x8x2x6x6x128xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x34x34x128xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8x8x2x6x6x128xf16>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [2, 34, 34, 128], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2x34x34x128xf16>> -> tensor<2x34x34x128xf16>
   %3 = tensor.empty() : tensor<8x8x2x6x6x128xf16>
   %4 = iree_linalg_ext.winograd.input_transform output_tile_size(6) kernel_size(3) image_dimensions([1, 2]) ins(%2 : tensor<2x34x34x128xf16>) outs(%3 : tensor<8x8x2x6x6x128xf16>) -> tensor<8x8x2x6x6x128xf16>
@@ -52,16 +48,14 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @winograd_output_transform() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<8x8x2x6x6x128xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x36x36x128xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<8x8x2x6x6x128xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x36x36x128xf16>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0, 0, 0], sizes = [8, 8, 2, 6, 6, 128], strides = [1, 1, 1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<8x8x2x6x6x128xf16>> -> tensor<8x8x2x6x6x128xf16>
   %3 = tensor.empty() : tensor<2x36x36x128xf16>
   %4 = iree_linalg_ext.winograd.output_transform output_tile_size(6) kernel_size(3) image_dimensions([1, 2]) ins(%2 : tensor<8x8x2x6x6x128xf16>) outs(%3 : tensor<2x36x36x128xf16>) -> tensor<2x36x36x128xf16>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/conv_pipeline_test_cuda.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/conv_pipeline_test_cuda.mlir
index de7bbb4..d129117 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/conv_pipeline_test_cuda.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/conv_pipeline_test_cuda.mlir
@@ -3,12 +3,10 @@
 // RUN:   %s | FileCheck %s
 
 #executable_target_cuda_nvptx_fb = #hal.executable.target<"cuda", "cuda-nvptx-fb">
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @conv2d_1x230x230x3_7x7x3x64_dispatch_0 {
   hal.executable.variant public @cuda_nvptx_fb target(#executable_target_cuda_nvptx_fb) {
@@ -21,9 +19,9 @@
       func.func @conv2d_1x230x230x3_7x7x3x64() {
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f32
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x230x230x3xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<7x7x3x64xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x64xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x230x230x3xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<7x7x3x64xf32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x64xf32>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 230, 230, 3], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x230x230x3xf32>> -> tensor<1x230x230x3xf32>
         %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [7, 7, 3, 64], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<7x7x3x64xf32>> -> tensor<7x7x3x64xf32>
         %5 = tensor.empty() : tensor<1x112x112x64xf32>
@@ -50,12 +48,10 @@
 // -----
 
 #executable_target_cuda_nvptx_fb = #hal.executable.target<"cuda", "cuda-nvptx-fb">
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @conv_nchw_dispatch_0 {
   hal.executable.variant public @cuda_nvptx_fb target(#executable_target_cuda_nvptx_fb) {
@@ -68,9 +64,9 @@
       func.func @conv_nchw() {
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f32
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x4x66x66xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<320x4x3x3xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x320x64x64xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x4x66x66xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<320x4x3x3xf32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x320x64x64xf32>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 230, 230, 3], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2x4x66x66xf32>> -> tensor<2x4x66x66xf32>
         %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [7, 7, 3, 64], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<320x4x3x3xf32>> -> tensor<320x4x3x3xf32>
         %5 = tensor.empty() : tensor<2x320x64x64xf32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/conv_pipeline_test_rocm.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/conv_pipeline_test_rocm.mlir
index 6035b4b..ec67064 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/conv_pipeline_test_rocm.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/conv_pipeline_test_rocm.mlir
@@ -2,13 +2,11 @@
 // RUN:   --pass-pipeline='builtin.module(hal.executable(hal.executable.variant(builtin.module(iree-llvmgpu-select-lowering-strategy, func.func(iree-llvmgpu-lower-executable-target,canonicalize)))))' \
 // RUN:   %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @conv_nchw_dispatch_1 {
   hal.executable.variant public @rocm_hsaco_fb target(<"rocm", "rocm-hsaco-fb">) {
@@ -23,10 +21,10 @@
       func.func @conv_2d_nchw_fchw_2x320x64x64x320x3x3_f16() {
         %cst = arith.constant 0.000000e+00 : f16
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x320x130x130xf16>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<320x320x3x3xf16>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<320xf16>>
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x320x64x64xf16>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x320x130x130xf16>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<320x320x3x3xf16>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<320xf16>>
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x320x64x64xf16>>
         %4 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [2, 320, 130, 130], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2x320x130x130xf16>> -> tensor<2x320x130x130xf16>
         %5 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [320, 320, 3, 3], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<320x320x3x3xf16>> -> tensor<320x320x3x3xf16>
         %6 = flow.dispatch.tensor.load %2, offsets = [0], sizes = [320], strides = [1] : !flow.dispatch.tensor<readonly:tensor<320xf16>> -> tensor<320xf16>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/convert_to_nvvm.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/convert_to_nvvm.mlir
index ea5f33f..f20c2c3 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/convert_to_nvvm.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/convert_to_nvvm.mlir
@@ -1,14 +1,10 @@
 // RUN: iree-opt --pass-pipeline="builtin.module(hal.executable(hal.executable.variant(builtin.module(iree-convert-to-nvvm))))" --iree-gpu-test-target=sm_60 --split-input-file %s | FileCheck %s
 
 // Test that that standard and GPU ops are converted to LLVM and NVVM.
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<4, storage_buffer>
-  ]>,
-  #hal.descriptor_set.layout<1, bindings = [
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @abs_ex_dispatch_0 {
   hal.executable.variant @cuda target(<"cuda", "cuda-nvptx-fb">) {
@@ -17,9 +13,9 @@
       func.func @abs_ex_dispatch_0() {
         %c0 = arith.constant 0 : index
         %c128 = arith.constant 128 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(4) offset(%c128) flags(ReadOnly) : memref<16xf32, strided<[1], offset: 32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<16xi32>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(1) binding(2) : memref<16xf32>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) offset(%c128) flags(ReadOnly) : memref<16xf32, strided<[1], offset: 32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<16xi32>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<16xf32>
         %3 = gpu.block_id x
         %4 = gpu.block_dim x
         %5 = gpu.thread_id x
@@ -44,14 +40,10 @@
 //  CHECK: llvm.store %[[FADD]], %[[ADDR]] : f32, !llvm.ptr
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 4, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<4, storage_buffer>
-  ]>,
-  #hal.descriptor_set.layout<1, bindings = [
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 4, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @abs_dynamic {
   hal.executable.variant @cuda target(<"cuda", "cuda-nvptx-fb">) {
@@ -66,9 +58,9 @@
         %d0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
         %d1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
         %d2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(3) : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(4) offset(%o) : memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>{%d0, %d1, %d2}
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<?x?x?xi32>{%d0, %d1, %d2}
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(1) binding(2) : memref<?x?x?xf32>{%d0, %d1, %d2}
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) offset(%o) : memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>{%d0, %d1, %d2}
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<?x?x?xi32>{%d0, %d1, %d2}
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<?x?x?xf32>{%d0, %d1, %d2}
         %9 = memref.load %0[%c3, %c5, %c7] : memref<?x?x?xf32, strided<[?, ?, 1], offset: ?>>
         %10 = memref.load %1[%c3, %c5, %c7] : memref<?x?x?xi32>
         %11 = arith.sitofp %10 : i32 to f32
@@ -106,13 +98,9 @@
 
 // Test that we handle correctly the case where bindings are sparse (set 0
 // binding 0 is not used).
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>,
-  #hal.descriptor_set.layout<1, bindings = [
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @dead_symbol {
   hal.executable.variant @cuda target(<"cuda", "cuda-nvptx-fb">) {
@@ -121,8 +109,8 @@
       func.func @dead_symbol() {
         %c0 = arith.constant 0 : index
         %c128 = arith.constant 128 : index
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<16xi32>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(1) binding(2) : memref<16xf32>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<16xi32>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<16xf32>
         %3 = gpu.block_id x
         %4 = gpu.block_dim x
         %5 = gpu.thread_id x
@@ -146,11 +134,9 @@
 
 // A single binding may contain different data types.
 // Test that we cast pointers correctly.
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @mixed_type {
   hal.executable.variant @cuda target(<"cuda", "cuda-nvptx-fb">) {
@@ -159,9 +145,9 @@
       func.func @mixed_type() {
         %c0 = arith.constant 0 : index
         %c128 = arith.constant 128 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%c128) : memref<16xf32, strided<[1], offset: 4>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%c0) : memref<16xi32>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<16xf32>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%c128) : memref<16xf32, strided<[1], offset: 4>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%c0) : memref<16xi32>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<16xf32>
         %3 = gpu.block_id x
         %4 = gpu.block_dim x
         %5 = gpu.thread_id x
@@ -187,10 +173,8 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @shared_memory_lowering {
   hal.executable.variant @cuda target(<"cuda", "cuda-nvptx-fb">) {
@@ -227,10 +211,8 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @shared_memory_dealloc_elision {
   hal.executable.variant @cuda target(<"cuda", "cuda-nvptx-fb">) {
@@ -253,10 +235,8 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @shared_memory_lowering_aligned_alloc {
   hal.executable.variant @cuda target(<"cuda", "cuda-nvptx-fb">) {
@@ -288,15 +268,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>,
-  #hal.descriptor_set.layout<1, bindings = [
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @check_not_readonly {
   hal.executable.variant @cuda target(<"cuda", "cuda-nvptx-fb">) {
@@ -305,13 +281,13 @@
       func.func @check_not_readonly() {
         %c0 = arith.constant 0 : index
         %c128 = arith.constant 128 : index
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<16xi32>
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%c128) flags(ReadOnly) : memref<16xf32, strided<[1], offset: 32>>
-        %b11 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) flags(ReadOnly) : memref<16xi32>
-        %b12 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) offset(%c128) : memref<16xf32, strided<[1], offset: 32>>
-        %b21 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) flags(ReadOnly) : memref<16xi32>
-        %b22 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) offset(%c128) flags(ReadOnly) : memref<16xf32, strided<[1], offset: 32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(1) binding(3) : memref<16xf32>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<16xi32>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%c128) flags(ReadOnly) : memref<16xf32, strided<[1], offset: 32>>
+        %b11 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) flags(ReadOnly) : memref<16xi32>
+        %b12 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) offset(%c128) : memref<16xf32, strided<[1], offset: 32>>
+        %b21 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) flags(ReadOnly) : memref<16xi32>
+        %b22 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) offset(%c128) flags(ReadOnly) : memref<16xf32, strided<[1], offset: 32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : memref<16xf32>
         %3 = gpu.block_id x
         %4 = gpu.block_dim x
         %5 = gpu.thread_id x
@@ -332,13 +308,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>,
-  #hal.descriptor_set.layout<1, bindings = [
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @complex {
   hal.executable.variant @cuda target(<"cuda", "cuda-nvptx-fb">) {
@@ -347,8 +319,8 @@
       func.func @complex() {
         %c0 = arith.constant 0 : index
         %c128 = arith.constant 128 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%c128) flags(ReadOnly) : memref<16xcomplex<f32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(1) binding(2) : memref<16xf32>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%c128) flags(ReadOnly) : memref<16xcomplex<f32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<16xf32>
         %3 = gpu.block_id x
         %4 = gpu.block_dim x
         %5 = gpu.thread_id x
@@ -371,10 +343,8 @@
 // -----
 
 // Check that we don't choke on memref of index.
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @shared_memory_lowering_index {
   hal.executable.variant @cuda target(<"cuda", "cuda-nvptx-fb">) {
@@ -398,11 +368,9 @@
 //  CHECK-NEXT: %{{.*}} = llvm.getelementptr %{{.*}} : (!llvm.ptr<3>, i64, i64) -> !llvm.ptr<3>
 
 // -----
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @masked_load_store {
   hal.executable.variant @cuda target(<"cuda", "cuda-nvptx-fb">) {
@@ -412,8 +380,8 @@
         %c0 = arith.constant 0 : index
         %idx = gpu.thread_id x
         %pass_thru = arith.constant dense<0.000000e+00> : vector<1xf32>
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<64xf32, #gpu.address_space<global>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<64xf32, #gpu.address_space<global>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<64xf32, #gpu.address_space<global>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<64xf32, #gpu.address_space<global>>
         %mask = vector.create_mask %idx : vector<1xi1>
         %ld = vector.maskedload %0[%idx], %mask, %pass_thru : memref<64xf32, #gpu.address_space<global>>, vector<1xi1>, vector<1xf32> into vector<1xf32>
         vector.maskedstore %1[%idx], %mask, %ld : memref<64xf32, #gpu.address_space<global>>, vector<1xi1>, vector<1xf32>
@@ -429,12 +397,10 @@
 
 // -----
 // Test workgroup size lowering
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @interface_wg_size {
   hal.executable.variant @rocm target(<"cuda", "cuda-nvptx-fb">) {
@@ -446,7 +412,7 @@
         %c0 = arith.constant 0.0 : f32
         %workgroup_size_x = hal.interface.workgroup.size[0] : index
         %workgroup_size_y = hal.interface.workgroup.size[1] : index
-        %subspan = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<64x64xf32>
+        %subspan = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<64x64xf32>
         memref.store %c0, %subspan[%workgroup_size_x, %workgroup_size_y] : memref<64x64xf32>
         return
       }
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/convert_to_rocdl.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/convert_to_rocdl.mlir
index a88ec61..3e158e1 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/convert_to_rocdl.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/convert_to_rocdl.mlir
@@ -2,15 +2,11 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=gfx908 --pass-pipeline="builtin.module(hal.executable(hal.executable.variant(builtin.module(iree-convert-to-rocdl))))" --iree-hip-index-bits=32 %s | FileCheck %s --check-prefix=INDEX32
 
 // Test that that standard and GPU ops are converted to LLVM and NVVM.
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>,
-  #hal.descriptor_set.layout<1, bindings = [
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @abs_ex_dispatch_0 {
   hal.executable.variant @rocm target(<"rocm", "rocm-hsaco-fb">) {
@@ -18,9 +14,9 @@
     builtin.module {
       func.func @abs_ex_dispatch_0() {
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) flags(ReadOnly) : memref<16xf32>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<16xf32>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<16xf32>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) flags(ReadOnly) : memref<16xf32>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<16xf32>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<16xf32>
         %3 = gpu.block_id x
         %4 = gpu.block_dim x
         %5 = gpu.thread_id x
@@ -48,14 +44,10 @@
 
 // -----
 // Test that maximum and minum are converted to max and min on rocm
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>,
-  #hal.descriptor_set.layout<1, bindings = [
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @abs_ex_dispatch_0 {
   hal.executable.variant @rocm target(<"rocm", "rocm-hsaco-fb">) {
@@ -63,9 +55,9 @@
     builtin.module {
       func.func @reduction_maximum() {
       %c0 = arith.constant 0 : index
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) :
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) :
             memref<32x64x64xf32, strided<[4096, 64, 1], offset: ?>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<32x64x64xf32,
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<32x64x64xf32,
             strided<[4096, 64, 1], offset: ?>>
       %2 = vector.load %0[%c0, %c0, %c0] : memref<32x64x64xf32, strided<[4096, 64, 1], offset: ?>>, vector<2xf32>
       %3 = vector.reduction <maximumf>, %2 : vector<2xf32> into f32
@@ -81,10 +73,8 @@
 
 // -----
 // Test that gpu barriers be lowered to `s_waitcnt lgkmcnt(0)\0As_barrier` on rocm
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @simple_barrier {
   hal.executable.variant @rocm target(<"rocm", "rocm-hsaco-fb">) {
@@ -101,11 +91,9 @@
 // CHECK: llvm.inline_asm has_side_effects asm_dialect = att ";;;WARNING: BREAKS DEBUG WATCHES\0As_waitcnt lgkmcnt(0)\0As_barrier", ""  : () -> ()
 
 // -----
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @masked_load_store {
   hal.executable.variant @rocm target(<"rocm", "rocm-hsaco-fb">) {
@@ -115,8 +103,8 @@
         %c0 = arith.constant 0 : index
         %idx = gpu.thread_id x
         %pass_thru = arith.constant dense<0.000000e+00> : vector<1xf32>
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<64xf32, #gpu.address_space<global>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<64xf32, #gpu.address_space<global>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<64xf32, #gpu.address_space<global>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<64xf32, #gpu.address_space<global>>
         %mask = vector.create_mask %idx : vector<1xi1>
         %ld = vector.maskedload %0[%idx], %mask, %pass_thru : memref<64xf32, #gpu.address_space<global>>, vector<1xi1>, vector<1xf32> into vector<1xf32>
         vector.maskedstore %1[%idx], %mask, %ld : memref<64xf32, #gpu.address_space<global>>, vector<1xi1>, vector<1xf32>
@@ -132,12 +120,10 @@
 
 // -----
 // Test workgroup size lowering
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @interface_wg_size {
   hal.executable.variant @rocm target(<"rocm", "rocm-hsaco-fb">) {
@@ -149,7 +135,7 @@
         %c0 = arith.constant 0.0 : f32
         %workgroup_size_x = hal.interface.workgroup.size[0] : index
         %workgroup_size_y = hal.interface.workgroup.size[1] : index
-        %subspan = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<64x64xf32>
+        %subspan = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<64x64xf32>
         memref.store %c0, %subspan[%workgroup_size_x, %workgroup_size_y] : memref<64x64xf32>
         return
       }
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/distribute_to_thread.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/distribute_to_thread.mlir
index c419526..14259a9 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/distribute_to_thread.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/distribute_to_thread.mlir
@@ -1,11 +1,9 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=sm_60 --pass-pipeline="builtin.module(func.func(iree-llvmgpu-tile-and-distribute))" %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[2, 256, 4]]>
 #map = affine_map<()[s0] -> (s0 * 2)>
@@ -16,9 +14,9 @@
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
   %c1024 = arith.constant 1024 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<1024x1024xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<1024x1024xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<1024x1024xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<1024x1024xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<1024x1024xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<1024x1024xf32>
   %workgroup_size_x = hal.interface.workgroup.size[0] : index
   %workgroup_size_y = hal.interface.workgroup.size[1] : index
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
@@ -70,12 +68,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[1, 8, 32, 32]]>
 #map = affine_map<()[s0] -> (s0 * 8)>
@@ -90,11 +86,11 @@
   %c4 = arith.constant 4 : index
   %c32 = arith.constant 32 : index
   %c64 = arith.constant 64 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32) offset(%c0) : memref<4x32x1024xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32) offset(%c0) : memref<4x32x1024xf32>
   memref.assume_alignment %0, 32 : memref<4x32x1024xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(32) offset(%c0) : memref<4x1024x64xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(32) offset(%c0) : memref<4x1024x64xf32>
   memref.assume_alignment %1, 32 : memref<4x1024x64xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(32) offset(%c0) : memref<4x32x64xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(32) offset(%c0) : memref<4x32x64xf32>
   memref.assume_alignment %2, 32 : memref<4x32x64xf32>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
@@ -143,12 +139,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[2, 32, 4]]>
 #map = affine_map<()[s0] -> (s0 * 2)>
@@ -159,9 +153,9 @@
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
   %c1024 = arith.constant 1024 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<1024x1024xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<1024x1024xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<1024x1024xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<1024x1024xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<1024x1024xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<1024x1024xf32>
   %workgroup_size_x = hal.interface.workgroup.size[0] : index
   %workgroup_size_y = hal.interface.workgroup.size[1] : index
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
@@ -215,11 +209,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[]]>
 #map = affine_map<(d0) -> (d0)>
@@ -229,8 +221,8 @@
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0x7FC00000 : f32
   %cst_0 = arith.constant 0xFF800000 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<1000xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<f32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<1000xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<f32>
   linalg.fill {lowering_config = #config} ins(%cst_0 : f32) outs(%1 : memref<f32>)
   linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["reduction"]} ins(%0 : memref<1000xf32>) outs(%1 : memref<f32>) attrs =  {lowering_config = #config} {
   ^bb0(%in: f32, %out: f32):
@@ -253,12 +245,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[0, 1, 1, 256, 4, 4, 4]]>
 #map = affine_map<()[s0] -> (s0 * 256)>
@@ -274,11 +264,11 @@
     %c41664 = arith.constant 41664 : index
     %c0 = arith.constant 0 : index
     %cst = arith.constant 0.000000e+00 : f32
-    %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<1x64x56x56xf32>
+    %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<1x64x56x56xf32>
     memref.assume_alignment %0, 64 : memref<1x64x56x56xf32>
-    %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c41664) : memref<64x64x1x1xf32>
+    %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c41664) : memref<64x64x1x1xf32>
     memref.assume_alignment %1, 64 : memref<64x64x1x1xf32>
-    %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c802816) : memref<1x64x56x56xf32>
+    %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c802816) : memref<1x64x56x56xf32>
     memref.assume_alignment %2, 64 : memref<1x64x56x56xf32>
     %workgroup_id_x = hal.interface.workgroup.id[0] : index
     %workgroup_count_x = hal.interface.workgroup.count[0] : index
@@ -316,12 +306,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[0, 1, 2, 256, 4]]>
 #translation = #iree_codegen.translation_info<LLVMGPUMatmulSimt workgroup_size = [64, 8, 1], {pipeline_depth = 0 : i64, store_stage = 1 : i64}>
@@ -342,9 +330,9 @@
     %cst_0 = arith.constant 0.000000e+00 : f32
     %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : i32
     %1 = arith.index_cast %0 : i32 to index
-    %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%1) : memref<?x?x12x64xf32>{%1, %1}
-    %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%1) : memref<?x?x12x64xf32>{%1, %1}
-    %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<?x12x?x?xf32>{%1, %1, %1}
+    %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%1) : memref<?x?x12x64xf32>{%1, %1}
+    %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%1) : memref<?x?x12x64xf32>{%1, %1}
+    %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<?x12x?x?xf32>{%1, %1, %1}
     %workgroup_id_x = hal.interface.workgroup.id[0] : index
     %workgroup_count_x = hal.interface.workgroup.count[0] : index
     %workgroup_id_y = hal.interface.workgroup.id[1] : index
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/elementwise_pipeline.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/elementwise_pipeline.mlir
index 30e5278..7f50d1d 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/elementwise_pipeline.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/elementwise_pipeline.mlir
@@ -1,18 +1,16 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=sm_60 --pass-pipeline="builtin.module(iree-llvmgpu-select-lowering-strategy, func.func(iree-llvmgpu-lower-executable-target))" %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2, d3) -> (d2, d1, d0, d3)>
 #map1 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
 func.func @forward_dispatch_0_generic_320x320x3x3() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<3x320x320x3xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<320x320x3x3xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<3x320x320x3xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<320x320x3x3xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [3, 320, 320, 3], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x320x320x3xf32>> -> tensor<3x320x320x3xf32>
   %3 = tensor.empty() : tensor<320x320x3x3xf32>
   %4 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%2 : tensor<3x320x320x3xf32>) outs(%3 : tensor<320x320x3x3xf32>) {
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/gpu_pipeline_generalize_named_ops.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/gpu_pipeline_generalize_named_ops.mlir
index 74b075c..2c171b2 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/gpu_pipeline_generalize_named_ops.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/gpu_pipeline_generalize_named_ops.mlir
@@ -10,21 +10,19 @@
 // CHECK-NEXT: linalg.generic
 // CHECK-NOT:  linalg.matmul_transpose_b
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @warp_reduction_large_vector() {
   %cst = arith.constant 0.000000e+00 : f32
   %c128 = arith.constant 128 : index
   %c0 = arith.constant 0 : index
   %c394240 = arith.constant 394240 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c128) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x1280xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1280x1280xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c394240) : !flow.dispatch.tensor<writeonly:tensor<1x1280xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c128) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x1280xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1280x1280xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c394240) : !flow.dispatch.tensor<writeonly:tensor<1x1280xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1, 1280], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1x1280xf32>> -> tensor<1x1280xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [1280, 1280], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1280x1280xf32>> -> tensor<1280x1280xf32>
   %5 = tensor.empty() : tensor<1x1280xf32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/gpu_set_num_workgroups.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/gpu_set_num_workgroups.mlir
index 3c3932c..50d9895 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/gpu_set_num_workgroups.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/gpu_set_num_workgroups.mlir
@@ -5,19 +5,17 @@
 
 // Transform dialect attributes are tested separately.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0) -> (d0)>
 func.func @add_dispatch_0() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<16384xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<16384xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<16384xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<16384xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<16384xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<16384xf32>>
   %3 = tensor.empty() : tensor<16384xf32>
   %4 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [16384], strides = [1] : !flow.dispatch.tensor<readonly:tensor<16384xf32>> -> tensor<16384xf32>
   %5 = flow.dispatch.tensor.load %1, offsets = [0], sizes = [16384], strides = [1] : !flow.dispatch.tensor<readonly:tensor<16384xf32>> -> tensor<16384xf32>
@@ -39,21 +37,19 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @dot_dispatch_1() {
   %c0 = arith.constant 0 : index
   %c4 = arith.constant 4 : index
   %c2 = arith.constant 2 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<2x3xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<3x4xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<2x4xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<2x3xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<3x4xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<2x4xf32>
   linalg.fill ins(%cst : f32) outs(%2 : memref<2x4xf32>)
   linalg.matmul ins(%0, %1 : memref<2x3xf32>, memref<3x4xf32>) outs(%2 : memref<2x4xf32>)
   return
@@ -70,21 +66,19 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @unaligned_k() {
   %c0 = arith.constant 0 : index
   %c4 = arith.constant 4 : index
   %c2 = arith.constant 2 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<128x258xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<258x64xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<128x64xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<128x258xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<258x64xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<128x64xf32>
   linalg.fill ins(%cst : f32) outs(%2 : memref<128x64xf32>)
   linalg.matmul ins(%0, %1 : memref<128x258xf32>, memref<258x64xf32>) outs(%2 : memref<128x64xf32>)
   return
@@ -101,11 +95,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0) -> (d0)>
 #map1 = affine_map<(d0) -> ()>
@@ -113,8 +105,8 @@
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0x7FC00000 : f32
   %cst_0 = arith.constant 0xFF800000 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<1000xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<f32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<1000xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<f32>
   linalg.fill ins(%cst_0 : f32) outs(%1 : memref<f32>)
   linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["reduction"]} ins(%0 : memref<1000xf32>) outs(%1 : memref<f32>) {
   ^bb0(%in: f32, %out: f32):
@@ -138,19 +130,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d2, d0, d1)>
 #map1 = affine_map<(d0, d1, d2) -> (d0, d1)>
 func.func @reduction_aligned2() {
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4x128x384xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x384xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4x128x384xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x384xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [4, 128, 384], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4x128x384xf32>> -> tensor<4x128x384xf32>
   %3 = tensor.empty() : tensor<128x384xf32>
   %4 = linalg.fill ins(%cst : f32) outs(%3 : tensor<128x384xf32>) -> tensor<128x384xf32>
@@ -174,19 +164,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1) -> (d0, d1)>
 func.func @copy_as_generic() {
   %c0 = arith.constant 0 : index
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<?x?xi32>{%0, %1}
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<?x?xi32>{%0, %1}
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<?x?xi32>{%0, %1}
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<?x?xi32>{%0, %1}
   linalg.generic {indexing_maps = [#map, #map], iterator_types = ["parallel", "parallel"]} ins(%2 : memref<?x?xi32>) outs(%3 : memref<?x?xi32>) {
   ^bb0(%in: i32, %out: i32):
     linalg.yield %in : i32
@@ -203,19 +191,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @static_1d_fft_stage2() {
   %c0 = arith.constant 0 : index
   %c2 = arith.constant 2 : index
   %cst = arith.constant dense<[1.000000e+00, 6.12323426E-17]> : tensor<2xf32>
   %cst_0 = arith.constant dense<[-0.000000e+00, -1.000000e+00]> : tensor<2xf32>
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readwrite:tensor<32xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readwrite:tensor<32xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readwrite:tensor<32xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readwrite:tensor<32xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [32], strides = [1] : !flow.dispatch.tensor<readwrite:tensor<32xf32>> -> tensor<32xf32>
   %3 = flow.dispatch.tensor.load %1, offsets = [0], sizes = [32], strides = [1] : !flow.dispatch.tensor<readwrite:tensor<32xf32>> -> tensor<32xf32>
   %4:2 = iree_linalg_ext.fft {__internal_linalg_transform__ = "workgroup"} ins(%c2, %cst, %cst_0 : index, tensor<2xf32>, tensor<2xf32>) outs(%2, %3 : tensor<32xf32>, tensor<32xf32>) : tensor<32xf32>, tensor<32xf32>
@@ -233,11 +219,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @static_3d_fft_stage3() {
   %c0 = arith.constant 0 : index
@@ -249,8 +233,8 @@
   %cst_0 = arith.constant dense<[-0.000000e+00, -0.707106769, -1.000000e+00, -0.707106769]> : tensor<4xf32>
   %0 = bufferization.to_memref %cst_0 : memref<4xf32>
   %1 = bufferization.to_memref %cst : memref<4xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<64x128x32xf32>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<64x128x32xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<64x128x32xf32>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<64x128x32xf32>
   iree_linalg_ext.fft {__internal_linalg_transform__ = "workgroup"} ins(%c3, %1, %0 : index, memref<4xf32>, memref<4xf32>) outs(%2, %3 : memref<64x128x32xf32>, memref<64x128x32xf32>)
   return
 }
@@ -264,12 +248,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[32, 128, 64]]>
 #translation = #iree_codegen.translation_info<LLVMGPUMatmulSimt workgroup_size = [16, 8, 1], {pipeline_depth = 0 : i64, store_stage = 1 : i64}>
@@ -279,9 +261,9 @@
   %c128 = arith.constant 128 : index
   %c1024 = arith.constant 1024 : index
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<128x256xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<256x1024xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<128x1024xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<128x256xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<256x1024xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<128x1024xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [128, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<128x256xf32>> -> tensor<128x256xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [256, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x1024xf32>> -> tensor<256x1024xf32>
   %5 = tensor.empty() : tensor<128x1024xf32>
@@ -302,22 +284,20 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @sort_op() {
   %c1 = arith.constant 1 : index
   %c0 = arith.constant 0 : index
   %c2304000 = arith.constant 2304000 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32) : !flow.dispatch.tensor<readonly:tensor<1x576000xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(32) : !flow.dispatch.tensor<readonly:tensor<1x576000xi32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(32) : !flow.dispatch.tensor<writeonly:tensor<1x576000xf32>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(32) offset(%c2304000) : !flow.dispatch.tensor<writeonly:tensor<1x576000xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32) : !flow.dispatch.tensor<readonly:tensor<1x576000xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(32) : !flow.dispatch.tensor<readonly:tensor<1x576000xi32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(32) : !flow.dispatch.tensor<writeonly:tensor<1x576000xf32>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(32) offset(%c2304000) : !flow.dispatch.tensor<writeonly:tensor<1x576000xi32>>
   %4 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1, 576000], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1x576000xf32>> -> tensor<1x576000xf32>
   %5 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [1, 576000], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1x576000xi32>> -> tensor<1x576000xi32>
   %6:2 = iree_linalg_ext.sort dimension(1) outs(%4, %5 : tensor<1x576000xf32>, tensor<1x576000xi32>) {
@@ -339,21 +319,19 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_config_sm35() {
   %cst = arith.constant 0.000000e+00 : f32
   %c128 = arith.constant 128 : index
   %c1024 = arith.constant 1024 : index
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<128x256xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<256x1024xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<128x1024xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<128x256xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<256x1024xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<128x1024xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [128, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<128x256xf32>> -> tensor<128x256xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [256, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x1024xf32>> -> tensor<256x1024xf32>
   %5 = tensor.empty() : tensor<128x1024xf32>
@@ -369,21 +347,19 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_config_sm80() {
   %cst = arith.constant 0.000000e+00 : f32
   %c128 = arith.constant 128 : index
   %c1024 = arith.constant 1024 : index
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<128x256xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<256x1024xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<128x1024xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<128x256xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<256x1024xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<128x1024xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [128, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<128x256xf32>> -> tensor<128x256xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [256, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x1024xf32>> -> tensor<256x1024xf32>
   %5 = tensor.empty() : tensor<128x1024xf32>
@@ -399,21 +375,19 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_config_sm86() {
   %cst = arith.constant 0.000000e+00 : f32
   %c128 = arith.constant 128 : index
   %c1024 = arith.constant 1024 : index
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<128x256xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<256x1024xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<128x1024xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<128x256xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<256x1024xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<128x1024xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [128, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<128x256xf32>> -> tensor<128x256xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [256, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x1024xf32>> -> tensor<256x1024xf32>
   %5 = tensor.empty() : tensor<128x1024xf32>
@@ -429,12 +403,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[32, 128, 32]]>
 #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
@@ -445,9 +417,9 @@
   %c40064 = arith.constant 40064 : index
   %c34752 = arith.constant 34752 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x7xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c40064) : !flow.dispatch.tensor<readonly:tensor<3x64x4x8xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c34752) : !flow.dispatch.tensor<writeonly:tensor<3x64xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x7xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c40064) : !flow.dispatch.tensor<readonly:tensor<3x64x4x8xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c34752) : !flow.dispatch.tensor<writeonly:tensor<3x64xf32>>
   %3 = tensor.empty() : tensor<3x64xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 4], sizes = [3, 64, 4, 1], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x64x4x8xf32>> -> tensor<3x64x4xf32>
   %5 = linalg.fill {lowering_config = #config} ins(%cst : f32) outs(%3 : tensor<3x64xf32>) -> tensor<3x64xf32>
@@ -470,11 +442,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 4, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 4, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @dynamic_pack_2x2() {
   %c0 = arith.constant 0 : index
@@ -487,8 +457,8 @@
   %5 = arith.index_castui %1 : i32 to index
   %6 = arith.index_castui %2 : i32 to index
   %7 = arith.index_castui %3 : i32 to index
-  %8 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c64) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x?xi32>>{%4, %5}
-  %9 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<?x?x2x2xi32>>{%6, %7}
+  %8 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c64) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x?xi32>>{%4, %5}
+  %9 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<?x?x2x2xi32>>{%6, %7}
   %10 = flow.dispatch.tensor.load %8, offsets = [0, 0], sizes = [%4, %5], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?xi32>>{%4, %5} -> tensor<?x?xi32>
   %11 = tensor.empty(%6, %7) : tensor<?x?x2x2xi32>
   %pack = tensor.pack %10 inner_dims_pos = [0, 1] inner_tiles = [2, 2] into %11 : tensor<?x?xi32> -> tensor<?x?x2x2xi32>
@@ -505,21 +475,19 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @large_matmul_f16() {
   %cst = arith.constant 0.000000e+00 : f16
   %c128 = arith.constant 128 : index
   %c1024 = arith.constant 1024 : index
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<2560x1792xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<1792x2048xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<2560x2048xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<2560x1792xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<1792x2048xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<2560x2048xf16>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2560, 1792], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2560x1792xf16>> -> tensor<2560x1792xf16>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [1792, 2048], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1792x2048xf16>> -> tensor<1792x2048xf16>
   %5 = tensor.empty() : tensor<2560x2048xf16>
@@ -539,21 +507,19 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @large_matmul_f32() {
   %cst = arith.constant 0.000000e+00 : f32
   %c128 = arith.constant 128 : index
   %c1024 = arith.constant 1024 : index
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<2560x1792xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<1792x2048xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<2560x2048xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<2560x1792xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<1792x2048xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<2560x2048xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2560, 1792], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2560x1792xf32>> -> tensor<2560x1792xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [1792, 2048], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1792x2048xf32>> -> tensor<1792x2048xf32>
   %5 = tensor.empty() : tensor<2560x2048xf32>
@@ -574,19 +540,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1) -> (d0, d1)>
 func.func @inner_unit_dim() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<16384x1xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<16384x1xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<16384x1xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<16384x1xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<16384x1xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<16384x1xf32>>
   %3 = tensor.empty() : tensor<16384x1xf32>
   %4 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [16384, 1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<16384x1xf32>> -> tensor<16384x1xf32>
   %5 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [16384, 1], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<16384x1xf32>> -> tensor<16384x1xf32>
@@ -608,12 +572,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
 #map1 = affine_map<(d0, d1, d2, d3) -> (d3)>
@@ -627,9 +589,9 @@
   %cst_3 = arith.constant dense_resource<__elided__> : tensor<64xf32>
   %cst_4 = arith.constant dense_resource<__elided__> : tensor<64xf32>
   %cst_5 = arith.constant dense_resource<__elided__> : tensor<64xf32>
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x230x230x3xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<7x7x3x64xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c162508800) : !flow.dispatch.tensor<writeonly:tensor<256x112x112x64xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x230x230x3xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<7x7x3x64xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c162508800) : !flow.dispatch.tensor<writeonly:tensor<256x112x112x64xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [256, 230, 230, 3], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<256x230x230x3xf32>> -> tensor<256x230x230x3xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [7, 7, 3, 64], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<7x7x3x64xf32>> -> tensor<7x7x3x64xf32>
   %5 = tensor.empty() : tensor<256x112x112x64xf32>
@@ -660,11 +622,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2, d3, d4) -> (d0, d2, d1, d4)>
 #map1 = affine_map<(d0, d1, d2, d3, d4) -> (d0, d3, d1, d4)>
@@ -679,9 +639,9 @@
   %3 = arith.index_castui %0 {stream.alignment = 64 : index, stream.values = [35524672 : index, 240930880 : index, 446337088 : index, 651743296 : index]} : i32 to index
   %4 = arith.index_castui %1 {stream.alignment = 64 : index, stream.values = [57544768 : index, 262950976 : index, 468357184 : index, 673763392 : index]} : i32 to index
   %5 = arith.index_castui %2 {stream.alignment = 64 : index, stream.values = [1728 : index, 36472832 : index, 72943744 : index, 109415936 : index]} : i32 to index
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%3) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<512x42x4x64xf32>>
-  %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%4) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<512x42x4x64xf32>>
-  %8 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%5) : !flow.dispatch.tensor<writeonly:tensor<512x4x42x42xf32>>
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%3) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<512x42x4x64xf32>>
+  %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%4) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<512x42x4x64xf32>>
+  %8 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%5) : !flow.dispatch.tensor<writeonly:tensor<512x4x42x42xf32>>
   %9 = flow.dispatch.tensor.load %6, offsets = [0, 0, 0, 0], sizes = [512, 42, 4, 64], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<512x42x4x64xf32>> -> tensor<512x42x4x64xf32>
   %10 = flow.dispatch.tensor.load %7, offsets = [0, 0, 0, 0], sizes = [512, 42, 4, 64], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<512x42x4x64xf32>> -> tensor<512x42x4x64xf32>
   %11 = tensor.empty() : tensor<512x4x42x42xf32>
@@ -712,13 +672,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 9, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 9, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1) -> (d0, d1)>
 #map1 = affine_map<(d0, d1) -> (d0)>
@@ -755,12 +713,12 @@
   %24 = arith.shli %23, %c32_i64 : i64
   %25 = arith.ori %22, %24 : i64
   %26 = arith.index_castui %25 : i64 to index
-  %27 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%9) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x11008xi4>>
-  %28 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%10) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096xf32>>
-  %29 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%11) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096xf32>>
+  %27 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%9) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x11008xi4>>
+  %28 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%10) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096xf32>>
+  %29 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%11) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096xf32>>
   %30 = flow.dispatch.workload.ordinal %26, 0 : index
-  %31 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%16) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x11008xf32>>{%30}
-  %32 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%21) : !flow.dispatch.tensor<writeonly:tensor<?x4096xf32>>{%30}
+  %31 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%16) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x11008xf32>>{%30}
+  %32 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%21) : !flow.dispatch.tensor<writeonly:tensor<?x4096xf32>>{%30}
   %33 = flow.dispatch.tensor.load %27, offsets = [0, 0], sizes = [4096, 11008], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x11008xi4>> -> tensor<4096x11008xi4>
   %34 = flow.dispatch.tensor.load %28, offsets = [0], sizes = [4096], strides = [1] : !flow.dispatch.tensor<readonly:tensor<4096xf32>> -> tensor<4096xf32>
   %35 = flow.dispatch.tensor.load %29, offsets = [0], sizes = [4096], strides = [1] : !flow.dispatch.tensor<readonly:tensor<4096xf32>> -> tensor<4096xf32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/illegal_configuration.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/illegal_configuration.mlir
index 7313f52..436ef52 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/illegal_configuration.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/illegal_configuration.mlir
@@ -1,19 +1,17 @@
 // RUN: iree-opt --iree-gpu-test-target=sm_60 --pass-pipeline="builtin.module(iree-llvmgpu-select-lowering-strategy)" --verify-diagnostics --split-input-file %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = []>
 #translation = #iree_codegen.translation_info<LLVMGPUMatmulSimt workgroup_size = [32, 8, 8], {pipeline_depth = 0 : i64, store_stage = 1 : i64}>
 func.func @illegal() attributes {translation_info = #translation} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<4x8xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<8x16xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<4x16xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<4x8xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<8x16xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<4x16xf32>
   // expected-error @+1 {{Total number of threads in a thread block 2048 exceeds the limit of 1024 with compilation pipeline LLVMGPUMatmulSimt}}
   linalg.matmul {lowering_config = #config} ins(%0, %1 : memref<4x8xf32>, memref<8x16xf32>) outs(%2 : memref<4x16xf32>)
   return
@@ -21,20 +19,18 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = []>
 #translation = #iree_codegen.translation_info<LLVMGPUMatmulSimt workgroup_size = [32, 8, 2], {pipeline_depth = 0 : i64, store_stage = 1 : i64}>
 func.func @illegal() attributes {translation_info = #translation} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<4x8xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<8x16xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<4x16xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<4x8xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<8x16xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<4x16xf32>
   // expected-error @+1 {{Expected workgroup size in z-dim = 1, but got 2 with compilation pipeline LLVMGPUMatmulSimt}}
   linalg.matmul {lowering_config = #config} ins(%0, %1 : memref<4x8xf32>, memref<8x16xf32>) outs(%2 : memref<4x16xf32>)
   return
@@ -42,20 +38,18 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[32, 32, 16]]>
 #translation = #iree_codegen.translation_info<LLVMGPUMatmulTensorCore workgroup_size = [64, 2, 10], {pipeline_depth = 0 : i64, store_stage = 1 : i64}>
 func.func @illegal() attributes {translation_info = #translation} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<32x16xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<16x32xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<32x32xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<32x16xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<16x32xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<32x32xf32>
   // expected-error @+1 {{Total number of threads in a thread block 1280 exceeds the limit of 1024 with compilation pipeline LLVMGPUMatmulTensorCore}}
   linalg.matmul {lowering_config = #config} ins(%0, %1 : memref<32x16xf32>, memref<16x32xf32>) outs(%2 : memref<32x32xf32>)
   return
@@ -63,20 +57,18 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[32, 32, 16]]>
 #translation = #iree_codegen.translation_info<LLVMGPUMatmulTensorCore workgroup_size = [48, 2, 1], {pipeline_depth = 0 : i64, store_stage = 1 : i64}>
 func.func @illegal() attributes {translation_info = #translation} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<32x16xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<16x32xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<32x32xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<32x16xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<16x32xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<32x32xf32>
   // expected-error @+1 {{Number of threads in x-dim 48 is not a multiple of warp size (32) or integer units of warps in x-dim with compilation pipeline LLVMGPUMatmulTensorCore}}
   linalg.matmul {lowering_config = #config} ins(%0, %1 : memref<32x16xf32>, memref<16x32xf32>) outs(%2 : memref<32x32xf32>)
   return
@@ -84,20 +76,18 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[32, 32, 16]]>
 #translation = #iree_codegen.translation_info<LLVMGPUMatmulTensorCore workgroup_size = [64, 2, 2], {pipeline_depth = 0 : i64, store_stage = 1 : i64}>
 func.func @illegal() attributes {translation_info = #translation} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<32x16xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<16x32xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<32x32xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<32x16xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<16x32xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<32x32xf32>
   // expected-error @+1 {{Expected workgroup size in z-dim = 1, but got 2 with compilation pipeline LLVMGPUMatmulTensorCore}}
   linalg.matmul {lowering_config = #config} ins(%0, %1 : memref<32x16xf32>, memref<16x32xf32>) outs(%2 : memref<32x32xf32>)
   return
@@ -105,20 +95,18 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[32, 32, 20]]>
 #translation = #iree_codegen.translation_info<LLVMGPUMatmulTensorCore workgroup_size = [64, 2, 1], {pipeline_depth = 0 : i64, store_stage = 1 : i64}>
 func.func @illegal() attributes {translation_info = #translation} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<32x16xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<16x32xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<32x32xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<32x16xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<16x32xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<32x32xf32>
   // expected-error @+1 {{Thread block shape 32, 32, 20 cannot be tiled on matmul shape 32, 32, 16 with compilation pipeline LLVMGPUMatmulTensorCore}}
   linalg.matmul {lowering_config = #config} ins(%0, %1 : memref<32x16xf32>, memref<16x32xf32>) outs(%2 : memref<32x32xf32>)
   return
@@ -126,20 +114,18 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[64, 32, 16]]>
 #translation = #iree_codegen.translation_info<LLVMGPUMatmulTensorCore workgroup_size = [128, 1, 1], {pipeline_depth = 0 : i64, store_stage = 1 : i64}>
 func.func @illegal() attributes {translation_info = #translation} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<1024x512xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<512x256xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<1024x256xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<1024x512xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<512x256xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<1024x256xf32>
   // expected-error @+1 {{Tensor Core instruction shape 16, 16, 8 cannot be tiled on warp shape 64, 8, 16 with compilation pipeline LLVMGPUMatmulTensorCore}}
   linalg.matmul {lowering_config = #config} ins(%0, %1 : memref<1024x512xf32>, memref<512x256xf32>) outs(%2 : memref<1024x256xf32>)
   return
@@ -147,20 +133,18 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[32, 32, 16]]>
 #translation = #iree_codegen.translation_info<LLVMGPUMatmulTensorCore workgroup_size = [64, 2, 1], {pipeline_depth = 0 : i64, store_stage = 1 : i64}>
 func.func @illegal() attributes {translation_info = #translation} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<48x16xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<16x32xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<48x32xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<48x16xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<16x32xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<48x32xf32>
   // expected-error @+1 {{Thread block shape 32, 32, 16 cannot be tiled on matmul shape 48, 32, 16 with compilation pipeline LLVMGPUMatmulTensorCore}}
   linalg.matmul {lowering_config = #config} ins(%0, %1 : memref<48x16xf32>, memref<16x32xf32>) outs(%2 : memref<48x32xf32>)
   return
@@ -168,20 +152,18 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[32, 32, 16]]>
 #translation = #iree_codegen.translation_info<LLVMGPUMatmulTensorCore workgroup_size = [64, 2, 1], {pipeline_depth = 0 : i64, store_stage = 1 : i64}>
 func.func @illegal() attributes {translation_info = #translation} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<32x16xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<16x48xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<32x48xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<32x16xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<16x48xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<32x48xf32>
   // expected-error @+1 {{Thread block shape 32, 32, 16 cannot be tiled on matmul shape 32, 48, 16 with compilation pipeline LLVMGPUMatmulTensorCore}}
   linalg.matmul {lowering_config = #config} ins(%0, %1 : memref<32x16xf32>, memref<16x48xf32>) outs(%2 : memref<32x48xf32>)
   return
@@ -189,12 +171,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[2, 32, 32, 16]]>
 #map = affine_map<()[s0] -> (s0 * 8)>
@@ -209,11 +189,11 @@
   %c4 = arith.constant 4 : index
   %c32 = arith.constant 32 : index
   %c64 = arith.constant 64 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32) offset(%c0) : memref<4x32x1024xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32) offset(%c0) : memref<4x32x1024xf32>
   memref.assume_alignment %0, 32 : memref<4x32x1024xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(32) offset(%c0) : memref<4x1024x64xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(32) offset(%c0) : memref<4x1024x64xf32>
   memref.assume_alignment %1, 32 : memref<4x1024x64xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(32) offset(%c0) : memref<4x32x64xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(32) offset(%c0) : memref<4x32x64xf32>
   memref.assume_alignment %2, 32 : memref<4x32x64xf32>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
@@ -242,20 +222,18 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[64, 32, 48]]>
 #translation = #iree_codegen.translation_info<LLVMGPUMatmulTensorCoreMmaSync workgroup_size = [128, 1, 1], {pipeline_depth = 4 : i64, store_stage = 1 : i64}>
 func.func @illegal() attributes {translation_info = #translation} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<1024x512xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<512x256xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<1024x256xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<1024x512xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<512x256xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<1024x256xf32>
   // expected-error @+1 {{Thread block shape 64, 32, 48 cannot be tiled on matmul shape 1024, 256, 512 with compilation pipeline LLVMGPUMatmulTensorCoreMmaSync}}
   linalg.matmul {lowering_config = #config} ins(%0, %1 : memref<1024x512xf32>, memref<512x256xf32>) outs(%2 : memref<1024x256xf32>)
   return
@@ -263,20 +241,18 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[64, 32, 4]]>
 #translation = #iree_codegen.translation_info<LLVMGPUMatmulTensorCoreMmaSync workgroup_size = [128, 1, 1], {pipeline_depth = 4 : i64, store_stage = 1 : i64}>
 func.func @illegal() attributes {translation_info = #translation} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<1024x512xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<512x256xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<1024x256xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<1024x512xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<512x256xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<1024x256xf32>
   // expected-error @+1 {{Tensor Core instruction shape 16, 8, 8 cannot be tiled on warp shape 64, 8, 4 with compilation pipeline LLVMGPUMatmulTensorCoreMmaSync}}
   linalg.matmul {lowering_config = #config} ins(%0, %1 : memref<1024x512xf32>, memref<512x256xf32>) outs(%2 : memref<1024x256xf32>)
   return
@@ -284,20 +260,18 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[64, 64, 64]]>
 #translation = #iree_codegen.translation_info<LLVMGPUMatmulTensorCoreMmaSync workgroup_size = [128, 1, 1], {pipeline_depth = 4 : i64, store_stage = 1 : i64}>
 func.func @illegal() attributes {translation_info = #translation} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<1024x512xi8>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<512x256xi8>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<1024x256xi8>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<1024x512xi8>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<512x256xi8>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<1024x256xi8>
   // expected-error @+1 {{Expected f16, bf16 or f32 for Tensor Core (MMA.SYNC) pipeline}}
   linalg.matmul {lowering_config = #config} ins(%0, %1 : memref<1024x512xi8>, memref<512x256xi8>) outs(%2 : memref<1024x256xi8>)
   return
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/linalg_transform.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/linalg_transform.mlir
index 18100b1..f292df7 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/linalg_transform.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/linalg_transform.mlir
@@ -10,19 +10,17 @@
 // RUN:     --iree-codegen-transform-dialect-library=%p/transform_dialect_codegen_foreach_to_gpu_spec.mlir@__transform_main | \
 // RUN: FileCheck %s --check-prefix=FOREACH-TO-GPU
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_cuda_nvptx_fb = #hal.executable.target<"cuda", "cuda-nvptx-fb">
 func.func @matmul_static_dispatch_0() attributes {hal.executable.target = #executable_target_cuda_nvptx_fb} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<250x500xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<500x1020xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<250x1020xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<250x500xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<500x1020xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<250x1020xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [250, 500], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<250x500xf32>> -> tensor<250x500xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [500, 1020], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<500x1020xf32>> -> tensor<500x1020xf32>
   %5 = tensor.empty() : tensor<250x1020xf32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/llvmgpu_bufferize.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/llvmgpu_bufferize.mlir
index 1082caf..73bdb91 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/llvmgpu_bufferize.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/llvmgpu_bufferize.mlir
@@ -1,17 +1,15 @@
 // RUN: iree-opt --pass-pipeline="builtin.module(func.func(iree-codegen-llvmgpu-bufferization-pipeline))" --split-input-file %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @bufferize_with_thread_private_memory(%arg0: index) {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
   %cst_ved = arith.constant dense<0.000000e+00> : vector<1x1x4x4xf16>
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<320xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x320x64x64xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<320xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x320x64x64xf16>>
   %2 = flow.dispatch.tensor.load %1, offsets = [%arg0, %arg0, %arg0, %arg0], sizes = [1, 1, 8, 64], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<writeonly:tensor<2x320x64x64xf16>> -> tensor<1x1x8x64xf16>
   %3 = flow.dispatch.tensor.load %0, offsets = [%arg0], sizes = [1], strides = [1] : !flow.dispatch.tensor<readonly:tensor<320xf16>> -> tensor<1xf16>
   %4 = scf.forall (%arg1, %arg2) in (2, 16) shared_outs(%arg3 = %2) -> (tensor<1x1x8x64xf16>) {
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/nvvm_extract_address_computation.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/nvvm_extract_address_computation.mlir
index 8ce12fb..25b5957 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/nvvm_extract_address_computation.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/nvvm_extract_address_computation.mlir
@@ -73,7 +73,11 @@
 // Just double check that we captured the IV
 // CHECK: %[[IV_NEXT:.*]] = llvm.mul %[[IV]], %[[C8192]]  : i64
 #executable_target_cuda_nvptx_fb = #hal.executable.target<"cuda", "cuda-nvptx-fb">
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [<0, bindings = [<0, storage_buffer, ReadOnly>, <1, storage_buffer, ReadOnly>, <2, storage_buffer>]>]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer, ReadOnly>,
+  #hal.pipeline.binding<storage_buffer, ReadOnly>,
+  #hal.pipeline.binding<storage_buffer>
+]>
 hal.executable private @matmul_dispatch_0 {
   hal.executable.variant public @cuda_nvptx_fb target(#executable_target_cuda_nvptx_fb) {
     hal.executable.export public @matmul_dispatch_0_matmul_2560x2560x2560 ordinal(0) layout(#pipeline_layout) {
@@ -85,9 +89,9 @@
       func.func @matmul_dispatch_0_matmul_2560x2560x2560() {
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f16
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2560x2560xf16>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2560x2560xf16>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2560x2560xf16>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2560x2560xf16>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2560x2560xf16>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2560x2560xf16>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2560, 2560], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2560x2560xf16>> -> tensor<2560x2560xf16>
         %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [2560, 2560], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2560x2560xf16>> -> tensor<2560x2560xf16>
         %5 = tensor.empty() : tensor<2560x2560xf16>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/nvvm_mma_sync_pipeline_test.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/nvvm_mma_sync_pipeline_test.mlir
index ab7136c..28ae306 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/nvvm_mma_sync_pipeline_test.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/nvvm_mma_sync_pipeline_test.mlir
@@ -5,12 +5,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @mma_fused_fp16 {
   hal.executable.variant public @cuda_nvptx_fb target(<"cuda", "cuda-nvptx-fb">) {
@@ -25,10 +23,10 @@
       %cst = arith.constant 0.000000e+00 : f16
       %c2048 = arith.constant 2048 : index
       %c512 = arith.constant 512 : index
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<2048x1024xf16>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<1024x512xf16>>
-      %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<2048x512xf16>>
-      %di = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<2048x512xf16>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<2048x1024xf16>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<1024x512xf16>>
+      %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<2048x512xf16>>
+      %di = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<2048x512xf16>>
       %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2048, 1024], strides = [1, 1]
           : !flow.dispatch.tensor<readonly:tensor<2048x1024xf16>> -> tensor<2048x1024xf16>
       %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [1024, 512], strides = [1, 1]
@@ -87,12 +85,10 @@
 // -----
 
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @mma_fused_f32 {
   hal.executable.variant public @cuda_nvptx_fb target(<"cuda", "cuda-nvptx-fb">) {
@@ -107,10 +103,10 @@
       %cst = arith.constant 0.000000e+00 : f32
       %c2048 = arith.constant 2048 : index
       %c512 = arith.constant 512 : index
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<2048x1024xf32>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<1024x512xf32>>
-      %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<2048x512xf32>>
-      %di = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<2048x512xf32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<2048x1024xf32>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<1024x512xf32>>
+      %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<2048x512xf32>>
+      %di = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<2048x512xf32>>
       %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2048, 1024], strides = [1, 1]
           : !flow.dispatch.tensor<readonly:tensor<2048x1024xf32>> -> tensor<2048x1024xf32>
       %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [1024, 512], strides = [1, 1]
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/nvvm_pipeline_test.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/nvvm_pipeline_test.mlir
index 975f73d..c591885 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/nvvm_pipeline_test.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/nvvm_pipeline_test.mlir
@@ -4,12 +4,10 @@
 // Verify that a simple element wise op gets lowered succefully all the way to
 // nvvm/llvm dialect.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @simpleMath_ex_dispatch_0 {
   hal.executable.variant @cuda target(<"cuda", "cuda-nvptx-fb">) {
@@ -21,9 +19,9 @@
   builtin.module {
     func.func @add_dispatch_0() {
       %c0 = arith.constant 0 : index
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<16xf32>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<16xf32>>
-      %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<16xf32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<16xf32>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<16xf32>>
+      %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<16xf32>>
       %3 = tensor.empty() : tensor<16xf32>
       %4 = flow.dispatch.tensor.load %0, offsets=[0], sizes=[16], strides=[1] : !flow.dispatch.tensor<readonly:tensor<16xf32>> -> tensor<16xf32>
       %5 = flow.dispatch.tensor.load %1, offsets=[0], sizes=[16], strides=[1] : !flow.dispatch.tensor<readonly:tensor<16xf32>> -> tensor<16xf32>
@@ -48,12 +46,10 @@
 #map0 = affine_map<()[s0, s1] -> (s0 * s1)>
 #map1 = affine_map<(d0)[s0] -> (s0, -d0 + 1024)>
 #map2 = affine_map<(d0)[s0] -> (-d0 + 1024, s0)>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @dot_dispatch_0 {
   hal.executable.variant @cuda target(<"cuda", "cuda-nvptx-fb">) {
@@ -68,9 +64,9 @@
         %c0 = arith.constant 0 : index
         %c1024 = arith.constant 1024 : index
         %c1 = arith.constant 1 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1024x1024xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<1024x1024xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1024x1024xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1024x1024xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<1024x1024xf32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1024x1024xf32>>
         %8 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1024, 1024], strides = [1, 1]
             : !flow.dispatch.tensor<readonly:tensor<1024x1024xf32>> -> tensor<1024x1024xf32>
         %10 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [1024, 1024], strides = [1, 1]
@@ -119,12 +115,10 @@
   ],
   iterator_types = ["parallel", "parallel", "reduction"]
 }
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @dot_dispatch_0 {
   hal.executable.variant @cuda target(<"cuda", "cuda-nvptx-fb">) {
@@ -139,9 +133,9 @@
         %c0 = arith.constant 0 : index
         %c1024 = arith.constant 1024 : index
         %c1 = arith.constant 1 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1024x1024xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<1024x1024xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1024x1024xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1024x1024xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<1024x1024xf32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1024x1024xf32>>
         %8 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1024, 1024], strides = [1, 1]
             : !flow.dispatch.tensor<readonly:tensor<1024x1024xf32>> -> tensor<1024x1024xf32>
         %10 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [1024, 1024], strides = [1, 1]
@@ -172,12 +166,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @conv2d_dispatch_0 {
 hal.executable.variant @cuda target(<"cuda", "cuda-nvptx-fb">) {
@@ -193,9 +185,9 @@
       %c2 = arith.constant 2 : index
       %c3 = arith.constant 3 : index
       %c1 = arith.constant 1 : index
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x4x4x2xf32>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x2x2x1xf32>>
-      %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x2x3x1xf32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x4x4x2xf32>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x2x2x1xf32>>
+      %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x2x3x1xf32>>
       %11 = flow.dispatch.tensor.load %0, offsets = [0, 0 ,0, 0], sizes = [1, 4, 4, 2], strides = [1, 1, 1, 1]
           : !flow.dispatch.tensor<readonly:tensor<1x4x4x2xf32>> -> tensor<1x4x4x2xf32>
       %13 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [3, 2, 2, 1], strides = [1, 1, 1, 1]
@@ -221,11 +213,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @simpleMath_ex_dispatch_0 {
 hal.executable.variant @cuda target(<"cuda", "cuda-nvptx-fb">) {
@@ -237,8 +227,8 @@
   builtin.module {
     func.func @add_dispatch_0() {
       %c0 = arith.constant 0 : index
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<16xf32>>
-      %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<16xf32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<16xf32>>
+      %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<16xf32>>
       %3 = tensor.empty() : tensor<16xf32>
       %4 = flow.dispatch.tensor.load %0, offsets=[0], sizes=[16], strides=[1] : !flow.dispatch.tensor<readonly:tensor<16xf32>> -> tensor<16xf32>
       %5 = arith.constant dense<[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0]> : tensor<16xf32>
@@ -261,11 +251,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @reduction_dispatch {
 hal.executable.variant @cuda target(<"cuda", "cuda-nvptx-fb">) {
@@ -279,8 +267,8 @@
       %c0 = arith.constant 0 : index
       %cst = arith.constant 0.000000e+00 : f32
       %c96 = arith.constant 96 : index
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<14x14x96xf32>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<96xf32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<14x14x96xf32>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<96xf32>>
       %5 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [14, 14, 96], strides = [1, 1, 1]
           : !flow.dispatch.tensor<readonly:tensor<14x14x96xf32>> -> tensor<14x14x96xf32>
       %8 = tensor.empty() : tensor<96xf32>
@@ -307,12 +295,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @vector_add_dispatch {
 hal.executable.variant @cuda target(<"cuda", "cuda-nvptx-fb">) {
@@ -325,9 +311,9 @@
     func.func @vector_add_dispatch() {
       %c0 = arith.constant 0 : index
       %c16384 = arith.constant 16384 : index
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<16384xf32>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<16384xf32>>
-      %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<16384xf32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<16384xf32>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<16384xf32>>
+      %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<16384xf32>>
       %6 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [16384], strides = [1]
           : !flow.dispatch.tensor<readonly:tensor<16384xf32>> -> tensor<16384xf32>
       %8 = flow.dispatch.tensor.load %1, offsets = [0], sizes = [16384], strides = [1]
@@ -361,11 +347,9 @@
 #map2 = affine_map<(d0)[s0] -> (-d0 + 16384, s0)>
 #map3 = affine_map<(d0, d1) -> (d1, d0)>
 #map4 = affine_map<(d0, d1) -> (d0)>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @vector_reduction_dispatch {
 hal.executable.variant @cuda target(<"cuda", "cuda-nvptx-fb">) {
@@ -379,8 +363,8 @@
       %c0 = arith.constant 0 : index
       %c16384 = arith.constant 16384 : index
       %cst = arith.constant 1.000000e+00 : f32
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<512x16384xf32>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<16384xf32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<512x16384xf32>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<16384xf32>>
       %5 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [512, 16384], strides = [1, 1]
           : !flow.dispatch.tensor<readonly:tensor<512x16384xf32>> -> tensor<512x16384xf32>
       %8 = tensor.empty() : tensor<16384xf32>
@@ -406,16 +390,18 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @mma_fused {
   hal.executable.variant public @cuda_nvptx_fb target(<"cuda", "cuda-nvptx-fb">) {
-  hal.executable.export public @_large_aligned_dispatch_0 ordinal(0) layout(#hal.pipeline.layout<push_constants = 0, sets = [#hal.descriptor_set.layout<0, bindings = [#hal.descriptor_set.binding<0, storage_buffer>, #hal.descriptor_set.binding<1, storage_buffer>, #hal.descriptor_set.binding<2, storage_buffer>]>]>) {
+  hal.executable.export public @_large_aligned_dispatch_0 ordinal(0) layout(#hal.pipeline.layout<bindings = [
+    #hal.pipeline.binding<storage_buffer>,
+    #hal.pipeline.binding<storage_buffer>,
+    #hal.pipeline.binding<storage_buffer>
+  ]>) {
   ^bb0(%arg0: !hal.device, %arg1: index, %arg2 : index):
     %x, %y, %z = flow.dispatch.workgroup_count_from_dag_root %arg1, %arg2
     hal.return %x, %y, %z : index, index, index
@@ -426,10 +412,10 @@
       %cst = arith.constant 0.000000e+00 : f32
       %c2048 = arith.constant 2048 : index
       %c512 = arith.constant 512 : index
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<2048x1024xf32>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<1024x512xf32>>
-      %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<2048x512xf32>>
-      %di = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<2048x512xf32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<2048x1024xf32>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<1024x512xf32>>
+      %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<2048x512xf32>>
+      %di = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<2048x512xf32>>
       %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2048, 1024], strides = [1, 1]
           : !flow.dispatch.tensor<readonly:tensor<2048x1024xf32>> -> tensor<2048x1024xf32>
       %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [1024, 512], strides = [1, 1]
@@ -489,16 +475,18 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @mma_fused_fp16 {
   hal.executable.variant public @cuda_nvptx_fb target(<"cuda", "cuda-nvptx-fb">) {
-  hal.executable.export public @_large_aligned_dispatch_0 ordinal(0) layout(#hal.pipeline.layout<push_constants = 0, sets = [#hal.descriptor_set.layout<0, bindings = [#hal.descriptor_set.binding<0, storage_buffer>, #hal.descriptor_set.binding<1, storage_buffer>, #hal.descriptor_set.binding<2, storage_buffer>]>]>) {
+  hal.executable.export public @_large_aligned_dispatch_0 ordinal(0) layout(#hal.pipeline.layout<bindings = [
+    #hal.pipeline.binding<storage_buffer>,
+    #hal.pipeline.binding<storage_buffer>,
+    #hal.pipeline.binding<storage_buffer>
+  ]>) {
   ^bb0(%arg0: !hal.device, %arg1: index, %arg2 : index):
     %x, %y, %z = flow.dispatch.workgroup_count_from_dag_root %arg1, %arg2
     hal.return %x, %y, %z : index, index, index
@@ -509,10 +497,10 @@
       %cst = arith.constant 0.000000e+00 : f16
       %c2048 = arith.constant 2048 : index
       %c512 = arith.constant 512 : index
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<2048x1024xf16>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<1024x512xf16>>
-      %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<2048x512xf16>>
-      %di = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<2048x512xf16>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<2048x1024xf16>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<1024x512xf16>>
+      %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<2048x512xf16>>
+      %di = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<2048x512xf16>>
       %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2048, 1024], strides = [1, 1]
           : !flow.dispatch.tensor<readonly:tensor<2048x1024xf16>> -> tensor<2048x1024xf16>
       %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [1024, 512], strides = [1, 1]
@@ -568,12 +556,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_cuda_nvptx_fb = #hal.executable.target<"cuda", "cuda-nvptx-fb">
 #map0 = affine_map<()[s0, s1] -> (s0 * s1)>
@@ -597,11 +583,11 @@
           %c4 = arith.constant 4 : index
           %cst = arith.constant 0.000000e+00 : f32
           %c0 = arith.constant 0 : index
-          %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32) offset(%c0)
+          %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32) offset(%c0)
               : !flow.dispatch.tensor<readonly:tensor<4x32x1024xf32>>
-          %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(32) offset(%c0)
+          %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(32) offset(%c0)
               : !flow.dispatch.tensor<readonly:tensor<4x1024x64xf32>>
-          %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(32) offset(%c0)
+          %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(32) offset(%c0)
               : !flow.dispatch.tensor<writeonly:tensor<4x32x64xf32>>
           %11 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [4, 32, 1024], strides = [1, 1, 1]
               : !flow.dispatch.tensor<readonly:tensor<4x32x1024xf32>> -> tensor<4x32x1024xf32>
@@ -648,12 +634,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_cuda_nvptx_fb = #hal.executable.target<"cuda", "cuda-nvptx-fb">
 #map0 = affine_map<(d0, d1, d2, d3) -> (d1, d0, d3)>
@@ -674,9 +658,9 @@
         func.func @split_k_gemm() {
           %cst = arith.constant 0.000000e+00 : f32
           %c0 = arith.constant 0 : index
-          %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2048x4x256xf32>>
-          %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4x256x512xf32>>
-          %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4x2048x512xf32>>
+          %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2048x4x256xf32>>
+          %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4x256x512xf32>>
+          %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4x2048x512xf32>>
           %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [2048, 4, 256], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2048x4x256xf32>> -> tensor<2048x4x256xf32>
           %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [4, 256, 512], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4x256x512xf32>> -> tensor<4x256x512xf32>
           %5 = tensor.empty() : tensor<4x2048x512xf32>
@@ -718,12 +702,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_cuda_nvptx_fb = #hal.executable.target<"cuda", "cuda-nvptx-fb">
   hal.executable public @pooling_dynamic {
@@ -740,8 +722,8 @@
           %cst = arith.constant 0.000000e+00 : f32
           %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : i32
           %s = arith.index_cast %0 : i32 to index
-          %14 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%s) : !flow.dispatch.tensor<readonly:tensor<?x2048x?x?xf32>>{%s, %s, %s}
-          %15 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%s) : !flow.dispatch.tensor<writeonly:tensor<?x2048x1x1xf32>>{%s}
+          %14 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%s) : !flow.dispatch.tensor<readonly:tensor<?x2048x?x?xf32>>{%s, %s, %s}
+          %15 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%s) : !flow.dispatch.tensor<writeonly:tensor<?x2048x1x1xf32>>{%s}
           %16 = flow.dispatch.tensor.load %14, offsets = [0, 0, 0, 0], sizes = [%s, 2048, %s, %s], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<?x2048x?x?xf32>>{%s, %s, %s} -> tensor<?x2048x?x?xf32>
           %19 = tensor.empty(%s) : tensor<?x2048x1x1xf32>
           %38 = tensor.empty(%s, %s) : tensor<?x?xf32>
@@ -765,11 +747,9 @@
 #map2 = affine_map<(d0)[s0] -> (-d0 + 16384, s0)>
 #map3 = affine_map<(d0, d1) -> (d0, d1)>
 #map4 = affine_map<(d0, d1) -> (d0)>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @warp_reduction_dispatch {
 hal.executable.variant @cuda target(<"cuda", "cuda-nvptx-fb">) {
@@ -783,8 +763,8 @@
       %c0 = arith.constant 0 : index
       %c1024 = arith.constant 1024 : index
       %cst = arith.constant 1.000000e+00 : f32
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<512x1024xf32>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<512xf32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<512x1024xf32>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<512xf32>>
       %5 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [512, 1024], strides = [1, 1]
           : !flow.dispatch.tensor<readonly:tensor<512x1024xf32>> -> tensor<512x1024xf32>
       %8 = tensor.empty() : tensor<512xf32>
@@ -818,11 +798,9 @@
 #map0 = affine_map<()[s0, s1] -> (s0 * s1)>
 #map3 = affine_map<(d0, d1) -> (d0, d1)>
 #map4 = affine_map<(d0, d1) -> (d0)>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @warp_reduction_broadcast_dispatch {
 hal.executable.variant @cuda target(<"cuda", "cuda-nvptx-fb">) {
@@ -837,8 +815,8 @@
       %c1024 = arith.constant 1024 : index
       %cst_0 = arith.constant 3.840000e+02 : f32
       %cst = arith.constant 1.000000e+00 : f32
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<512x1024xf32>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<512x1024xf32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<512x1024xf32>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<512x1024xf32>>
       %5 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [512, 1024], strides = [1, 1]
           : !flow.dispatch.tensor<readonly:tensor<512x1024xf32>> -> tensor<512x1024xf32>
       %8 = tensor.empty() : tensor<512xf32>
@@ -879,15 +857,13 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @shared_mem_alloc {
   hal.executable.variant public @cuda_nvptx_fb target(<"cuda", "cuda-nvptx-fb">) {
-    hal.executable.export public @shared_mem_alloc ordinal(0) layout(#hal.pipeline.layout<push_constants = 0, sets = [<0, bindings = [<0, storage_buffer>, <1, storage_buffer>]>]>) {
+    hal.executable.export public @shared_mem_alloc ordinal(0) layout(#pipeline_layout) {
     ^bb0(%arg0: !hal.device, %arg1: index, %arg2: index, %arg3: index, %arg4: index, %arg5: index):
       %x, %y, %z = flow.dispatch.workgroup_count_from_dag_root %arg1, %arg2, %arg3, %arg4, %arg5
       hal.return %x, %y, %z : index, index, index
@@ -896,8 +872,8 @@
       func.func @shared_mem_alloc() {
         %c0 = arith.constant 0 : index
         %cst = arith.constant dense<0xFF800000> : tensor<14x14x480xf32>
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<29x29x480xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<14x14x480xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<29x29x480xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<14x14x480xf32>>
         %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [29, 29, 480], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<29x29x480xf32>> -> tensor<29x29x480xf32>
         %3 = tensor.empty() : tensor<3x3xf32>
         %4 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2, d3, d4) -> (d0 * 2 + d3, d1 * 2 + d4, d2)>, affine_map<(d0, d1, d2, d3, d4) -> (d3, d4)>, affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d2)>], iterator_types = ["parallel", "parallel", "parallel", "reduction", "reduction"]} ins(%2, %3 : tensor<29x29x480xf32>, tensor<3x3xf32>) outs(%cst : tensor<14x14x480xf32>) {
@@ -928,11 +904,9 @@
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[32,32]]>
 #executable_target_cuda_nvptx_fb = #hal.executable.target<"cuda", "cuda-nvptx-fb">
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map0 = affine_map<(d0, d1) -> (d1, d0)>
 #map1 = affine_map<(d0, d1) -> (d0, d1)>
@@ -946,8 +920,8 @@
     builtin.module {
         func.func @shared_mem_transpose() {
           %c0 = arith.constant 0 : index
-          %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2048x768xf32>>
-          %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<768x2048xf32>>
+          %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2048x768xf32>>
+          %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<768x2048xf32>>
           %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2048, 768], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2048x768xf32>> -> tensor<2048x768xf32>
           %3 = tensor.empty() : tensor<768x2048xf32>
           %4 = linalg.generic {indexing_maps = [#map0, #map1], iterator_types = ["parallel", "parallel"]} ins(%2 : tensor<2048x768xf32>) outs(%3 : tensor<768x2048xf32>) {
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/pack_pipeline_test.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/pack_pipeline_test.mlir
index bb741ac..bb7722c 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/pack_pipeline_test.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/pack_pipeline_test.mlir
@@ -1,15 +1,13 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=sm_60 --pass-pipeline="builtin.module(iree-llvmgpu-select-lowering-strategy, func.func(iree-llvmgpu-lower-executable-target))" %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @static_pack() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128x256xi32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4x16x16x32xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128x256xi32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4x16x16x32xi32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [128, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<128x256xi32>> -> tensor<128x256xi32>
   %3 = tensor.empty() : tensor<4x16x16x32xi32>
   %pack = tensor.pack %2 inner_dims_pos = [1, 0] inner_tiles = [16, 32] into %3 : tensor<128x256xi32> -> tensor<4x16x16x32xi32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/promote_matmul_to_fit_mma.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/promote_matmul_to_fit_mma.mlir
index 45eb7ad..e2ef4ee 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/promote_matmul_to_fit_mma.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/promote_matmul_to_fit_mma.mlir
@@ -1,12 +1,10 @@
 // RUN: iree-opt --split-input-file --pass-pipeline="builtin.module(func.func(iree-llvmgpu-promote-matmul-to-fit-mma{target-dimensions=parallel}))"  %s | FileCheck %s --check-prefixes=ALL,PARALLEL
 // RUN: iree-opt --split-input-file --pass-pipeline="builtin.module(func.func(iree-llvmgpu-promote-matmul-to-fit-mma{target-dimensions=reduction}))" %s | FileCheck %s --check-prefixes=ALL,REDUCTION
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<()[s0] -> (s0 * 64)>
 #map1 = affine_map<()[s0] -> (s0 * 128)>
@@ -17,9 +15,9 @@
 func.func @batch_matmul_f16() {
   %cst = arith.constant 0.000000e+00 : f16
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x968x1281xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x1281x1281xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64x968x1281xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x968x1281xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x1281x1281xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64x968x1281xf16>>
   %workgroup_id_z = hal.interface.workgroup.id[2] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
   %3 = affine.apply #map()[%workgroup_id_y]
@@ -36,9 +34,9 @@
   return
 }
 // ALL-LABEL:     func.func @batch_matmul_f16
-// ALL:             %[[LHS_HANDLE:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x968x1281xf16>>
-// ALL:             %[[RHS_HANDLE:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x1281x1281xf16>>
-// ALL:             %[[OUT_HANDLE:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64x968x1281xf16>>
+// ALL:             %[[LHS_HANDLE:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x968x1281xf16>>
+// ALL:             %[[RHS_HANDLE:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x1281x1281xf16>>
+// ALL:             %[[OUT_HANDLE:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64x968x1281xf16>>
 // ALL-DAG:         %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_HANDLE]]
 // ALL-DAG:         %[[RHS:.+]] = flow.dispatch.tensor.load %[[RHS_HANDLE]]
 // PARALLEL:        %[[PADDED_LHS:.+]] = tensor.pad %[[LHS]]
@@ -67,12 +65,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<()[s0] -> (s0 * 64)>
 #map1 = affine_map<()[s0] -> (s0 * 128)>
@@ -88,9 +84,9 @@
   %c1 = arith.constant 1 : index
   %cst = arith.constant 0.000000e+00 : f16
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x968x1281xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x1281x1281xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64x968x1281xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x968x1281xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x1281x1281xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64x968x1281xf16>>
   %workgroup_id_z = hal.interface.workgroup.id[2] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
   %3 = affine.apply #map()[%workgroup_id_y]
@@ -128,9 +124,9 @@
 // The padding on parallel dims is a nop because they are already padded. Skip
 // the check for the testcase.
 // ALL-LABEL:     func.func @batch_matmul_pad_reduction_after_tiling
-// ALL:             %[[LHS_HANDLE:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x968x1281xf16>>
-// ALL:             %[[RHS_HANDLE:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x1281x1281xf16>>
-// ALL:             %[[OUT_HANDLE:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64x968x1281xf16>>
+// ALL:             %[[LHS_HANDLE:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x968x1281xf16>>
+// ALL:             %[[RHS_HANDLE:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<64x1281x1281xf16>>
+// ALL:             %[[OUT_HANDLE:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64x968x1281xf16>>
 // ALL-DAG:         %[[LHS:.+]] = flow.dispatch.tensor.load %[[LHS_HANDLE]]
 // ALL-DAG:         %[[RHS:.+]] = flow.dispatch.tensor.load %[[RHS_HANDLE]]
 // REDUCTION:       %[[INIT:.+]] = tensor.empty() : tensor<1x64x128xf16>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/reduction_pipeline_cuda.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/reduction_pipeline_cuda.mlir
index bfc69ed..cbab841 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/reduction_pipeline_cuda.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/reduction_pipeline_cuda.mlir
@@ -1,10 +1,8 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=sm_60 --iree-codegen-llvmgpu-enable-transform-dialect-jit=true --pass-pipeline="builtin.module(hal.executable(hal.executable.variant(builtin.module(func.func(iree-codegen-decompose-softmax), iree-llvmgpu-select-lowering-strategy, iree-codegen-lower-executable-using-transform-dialect, func.func(iree-llvmgpu-lower-executable-target)))))" %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @warp_reduction_dispatch {
 hal.executable.variant @cuda target(<"cuda", "cuda-nvptx-fb">) {
@@ -18,8 +16,8 @@
       %c0 = arith.constant 0 : index
       %c10240 = arith.constant 10240 : index
       %cst = arith.constant 1.000000e+00 : f32
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<512x10240xf32>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<512xf32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<512x10240xf32>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<512xf32>>
       %5 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [512, 10240], strides = [1, 1]
           : !flow.dispatch.tensor<readonly:tensor<512x10240xf32>> -> tensor<512x10240xf32>
       %8 = tensor.empty() : tensor<512xf32>
@@ -103,11 +101,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @warp_reduction_broadcast_dispatch {
 hal.executable.variant @cuda target(<"cuda", "cuda-nvptx-fb">) {
@@ -122,8 +118,8 @@
       %c10240 = arith.constant 10240 : index
       %cst_0 = arith.constant 3.840000e+02 : f32
       %cst = arith.constant 1.000000e+00 : f32
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<512x10240xf32>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<512x10240xf32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<512x10240xf32>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<512x10240xf32>>
       %5 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [512, 1024], strides = [1, 1]
           : !flow.dispatch.tensor<readonly:tensor<512x10240xf32>> -> tensor<512x10240xf32>
       %8 = tensor.empty() : tensor<512xf32>
@@ -196,11 +192,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @softmax {
 hal.executable.variant @cuda target(<"cuda", "cuda-nvptx-fb">) {
@@ -215,8 +209,8 @@
       %cst = arith.constant -3.40282347E+38 : f32
       %cst_0 = arith.constant 0.000000e+00 : f32
       %cst_1 = arith.constant 1.000000e+00 : f32
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<12x128x40960xf32>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<12x128x40960xf32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<12x128x40960xf32>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<12x128x40960xf32>>
       %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [12, 128, 40960], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<12x128x40960xf32>> -> tensor<12x128x40960xf32>
       %3 = tensor.empty() : tensor<12x128x40960xf32>
       %4 = linalg.softmax dimension(2) ins(%2 : tensor<12x128x40960xf32>) outs(%3 : tensor<12x128x40960xf32>) -> tensor<12x128x40960xf32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/reduction_pipeline_rocm.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/reduction_pipeline_rocm.mlir
index b890571..c46f738 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/reduction_pipeline_rocm.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/reduction_pipeline_rocm.mlir
@@ -1,19 +1,17 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=gfx1100 --pass-pipeline="builtin.module(func.func(iree-codegen-decompose-softmax), iree-llvmgpu-select-lowering-strategy,  func.func(iree-llvmgpu-lower-executable-target))" %s | FileCheck %s
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=gfx940 --pass-pipeline="builtin.module(func.func(iree-codegen-decompose-softmax), iree-llvmgpu-select-lowering-strategy,  func.func(iree-llvmgpu-lower-executable-target))" %s | FileCheck %s --check-prefix=CDNA3
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @softmax() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant -3.40282347E+38 : f32
   %cst_0 = arith.constant 0.000000e+00 : f32
   %cst_1 = arith.constant 1.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<12x128x40960xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<12x128x40960xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<12x128x40960xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<12x128x40960xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [12, 128, 40960], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<12x128x40960xf32>> -> tensor<12x128x40960xf32>
   %3 = tensor.empty() : tensor<12x128x40960xf32>
   %4 = linalg.softmax dimension(2) ins(%2 : tensor<12x128x40960xf32>) outs(%3 : tensor<12x128x40960xf32>) -> tensor<12x128x40960xf32>
@@ -28,19 +26,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @softmax() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant -3.40282347E+38 : f32
   %cst_0 = arith.constant 0.000000e+00 : f32
   %cst_1 = arith.constant 1.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<12x128x40960xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<12x128x40960xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<12x128x40960xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<12x128x40960xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [12, 128, 40960], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<12x128x40960xf32>> -> tensor<12x128x40960xf32>
   %3 = tensor.empty() : tensor<12x128x40960xf32>
   %4 = linalg.softmax dimension(2) ins(%2 : tensor<12x128x40960xf32>) outs(%3 : tensor<12x128x40960xf32>) -> tensor<12x128x40960xf32>
@@ -57,11 +53,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @dynamic_softmax() {
   %c32_i64 = arith.constant 32 : i64
@@ -74,8 +68,8 @@
   %5 = arith.ori %2, %4 : i64
   %6 = arith.index_castui %5 : i64 to index
   %7 = flow.dispatch.workload.ordinal %6, 0 : index
-  %8 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x?xf16>>{%7}
-  %9 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32x?xf16>>{%7}
+  %8 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x?xf16>>{%7}
+  %9 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32x?xf16>>{%7}
   %10 = flow.dispatch.tensor.load %8, offsets = [0, 0], sizes = [32, %7], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<32x?xf16>>{%7} -> tensor<32x?xf16>
   %11 = tensor.empty(%7) : tensor<32x?xf16>
   %12 = linalg.softmax dimension(1) ins(%10 : tensor<32x?xf16>) outs(%11 : tensor<32x?xf16>) -> tensor<32x?xf16>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/reduction_pipeline_transform_cuda.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/reduction_pipeline_transform_cuda.mlir
index 3e22bd8..e3b16eb 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/reduction_pipeline_transform_cuda.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/reduction_pipeline_transform_cuda.mlir
@@ -1,10 +1,8 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=sm_60 --iree-codegen-llvmgpu-enable-transform-dialect-jit=true --pass-pipeline="builtin.module(hal.executable(hal.executable.variant(builtin.module(iree-llvmgpu-select-lowering-strategy, iree-codegen-lower-executable-using-transform-dialect, func.func(iree-llvmgpu-lower-executable-target)))))" %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @small_reduction {
 hal.executable.variant public @cuda_nvptx_fb target(<"cuda", "cuda-nvptx-fb">) {
@@ -17,8 +15,8 @@
     func.func @small_reduction() {
       %c0 = arith.constant 0 : index
       %cst = arith.constant -0.000000e+00 : f32
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1024x13xf32>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024xf32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1024x13xf32>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1024xf32>>
       %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1024, 13], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1024x13xf32>> -> tensor<1024x13xf32>
       %3 = tensor.empty() : tensor<1024xf32>
       %4 = linalg.fill ins(%cst : f32) outs(%3 : tensor<1024xf32>) -> tensor<1024xf32>
@@ -52,11 +50,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @group_reduction {
 hal.executable.variant public @cuda_nvptx_fb target(<"cuda", "cuda-nvptx-fb">) {
@@ -69,8 +65,8 @@
     func.func @group_reduction() {
       %c0 = arith.constant 0 : index
       %cst = arith.constant -0.000000e+00 : f32
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<8x64xf32>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8xf32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<8x64xf32>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8xf32>>
       %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [8, 64], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<8x64xf32>> -> tensor<8x64xf32>
       %3 = tensor.empty() : tensor<8xf32>
       %4 = linalg.fill ins(%cst : f32) outs(%3 : tensor<8xf32>) -> tensor<8xf32>
@@ -121,11 +117,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @group_elementwise_reduction_elementwise {
 hal.executable.variant public @cuda_nvptx_fb target(<"cuda", "cuda-nvptx-fb">) {
@@ -138,8 +132,8 @@
     func.func @group_elementwise_reduction_elementwise() {
       %c0 = arith.constant 0 : index
       %cst = arith.constant -0.000000e+00 : f32
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<8x64xf32>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8xf32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<8x64xf32>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8xf32>>
       %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [8, 64], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<8x64xf32>> -> tensor<8x64xf32>
       %3 = tensor.empty() : tensor<8xf32>
       %4 = linalg.fill ins(%cst : f32) outs(%3 : tensor<8xf32>) -> tensor<8xf32>
@@ -198,11 +192,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @group_reduction_larger {
 hal.executable.variant public @cuda_nvptx_fb target(<"cuda", "cuda-nvptx-fb">) {
@@ -215,8 +207,8 @@
     func.func @group_reduction_larger() {
       %c0 = arith.constant 0 : index
       %cst = arith.constant -0.000000e+00 : f32
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<33x1024xf32>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<33xf32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<33x1024xf32>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<33xf32>>
       %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [33, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<33x1024xf32>> -> tensor<33x1024xf32>
       %3 = tensor.empty() : tensor<33xf32>
       %4 = linalg.fill ins(%cst : f32) outs(%3 : tensor<33xf32>) -> tensor<33xf32>
@@ -268,11 +260,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @group_reduction_1d {
 hal.executable.variant public @cuda_nvptx_fb target(<"cuda", "cuda-nvptx-fb">) {
@@ -285,8 +275,8 @@
     func.func @group_reduction_1d() {
       %c0 = arith.constant 0 : index
       %cst = arith.constant -0.000000e+00 : f32
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<64xf32>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<f32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<64xf32>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<f32>>
       %2 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [64], strides = [1] : !flow.dispatch.tensor<readonly:tensor<64xf32>> -> tensor<64xf32>
       %3 = tensor.empty() : tensor<f32>
       %4 = linalg.fill ins(%cst : f32) outs(%3 : tensor<f32>) -> tensor<f32>
@@ -307,11 +297,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @group_elementwise_reduction_elementwise_4d {
 hal.executable.variant public @cuda_nvptx_fb target(<"cuda", "cuda-nvptx-fb">) {
@@ -324,8 +312,8 @@
     func.func @group_elementwise_reduction_elementwise_4d() {
       %c0 = arith.constant 0 : index
       %cst = arith.constant -0.000000e+00 : f32
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x4x8x64xf32>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x4x8xf32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x4x8x64xf32>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x4x8xf32>>
       %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [2, 4, 8, 64], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2x4x8x64xf32>> -> tensor<2x4x8x64xf32>
       %3 = tensor.empty() : tensor<2x4x8xf32>
       %4 = linalg.fill ins(%cst : f32) outs(%3 : tensor<2x4x8xf32>) -> tensor<2x4x8xf32>
@@ -355,11 +343,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @group_reduction_i8_12345 {
 hal.executable.variant public @cuda_nvptx_fb target(<"cuda", "cuda-nvptx-fb">) {
@@ -372,8 +358,8 @@
     func.func @group_reduction_i8_12345() {
       %c0 = arith.constant 0 : index
       %cst = arith.constant 0 : i8
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<8x12345xi8>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8x12345xi8>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<8x12345xi8>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8x12345xi8>>
       %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [8, 12345], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<8x12345xi8>> -> tensor<8x12345xi8>
       %3 = tensor.empty() : tensor<8x12345xi8>
       %4 = tensor.empty() : tensor<8xi8>
@@ -440,11 +426,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_cuda_nvptx_fb = #hal.executable.target<"cuda", "cuda-nvptx-fb">
 #map = affine_map<(d0, d1) -> (d0, d1)>
@@ -460,8 +444,8 @@
       func.func @reduction_2d_trailing_elementwise_static_dispatch_0_generic_128x10_f32() {
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f32
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128x10xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x10xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128x10xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x10xf32>>
         %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [128, 10], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<128x10xf32>> -> tensor<128x10xf32>
         %3 = tensor.empty() : tensor<128x10xf32>
         %4 = tensor.empty() : tensor<128xf32>
@@ -509,14 +493,12 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>,
-    #hal.descriptor_set.binding<4, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @i4_dequant_matvec {
   hal.executable.variant public @cuda_nvptx_fb target(<"cuda", "cuda-nvptx-fb">) {
@@ -529,11 +511,11 @@
       func.func @i4_dequant_matvec() {
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f16
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x32x128xi4>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x32xf16>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x32xf16>>
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x128xf16>>
-        %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(4) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4096xf16>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x32x128xi4>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x32xf16>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x32xf16>>
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x128xf16>>
+        %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(4) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4096xf16>>
         %5 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [4096, 32, 128], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x32x128xi4>> -> tensor<4096x32x128xi4>
         %6 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [4096, 32], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x32xf16>> -> tensor<4096x32xf16>
         %7 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [4096, 32], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x32xf16>> -> tensor<4096x32xf16>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/reduction_pipeline_transform_rocm.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/reduction_pipeline_transform_rocm.mlir
index cfa16f7..fea7846 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/reduction_pipeline_transform_rocm.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/reduction_pipeline_transform_rocm.mlir
@@ -5,11 +5,9 @@
 // RUN:  --pass-pipeline="builtin.module(hal.executable(hal.executable.variant(builtin.module(iree-llvmgpu-select-lowering-strategy, func.func(iree-llvmgpu-lower-executable-target)))))" \
 // RUN:  %s | FileCheck %s --check-prefix=CDNA3
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @group_reduction_1d {
 hal.executable.variant public @rocm_hsaco_fb target(<"rocm", "rocm-hsaco-fb">) {
@@ -22,8 +20,8 @@
     func.func @group_reduction_1d() {
       %c0 = arith.constant 0 : index
       %cst = arith.constant -0.000000e+00 : f32
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<64xf32>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<f32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<64xf32>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<f32>>
       %2 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [64], strides = [1] : !flow.dispatch.tensor<readonly:tensor<64xf32>> -> tensor<64xf32>
       %3 = tensor.empty() : tensor<f32>
       %4 = linalg.fill ins(%cst : f32) outs(%3 : tensor<f32>) -> tensor<f32>
@@ -46,11 +44,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @group_reduction_1d {
 hal.executable.variant public @rocm_hsaco_fb target(<"rocm", "rocm-hsaco-fb">) {
@@ -63,8 +59,8 @@
     func.func @group_reduction_1d() {
       %c0 = arith.constant 0 : index
       %cst = arith.constant -0.000000e+00 : f32
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<64xf32>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<f32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<64xf32>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<f32>>
       %2 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [64], strides = [1] : !flow.dispatch.tensor<readonly:tensor<64xf32>> -> tensor<64xf32>
       %3 = tensor.empty() : tensor<f32>
       %4 = linalg.fill ins(%cst : f32) outs(%3 : tensor<f32>) -> tensor<f32>
@@ -88,14 +84,12 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>,
-    #hal.descriptor_set.binding<4, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @i4_dequant_matvec {
   hal.executable.variant public @rocm_hsaco_fb target(<"rocm", "rocm-hsaco-fb">) {
@@ -108,11 +102,11 @@
       func.func @i4_dequant_matvec() {
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f16
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x32x128xi4>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x32xf16>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x32xf16>>
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x128xf16>>
-        %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(4) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4096xf16>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x32x128xi4>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x32xf16>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x32xf16>>
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x128xf16>>
+        %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(4) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4096xf16>>
         %5 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [4096, 32, 128], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x32x128xi4>> -> tensor<4096x32x128xi4>
         %6 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [4096, 32], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x32xf16>> -> tensor<4096x32xf16>
         %7 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [4096, 32], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x32xf16>> -> tensor<4096x32xf16>
@@ -165,14 +159,12 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>,
-    #hal.descriptor_set.binding<4, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @i4_dequant_matvec {
   hal.executable.variant public @rocm_hsaco_fb target(<"rocm", "rocm-hsaco-fb">) {
@@ -185,11 +177,11 @@
       func.func @i4_dequant_matvec() {
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f16
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x32x128xi4>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x32xf16>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x32xf16>>
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x128xf16>>
-        %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(4) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4096xf16>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x32x128xi4>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x32xf16>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x32xf16>>
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x128xf16>>
+        %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(4) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4096xf16>>
         %5 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [4096, 32, 128], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x32x128xi4>> -> tensor<4096x32x128xi4>
         %6 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [4096, 32], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x32xf16>> -> tensor<4096x32xf16>
         %7 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [4096, 32], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x32xf16>> -> tensor<4096x32xf16>
@@ -224,12 +216,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @matvec_fp16 {
   hal.executable.variant public @rocm_hsaco_fb target(<"rocm", "rocm-hsaco-fb">) {
@@ -242,9 +232,9 @@
       func.func @matvec_fp16() {
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f16
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x4096xf16>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32000x4096xf16>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x32000xf16>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x4096xf16>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32000x4096xf16>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x32000xf16>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1, 4096], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1x4096xf16>> -> tensor<1x4096xf16>
         %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [32000, 4096], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<32000x4096xf16>> -> tensor<32000x4096xf16>
         %5 = tensor.empty() : tensor<1x32000xf16>
@@ -287,12 +277,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @matvec_fp16 {
   hal.executable.variant public @rocm_hsaco_fb target(<"rocm", "rocm-hsaco-fb">) {
@@ -305,9 +293,9 @@
       func.func @matvec_fp16() {
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f16
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x4096xf16>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32000x4096xf16>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x32000xf16>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x4096xf16>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32000x4096xf16>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x32000xf16>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1, 4096], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1x4096xf16>> -> tensor<1x4096xf16>
         %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [32000, 4096], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<32000x4096xf16>> -> tensor<32000x4096xf16>
         %5 = tensor.empty() : tensor<1x32000xf16>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/rocdl_pipeline_test.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/rocdl_pipeline_test.mlir
index 6a060af..7736a40 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/rocdl_pipeline_test.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/rocdl_pipeline_test.mlir
@@ -5,12 +5,10 @@
 // Verify that a simple element wise op gets lowered succefully all the way to
 // nvvm/llvm dialect.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @simpleMath_ex_dispatch_0 {
   hal.executable.variant @rocm target(<"rocm", "rocm-hsaco-fb">) {
@@ -22,9 +20,9 @@
   builtin.module {
     func.func @add_dispatch_0() {
       %c0 = arith.constant 0 : index
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<16xf32>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<16xf32>>
-      %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<16xf32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<16xf32>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<16xf32>>
+      %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<16xf32>>
       %3 = tensor.empty() : tensor<16xf32>
       %4 = flow.dispatch.tensor.load %0, offsets=[0], sizes=[16], strides=[1] : !flow.dispatch.tensor<readonly:tensor<16xf32>> -> tensor<16xf32>
       %5 = flow.dispatch.tensor.load %1, offsets=[0], sizes=[16], strides=[1] : !flow.dispatch.tensor<readonly:tensor<16xf32>> -> tensor<16xf32>
@@ -49,12 +47,10 @@
 #map0 = affine_map<()[s0, s1] -> (s0 * s1)>
 #map1 = affine_map<(d0)[s0] -> (s0, -d0 + 1024)>
 #map2 = affine_map<(d0)[s0] -> (-d0 + 1024, s0)>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @dot_dispatch_0 {
   hal.executable.variant @rocm target(<"rocm", "rocm-hsaco-fb">) {
@@ -69,9 +65,9 @@
         %c0 = arith.constant 0 : index
         %c1024 = arith.constant 1024 : index
         %c1 = arith.constant 1 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1024x1024xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<1024x1024xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1024x1024xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1024x1024xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<1024x1024xf32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1024x1024xf32>>
         %8 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1024, 1024], strides = [1, 1]
             : !flow.dispatch.tensor<readonly:tensor<1024x1024xf32>> -> tensor<1024x1024xf32>
         %10 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [1024, 1024], strides = [1, 1]
@@ -106,12 +102,10 @@
 // -----
 
 #map = affine_map<(d0) -> (d0)>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @ext_fp8_dispatch {
   hal.executable.variant @rocm_hsaco_fb target(<"rocm", "rocm-hsaco-fb">) {
@@ -123,9 +117,9 @@
     builtin.module {
       func.func @ext_fp8_dispatch() {
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096xf8E4M3FNUZ>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096xf8E5M2FNUZ>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4096xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096xf8E4M3FNUZ>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096xf8E5M2FNUZ>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4096xf32>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [4096], strides = [1] : !flow.dispatch.tensor<readonly:tensor<4096xf8E4M3FNUZ>> -> tensor<4096xf8E4M3FNUZ>
         %4 = flow.dispatch.tensor.load %1, offsets = [0], sizes = [4096], strides = [1] : !flow.dispatch.tensor<readonly:tensor<4096xf8E5M2FNUZ>> -> tensor<4096xf8E5M2FNUZ>
         %5 = tensor.empty() : tensor<4096xf32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/set_transform_strategy_batch_matmul.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/set_transform_strategy_batch_matmul.mlir
index 186fcf5..f1ced7b 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/set_transform_strategy_batch_matmul.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/set_transform_strategy_batch_matmul.mlir
@@ -14,12 +14,10 @@
 // RUN: -td-matmul-strategy-use-fma=true \
 // RUN:   | FileCheck %s --check-prefixes=CHECK,OPTIONS
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>
 #map1 = affine_map<(d0, d1, d2, d3) -> (d0, d3, d2)>
@@ -27,9 +25,9 @@
 func.func @batch_matmul_dispatch_0_generic_128x80x320x32_f32() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128x80x32xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128x32x320xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x80x320xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128x80x32xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<128x32x320xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x80x320xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [128, 80, 32], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<128x80x32xf32>> -> tensor<128x80x32xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [128, 32, 320], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<128x32x320xf32>> -> tensor<128x32x320xf32>
   %5 = tensor.empty() : tensor<128x80x320xf32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/set_transform_strategy_convolution.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/set_transform_strategy_convolution.mlir
index 4c0ecc9..445a64c 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/set_transform_strategy_convolution.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/set_transform_strategy_convolution.mlir
@@ -1,19 +1,17 @@
 // RUN: iree-opt %s --split-input-file --iree-codegen-llvmgpu-enable-transform-dialect-jit= --pass-pipeline="builtin.module(iree-llvmgpu-select-lowering-strategy)" \
 // RUN:  --iree-gpu-test-target=sm_80 --iree-codegen-llvmgpu-enable-transform-dialect-implicit-gemm-strategy | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @nchw_convolution() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<8x128x258x258xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x128x3x3xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8x256x256x256xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<8x128x258x258xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<256x128x3x3xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8x256x256x256xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [8, 128, 258, 258], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<8x128x258x258xf32>> -> tensor<8x128x258x258xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [256, 128, 3, 3], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<256x128x3x3xf32>> -> tensor<256x128x3x3xf32>
   %5 = tensor.empty() : tensor<8x256x256x256xf32>
@@ -67,19 +65,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @nhwc_convolution() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<8x258x258x128xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<3x3x128x256xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8x256x256x256xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<8x258x258x128xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<3x3x128x256xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8x256x256x256xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [8, 258, 258, 128], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<8x258x258x128xf32>> -> tensor<8x258x258x128xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [3, 3, 128, 256], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x128x256xf32>> -> tensor<3x3x128x256xf32>
   %5 = tensor.empty() : tensor<8x256x256x256xf32>
@@ -107,19 +103,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @unaligned_convolution() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<8x258x258x132xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<3x3x132x264xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8x256x256x264xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<8x258x258x132xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<3x3x132x264xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8x256x256x264xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [8, 258, 258, 132], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<8x258x258x132xf32>> -> tensor<8x258x258x132xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [3, 3, 132, 264], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x132x264xf32>> -> tensor<3x3x132x264xf32>
   %5 = tensor.empty() : tensor<8x256x256x264xf32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/set_transform_strategy_matmul.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/set_transform_strategy_matmul.mlir
index f1eecaf..2e41bfe 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/set_transform_strategy_matmul.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/set_transform_strategy_matmul.mlir
@@ -43,19 +43,17 @@
 // RUN: iree-opt %s --split-input-file --iree-codegen-llvmgpu-enable-transform-dialect-jit=true --pass-pipeline="builtin.module(iree-llvmgpu-select-lowering-strategy)" \
 // RUN:   --iree-gpu-test-target=sm_80 --iree-codegen-llvmgpu-enable-transform-dialect-small-matmul | FileCheck --check-prefix=SMALL %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_1() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2052x2556xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2556x2052xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2052x2052xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2052x2556xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2556x2052xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2052x2052xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2052, 2556], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2052x2556xf32>> -> tensor<2052x2556xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [2556, 2052], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2556x2052xf32>> -> tensor<2556x2052xf32>
   %5 = tensor.empty() : tensor<2052x2052xf32>
@@ -205,19 +203,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_2() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2051x2555xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2555x2050xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2051x2050xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2051x2555xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2555x2050xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2051x2050xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2051, 2555], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2051x2555xf32>> -> tensor<2051x2555xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [2555, 2051], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2555x2050xf32>> -> tensor<2555x2050xf32>
   %5 = tensor.empty() : tensor<2051x2050xf32>
@@ -255,19 +251,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_3() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x2556xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2556x2556xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x2556xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x2556xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2556x2556xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x2556xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2048, 2556], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2048x2556xf32>> -> tensor<2048x2556xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [2556, 2556], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2556x2556xf32>> -> tensor<2556x2556xf32>
   %5 = tensor.empty() : tensor<2048x2556xf32>
@@ -287,19 +281,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_4_partially_unaligned() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x2044xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2044x1024xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x1024xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x2044xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2044x1024xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x1024xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2048, 2048], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2048x2044xf32>> -> tensor<2048x2044xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [2048, 2048], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2044x1024xf32>> -> tensor<2044x1024xf32>
   %5 = tensor.empty() : tensor<2048x1024xf32>
@@ -355,19 +347,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @aligned_matmul() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x2048xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x2048xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x2048xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x2048xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x2048xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x2048xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2048, 2048], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2048x2048xf32>> -> tensor<2048x2048xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [2048, 2048], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2048x2048xf32>> -> tensor<2048x2048xf32>
   %5 = tensor.empty() : tensor<2048x2048xf32>
@@ -422,19 +412,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_5_small() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x2044xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2044x1024xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x1024xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x2044xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2044x1024xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x1024xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2, 2044], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2x2044xf32>> -> tensor<2x2044xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [2044, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2044x1024xf32>> -> tensor<2044x1024xf32>
   %5 = tensor.empty() : tensor<2x1024xf32>
@@ -461,19 +449,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @f16_matmul() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2052x2556xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2556x2052xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2052x2052xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2052x2556xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2556x2052xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2052x2052xf16>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2052, 2556], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2052x2556xf16>> -> tensor<2052x2556xf16>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [2556, 2052], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2556x2052xf16>> -> tensor<2556x2052xf16>
   %5 = tensor.empty() : tensor<2052x2052xf16>
@@ -494,19 +480,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @int8_matmul() {
   %c0 = arith.constant 0 : index
   %c0_i8 = arith.constant 0 : i8
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4x2556xi8>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2556x2052xi8>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4x2052xi8>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4x2556xi8>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2556x2052xi8>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4x2052xi8>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [4, 2556], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4x2556xi8>> -> tensor<4x2556xi8>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [2556, 2052], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2556x2052xi8>> -> tensor<2556x2052xi8>
   %5 = tensor.empty() : tensor<4x2052xi8>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/set_transform_strategy_pad.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/set_transform_strategy_pad.mlir
index 76fb368..599ea92 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/set_transform_strategy_pad.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/set_transform_strategy_pad.mlir
@@ -16,18 +16,16 @@
 // RUN:   --td-pad-strategy-use-async-copies=false \
 // RUN: | FileCheck --check-prefix=WITH_OPTIONS %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @pad() {
   %c0 = arith.constant 0 : index
   %c56 = arith.constant 56 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<123x456xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x512xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<123x456xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x512xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [123, 456], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<123x456xf32>> -> tensor<123x456xf32>
   %cst_0 = arith.constant 0.000000e+00 : f32
   %padded = tensor.pad %2 low[%c0, 0] high[5, %c56] {
@@ -98,17 +96,15 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @pad_low() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<123x456xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x512xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<123x456xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x512xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [123, 456], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<123x456xf32>> -> tensor<123x456xf32>
   %cst_0 = arith.constant 0.000000e+00 : f32
   %padded = tensor.pad %2 low[5, 0] high[0, 56] {
@@ -127,17 +123,15 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @pad_local() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<123x456xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x512xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<123x456xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x512xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [123, 456], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<123x456xf32>> -> tensor<123x456xf32>
   %padded = tensor.pad %2 low[0, 0] high[5, 56] {
   ^bb0(%arg0: index, %arg1: index):
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/tensor_pad.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/tensor_pad.mlir
index 01904e1..48fc842 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/tensor_pad.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/tensor_pad.mlir
@@ -1,17 +1,15 @@
 // RUN: iree-opt --split-input-file --pass-pipeline="builtin.module(func.func(iree-llvmgpu-tensor-pad),fold-memref-alias-ops,canonicalize,cse)" %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @transpose_no_align_dispatch_0_generic_48x32() {
   %c48 = arith.constant 48 : index
   %c32 = arith.constant 32 : index
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<32x48xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<48x32xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<32x48xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<48x32xf32>>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -40,8 +38,8 @@
 //       CHECK:  %[[C48:.*]] = arith.constant 48 : index
 //       CHECK:  %[[C32:.*]] = arith.constant 32 : index
 //       CHECK:  %[[C0:.*]] = arith.constant 0 : index
-//       CHECK:  %[[D0:.*]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64) offset(%[[C0]]) : !flow.dispatch.tensor<readonly:tensor<32x48xf32>>
-//       CHECK:  %[[D1:.*]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) alignment(64) offset(%[[C0]]) : !flow.dispatch.tensor<writeonly:tensor<48x32xf32>>
+//       CHECK:  %[[D0:.*]] = hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64) offset(%[[C0]]) : !flow.dispatch.tensor<readonly:tensor<32x48xf32>>
+//       CHECK:  %[[D1:.*]] = hal.interface.binding.subspan layout({{.+}}) binding(1) alignment(64) offset(%[[C0]]) : !flow.dispatch.tensor<writeonly:tensor<48x32xf32>>
 //       CHECK:  %[[WORKGROUP_ID_X:.*]] = hal.interface.workgroup.id[0] : index
 //       CHECK:  %[[WORKGROUP_COUNT_X:.*]] = hal.interface.workgroup.count[0] : index
 //       CHECK:  %[[WORKGROUP_ID_Y:.*]] = hal.interface.workgroup.id[1] : index
@@ -73,11 +71,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 4, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 4, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<()[s0] -> (s0 * 16)>
 #map1 = affine_map<(d0)[s0] -> (-d0 + s0, 16)>
@@ -94,8 +90,8 @@
   %5 = arith.index_castui %1 : i32 to index
   %6 = arith.index_castui %2 : i32 to index
   %7 = arith.index_castui %3 : i32 to index
-  %8 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c64) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x?x2x2xi32>>{%4, %5}
-  %9 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<?x?xi32>>{%6, %7}
+  %8 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c64) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x?x2x2xi32>>{%4, %5}
+  %9 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<?x?xi32>>{%6, %7}
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -125,7 +121,7 @@
   return
 }
 // CHECK-LABEL: func.func @unpack_dynamic
-// CHECK:         %[[DEST_BUF:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+// CHECK:         %[[DEST_BUF:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 // CHECK:           %[[LOAD:.+]] = flow.dispatch.tensor.load %[[DEST_BUF]]
 // CHECK:           %[[PAD:.+]] = tensor.pad %[[LOAD]]
 // CHECK:           %[[UNPACK:.+]] = tensor.unpack {{.+}} into %[[PAD]]
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/tensorcore_vectorization.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/tensorcore_vectorization.mlir
index edc882b..ba696c7 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/tensorcore_vectorization.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/tensorcore_vectorization.mlir
@@ -1,20 +1,18 @@
 // RUN: iree-opt --pass-pipeline="builtin.module(func.func(iree-llvmgpu-tensorcore-vectorization))" %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @dot() {
   %c16 = arith.constant 16 : index
   %c1024 = arith.constant 1024 : index
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<2048x1024xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<1024x512xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<2048x512xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<2048x1024xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<1024x512xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<2048x512xf32>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
   %3 = affine.apply affine_map<()[s0] -> (s0 * 64)>()[%workgroup_id_y]
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/transform_dialect_bufferize.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/transform_dialect_bufferize.mlir
index 25e19a8..5354ca0 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/transform_dialect_bufferize.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/transform_dialect_bufferize.mlir
@@ -1,17 +1,15 @@
 // RUN: iree-opt %s -iree-transform-dialect-interpreter -transform-dialect-drop-schedule | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @pad_matmul_static_dispatch_0() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<250x500xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<500x1020xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<250x1020xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<250x500xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<500x1020xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<250x1020xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [250, 500], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<250x500xf32>> -> tensor<250x500xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [500, 1020], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<500x1020xf32>> -> tensor<500x1020xf32>
 
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/transform_dialect_promote_operands.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/transform_dialect_promote_operands.mlir
index 024c901..4b4a465 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/transform_dialect_promote_operands.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/transform_dialect_promote_operands.mlir
@@ -1,17 +1,15 @@
 // RUN: iree-opt %s -iree-transform-dialect-interpreter -transform-dialect-drop-schedule | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @pad_matmul_static_dispatch_0  {
   builtin.module {
     func.func @pad_matmul_static_dispatch_0(%arg0: tensor<250x500xf32>, %arg1: tensor<500x1020xf32>) -> tensor<250x1020xf32> {
       %c0 = arith.constant 0 : index
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<250x500xf32>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<500x1020xf32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<250x500xf32>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<500x1020xf32>>
       %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [250, 500], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<250x500xf32>> -> tensor<250x500xf32>
       %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [500, 1020], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<500x1020xf32>> -> tensor<500x1020xf32>
 
@@ -19,8 +17,8 @@
       %cst = arith.constant 0.000000e+00 : f32
       %5 = linalg.fill ins(%cst : f32) outs(%50 : tensor<250x1020xf32>) -> tensor<250x1020xf32>
       // CHECK:      %[[CST:.+]] = arith.constant 0.000000e+00 : f32
-      // CHECK:      %[[D0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64)
-      // CHECK:      %[[D1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) alignment(64)
+      // CHECK:      %[[D0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64)
+      // CHECK:      %[[D1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) alignment(64)
       // CHECK:      %[[D2:.+]] = flow.dispatch.tensor.load %[[D0]], offsets = [0, 0], sizes = [250, 500]
       // CHECK:      %[[D3:.+]] = flow.dispatch.tensor.load %[[D1]], offsets = [0, 0], sizes = [500, 1020]
       // CHECK:      %[[D4:.+]] = tensor.empty() : tensor<250x1020xf32>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/transform_dialect_vector_distribution.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/transform_dialect_vector_distribution.mlir
index da19ad2..3e47fe8 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/transform_dialect_vector_distribution.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/transform_dialect_vector_distribution.mlir
@@ -6,16 +6,14 @@
 // RUN: --allow-unregistered-dialect | \
 // RUN: FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #translation_info = #iree_codegen.translation_info<None workgroup_size = [64, 1, 1] subgroup_size = 32>
 func.func @reduce_dispatch_0() attributes {translation_info = #translation_info} {
   %c0 = arith.constant 0 : index
   %c1 = arith.constant 1 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<128xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<128xf32>
   memref.assume_alignment %0, 64 : memref<128xf32>
   %1 = gpu.thread_id  x
   %2 = arith.cmpi ult, %1, %c1 : index
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/transform_distribute_forall.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/transform_distribute_forall.mlir
index 4102664..4056c42 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/transform_distribute_forall.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/transform_distribute_forall.mlir
@@ -1,9 +1,7 @@
 // RUN: iree-opt %s --pass-pipeline="builtin.module(iree-codegen-lower-executable-using-transform-dialect)" | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_cuda_nvptx_fb = #hal.executable.target<"cuda", "cuda-nvptx-fb">
 #translation = #iree_codegen.translation_info<TransformDialectCodegen, { config_test = "config_test" }>
@@ -13,7 +11,7 @@
     %c250 = arith.constant 250 : index
     %c8 = arith.constant 8 : index
     %c0 = arith.constant 0 : index
-    %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<2xf16>
+    %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<2xf16>
     memref.assume_alignment %0, 64 : memref<2xf16>
     %workgroup_id_x = hal.interface.workgroup.id[0] : index
     %subview = memref.subview %0[%workgroup_id_x] [1] [1] : memref<2xf16> to memref<1xf16, strided<[1], offset: ?>>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/transform_gpu_pipelining.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/transform_gpu_pipelining.mlir
index 57f8c82..4975c58 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/transform_gpu_pipelining.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/transform_gpu_pipelining.mlir
@@ -1,11 +1,9 @@
 // RUN: iree-opt %s -iree-transform-dialect-interpreter -transform-dialect-drop-schedule | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @matmul_pipelining  {
 builtin.module {
@@ -21,11 +19,11 @@
   %3 = gpu.thread_id  z
   %4 = memref.alloc() : memref<4x32x40xf16, 3>
   %5 = memref.alloc() : memref<4x32x40xf16, 3>
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<3456x2048xf16>
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<3456x2048xf16>
   memref.assume_alignment %6, 64 : memref<3456x2048xf16>
-  %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<2048x1024xf16>
+  %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<2048x1024xf16>
   memref.assume_alignment %7, 64 : memref<2048x1024xf16>
-  %8 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<3456x1024xf16>
+  %8 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<3456x1024xf16>
   memref.assume_alignment %8, 64 : memref<3456x1024xf16>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/transform_vector_to_mma.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/transform_vector_to_mma.mlir
index 2159e58..9fae267 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/transform_vector_to_mma.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/transform_vector_to_mma.mlir
@@ -1,11 +1,9 @@
 // RUN: iree-opt %s --split-input-file -iree-transform-dialect-interpreter -transform-dialect-drop-schedule | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @matmul  {
 builtin.module {
@@ -17,11 +15,11 @@
   %c16 = arith.constant 16 : index
   %c32 = arith.constant 32 : index
   %cst_0 = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<32x32xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<32x32xf32>
   memref.assume_alignment %0, 64 : memref<32x32xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<32x32xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<32x32xf32>
   memref.assume_alignment %1, 64 : memref<32x32xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<32x32xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<32x32xf32>
   memref.assume_alignment %2, 64 : memref<32x32xf32>
   %3 = gpu.thread_id  x
   %4 = gpu.thread_id  y
@@ -77,12 +75,10 @@
 // -----
 
 // Verify that unrolling does not apply to rank 1 elementwise vector ops.
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @gathered_matmul  {
 builtin.module {
@@ -98,11 +94,11 @@
   %cst_0 = arith.constant 0.000000e+00 : f32
   %cst_1 = arith.constant dense<[0, 1, 2, 3]> : vector<4xindex>
   %cst_2 = arith.constant dense<1> : vector<4x4xindex>
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<32x32xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<32x32xf32>
   memref.assume_alignment %0, 64 : memref<32x32xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<32x32xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<32x32xf32>
   memref.assume_alignment %1, 64 : memref<32x32xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<32x32xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<32x32xf32>
   memref.assume_alignment %2, 64 : memref<32x32xf32>
   %alloc = memref.alloc() {alignment = 64 : i64} : memref<32x32xf32>
   %3 = gpu.thread_id  x
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/transpose_pipeline_test.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/transpose_pipeline_test.mlir
index 09357d9..8aa8774 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/transpose_pipeline_test.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/transpose_pipeline_test.mlir
@@ -1,11 +1,9 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=sm_80 \
 // RUN:   --pass-pipeline="builtin.module(hal.executable(hal.executable.variant(builtin.module(iree-llvmgpu-select-lowering-strategy, func.func(iree-llvmgpu-lower-executable-target, fold-memref-alias-ops, canonicalize, cse)))))" %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_cuda_nvptx_fb = #hal.executable.target<"cuda", "cuda-nvptx-fb">
 hal.executable @transpose_dispatch_0 {
@@ -18,8 +16,8 @@
     builtin.module {
       func.func @transpose_dispatch_0_generic_4096x4096() {
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4096x4096xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4096x4096xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4096x4096xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4096x4096xf32>>
         %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [4096, 4096], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x4096xf32>> -> tensor<4096x4096xf32>
         %3 = tensor.empty() : tensor<4096x4096xf32>
         %4 = linalg.generic {indexing_maps = [ affine_map<(d0, d1) -> (d1, d0)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%2 : tensor<4096x4096xf32>) outs(%3 : tensor<4096x4096xf32>) {
@@ -40,9 +38,9 @@
 //   CHECK-DAG:  %[[D1:.*]] = gpu.thread_id  y
 //   CHECK-DAG:  %[[D2:.*]] = gpu.thread_id  z
 //   CHECK-DAG:  %[[D3:.*]] = memref.alloc() : memref<32x33xf32, #gpu.address_space<workgroup>>
-//       CHECK:  %[[D4:.*]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64) offset(%[[C0]]) : memref<4096x4096xf32, #hal.descriptor_type<storage_buffer>>
+//       CHECK:  %[[D4:.*]] = hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64) offset(%[[C0]]) : memref<4096x4096xf32, #hal.descriptor_type<storage_buffer>>
 //       CHECK:  memref.assume_alignment %[[D4]], 64 : memref<4096x4096xf32, #hal.descriptor_type<storage_buffer>>
-//       CHECK:  %[[D5:.*]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) alignment(64) offset(%[[C0]]) : memref<4096x4096xf32, #hal.descriptor_type<storage_buffer>>
+//       CHECK:  %[[D5:.*]] = hal.interface.binding.subspan layout({{.+}}) binding(1) alignment(64) offset(%[[C0]]) : memref<4096x4096xf32, #hal.descriptor_type<storage_buffer>>
 //       CHECK:  memref.assume_alignment %[[D5]], 64 : memref<4096x4096xf32, #hal.descriptor_type<storage_buffer>>
 //       CHECK:  gpu.barrier
 //       CHECK:  %[[D6:.*]] = affine.apply #{{.*}}()[%{{.*}}, %[[D0]], %[[D1]], %[[D2]]]
@@ -61,12 +59,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_cuda_nvptx_fb = #hal.executable.target<"cuda", "cuda-nvptx-fb">
 hal.executable @transpose_single_operand_dispatch_0_generic_768x2048 {
@@ -79,9 +75,9 @@
     builtin.module {
       func.func @transpose_single_operand_dispatch_0_generic_768x2048() {
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2048x768xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<768x2048xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<768x2048xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2048x768xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<768x2048xf32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<768x2048xf32>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2048, 768], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2048x768xf32>> -> tensor<2048x768xf32>
         %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [768, 2048], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<768x2048xf32>> -> tensor<768x2048xf32>
         %5 = tensor.empty() : tensor<768x2048xf32>
@@ -104,11 +100,11 @@
 //       CHECK:  %[[D1:.*]] = gpu.thread_id  y
 //       CHECK:  %[[D2:.*]] = gpu.thread_id  z
 //       CHECK:  %[[D3:.*]] = memref.alloc() : memref<32x33xf32, #gpu.address_space<workgroup>>
-//       CHECK:  %[[D4:.*]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64) offset(%[[C0]]) : memref<2048x768xf32, #hal.descriptor_type<storage_buffer>>
+//       CHECK:  %[[D4:.*]] = hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64) offset(%[[C0]]) : memref<2048x768xf32, #hal.descriptor_type<storage_buffer>>
 //       CHECK:  memref.assume_alignment %[[D4]], 64 : memref<2048x768xf32, #hal.descriptor_type<storage_buffer>>
-//       CHECK:  %[[D5:.*]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) alignment(64) offset(%[[C0]]) : memref<768x2048xf32, #hal.descriptor_type<storage_buffer>>
+//       CHECK:  %[[D5:.*]] = hal.interface.binding.subspan layout({{.+}}) binding(1) alignment(64) offset(%[[C0]]) : memref<768x2048xf32, #hal.descriptor_type<storage_buffer>>
 //       CHECK:  memref.assume_alignment %[[D5]], 64 : memref<768x2048xf32, #hal.descriptor_type<storage_buffer>>
-//       CHECK:  %[[D6:.*]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) alignment(64) offset(%[[C0]]) : memref<768x2048xf32, #hal.descriptor_type<storage_buffer>>
+//       CHECK:  %[[D6:.*]] = hal.interface.binding.subspan layout({{.+}}) binding(2) alignment(64) offset(%[[C0]]) : memref<768x2048xf32, #hal.descriptor_type<storage_buffer>>
 //       CHECK:  memref.assume_alignment %[[D6]], 64 : memref<768x2048xf32, #hal.descriptor_type<storage_buffer>>
 //       CHECK:  gpu.barrier
 //       CHECK:  %[[D7:.*]] = affine.apply #{{.*}}()[%{{.*}}, %[[D0]], %[[D1]], %[[D2]]]
@@ -129,12 +125,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_cuda_nvptx_fb = #hal.executable.target<"cuda", "cuda-nvptx-fb">
 hal.executable @transpose_3d_no_dispatch_0_generic_768x2048x1024 {
@@ -147,9 +141,9 @@
     builtin.module {
       func.func @transpose_3d_no_dispatch_0_generic_768x2048x1024() {
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2048x768x1024xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<768x2048x1024xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<768x2048x1024xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2048x768x1024xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<768x2048x1024xf32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<768x2048x1024xf32>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [2048, 768, 1024], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2048x768x1024xf32>> -> tensor<2048x768x1024xf32>
         %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [768, 2048, 1024], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<768x2048x1024xf32>> -> tensor<768x2048x1024xf32>
         %5 = tensor.empty() : tensor<768x2048x1024xf32>
@@ -172,12 +166,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_cuda_nvptx_fb = #hal.executable.target<"cuda", "cuda-nvptx-fb">
 hal.executable @transpose_3d_yes_dispatch_0_generic_10x768x2048 {
@@ -190,9 +182,9 @@
     builtin.module {
       func.func @transpose_3d_yes_dispatch_0_generic_10x768x2048() {
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<10x2048x768xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<10x768x2048xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<10x768x2048xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<10x2048x768xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<10x768x2048xf32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<10x768x2048xf32>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [10, 2048, 768], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<10x2048x768xf32>> -> tensor<10x2048x768xf32>
         %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [10, 768, 2048], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<10x768x2048xf32>> -> tensor<10x768x2048xf32>
         %5 = tensor.empty() : tensor<10x768x2048xf32>
@@ -215,11 +207,11 @@
 //       CHECK:   %[[D1:.*]] = gpu.thread_id  y
 //       CHECK:   %[[D2:.*]] = gpu.thread_id  z
 //       CHECK:   %[[D3:.*]] = memref.alloc() : memref<1x32x33xf32, #gpu.address_space<workgroup>>
-//       CHECK:   %[[D4:.*]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64) offset(%[[C0]]) : memref<10x2048x768xf32, #hal.descriptor_type<storage_buffer>>
+//       CHECK:   %[[D4:.*]] = hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64) offset(%[[C0]]) : memref<10x2048x768xf32, #hal.descriptor_type<storage_buffer>>
 //       CHECK:   memref.assume_alignment %[[D4]], 64 : memref<10x2048x768xf32, #hal.descriptor_type<storage_buffer>>
-//       CHECK:   %[[D5:.*]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) alignment(64) offset(%[[C0]]) : memref<10x768x2048xf32, #hal.descriptor_type<storage_buffer>>
+//       CHECK:   %[[D5:.*]] = hal.interface.binding.subspan layout({{.+}}) binding(1) alignment(64) offset(%[[C0]]) : memref<10x768x2048xf32, #hal.descriptor_type<storage_buffer>>
 //       CHECK:   memref.assume_alignment %[[D5]], 64 : memref<10x768x2048xf32, #hal.descriptor_type<storage_buffer>>
-//       CHECK:   %[[D6:.*]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) alignment(64) offset(%[[C0]]) : memref<10x768x2048xf32, #hal.descriptor_type<storage_buffer>>
+//       CHECK:   %[[D6:.*]] = hal.interface.binding.subspan layout({{.+}}) binding(2) alignment(64) offset(%[[C0]]) : memref<10x768x2048xf32, #hal.descriptor_type<storage_buffer>>
 //       CHECK:   memref.assume_alignment %[[D6]], 64 : memref<10x768x2048xf32, #hal.descriptor_type<storage_buffer>>
 //       CHECK:   gpu.barrier
 //       CHECK:   %[[D7:.*]] = affine.apply #{{.*}}()[%{{.*}}, %[[D0]], %[[D1]], %[[D2]]]
@@ -240,12 +232,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_cuda_nvptx_fb = #hal.executable.target<"cuda", "cuda-nvptx-fb">
 hal.executable @transpose_3d_trans_out_dispatch_0_generic_10x2048x768 {
@@ -258,9 +248,9 @@
     builtin.module {
       func.func @transpose_3d_trans_out_dispatch_0_generic_10x2048x768() {
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<10x768x2048xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<10x768x2048xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<10x2048x768xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<10x768x2048xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<10x768x2048xf32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<10x2048x768xf32>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [10, 768, 2048], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<10x768x2048xf32>> -> tensor<10x768x2048xf32>
         %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [10, 768, 2048], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<10x768x2048xf32>> -> tensor<10x768x2048xf32>
         %5 = tensor.empty() : tensor<10x2048x768xf32>
@@ -284,11 +274,11 @@
 //       CHECK:   %[[D2:.*]] = gpu.thread_id  z
 //       CHECK:   %[[D3:.*]] = memref.alloc() : memref<1x32x33xf32, #gpu.address_space<workgroup>>
 //       CHECK:   %[[D4:.*]] = memref.alloc() : memref<1x32x33xf32, #gpu.address_space<workgroup>>
-//       CHECK:   %[[D5:.*]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64) offset(%[[C0]]) : memref<10x768x2048xf32, #hal.descriptor_type<storage_buffer>>
+//       CHECK:   %[[D5:.*]] = hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64) offset(%[[C0]]) : memref<10x768x2048xf32, #hal.descriptor_type<storage_buffer>>
 //       CHECK:   memref.assume_alignment %[[D5]], 64 : memref<10x768x2048xf32, #hal.descriptor_type<storage_buffer>>
-//       CHECK:   %[[D6:.*]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) alignment(64) offset(%[[C0]]) : memref<10x768x2048xf32, #hal.descriptor_type<storage_buffer>>
+//       CHECK:   %[[D6:.*]] = hal.interface.binding.subspan layout({{.+}}) binding(1) alignment(64) offset(%[[C0]]) : memref<10x768x2048xf32, #hal.descriptor_type<storage_buffer>>
 //       CHECK:   memref.assume_alignment %[[D6]], 64 : memref<10x768x2048xf32, #hal.descriptor_type<storage_buffer>>
-//       CHECK:   %[[D7:.*]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) alignment(64) offset(%[[C0]]) : memref<10x2048x768xf32, #hal.descriptor_type<storage_buffer>>
+//       CHECK:   %[[D7:.*]] = hal.interface.binding.subspan layout({{.+}}) binding(2) alignment(64) offset(%[[C0]]) : memref<10x2048x768xf32, #hal.descriptor_type<storage_buffer>>
 //       CHECK:   memref.assume_alignment %[[D7]], 64 : memref<10x2048x768xf32, #hal.descriptor_type<storage_buffer>>
 //       CHECK:   gpu.barrier
 //       CHECK:   %[[D8:.*]] = affine.apply #{{.*}}()[%{{.*}}, %[[D0]], %[[D1]], %[[D2]]]
@@ -311,12 +301,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_cuda_nvptx_fb = #hal.executable.target<"cuda", "cuda-nvptx-fb">
 hal.executable @transpose_3d_diff_dispatch_0_generic_10x768x2048 {
@@ -333,9 +321,9 @@
       %c768 = arith.constant 768 : index
       %c2048 = arith.constant 2048 : index
       %c0 = arith.constant 0 : index
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<10x2048x768xf32>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2048x768x10xf32>>
-      %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<10x768x2048xf32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<10x2048x768xf32>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2048x768x10xf32>>
+      %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<10x768x2048xf32>>
       %workgroup_id_x = hal.interface.workgroup.id[0] : index
       %workgroup_count_x = hal.interface.workgroup.count[0] : index
       %workgroup_id_y = hal.interface.workgroup.id[1] : index
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/ukernel_pipeline_transform.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/ukernel_pipeline_transform.mlir
index f231c8b..857f4bd 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/ukernel_pipeline_transform.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/ukernel_pipeline_transform.mlir
@@ -1,10 +1,8 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=gfx1100 --pass-pipeline="builtin.module(iree-llvmgpu-select-lowering-strategy, func.func(iree-llvmgpu-lower-executable-target))" %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_rocm_hsaco_fb = #hal.executable.target<"rocm", "rocm-hsaco-fb", {ukernels = "argmax"}>
 #map = affine_map<(d0) -> (d0)>
@@ -21,9 +19,9 @@
   %4 = arith.shli %3, %c32_i64 : i64
   %5 = arith.ori %2, %4 : i64
   %6 = arith.index_castui %5 : i64 to index
-  %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<i64>>
+  %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<i64>>
   %8 = flow.dispatch.workload.ordinal %6, 0 : index
-  %9 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?xf16>>{%8}
+  %9 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?xf16>>{%8}
   %10 = flow.dispatch.tensor.load %9, offsets = [0], sizes = [%8], strides = [1] : !flow.dispatch.tensor<readonly:tensor<?xf16>>{%8} -> tensor<?xf16>
   %11 = tensor.empty() : tensor<i64>
   %12 = tensor.empty() : tensor<f16>
@@ -49,11 +47,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_rocm_hsaco_fb = #hal.executable.target<"rocm", "rocm-hsaco-fb", {ukernels = "argmax"}>
 #map = affine_map<(d0, d1) -> (d0, d1)>
@@ -70,9 +66,9 @@
   %4 = arith.shli %3, %c32_i64 : i64
   %5 = arith.ori %2, %4 : i64
   %6 = arith.index_castui %5 : i64 to index
-  %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<16xi64>>
+  %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<16xi64>>
   %8 = flow.dispatch.workload.ordinal %6, 0 : index
-  %9 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<16x?xf32>>{%8}
+  %9 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<16x?xf32>>{%8}
   %10 = flow.dispatch.tensor.load %9, offsets = [0, 0], sizes = [16, %8], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<16x?xf32>>{%8} -> tensor<16x?xf32>
   %11 = tensor.empty() : tensor<16xi64>
   %12 = tensor.empty() : tensor<16xf32>
@@ -100,11 +96,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_rocm_hsaco_fb = #hal.executable.target<"rocm", "rocm-hsaco-fb">
 #map = affine_map<(d0) -> (d0)>
@@ -121,9 +115,9 @@
   %4 = arith.shli %3, %c32_i64 : i64
   %5 = arith.ori %2, %4 : i64
   %6 = arith.index_castui %5 : i64 to index
-  %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<i64>>
+  %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<i64>>
   %8 = flow.dispatch.workload.ordinal %6, 0 : index
-  %9 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?xf16>>{%8}
+  %9 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?xf16>>{%8}
   %10 = flow.dispatch.tensor.load %9, offsets = [0], sizes = [%8], strides = [1] : !flow.dispatch.tensor<readonly:tensor<?xf16>>{%8} -> tensor<?xf16>
   %11 = tensor.empty() : tensor<i64>
   %12 = tensor.empty() : tensor<f16>
@@ -149,11 +143,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_rocm_hsaco_fb = #hal.executable.target<"rocm", "rocm-hsaco-fb", {ukernels = "argmax"}>
 #map = affine_map<(d0) -> (d0)>
@@ -170,9 +162,9 @@
   %4 = arith.shli %3, %c32_i64 : i64
   %5 = arith.ori %2, %4 : i64
   %6 = arith.index_castui %5 : i64 to index
-  %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<i64>>
+  %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<i64>>
   %8 = flow.dispatch.workload.ordinal %6, 0 : index
-  %9 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?xf16>>{%8}
+  %9 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?xf16>>{%8}
   %10 = flow.dispatch.tensor.load %9, offsets = [0], sizes = [%8], strides = [1] : !flow.dispatch.tensor<readonly:tensor<?xf16>>{%8} -> tensor<?xf16>
   %11 = tensor.empty() : tensor<i64>
   %12 = tensor.empty() : tensor<f16>
diff --git a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/winograd_pipeline_test.mlir b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/winograd_pipeline_test.mlir
index fcb1192..4173142 100644
--- a/compiler/src/iree/compiler/Codegen/LLVMGPU/test/winograd_pipeline_test.mlir
+++ b/compiler/src/iree/compiler/Codegen/LLVMGPU/test/winograd_pipeline_test.mlir
@@ -1,15 +1,13 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=gfx1100 --pass-pipeline="builtin.module(iree-llvmgpu-select-lowering-strategy, func.func(iree-llvmgpu-lower-executable-target))" %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @winograd_filter_transform() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<3x3x64x128xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8x8x64x128xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<3x3x64x128xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8x8x64x128xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [3, 3, 64, 128], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x64x128xf32>> -> tensor<3x3x64x128xf32>
   %3 = tensor.empty() : tensor<8x8x64x128xf32>
   %4 = iree_linalg_ext.winograd.filter_transform output_tile_size(6) kernel_size(3) kernel_dimensions([0, 1]) ins(%2 : tensor<3x3x64x128xf32>) outs(%3 : tensor<8x8x64x128xf32>) -> tensor<8x8x64x128xf32>
@@ -29,16 +27,14 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @winograd_input_transform() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x34x34x128xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8x8x2x6x6x128xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x34x34x128xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8x8x2x6x6x128xf16>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [2, 34, 34, 128], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2x34x34x128xf16>> -> tensor<2x34x34x128xf16>
   %3 = tensor.empty() : tensor<8x8x2x6x6x128xf16>
   %4 = iree_linalg_ext.winograd.input_transform output_tile_size(6) kernel_size(3) image_dimensions([1, 2]) ins(%2 : tensor<2x34x34x128xf16>) outs(%3 : tensor<8x8x2x6x6x128xf16>) -> tensor<8x8x2x6x6x128xf16>
@@ -59,16 +55,14 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @winograd_output_transform() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<8x8x2x6x6x128xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x36x36x128xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<8x8x2x6x6x128xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x36x36x128xf16>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0, 0, 0], sizes = [8, 8, 2, 6, 6, 128], strides = [1, 1, 1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<8x8x2x6x6x128xf16>> -> tensor<8x8x2x6x6x128xf16>
   %3 = tensor.empty() : tensor<2x36x36x128xf16>
   %4 = iree_linalg_ext.winograd.output_transform output_tile_size(6) kernel_size(3) image_dimensions([1, 2]) ins(%2 : tensor<8x8x2x6x6x128xf16>) outs(%3 : tensor<2x36x36x128xf16>) -> tensor<2x36x36x128xf16>
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/ConvertToSPIRVPass.cpp b/compiler/src/iree/compiler/Codegen/SPIRV/ConvertToSPIRVPass.cpp
index a5c7df9..b785b3d 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/ConvertToSPIRVPass.cpp
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/ConvertToSPIRVPass.cpp
@@ -117,7 +117,7 @@
 /// Returns the (set, binding) pair for the given interface op.
 static std::pair<uint32_t, uint32_t>
 getInterfaceSetAndBinding(IREE::HAL::InterfaceBindingSubspanOp op) {
-  return {op.getSet().getSExtValue(), op.getBinding().getSExtValue()};
+  return {0, op.getBinding().getSExtValue()};
 }
 
 /// Scans all hal.interface.binding.subspan ops in `module`, creates their
@@ -289,7 +289,7 @@
     assert(exportOps.size() == 1);
     auto layoutAttr = exportOps.front().getLayout();
 
-    uint64_t elementCount = layoutAttr.getPushConstants();
+    uint64_t elementCount = layoutAttr.getConstants();
     unsigned index = loadOp.getOrdinal().getZExtValue();
 
     // The following function generates SPIR-V ops with i32 types. So it does
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/SPIRVEmulateI64.cpp b/compiler/src/iree/compiler/Codegen/SPIRV/SPIRVEmulateI64.cpp
index 6861e81..0e2734b 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/SPIRVEmulateI64.cpp
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/SPIRVEmulateI64.cpp
@@ -61,10 +61,9 @@
 
     auto newOp =
         rewriter.replaceOpWithNewOp<IREE::HAL::InterfaceBindingSubspanOp>(
-            op, newResultTy, adaptor.getLayout(), adaptor.getSet(),
-            adaptor.getBinding(), adaptor.getByteOffset(),
-            adaptor.getDynamicDims(), adaptor.getAlignmentAttr(),
-            adaptor.getDescriptorFlagsAttr());
+            op, newResultTy, adaptor.getLayout(), adaptor.getBinding(),
+            adaptor.getByteOffset(), adaptor.getDynamicDims(),
+            adaptor.getAlignmentAttr(), adaptor.getDescriptorFlagsAttr());
     LLVM_DEBUG(llvm::dbgs()
                << "WideIntegerEmulation: new op: " << newOp << "\n");
     (void)newOp;
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/SPIRVEraseStorageBufferStaticShape.cpp b/compiler/src/iree/compiler/Codegen/SPIRV/SPIRVEraseStorageBufferStaticShape.cpp
index 817b121..94a3dc2 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/SPIRVEraseStorageBufferStaticShape.cpp
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/SPIRVEraseStorageBufferStaticShape.cpp
@@ -50,7 +50,7 @@
 /// e.g.,
 ///
 /// ```mlir
-///  hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0)
+///  hal.interface.binding.subspan layout(#pipeline_layout) binding(0)
 ///  offset(%offset)
 ///      : memref<16xf32>
 /// ```
@@ -58,7 +58,7 @@
 /// is re-written to
 ///
 /// ```mlir
-///  hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0)
+///  hal.interface.binding.subspan layout(#pipeline_layout) binding(0)
 ///  offset(%offset)
 ///      : memref<?xf32>{%c16}
 /// ```
@@ -87,9 +87,8 @@
 
   auto newOp = rewriter.create<IREE::HAL::InterfaceBindingSubspanOp>(
       subspanOp.getLoc(), newType, subspanOp.getLayoutAttr(),
-      subspanOp.getSetAttr(), subspanOp.getBindingAttr(),
-      subspanOp.getByteOffset(), dynamicDims, subspanOp.getAlignmentAttr(),
-      subspanOp.getDescriptorFlagsAttr());
+      subspanOp.getBindingAttr(), subspanOp.getByteOffset(), dynamicDims,
+      subspanOp.getAlignmentAttr(), subspanOp.getDescriptorFlagsAttr());
 
   LLVM_DEBUG({
     llvm::dbgs() << "Rewritten to: ";
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/SPIRVMaterializeExecutableConditions.cpp b/compiler/src/iree/compiler/Codegen/SPIRV/SPIRVMaterializeExecutableConditions.cpp
index 50086d0..0228656 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/SPIRVMaterializeExecutableConditions.cpp
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/SPIRVMaterializeExecutableConditions.cpp
@@ -78,7 +78,7 @@
 // and updates features.
 //
 // Note that the device queries used here should match the ones used in
-// iree_hal_vulkan_get_device_properties() on the runtime side.
+// iree_hal_vulkan_query_device_properties() on the runtime side.
 LogicalResult mapToDeviceQuery(IREE::HAL::ExecutableExportOp entryPoint,
                                spirv::Capability cap,
                                KernelFeatures &features) {
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/SPIRVVectorizeLoadStore.cpp b/compiler/src/iree/compiler/Codegen/SPIRV/SPIRVVectorizeLoadStore.cpp
index 5934189..de4c7fa 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/SPIRVVectorizeLoadStore.cpp
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/SPIRVVectorizeLoadStore.cpp
@@ -649,10 +649,9 @@
                                          "cannot get vectorized memref type");
     }
     rewriter.replaceOpWithNewOp<IREE::HAL::InterfaceBindingSubspanOp>(
-        subspanOp, *vecMemRef, subspanOp.getLayout(), subspanOp.getSet(),
-        subspanOp.getBinding(), subspanOp.getByteOffset(),
-        subspanOp.getDynamicDims(), subspanOp.getAlignmentAttr(),
-        subspanOp.getDescriptorFlagsAttr());
+        subspanOp, *vecMemRef, subspanOp.getLayout(), subspanOp.getBinding(),
+        subspanOp.getByteOffset(), subspanOp.getDynamicDims(),
+        subspanOp.getAlignmentAttr(), subspanOp.getDescriptorFlagsAttr());
     return success();
   }
 };
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/annotate_winograd_loops.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/annotate_winograd_loops.mlir
index 7796ffc..1f541ff 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/annotate_winograd_loops.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/annotate_winograd_loops.mlir
@@ -1,10 +1,8 @@
 // RUN: iree-opt --split-input-file --pass-pipeline="builtin.module(func.func(iree-spirv-annotate-winograd-loops))" %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @_wino_input_dispatch_0() {
   %c0 = arith.constant 0 : index
@@ -16,8 +14,8 @@
   %c1 = arith.constant 1 : index
   %c32 = arith.constant 32 : index
   %0 = tensor.empty() : tensor<8x8xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x10x10x1280xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8x8x2x2x2x1280xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x10x10x1280xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8x8x2x2x2x1280xf32>>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -68,9 +66,9 @@
 // CHECK:        %[[C1:.+]] = arith.constant 1 : index
 // CHECK:        %[[C32:.+]] = arith.constant 32 : index
 // CHECK:        %[[D0:.+]] = tensor.empty() : tensor<8x8xf32>
-// CHECK:        %[[D1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64) offset(%[[C0]])
+// CHECK:        %[[D1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64) offset(%[[C0]])
 // CHECK-SAME:     : !flow.dispatch.tensor<readonly:tensor<2x10x10x1280xf32>>
-// CHECK:        %[[D2:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) alignment(64) offset(%[[C0]])
+// CHECK:        %[[D2:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) alignment(64) offset(%[[C0]])
 // CHECK-SAME:     : !flow.dispatch.tensor<writeonly:tensor<8x8x2x2x2x1280xf32>>
 // CHECK:        %[[WORKGROUP_ID_X:.+]] = hal.interface.workgroup.id[0] : index
 // CHECK:        %[[WORKGROUP_COUNT_X:.+]] = hal.interface.workgroup.count[0] : index
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_adreno_conv.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_adreno_conv.mlir
index 4e6ea89..221790a 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_adreno_conv.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_adreno_conv.mlir
@@ -2,21 +2,19 @@
 
 // Conv - large OC - distribute to only one workgroup dimension.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @conv_112x112x512() {
   %c0 = arith.constant 0 : index
   %c512 = arith.constant 512 : index
   %c112 = arith.constant 112 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x225x225x3xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x3x512xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x512xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x225x225x3xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x3x512xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x512xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 225, 225, 3], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x225x225x3xf32>> -> tensor<1x225x225x3xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [3, 3, 3, 512], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x3x512xf32>> -> tensor<3x3x3x512xf32>
   %5 = tensor.empty() : tensor<1x112x112x512xf32>
@@ -37,21 +35,19 @@
 
 // Conv - medium OC/OW/OH - distribute to two workgroup dimensions.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @conv_112x112x32() {
   %c0 = arith.constant 0 : index
   %c32 = arith.constant 32 : index
   %c112 = arith.constant 112 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x225x225x3xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x3x32xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x32xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x225x225x3xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x3x32xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x32xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 225, 225, 3], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x225x225x3xf32>> -> tensor<1x225x225x3xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [3, 3, 3, 32], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x3x32xf32>> -> tensor<3x3x3x32xf32>
   %5 = tensor.empty() : tensor<1x112x112x32xf32>
@@ -72,20 +68,18 @@
 
 // Conv - small OC/OW/OH - distribute to all three workgroup dimensions.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @conv_16x16x16() {
   %c0 = arith.constant 0 : index
   %c16 = arith.constant 16 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x33x33x3xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x3x16xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x16x16x16xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x33x33x3xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x3x16xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x16x16x16xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 33, 33, 3], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x33x33x3xf32>> -> tensor<1x33x33x3xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [3, 3, 3, 16], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x3x16xf32>> -> tensor<3x3x3x16xf32>
   %5 = tensor.empty() : tensor<1x16x16x16xf32>
@@ -105,21 +99,19 @@
 
 // Depthwise conv - small OC/OW/OH - distribute to all three workgroup dimensions.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @dwconv_28x28x144() {
   %c0 = arith.constant 0 : index
   %c144 = arith.constant 144 : index
   %c28 = arith.constant 28 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x57x57x144xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x144xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x28x28x144xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x57x57x144xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x144xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x28x28x144xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 57, 57, 144], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x57x57x144xf32>> -> tensor<1x57x57x144xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [3, 3, 144], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x144xf32>> -> tensor<3x3x144xf32>
   %5 = tensor.empty() : tensor<1x28x28x144xf32>
@@ -140,21 +132,19 @@
 
 // Depthwise conv - tiny OC/OW/OH - starving the GPU.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @dwconv_4x4x8() {
   %c0 = arith.constant 0 : index
   %c8 = arith.constant 8 : index
   %c4 = arith.constant 4 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x9x9x8xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x8xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x4x4x8xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x9x9x8xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x8xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x4x4x8xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 9, 9, 8], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x9x9x8xf32>> -> tensor<1x9x9x8xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [3, 3, 8], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x8xf32>> -> tensor<3x3x8xf32>
   %5 = tensor.empty() : tensor<1x4x4x8xf32>
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_adreno_matmul.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_adreno_matmul.mlir
index 60a9e65..08d7676 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_adreno_matmul.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_adreno_matmul.mlir
@@ -2,21 +2,19 @@
 
 // Large matmul that can match the best tiling scheme.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_1024x2048x512() {
   %c0 = arith.constant 0 : index
   %c2048 = arith.constant 2048 : index
   %c1024 = arith.constant 1024 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1024x512xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<512x2048xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1024x2048xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1024x512xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<512x2048xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1024x2048xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1024, 512], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1024x512xf32>> -> tensor<1024x512xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [512, 2048], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<512x2048xf32>> -> tensor<512x2048xf32>
   %5 = tensor.empty() : tensor<1024x2048xf32>
@@ -37,21 +35,19 @@
 
 // Small matmul N that can still tile to all threads in a workgroup.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_3136x24x96() {
   %c0 = arith.constant 0 : index
   %c24 = arith.constant 24 : index
   %c3136 = arith.constant 3136 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<3136x96xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<96x24xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<3136x24xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<3136x96xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<96x24xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<3136x24xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [3136, 96], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<3136x96xf32>> -> tensor<3136x96xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [96, 24], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<96x24xf32>> -> tensor<96x24xf32>
   %5 = tensor.empty() : tensor<3136x24xf32>
@@ -72,21 +68,19 @@
 
 // Small matmul M that can still tile to all threads in a workgroup.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_196x64x192() {
   %c0 = arith.constant 0 : index
   %c64 = arith.constant 64 : index
   %c196 = arith.constant 196 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<196x192xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<192x64xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<196x64xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<196x192xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<192x64xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<196x64xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [196, 192], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<196x192xf32>> -> tensor<196x192xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [192, 64], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<192x64xf32>> -> tensor<192x64xf32>
   %5 = tensor.empty() : tensor<196x64xf32>
@@ -107,21 +101,19 @@
 
 // Small matmul K that can still tile to all threads in a workgroup.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_12544x96x16() {
   %c0 = arith.constant 0 : index
   %c96 = arith.constant 96 : index
   %c12544 = arith.constant 12544 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<12544x16xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<16x96xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<12544x96xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<12544x16xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<16x96xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<12544x96xf32>
   linalg.fill ins(%cst : f32) outs(%2 : memref<12544x96xf32>)
   linalg.matmul {__internal_linalg_transform__ = "workgroup"} ins(%0, %1 : memref<12544x16xf32>, memref<16x96xf32>) outs(%2 : memref<12544x96xf32>)
   return
@@ -138,21 +130,19 @@
 
 // Odd matmul M and small N that cannot utilize all threads in a workgroup.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_49x160x576() {
   %c0 = arith.constant 0 : index
   %c160 = arith.constant 160 : index
   %c49 = arith.constant 49 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<49x576xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<576x160xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<49x160xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<49x576xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<576x160xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<49x160xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [49, 576], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<49x576xf32>> -> tensor<49x576xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [576, 160], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<576x160xf32>> -> tensor<576x160xf32>
   %5 = tensor.empty() : tensor<49x160xf32>
@@ -173,21 +163,19 @@
 
 // Large batch matmul.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @batch_matmul_4x384x384() {
   %c0 = arith.constant 0 : index
   %c384 = arith.constant 384 : index
   %c4 = arith.constant 4 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<4x384x32xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<4x32x384xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<4x384x384xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<4x384x32xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<4x32x384xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<4x384x384xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [4, 384, 32], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4x384x32xf32>> -> tensor<4x384x32xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [4, 32, 384], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4x32x384xf32>> -> tensor<4x32x384xf32>
   %5 = tensor.empty() : tensor<4x384x384xf32>
@@ -208,21 +196,19 @@
 
 // Small batch matmul.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @batch_matmul_4x8x8() {
   %c0 = arith.constant 0 : index
   %c8 = arith.constant 8 : index
   %c4 = arith.constant 4 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<4x8x32xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<4x32x8xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<4x8x8xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<4x8x32xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<4x32x8xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<4x8x8xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [4, 8, 32], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4x8x32xf32>> -> tensor<4x8x32xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [4, 32, 8], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4x32x8xf32>> -> tensor<4x32x8xf32>
   %5 = tensor.empty() : tensor<4x8x8xf32>
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_amd_conv.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_amd_conv.mlir
index 2fa2c4d..e499af8 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_amd_conv.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_amd_conv.mlir
@@ -1,21 +1,19 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=rdna2@vulkan --pass-pipeline='builtin.module(iree-spirv-select-lowering-strategy-pass)' %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
 func.func @nhwc_conv_pointwise_2x64x64x320() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x66x66x320xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x3x320x320xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x64x64x320xf16>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x64x64x320xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x66x66x320xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x3x320x320xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x64x64x320xf16>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x64x64x320xf16>>
   %4 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [2, 66, 66, 320], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2x66x66x320xf16>> -> tensor<2x66x66x320xf16>
   %5 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [3, 3, 320, 320], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x320x320xf16>> -> tensor<3x3x320x320xf16>
   %6 = flow.dispatch.tensor.load %2, offsets = [0, 0, 0, 0], sizes = [2, 64, 64, 320], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2x64x64x320xf16>> -> tensor<2x64x64x320xf16>
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_amd_matmul.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_amd_matmul.mlir
index b986068..37bd863 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_amd_matmul.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_amd_matmul.mlir
@@ -1,17 +1,15 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=rdna2@vulkan --pass-pipeline='builtin.module(iree-spirv-select-lowering-strategy-pass)' %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @batch_matmul_f32_16x4096x40x4096() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x4096x4096xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x4096x40xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<16x4096x40xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x4096x4096xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x4096x40xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<16x4096x40xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [16, 4096, 4096], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<16x4096x4096xf32>> -> tensor<16x4096x4096xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [16, 4096, 40], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<16x4096x40xf32>> -> tensor<16x4096x40xf32>
   %5 = flow.dispatch.tensor.load %2, offsets = [0, 0, 0], sizes = [16, 4096, 40], strides = [1, 1, 1] : !flow.dispatch.tensor<readwrite:tensor<16x4096x40xf32>> -> tensor<16x4096x40xf32>
@@ -30,19 +28,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_f16_64x640x320() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<64x320xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<320x640xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64x640xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<64x320xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<320x640xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<64x640xf16>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [64, 320], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<64x320xf16>> -> tensor<64x320xf16>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [320, 640], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<320x640xf16>> -> tensor<320x640xf16>
   %5 = tensor.empty() : tensor<64x640xf16>
@@ -61,18 +57,16 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @batch_matmul_f32_16x4096x40x4096() {
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x4096x4096xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x4096x48xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<16x4096x48xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x4096x4096xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x4096x48xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<16x4096x48xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [16, 4096, 4096], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<16x4096x4096xf32>> -> tensor<16x4096x4096xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [16, 4096, 48], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<16x4096x48xf32>> -> tensor<16x4096x48xf32>
   %5 = tensor.empty() : tensor<16x4096x48xf32>
@@ -91,20 +85,18 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
 func.func @batch_matmul_f16_1x4096x4096x512() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x4096x512xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x512x4096xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x4096x4096xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x4096x512xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x512x4096xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x4096x4096xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [1, 4096, 512], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x4096x512xf16>> -> tensor<1x4096x512xf16>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [1, 512, 4096], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x512x4096xf16>> -> tensor<1x512x4096xf16>
   %5 = tensor.empty() : tensor<1x4096x4096xf32>
@@ -129,13 +121,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 5, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 5, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d0, d1)>
@@ -154,11 +144,11 @@
   %7 = arith.index_castui %2 : i32 to index
   %8 = arith.index_castui %3 : i32 to index
   %9 = arith.index_castui %4 : i32 to index
-  %10 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%5) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<11008x32x128xi4>>
-  %11 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%6) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<11008x32xf32>>
-  %12 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%7) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<11008x32xf32>>
-  %13 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%8) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<512x32x128xf32>>
-  %14 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%9) : !flow.dispatch.tensor<writeonly:tensor<512x11008xf32>>
+  %10 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%5) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<11008x32x128xi4>>
+  %11 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%6) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<11008x32xf32>>
+  %12 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%7) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<11008x32xf32>>
+  %13 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%8) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<512x32x128xf32>>
+  %14 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%9) : !flow.dispatch.tensor<writeonly:tensor<512x11008xf32>>
   %15 = flow.dispatch.tensor.load %10, offsets = [0, 0, 0], sizes = [11008, 32, 128], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<11008x32x128xi4>> -> tensor<11008x32x128xi4>
   %16 = flow.dispatch.tensor.load %11, offsets = [0, 0], sizes = [11008, 32], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<11008x32xf32>> -> tensor<11008x32xf32>
   %17 = flow.dispatch.tensor.load %12, offsets = [0, 0], sizes = [11008, 32], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<11008x32xf32>> -> tensor<11008x32xf32>
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_amd_matmul_cooperative_ops.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_amd_matmul_cooperative_ops.mlir
index c07fe95..c3755fb 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_amd_matmul_cooperative_ops.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_amd_matmul_cooperative_ops.mlir
@@ -1,13 +1,11 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=rdna3@vulkan --pass-pipeline='builtin.module(iree-spirv-select-lowering-strategy-pass)' %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>,
-    #hal.descriptor_set.binding<4, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1) -> (d0, d1)>
 func.func @matmul_256x1024x128_div_add() {
@@ -15,11 +13,11 @@
   %c1024 = arith.constant 1024 : index
   %c256 = arith.constant 256 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<256x1024xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<256x1024xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<256x128xf16>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<readonly:tensor<128x1024xf16>>
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(4) : !flow.dispatch.tensor<writeonly:tensor<256x1024xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<256x1024xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<256x1024xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<256x128xf16>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<readonly:tensor<128x1024xf16>>
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(4) : !flow.dispatch.tensor<writeonly:tensor<256x1024xf16>>
   %5 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [256, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x1024xf16>> -> tensor<256x1024xf16>
   %6 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [256, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x1024xf16>> -> tensor<256x1024xf16>
   %7 = tensor.empty() : tensor<256x1024xf16>
@@ -47,22 +45,20 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
 func.func @batch_matmul_16x128x256x512_div() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x128x512xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x512x256xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x128x256xf16>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<16x128x256xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x128x512xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x512x256xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x128x256xf16>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<16x128x256xf16>>
   %4 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [16, 128, 512], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<16x128x512xf16>> -> tensor<16x128x512xf16>
   %5 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [16, 512, 256], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<16x512x256xf16>> -> tensor<16x512x256xf16>
   %6 = flow.dispatch.tensor.load %2, offsets = [0, 0, 0], sizes = [16, 128, 256], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<16x128x256xf16>> -> tensor<16x128x256xf16>
@@ -89,12 +85,10 @@
 
 // Linalg.generic that is a batch matmul.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2, d3) -> (d1, d0, d3)>
 #map1 = affine_map<(d0, d1, d2, d3) -> (d0, d3, d2)>
@@ -105,9 +99,9 @@
 func.func @generic_batch_matmul_32x8x512x64() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x32x64xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<32x64x512xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32x128x512xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x32x64xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<32x64x512xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32x128x512xf16>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [2, 32, 64], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<128x32x64xf16>> -> tensor<128x32x64xf16>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [32, 64, 512], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<32x64x512xf16>> -> tensor<32x64x512xf16>
   %5 = tensor.empty() : tensor<32x128x512xf16>
@@ -133,19 +127,17 @@
 
 // K dim size not divisble by 32.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @batch_matmul_16x1024x1024x80() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x1024x80xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x80x1024xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<16x1024x1024xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x1024x80xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x80x1024xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<16x1024x1024xf16>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [16, 1024, 80], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<16x1024x80xf16>> -> tensor<16x1024x80xf16>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [16, 80, 1024], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<16x80x1024xf16>> -> tensor<16x80x1024xf16>
   %5 = tensor.empty() : tensor<16x1024x1024xf16>
@@ -166,21 +158,19 @@
 
 // Small K - not supported by cooperative matrix.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_256x1024x8() {
   %c0 = arith.constant 0 : index
   %c1024 = arith.constant 1024 : index
   %c256 = arith.constant 256 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<256x8xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<8x1024xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<256x1024xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<256x8xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<8x1024xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<256x1024xf16>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [256, 8], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x8xf16>> -> tensor<256x8xf16>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [8, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<8x1024xf16>> -> tensor<8x1024xf16>
   %5 = tensor.empty() : tensor<256x1024xf16>
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_amd_matvec.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_amd_matvec.mlir
index fcf5341..f00e7b5 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_amd_matvec.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_amd_matvec.mlir
@@ -1,13 +1,11 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=cdna2@vulkan --pass-pipeline='builtin.module(iree-spirv-select-lowering-strategy-pass)' %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>,
-    #hal.descriptor_set.binding<4, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d0, d1)>
@@ -15,11 +13,11 @@
 #map3 = affine_map<(d0, d1, d2) -> (d0)>
 func.func @i4_dequant_matvec_f32() {
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<4096x86x128xi4>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<4096x86xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<4096x86xf32>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<readonly:tensor<86x128xf32>>
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(4) : !flow.dispatch.tensor<writeonly:tensor<4096xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<4096x86x128xi4>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<4096x86xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<4096x86xf32>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<readonly:tensor<86x128xf32>>
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(4) : !flow.dispatch.tensor<writeonly:tensor<4096xf32>>
   %5 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [4096, 86, 128], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x86x128xi4>> -> tensor<4096x86x128xi4>
   %6 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [4096, 86], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x86xf32>> -> tensor<4096x86xf32>
   %7 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [4096, 86], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x86xf32>> -> tensor<4096x86xf32>
@@ -54,12 +52,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d0, d1, 0)>
@@ -70,11 +66,11 @@
   %c32_i64 = arith.constant 32 : i64
   %cst = arith.constant 0.000000e+00 : f32
   %c4294967296_i64 = arith.constant 4294967296 : i64
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<4096x32x128xi4>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<4096x32x1xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<4096x32x1xf32>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<1x1x32x128xf32>>
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x1x4096xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<4096x32x128xi4>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<4096x32x1xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<4096x32x1xf32>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<1x1x32x128xf32>>
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x1x4096xf32>>
   %5 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [4096, 32, 128], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x32x128xi4>> -> tensor<4096x32x128xi4>
   %6 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [4096, 32, 1], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x32x1xf32>> -> tensor<4096x32x1xf32>
   %7 = flow.dispatch.tensor.load %2, offsets = [0, 0, 0], sizes = [4096, 32, 1], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x32x1xf32>> -> tensor<4096x32x1xf32>
@@ -109,13 +105,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 9, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 9, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d0, d1)>
@@ -152,12 +146,12 @@
   %24 = arith.shli %23, %c32_i64 : i64
   %25 = arith.ori %22, %24 : i64
   %26 = arith.index_castui %25 : i64 to index
-  %27 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%9) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86x128xi4>>
-  %28 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%10) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86xf32>>
-  %29 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%11) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86xf32>>
+  %27 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%9) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86x128xi4>>
+  %28 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%10) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86xf32>>
+  %29 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%11) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86xf32>>
   %30 = flow.dispatch.workload.ordinal %26, 0 : index
-  %31 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%16) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x86x128xf32>>{%30}
-  %32 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%21) : !flow.dispatch.tensor<writeonly:tensor<?x4096xf32>>{%30}
+  %31 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%16) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x86x128xf32>>{%30}
+  %32 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%21) : !flow.dispatch.tensor<writeonly:tensor<?x4096xf32>>{%30}
   %33 = flow.dispatch.tensor.load %27, offsets = [0, 0, 0], sizes = [4096, 86, 128], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x86x128xi4>> -> tensor<4096x86x128xi4>
   %34 = flow.dispatch.tensor.load %28, offsets = [0, 0], sizes = [4096, 86], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x86xf32>> -> tensor<4096x86xf32>
   %35 = flow.dispatch.tensor.load %29, offsets = [0, 0], sizes = [4096, 86], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x86xf32>> -> tensor<4096x86xf32>
@@ -192,14 +186,12 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>,
-    #hal.descriptor_set.binding<4, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d0, d1, 0)>
@@ -209,11 +201,11 @@
 func.func @i4_dequant_matvec_f16() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86x128xi4>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86x1xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86x1xf16>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x1x86x128xf16>>
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(4) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x1x4096xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86x128xi4>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86x1xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86x1xf16>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x1x86x128xf16>>
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(4) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x1x4096xf16>>
   %5 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [4096, 86, 128], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x86x128xi4>> -> tensor<4096x86x128xi4>
   %6 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [4096, 86, 1], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x86x1xf16>> -> tensor<4096x86x1xf16>
   %7 = flow.dispatch.tensor.load %2, offsets = [0, 0, 0], sizes = [4096, 86, 1], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x86x1xf16>> -> tensor<4096x86x1xf16>
@@ -248,14 +240,12 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 9, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>,
-    #hal.descriptor_set.binding<4, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 9, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d0, d1)>
@@ -292,12 +282,12 @@
   %24 = arith.shli %23, %c32_i64 : i64
   %25 = arith.ori %22, %24 : i64
   %26 = arith.index_castui %25 : i64 to index
-  %27 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%9) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86x128xi4>>
-  %28 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%10) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86xf32>>
-  %29 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%11) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86xf32>>
+  %27 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%9) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86x128xi4>>
+  %28 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%10) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86xf32>>
+  %29 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%11) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86xf32>>
   %30 = flow.dispatch.workload.ordinal %26, 0 : index
-  %31 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%16) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x86x128xf32>>{%30}
-  %32 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%21) : !flow.dispatch.tensor<writeonly:tensor<?x4096xf32>>{%30}
+  %31 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%16) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x86x128xf32>>{%30}
+  %32 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%21) : !flow.dispatch.tensor<writeonly:tensor<?x4096xf32>>{%30}
   %33 = flow.dispatch.tensor.load %27, offsets = [0, 0, 0], sizes = [4096, 86, 128], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x86x128xi4>> -> tensor<4096x86x128xi4>
   %34 = flow.dispatch.tensor.load %28, offsets = [0, 0], sizes = [4096, 86], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x86xf32>> -> tensor<4096x86xf32>
   %35 = flow.dispatch.tensor.load %29, offsets = [0, 0], sizes = [4096, 86], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x86xf32>> -> tensor<4096x86xf32>
@@ -332,12 +322,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 7, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 7, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d0, d1)>
@@ -368,12 +356,12 @@
   %17 = arith.shli %16, %c32_i64 : i64
   %18 = arith.ori %15, %17 : i64
   %19 = arith.index_castui %18 : i64 to index
-  %20 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%7) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<11008x32x128xi4>>
-  %21 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%8) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<11008x32xf16>>
-  %22 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%9) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<11008x32xf16>>
+  %20 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%7) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<11008x32x128xi4>>
+  %21 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%8) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<11008x32xf16>>
+  %22 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%9) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<11008x32xf16>>
   %23 = flow.dispatch.workload.ordinal %19, 0 : index
-  %24 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x32x128xf16>>{%23}
-  %25 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%14) : !flow.dispatch.tensor<writeonly:tensor<?x11008xf16>>{%23}
+  %24 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x32x128xf16>>{%23}
+  %25 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%14) : !flow.dispatch.tensor<writeonly:tensor<?x11008xf16>>{%23}
   %26 = flow.dispatch.tensor.load %20, offsets = [0, 0, 0], sizes = [11008, 32, 128], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<11008x32x128xi4>> -> tensor<11008x32x128xi4>
   %27 = flow.dispatch.tensor.load %21, offsets = [0, 0], sizes = [11008, 32], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<11008x32xf16>> -> tensor<11008x32xf16>
   %28 = flow.dispatch.tensor.load %22, offsets = [0, 0], sizes = [11008, 32], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<11008x32xf16>> -> tensor<11008x32xf16>
@@ -408,12 +396,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 5, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 5, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @dynamic_batch_matvec() {
   %c32_i64 = arith.constant 32 : i64
@@ -428,11 +414,11 @@
   %7 = arith.index_castui %2 : i32 to index
   %8 = arith.index_castui %3 : i32 to index
   %9 = arith.index_castui %4 : i32 to index
-  %10 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%7) : !flow.dispatch.tensor<writeonly:tensor<32x1x128xf16>>
+  %10 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%7) : !flow.dispatch.tensor<writeonly:tensor<32x1x128xf16>>
   %11 = flow.dispatch.workload.ordinal %8, 0 : index
   %12 = flow.dispatch.workload.ordinal %9, 1 : index
-  %13 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%5) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x1x?xf16>>{%11}
-  %14 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%6) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x?x128xf16>>{%12}
+  %13 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%5) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x1x?xf16>>{%11}
+  %14 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%6) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x?x128xf16>>{%12}
   %15 = flow.dispatch.tensor.load %13, offsets = [0, 0, 0], sizes = [32, 1, %11], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<32x1x?xf16>>{%11} -> tensor<32x1x?xf16>
   %16 = flow.dispatch.tensor.load %14, offsets = [0, 0, 0], sizes = [32, %12, 128], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<32x?x128xf16>>{%12} -> tensor<32x?x128xf16>
   %17 = tensor.empty() : tensor<32x1x128xf16>
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_default_conv.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_default_conv.mlir
index 88f9f72..aa9ae5e 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_default_conv.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_default_conv.mlir
@@ -2,13 +2,11 @@
 
 // Convolution with consumer pointwise ops.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
 func.func @nhwc_conv_pointwise_112x112x32() {
@@ -16,10 +14,10 @@
   %cst = arith.constant 0.000000e+00 : f32
   %c112 = arith.constant 112 : index
   %c32 = arith.constant 32 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x112x112x32xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<1x225x225x3xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<3x3x3x32xf32>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x32xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x112x112x32xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<1x225x225x3xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<3x3x3x32xf32>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x32xf32>>
   %4 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 112, 112, 32], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x112x112x32xf32>> -> tensor<1x112x112x32xf32>
   %5 = tensor.empty() : tensor<1x112x112x32xf32>
   %6 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [1, 225, 225, 3], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x225x225x3xf32>> -> tensor<1x225x225x3xf32>
@@ -45,18 +43,16 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @nchw_conv_2x1280x8x8() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x1280x10x10xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1280x1280x3x3xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<2x1280x8x8xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x1280x10x10xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1280x1280x3x3xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<2x1280x8x8xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [2, 1280, 10, 10], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2x1280x10x10xf32>> -> tensor<2x1280x10x10xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [1280, 1280, 3, 3], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1280x1280x3x3xf32>> -> tensor<1280x1280x3x3xf32>
   %5 = flow.dispatch.tensor.load %2, offsets = [0, 0, 0, 0], sizes = [2, 1280, 8, 8], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readwrite:tensor<2x1280x8x8xf32>> -> tensor<2x1280x8x8xf32>
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_default_linalg_ext_ops.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_default_linalg_ext_ops.mlir
index 29d8e04..7f0702c 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_default_linalg_ext_ops.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_default_linalg_ext_ops.mlir
@@ -1,13 +1,11 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=vp_android_baseline_2022@vulkan --pass-pipeline='builtin.module(iree-spirv-select-lowering-strategy-pass)' %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @static_1d_sort() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readwrite:tensor<1000xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readwrite:tensor<1000xi32>>
   %1 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [1000], strides = [1] : !flow.dispatch.tensor<readwrite:tensor<1000xi32>> -> tensor<1000xi32>
   %2 = iree_linalg_ext.sort dimension(0) outs(%1 : tensor<1000xi32>) {
   ^bb0(%arg0: i32, %arg1: i32):
@@ -29,19 +27,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
 func.func @static_3d_sort() {
   %c64 = arith.constant 64 : index
   %c128 = arith.constant 128 : index
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<64x32x128xi32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<64x32x128xi32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<64x32x128xi32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<64x32x128xi32>
   linalg.generic {indexing_maps = [#map, #map], iterator_types = ["parallel", "parallel", "parallel"]} ins(%0 : memref<64x32x128xi32>) outs(%1 : memref<64x32x128xi32>) {
   ^bb0(%in: i32, %out: i32):
     linalg.yield %in : i32
@@ -63,19 +59,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @static_1d_fft_stage2() {
   %c0 = arith.constant 0 : index
   %c2 = arith.constant 2 : index
   %cst = arith.constant dense<[1.000000e+00, 6.12323426E-17]> : tensor<2xf32>
   %cst_0 = arith.constant dense<[-0.000000e+00, -1.000000e+00]> : tensor<2xf32>
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readwrite:tensor<32xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readwrite:tensor<32xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readwrite:tensor<32xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readwrite:tensor<32xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [32], strides = [1] : !flow.dispatch.tensor<readwrite:tensor<32xf32>> -> tensor<32xf32>
   %3 = flow.dispatch.tensor.load %1, offsets = [0], sizes = [32], strides = [1] : !flow.dispatch.tensor<readwrite:tensor<32xf32>> -> tensor<32xf32>
   %4:2 = iree_linalg_ext.fft ins(%c2, %cst, %cst_0 : index, tensor<2xf32>, tensor<2xf32>) outs(%2, %3 : tensor<32xf32>, tensor<32xf32>) : tensor<32xf32>, tensor<32xf32>
@@ -93,11 +87,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @static_3d_fft_stage3() {
   %c0 = arith.constant 0 : index
@@ -109,8 +101,8 @@
   %cst_0 = arith.constant dense<[-0.000000e+00, -0.707106769, -1.000000e+00, -0.707106769]> : tensor<4xf32>
   %0 = bufferization.to_memref %cst_0 : memref<4xf32>
   %1 = bufferization.to_memref %cst : memref<4xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<64x128x32xf32>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<64x128x32xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<64x128x32xf32>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<64x128x32xf32>
   iree_linalg_ext.fft ins(%c3, %1, %0 : index, memref<4xf32>, memref<4xf32>) outs(%2, %3 : memref<64x128x32xf32>, memref<64x128x32xf32>)
   return
 }
@@ -124,16 +116,14 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @winograd_input_transform() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x34x34x128xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8x8x2x6x6x128xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2x34x34x128xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8x8x2x6x6x128xf16>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [2, 34, 34, 128], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2x34x34x128xf16>> -> tensor<2x34x34x128xf16>
   %3 = tensor.empty() : tensor<8x8x2x6x6x128xf16>
   %4 = iree_linalg_ext.winograd.input_transform output_tile_size(6) kernel_size(3) image_dimensions([1, 2]) ins(%2 : tensor<2x34x34x128xf16>) outs(%3 : tensor<8x8x2x6x6x128xf16>) -> tensor<8x8x2x6x6x128xf16>
@@ -150,16 +140,14 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @winograd_output_transform() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<8x8x2x6x6x128xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x36x36x128xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<8x8x2x6x6x128xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x36x36x128xf16>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0, 0, 0], sizes = [8, 8, 2, 6, 6, 128], strides = [1, 1, 1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<8x8x2x6x6x128xf16>> -> tensor<8x8x2x6x6x128xf16>
   %3 = tensor.empty() : tensor<2x36x36x128xf16>
   %4 = iree_linalg_ext.winograd.output_transform output_tile_size(6) kernel_size(3) image_dimensions([1, 2]) ins(%2 : tensor<8x8x2x6x6x128xf16>) outs(%3 : tensor<2x36x36x128xf16>) -> tensor<2x36x36x128xf16>
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_default_linalg_ops.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_default_linalg_ops.mlir
index 13f86b7..dcec345 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_default_linalg_ops.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_default_linalg_ops.mlir
@@ -1,10 +1,8 @@
 // RUN: iree-opt --split-input-file --pass-pipeline='builtin.module(iree-spirv-select-lowering-strategy-pass)' %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.gpu.target = #iree_gpu.target<arch = "", features = "spirv:v1.6,cap:Shader", wgp = <
@@ -18,8 +16,8 @@
   %c0 = arith.constant 0 : index
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<?x?xi32>{%0, %1}
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<?x?xi32>{%0, %1}
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<?x?xi32>{%0, %1}
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<?x?xi32>{%0, %1}
   linalg.generic {indexing_maps = [#map, #map], iterator_types = ["parallel", "parallel"]} ins(%2 : memref<?x?xi32>) outs(%3 : memref<?x?xi32>) {
   ^bb0(%in: i32, %out: i32):
     linalg.yield %in : i32
@@ -35,11 +33,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.gpu.target = #iree_gpu.target<arch = "", features = "spirv:v1.6,cap:Shader", wgp = <
@@ -53,8 +49,8 @@
   %c0 = arith.constant 0 : index
   %c224 = arith.constant 224 : index
   %c3 = arith.constant 3 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<1x224x224x3xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<1x224x224x3xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<1x224x224x3xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<1x224x224x3xf32>
   linalg.generic {indexing_maps = [#map, #map], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%0 : memref<1x224x224x3xf32>) outs(%1 : memref<1x224x224x3xf32>) {
   ^bb0(%in: f32, %out: f32):
     linalg.yield %in : f32
@@ -73,11 +69,9 @@
 
 // Average pooling op with nice tilable input.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.gpu.target = #iree_gpu.target<arch = "", features = "spirv:v1.6,cap:Shader", wgp = <
@@ -91,8 +85,8 @@
   %cst = arith.constant 0.000000e+00 : f32
   %c2 = arith.constant 2 : index
   %c8 = arith.constant 8 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x24x24x8xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<1x2x2x8xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x24x24x8xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<1x2x2x8xf32>>
   %2 = tensor.empty() : tensor<12x12xf32>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 24, 24, 8], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x24x24x8xf32>> -> tensor<1x24x24x8xf32>
   %4 = tensor.empty() : tensor<1x2x2x8xf32>
@@ -111,11 +105,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.gpu.target = #iree_gpu.target<arch = "", features = "spirv:v1.6,cap:Shader", wgp = <
@@ -129,8 +121,8 @@
   %cst = arith.constant 0.000000e+00 : f32
   %cst_0 = arith.constant 4.900000e+01 : f32
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x7x7x1280xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x1x1x1280xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x7x7x1280xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x1x1x1280xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 7, 7, 1280], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x7x7x1280xf32>> -> tensor<1x7x7x1280xf32>
   %3 = tensor.empty() : tensor<7x7xf32>
   %4 = tensor.empty() : tensor<1x1x1x1280xf32>
@@ -156,11 +148,9 @@
 
 // Max pooling op with odd size-1 dimension sizes.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.gpu.target = #iree_gpu.target<arch = "", features = "spirv:v1.6,cap:Shader", wgp = <
@@ -175,8 +165,8 @@
   %c1 = arith.constant 1 : index
   %c0 = arith.constant 0 : index
   %c320 = arith.constant 320 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x76x1x1xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<1x38x1x1xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x76x1x1xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<1x38x1x1xf32>>
   %2 = tensor.empty() : tensor<2x1xf32>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 76, 1, 1], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x76x1x1xf32>> -> tensor<1x76x1x1xf32>
   %4 = tensor.empty() : tensor<1x38x1x1xf32>
@@ -197,12 +187,10 @@
 
 // Element wise op with mismatched input and output rank.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.gpu.target = #iree_gpu.target<arch = "", features = "spirv:v1.6,cap:Shader", wgp = <
@@ -217,9 +205,9 @@
   %c0 = arith.constant 0 : index
   %c1 = arith.constant 1 : index
   %c10 = arith.constant 10 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x10xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<10xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<10xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x10xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<10xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<10xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1, 10], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1x10xf32>> -> tensor<1x10xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0], sizes = [10], strides = [1] : !flow.dispatch.tensor<readonly:tensor<10xf32>> -> tensor<10xf32>
   %5 = tensor.empty() : tensor<10xf32>
@@ -240,11 +228,9 @@
 
 // Fused depthwise convolution and element wise ops: don't vectorize with partially active subgroups.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.gpu.target = #iree_gpu.target<arch = "", features = "spirv:v1.6,cap:Shader", wgp = <
@@ -263,8 +249,8 @@
   %c4 = arith.constant 4 : index
   %c4576 = arith.constant 4576 : index
   %c6272 = arith.constant 6272 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x21x20x1xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<1x19x18x1x4xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x21x20x1xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<1x19x18x1x4xf32>>
   %2 = tensor.empty() : tensor<1x19x18x1x4xf32>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 21, 20, 1], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x21x20x1xf32>> -> tensor<1x21x20x1xf32>
   %4 = tensor.empty() : tensor<1x19x18x1x4xf32>
@@ -289,11 +275,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.gpu.target = #iree_gpu.target<arch = "", features = "spirv:v1.6,cap:Shader", wgp = <
@@ -307,8 +291,8 @@
 func.func @outermost_reduction() attributes {hal.executable.target = #executable_target_vulkan_spirv_fb} {
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4x2048x512xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x512xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4x2048x512xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x512xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [4, 2048, 512], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4x2048x512xf32>> -> tensor<4x2048x512xf32>
   %3 = tensor.empty() : tensor<2048x512xf32>
   %4 = linalg.fill ins(%cst : f32) outs(%3 : tensor<2048x512xf32>) -> tensor<2048x512xf32>
@@ -330,11 +314,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.gpu.target = #iree_gpu.target<arch = "", features = "spirv:v1.6,cap:Shader", wgp = <
@@ -353,9 +335,9 @@
   %3 = arith.index_cast %0 {stream.alignment = 512 : index, stream.values = [0 : index, 394752 : index, 984064 : index]} : i32 to index
   %4 = arith.index_cast %1 {stream.alignment = 512 : index, stream.values = [0 : index, 196608 : index, 197120 : index]} : i32 to index
   %5 = arith.index_cast %2 {stream.alignment = 512 : index, stream.values = [512 : index, 197120 : index, 197632 : index]} : i32 to index
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%3) : !flow.dispatch.tensor<readonly:tensor<128x384xf32>>
-  %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%4) : !flow.dispatch.tensor<readonly:tensor<128xf32>>
-  %8 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%5) : !flow.dispatch.tensor<writeonly:tensor<128xf32>>
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%3) : !flow.dispatch.tensor<readonly:tensor<128x384xf32>>
+  %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%4) : !flow.dispatch.tensor<readonly:tensor<128xf32>>
+  %8 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%5) : !flow.dispatch.tensor<writeonly:tensor<128xf32>>
   %9 = flow.dispatch.tensor.load %6, offsets = [0, 0], sizes = [128, 384], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<128x384xf32>> -> tensor<128x384xf32>
   %10 = flow.dispatch.tensor.load %7, offsets = [0], sizes = [128], strides = [1] : !flow.dispatch.tensor<readonly:tensor<128xf32>> -> tensor<128xf32>
   %11 = tensor.empty() : tensor<128xf32>
@@ -380,11 +362,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.gpu.target = #iree_gpu.target<arch = "", features = "spirv:v1.6,cap:Shader", wgp = <
@@ -397,8 +377,8 @@
 #map1 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
 func.func @four_dim_elementwise() attributes {hal.executable.target = #executable_target_vulkan_spirv_fb} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x8x256x4xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x256x4x8xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x8x256x4xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x256x4x8xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [128, 8, 256, 4], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<128x8x256x4xf32>> -> tensor<128x8x256x4xf32>
   %3 = tensor.empty() : tensor<128x256x4x8xf32>
   %4 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%2 : tensor<128x8x256x4xf32>) outs(%3 : tensor<128x256x4x8xf32>) {
@@ -418,11 +398,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.gpu.target = #iree_gpu.target<arch = "", features = "spirv:v1.6,cap:Shader", wgp = <
@@ -436,8 +414,8 @@
 func.func @odd_reduction_dimension_size_501() attributes {hal.executable.target = #executable_target_vulkan_spirv_fb} {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0xFF800000 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<512x501xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<512x501xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<512x501xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<512x501xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [512, 501], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<512x501xf32>> -> tensor<512x501xf32>
   %3 = tensor.empty() : tensor<512x501xf32>
   %4 = tensor.empty() : tensor<512xf32>
@@ -466,11 +444,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.gpu.target = #iree_gpu.target<arch = "", features = "spirv:v1.6,cap:Shader", wgp = <
@@ -484,8 +460,8 @@
 func.func @odd_reduction_dimension_size_2809() attributes {hal.executable.target = #executable_target_vulkan_spirv_fb} {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0xFF800000 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<512x2809xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<512x2809xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<512x2809xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<512x2809xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [512, 2809], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<512x2809xf32>> -> tensor<512x2809xf32>
   %3 = tensor.empty() : tensor<512x2809xf32>
   %4 = tensor.empty() : tensor<512xf32>
@@ -514,11 +490,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.gpu.target = #iree_gpu.target<arch = "", features = "spirv:v1.6,cap:Shader", wgp = <
@@ -533,8 +507,8 @@
   %c0 = arith.constant 0 : index
   %cst = arith.constant 1.000000e-10 : f32
   %cst_0 = arith.constant -1.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<f32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x1x1x1xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<f32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2048x1x1x1xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [], sizes = [], strides = [] : !flow.dispatch.tensor<readonly:tensor<f32>> -> tensor<f32>
   %3 = tensor.empty() : tensor<2048x1x1x1xf32>
   %4 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%2 : tensor<f32>) outs(%3 : tensor<2048x1x1x1xf32>) {
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_default_matmul.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_default_matmul.mlir
index 9370b6c..79df8d2 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_default_matmul.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_default_matmul.mlir
@@ -2,12 +2,10 @@
 
 // Odd K that forbids vectorization.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.gpu.target = #iree_gpu.target<arch = "", features = "spirv:v1.6,cap:Shader", wgp = <
@@ -22,9 +20,9 @@
   %c3 = arith.constant 3 : index
   %c1 = arith.constant 1 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x3x3xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<1x3x32xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x3x32xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x3x3xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<1x3x32xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x3x32xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [1, 3, 3], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x3x3xf32>> -> tensor<1x3x3xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [1, 3, 32], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x3x32xf32>> -> tensor<1x3x32xf32>
   %5 = tensor.empty() : tensor<1x3x32xf32>
@@ -45,12 +43,10 @@
 
 // 8-bit integers can be vectorized.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.gpu.target = #iree_gpu.target<arch = "", features = "spirv:v1.6,cap:Shader", wgp = <
@@ -64,9 +60,9 @@
   %c16 = arith.constant 16 : index
   %c64 = arith.constant 64 : index
   %c0_i32 = arith.constant 0 : i32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<64x32xi8>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<32x16xi8>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<64x16xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<64x32xi8>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<32x16xi8>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<64x16xi32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [64, 32], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<64x32xi8>> -> tensor<64x32xi8>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [32, 16], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<32x16xi8>> -> tensor<32x16xi8>
   %5 = tensor.empty() : tensor<64x16xi32>
@@ -87,12 +83,10 @@
 
 // Vectorize non-32 bit types.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.gpu.target = #iree_gpu.target<arch = "", features = "spirv:v1.6,cap:Shader", wgp = <
@@ -106,9 +100,9 @@
   %c16 = arith.constant 16 : index
   %c64 = arith.constant 64 : index
   %c0_i32 = arith.constant 0 : i32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<64x32xi64>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<32x16xi64>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<64x16xi64>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<64x32xi64>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<32x16xi64>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<64x16xi64>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [64, 32], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<64x32xi64>> -> tensor<64x32xi64>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [32, 16], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<32x16xi64>> -> tensor<32x16xi64>
   %5 = tensor.empty() : tensor<64x16xi64>
@@ -129,12 +123,10 @@
 
 // Odd N that forbids vectorization.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.gpu.target = #iree_gpu.target<arch = "", features = "spirv:v1.6,cap:Shader", wgp = <
@@ -151,10 +143,10 @@
   %cst = arith.constant 0.000000e+00 : f32
   %c400 = arith.constant 400 : index
   %c273 = arith.constant 273 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%c11775744) : !flow.dispatch.tensor<readonly:tensor<273xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<400x576xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<576x273xf32>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<400x273xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%c11775744) : !flow.dispatch.tensor<readonly:tensor<273xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<400x576xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<576x273xf32>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<400x273xf32>>
   %4 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [273], strides = [1] : !flow.dispatch.tensor<readonly:tensor<273xf32>> -> tensor<273xf32>
   %5 = tensor.empty() : tensor<400x273xf32>
   %6 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [400, 576], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<400x576xf32>> -> tensor<400x576xf32>
@@ -182,12 +174,10 @@
 
 // Odd M and non-4-multiplier N
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.gpu.target = #iree_gpu.target<arch = "", features = "spirv:v1.6,cap:Shader", wgp = <
@@ -204,10 +194,10 @@
   %cst = arith.constant 0.000000e+00 : f32
   %c25 = arith.constant 25 : index
   %c546 = arith.constant 546 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%c15842560) : !flow.dispatch.tensor<readonly:tensor<546xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<25x512xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<512x546xf32>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<25x546xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%c15842560) : !flow.dispatch.tensor<readonly:tensor<546xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<25x512xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<512x546xf32>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<25x546xf32>>
   %4 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [546], strides = [1] : !flow.dispatch.tensor<readonly:tensor<546xf32>> -> tensor<546xf32>
   %5 = tensor.empty() : tensor<25x546xf32>
   %6 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [25, 512], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<25x512xf32>> -> tensor<25x512xf32>
@@ -235,14 +225,12 @@
 
 // Matmul with consumer pointwise ops
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>,
-    #hal.descriptor_set.binding<4, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.gpu.target = #iree_gpu.target<arch = "", features = "spirv:v1.6,cap:Shader", wgp = <
@@ -257,11 +245,11 @@
   %cst = arith.constant 0.000000e+00 : f16
   %c256 = arith.constant 256 : index
   %c1024 = arith.constant 1024 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<256x1024xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<256x1024xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<256x128xf16>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<readonly:tensor<128x1024xf16>>
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(4) : !flow.dispatch.tensor<writeonly:tensor<256x1024xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<256x1024xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<256x1024xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<256x128xf16>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<readonly:tensor<128x1024xf16>>
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(4) : !flow.dispatch.tensor<writeonly:tensor<256x1024xf16>>
   %5 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [256, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x1024xf16>> -> tensor<256x1024xf16>
   %6 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [256, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x1024xf16>> -> tensor<256x1024xf16>
   %7 = tensor.empty() : tensor<256x1024xf16>
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_default_misc.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_default_misc.mlir
index 8deaa11..965c7dc 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_default_misc.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_default_misc.mlir
@@ -1,22 +1,20 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=vp_android_baseline_2022@vulkan --pass-pipeline='builtin.module(iree-spirv-select-lowering-strategy-pass)' %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d1)>
 #map1 = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
 func.func @complex_view_as_real() {
   %c1 = arith.constant 1 : index
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1xi32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x50xcomplex<f32>>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x1x32x50x2xf32>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32x50x2xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1xi32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x50xcomplex<f32>>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x1x32x50x2xf32>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32x50x2xf32>>
   %4 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [1], strides = [1] : !flow.dispatch.tensor<readonly:tensor<1xi32>> -> tensor<1xi32>
   %5 = flow.dispatch.tensor.load %2, offsets = [0, 0, 0, 0, 0], sizes = [1, 1, 32, 50, 2], strides = [1, 1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x1x32x50x2xf32>> -> tensor<1x1x32x50x2xf32>
   %6 = tensor.empty() : tensor<32x50x2xf32>
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_default_reduction.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_default_reduction.mlir
index 0f31e2a..5fadf17 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_default_reduction.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_default_reduction.mlir
@@ -1,10 +1,8 @@
 // RUN: iree-opt --split-input-file --pass-pipeline='builtin.module(iree-spirv-select-lowering-strategy-pass)' %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.gpu.target = #iree_gpu.target<arch = "", features = "spirv:v1.6,cap:Shader", wgp = <
@@ -18,8 +16,8 @@
 func.func @subgroup_reduce_f32() attributes {hal.executable.target = #executable_target_vulkan_spirv_fb} {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x512xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x512xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2, 512], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2x512xf32>> -> tensor<2x512xf32>
   %3 = tensor.empty() : tensor<2xf32>
   %4 = linalg.fill ins(%cst : f32) outs(%3 : tensor<2xf32>) -> tensor<2xf32>
@@ -41,11 +39,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.gpu.target = #iree_gpu.target<arch = "", features = "spirv:v1.6,cap:Shader", wgp = <
@@ -59,8 +55,8 @@
 func.func @subgroup_reduce_f16() attributes {hal.executable.target = #executable_target_vulkan_spirv_fb} {
   %cst = arith.constant 0.000000e+00 : f16
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x4096x4096xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<16x4096x4096xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x4096x4096xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<16x4096x4096xf16>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [16, 4096, 4096], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<16x4096x4096xf16>> -> tensor<16x4096x4096xf16>
   %3 = tensor.empty() : tensor<16x4096x4096xf16>
   %4 = tensor.empty() : tensor<16x4096xf16>
@@ -88,11 +84,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[1], [0, 64]]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
@@ -116,9 +110,9 @@
   %4 = arith.shli %3, %c32_i64 : i64
   %5 = arith.ori %2, %4 : i64
   %6 = arith.index_castui %5 : i64 to index
-  %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8xf32>>
+  %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8xf32>>
   %8 = flow.dispatch.workload.ordinal %6, 0 : index
-  %9 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<8x?xf32>>{%8}
+  %9 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<8x?xf32>>{%8}
   %10 = flow.dispatch.tensor.load %9, offsets = [0, 0], sizes = [8, %8], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<8x?xf32>>{%8} -> tensor<8x?xf32>
   %11 = tensor.empty() : tensor<8xf32>
   %12 = linalg.fill {lowering_config = #config} ins(%cst : f32) outs(%11 : tensor<8xf32>) -> tensor<8xf32>
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_default_sub_byte_types.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_default_sub_byte_types.mlir
index fefcfe0..f2bdec2 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_default_sub_byte_types.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_default_sub_byte_types.mlir
@@ -1,21 +1,19 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=vp_android_baseline_2022@vulkan --pass-pipeline='builtin.module(iree-spirv-select-lowering-strategy-pass)' %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1) -> (d0, d1)>
 #map1 = affine_map<(d0, d1) -> (d0)>
 func.func @i4_dequant() {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<131072x128xi4>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<131072xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<131072xf32>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<131072x128xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<131072x128xi4>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<131072xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<131072xf32>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<131072x128xf32>>
   %4 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [131072, 128], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<131072x128xi4>> -> tensor<131072x128xi4>
   %5 = flow.dispatch.tensor.load %1, offsets = [0], sizes = [131072], strides = [1] : !flow.dispatch.tensor<readonly:tensor<131072xf32>> -> tensor<131072xf32>
   %6 = flow.dispatch.tensor.load %2, offsets = [0], sizes = [131072], strides = [1] : !flow.dispatch.tensor<readonly:tensor<131072xf32>> -> tensor<131072xf32>
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_mali_conv.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_mali_conv.mlir
index 8ae533f..78a057e 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_mali_conv.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_mali_conv.mlir
@@ -2,21 +2,19 @@
 
 // Conv - large OC - distribute to only one workgroup dimension.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @conv_112x112x512() {
   %c0 = arith.constant 0 : index
   %c512 = arith.constant 512 : index
   %c112 = arith.constant 112 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x225x225x3xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x3x512xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x512xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x225x225x3xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x3x512xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x512xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 225, 225, 3], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x225x225x3xf32>> -> tensor<1x225x225x3xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [3, 3, 3, 512], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x3x512xf32>> -> tensor<3x3x3x512xf32>
   %5 = tensor.empty() : tensor<1x112x112x512xf32>
@@ -37,21 +35,19 @@
 
 // Conv - medium OC/OW/OH - distribute to two workgroup dimensions.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @conv_112x112x32() {
   %c0 = arith.constant 0 : index
   %c32 = arith.constant 32 : index
   %c112 = arith.constant 112 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x225x225x3xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x3x32xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x32xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x225x225x3xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x3x32xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x32xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 225, 225, 3], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x225x225x3xf32>> -> tensor<1x225x225x3xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [3, 3, 3, 32], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x3x32xf32>> -> tensor<3x3x3x32xf32>
   %5 = tensor.empty() : tensor<1x112x112x32xf32>
@@ -72,20 +68,18 @@
 
 // Conv - small OC/OW/OH - distribute to all three workgroup dimensions.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @conv_16x16x16() {
   %c0 = arith.constant 0 : index
   %c16 = arith.constant 16 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x33x33x3xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x3x16xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x16x16x16xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x33x33x3xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x3x16xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x16x16x16xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 33, 33, 3], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x33x33x3xf32>> -> tensor<1x33x33x3xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0, 0], sizes = [3, 3, 3, 16], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x3x16xf32>> -> tensor<3x3x3x16xf32>
   %5 = tensor.empty() : tensor<1x16x16x16xf32>
@@ -106,21 +100,19 @@
 
 // Depthwise conv - small OC/OW/OH - distribute to all three workgroup dimensions.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @dwconv_28x28x144() {
   %c0 = arith.constant 0 : index
   %c144 = arith.constant 144 : index
   %c28 = arith.constant 28 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x57x57x144xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x144xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x28x28x144xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x57x57x144xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x144xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x28x28x144xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [0, 57, 57, 144], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x57x57x144xf32>> -> tensor<1x57x57x144xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [3, 3, 144], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x144xf32>> -> tensor<3x3x144xf32>
   %5 = tensor.empty() : tensor<1x28x28x144xf32>
@@ -141,12 +133,10 @@
 
 // Depthwise conv - tiny OC/OW/OH - starving the GPU.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @dwconv_1x2x8() {
   %c0 = arith.constant 0 : index
@@ -154,9 +144,9 @@
   %c2 = arith.constant 2 : index
   %c1 = arith.constant 1 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x3x5x8xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x8xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x1x2x8xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1x3x5x8xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<3x3x8xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1x1x2x8xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1, 3, 5, 8], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1x3x5x8xf32>> -> tensor<1x3x5x8xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [3, 3, 8], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<3x3x8xf32>> -> tensor<3x3x8xf32>
   %5 = tensor.empty() : tensor<1x1x2x8xf32>
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_mali_matmul.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_mali_matmul.mlir
index 5f30177..7facf67 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_mali_matmul.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_mali_matmul.mlir
@@ -2,21 +2,19 @@
 
 // Large matmul that can match the best tiling scheme.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_1024x2048x512() {
   %c0 = arith.constant 0 : index
   %c2048 = arith.constant 2048 : index
   %c1024 = arith.constant 1024 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1024x512xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<512x2048xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1024x2048xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1024x512xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<512x2048xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1024x2048xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1024, 512], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1024x512xf32>> -> tensor<1024x512xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [512, 2048], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<512x2048xf32>> -> tensor<512x2048xf32>
   %5 = tensor.empty() : tensor<1024x2048xf32>
@@ -37,21 +35,19 @@
 
 // Small matmul N that can still tile to all threads in a workgroup.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_3136x24x96() {
   %c0 = arith.constant 0 : index
   %c24 = arith.constant 24 : index
   %c3136 = arith.constant 3136 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<3136x96xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<96x24xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<3136x24xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<3136x96xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<96x24xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<3136x24xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [3136, 96], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<3136x96xf32>> -> tensor<3136x96xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [96, 24], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<96x24xf32>> -> tensor<96x24xf32>
   %5 = tensor.empty() : tensor<3136x24xf32>
@@ -72,21 +68,19 @@
 
 // Small matmul M that can still tile to all threads in a workgroup.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_196x64x192() {
   %c0 = arith.constant 0 : index
   %c64 = arith.constant 64 : index
   %c196 = arith.constant 196 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<196x192xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<192x64xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<196x64xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<196x192xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<192x64xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<196x64xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [196, 192], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<196x192xf32>> -> tensor<196x192xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [192, 64], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<192x64xf32>> -> tensor<192x64xf32>
   %5 = tensor.empty() : tensor<196x64xf32>
@@ -107,21 +101,19 @@
 
 // Small matmul K that can still tile to all threads in a workgroup.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_12544x96x16() {
   %c0 = arith.constant 0 : index
   %c96 = arith.constant 96 : index
   %c12544 = arith.constant 12544 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<12544x16xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<16x96xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<12544x96xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<12544x16xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<16x96xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<12544x96xf32>
   linalg.fill ins(%cst : f32) outs(%2 : memref<12544x96xf32>)
   linalg.matmul ins(%0, %1 : memref<12544x16xf32>, memref<16x96xf32>) outs(%2 : memref<12544x96xf32>)
   return
@@ -138,21 +130,19 @@
 
 // Odd matmul M and small N that cannot utilize all threads in a workgroup.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_49x160x576() {
   %c0 = arith.constant 0 : index
   %c160 = arith.constant 160 : index
   %c49 = arith.constant 49 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<49x576xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<576x160xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<49x160xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<49x576xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<576x160xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<49x160xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [49, 576], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<49x576xf32>> -> tensor<49x576xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [576, 160], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<576x160xf32>> -> tensor<576x160xf32>
   %5 = tensor.empty() : tensor<49x160xf32>
@@ -173,12 +163,10 @@
 
 // Small matmul M to "shift" parallelism to N.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_2x1024x576() {
   %cst = arith.constant 0.000000e+00 : f32
@@ -189,10 +177,10 @@
   %c3436864 = arith.constant 3436864 : index
   %c10141312 = arith.constant 10141312 : index
   %c2304 = arith.constant 2304 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x576xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c3436864) : !flow.dispatch.tensor<readonly:tensor<576x1024xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c10141312) : !flow.dispatch.tensor<readonly:tensor<2x1024xf32>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x1024xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x576xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c3436864) : !flow.dispatch.tensor<readonly:tensor<576x1024xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c10141312) : !flow.dispatch.tensor<readonly:tensor<2x1024xf32>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x1024xf32>>
   %4 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1, 576], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2x576xf32>> -> tensor<2x576xf32>
   %5 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [576, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<576x1024xf32>> -> tensor<576x1024xf32>
   %6 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [1, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2x1024xf32>> -> tensor<2x1024xf32>
@@ -214,21 +202,19 @@
 
 // Large matmul with i8 inputs.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_1024x2048x512xi8() {
   %c0 = arith.constant 0 : index
   %c2048 = arith.constant 2048 : index
   %c1024 = arith.constant 1024 : index
   %c0_i32 = arith.constant 0 : i32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1024x512xi8>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<512x2048xi8>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1024x2048xi32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1024x512xi8>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<512x2048xi8>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<1024x2048xi32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1024, 512], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1024x512xi8>> -> tensor<1024x512xi8>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [512, 2048], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<512x2048xi8>> -> tensor<512x2048xi8>
   %5 = tensor.empty() : tensor<1024x2048xi32>
@@ -240,21 +226,19 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @batch_matmul_4x384x384() {
   %c0 = arith.constant 0 : index
   %c384 = arith.constant 384 : index
   %c4 = arith.constant 4 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<4x384x32xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<4x32x384xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<4x384x384xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<4x384x32xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<4x32x384xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<4x384x384xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [4, 384, 32], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4x384x32xf32>> -> tensor<4x384x32xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [4, 32, 384], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4x32x384xf32>> -> tensor<4x32x384xf32>
   %5 = tensor.empty() : tensor<4x384x384xf32>
@@ -275,12 +259,10 @@
 
 // Small batch matmul.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @batch_matmul_4x2x8() {
   %c0 = arith.constant 0 : index
@@ -288,9 +270,9 @@
   %c2 = arith.constant 2 : index
   %c4 = arith.constant 4 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<4x2x32xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<4x32x8xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<4x2x8xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<4x2x32xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<4x32x8xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<4x2x8xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [4, 2, 32], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4x2x32xf32>> -> tensor<4x2x32xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [4, 32, 8], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4x32x8xf32>> -> tensor<4x32x8xf32>
   %5 = tensor.empty() : tensor<4x2x8xf32>
@@ -311,12 +293,10 @@
 
 // Linalg.generic that is a batch matmul.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2, d3) -> (d1, d0, d3)>
 #map1 = affine_map<(d0, d1, d2, d3) -> (d0, d3, d2)>
@@ -327,9 +307,9 @@
 func.func @generic_batch_matmul_32x2x512() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<8x32x64xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<32x64x512xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32x8x512xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<8x32x64xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<32x64x512xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32x8x512xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [2, 32, 64], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<8x32x64xf32>> -> tensor<8x32x64xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [32, 64, 512], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<32x64x512xf32>> -> tensor<32x64x512xf32>
   %5 = tensor.empty() : tensor<32x8x512xf32>
@@ -355,13 +335,11 @@
 
 // Linalg.generic that is a batch matmul.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>
 #map1 = affine_map<(d0, d1, d2, d3) -> (d3, d2)>
@@ -372,11 +350,11 @@
   %c537247744 = arith.constant 537247744 : index
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c168607744) : !flow.dispatch.tensor<readonly:tensor<8x2500x4608xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4608x512xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c537247744) : !flow.dispatch.tensor<readonly:tensor<8x2500x512xf32>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<8x2500x512xf32>>
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8x2500x512xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c168607744) : !flow.dispatch.tensor<readonly:tensor<8x2500x4608xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4608x512xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c537247744) : !flow.dispatch.tensor<readonly:tensor<8x2500x512xf32>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<8x2500x512xf32>>
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<8x2500x512xf32>>
   %5 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [8, 2500, 4608], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<8x2500x4608xf32>> -> tensor<8x2500x4608xf32>
   %6 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [4608, 512], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4608x512xf32>> -> tensor<4608x512xf32>
   %7 = flow.dispatch.tensor.load %2, offsets = [0, 0, 0], sizes = [8, 2500, 512], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<8x2500x512xf32>> -> tensor<8x2500x512xf32>
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_nvidia_matmul.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_nvidia_matmul.mlir
index 4c3f060..8d4f3f0 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_nvidia_matmul.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_nvidia_matmul.mlir
@@ -1,11 +1,9 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=pascal@vulkan --pass-pipeline='builtin.module(iree-spirv-select-lowering-strategy-pass)' %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_4x4096x9216() {
   %c36864 = arith.constant 36864 : index
@@ -13,10 +11,10 @@
   %c209920 = arith.constant 209920 : index
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4x9216xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c209920) : !flow.dispatch.tensor<readonly:tensor<9216x4096xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c667974912) : !flow.dispatch.tensor<readonly:tensor<4x4096xf32>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c36864) : !flow.dispatch.tensor<writeonly:tensor<4x4096xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<4x9216xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c209920) : !flow.dispatch.tensor<readonly:tensor<9216x4096xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c667974912) : !flow.dispatch.tensor<readonly:tensor<4x4096xf32>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c36864) : !flow.dispatch.tensor<writeonly:tensor<4x4096xf32>>
   %4 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1, 9216], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4x9216xf32>> -> tensor<4x9216xf32>
   %5 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [9216, 4096], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<9216x4096xf32>> -> tensor<9216x4096xf32>
   %6 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [1, 4096], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4x4096xf32>> -> tensor<4x4096xf32>
@@ -36,12 +34,10 @@
 
 // Matvec does not go down matmul pipelines.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_1x4096x9216() {
   %c36864 = arith.constant 36864 : index
@@ -49,10 +45,10 @@
   %c209920 = arith.constant 209920 : index
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x9216xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c209920) : !flow.dispatch.tensor<readonly:tensor<9216x4096xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c667974912) : !flow.dispatch.tensor<readonly:tensor<1x4096xf32>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c36864) : !flow.dispatch.tensor<writeonly:tensor<1x4096xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x9216xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c209920) : !flow.dispatch.tensor<readonly:tensor<9216x4096xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c667974912) : !flow.dispatch.tensor<readonly:tensor<1x4096xf32>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c36864) : !flow.dispatch.tensor<writeonly:tensor<1x4096xf32>>
   %4 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1, 9216], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1x9216xf32>> -> tensor<1x9216xf32>
   %5 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [9216, 4096], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<9216x4096xf32>> -> tensor<9216x4096xf32>
   %6 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [1, 4096], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1x4096xf32>> -> tensor<1x4096xf32>
@@ -72,12 +68,10 @@
 
 // Multi-reduction-dimension transposed-B matmul.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2, d3) -> (d0, d2, d3)>
 #map1 = affine_map<(d0, d1, d2, d3) -> (d1, d2, d3)>
@@ -85,9 +79,9 @@
 func.func @multi_reduction_transposed_b_matmul() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86x128xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x86x128xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4096x2048xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86x128xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2048x86x128xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4096x2048xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [4096, 86, 128], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x86x128xf32>> -> tensor<4096x86x128xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [2048, 86, 128], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<2048x86x128xf32>> -> tensor<2048x86x128xf32>
   %5 = tensor.empty() : tensor<4096x2048xf32>
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_nvidia_matmul_cooperative_ops.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_nvidia_matmul_cooperative_ops.mlir
index db27e00..b818edd 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_nvidia_matmul_cooperative_ops.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_nvidia_matmul_cooperative_ops.mlir
@@ -2,14 +2,12 @@
 // RUN:   --pass-pipeline='builtin.module(iree-spirv-select-lowering-strategy-pass)' %s |  \
 // RUN:   FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>,
-    #hal.descriptor_set.binding<4, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1) -> (d0, d1)>
 func.func @matmul_256x1024x128_div_add() {
@@ -17,11 +15,11 @@
   %c1024 = arith.constant 1024 : index
   %c256 = arith.constant 256 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<256x1024xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<256x1024xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<256x128xf16>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<readonly:tensor<128x1024xf16>>
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(4) : !flow.dispatch.tensor<writeonly:tensor<256x1024xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<256x1024xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<256x1024xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<256x128xf16>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<readonly:tensor<128x1024xf16>>
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(4) : !flow.dispatch.tensor<writeonly:tensor<256x1024xf16>>
   %5 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [256, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x1024xf16>> -> tensor<256x1024xf16>
   %6 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [256, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x1024xf16>> -> tensor<256x1024xf16>
   %7 = tensor.empty() : tensor<256x1024xf16>
@@ -49,22 +47,20 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
 func.func @batch_matmul_16x128x256x512_div() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x128x512xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x512x256xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x128x256xf16>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<16x128x256xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x128x512xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x512x256xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x128x256xf16>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<16x128x256xf16>>
   %4 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [16, 128, 512], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<16x128x512xf16>> -> tensor<16x128x512xf16>
   %5 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [16, 512, 256], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<16x512x256xf16>> -> tensor<16x512x256xf16>
   %6 = flow.dispatch.tensor.load %2, offsets = [0, 0, 0], sizes = [16, 128, 256], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<16x128x256xf16>> -> tensor<16x128x256xf16>
@@ -91,12 +87,10 @@
 
 // Linalg.generic that is a batch matmul.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2, d3) -> (d1, d0, d3)>
 #map1 = affine_map<(d0, d1, d2, d3) -> (d0, d3, d2)>
@@ -107,9 +101,9 @@
 func.func @generic_batch_matmul_32x8x512x64() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x32x64xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<32x64x512xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32x128x512xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x32x64xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<32x64x512xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32x128x512xf16>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [2, 32, 64], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<128x32x64xf16>> -> tensor<128x32x64xf16>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [32, 64, 512], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<32x64x512xf16>> -> tensor<32x64x512xf16>
   %5 = tensor.empty() : tensor<32x128x512xf16>
@@ -135,19 +129,17 @@
 
 // K dim size not divisble by 32.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @batch_matmul_16x1024x1024x80() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x1024x80xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x80x1024xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<16x1024x1024xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x1024x80xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x80x1024xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<16x1024x1024xf16>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [16, 1024, 80], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<16x1024x80xf16>> -> tensor<16x1024x80xf16>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [16, 80, 1024], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<16x80x1024xf16>> -> tensor<16x80x1024xf16>
   %5 = tensor.empty() : tensor<16x1024x1024xf16>
@@ -168,21 +160,19 @@
 
 // Small K - not supported by cooperative matrix.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_256x1024x8() {
   %c0 = arith.constant 0 : index
   %c1024 = arith.constant 1024 : index
   %c256 = arith.constant 256 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<256x8xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<8x1024xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<256x1024xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<256x8xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<8x1024xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<256x1024xf16>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [256, 8], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x8xf16>> -> tensor<256x8xf16>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [8, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<8x1024xf16>> -> tensor<8x1024xf16>
   %5 = tensor.empty() : tensor<256x1024xf16>
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_user.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_user.mlir
index f42b1c1..ed93869 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/config_user.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/config_user.mlir
@@ -1,11 +1,9 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=vp_android_baseline_2022@vulkan --pass-pipeline='builtin.module(iree-codegen-materialize-user-configs, iree-spirv-select-lowering-strategy-pass)' %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[128, 256], [16, 16]]>
 #translation = #iree_codegen.translation_info<SPIRVBaseVectorize workgroup_size = [16, 8, 1] subgroup_size = 64>
@@ -15,9 +13,9 @@
   %c128 = arith.constant 128 : index
   %c1024 = arith.constant 1024 : index
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<128x256xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<256x1024xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<128x1024xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<128x256xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<256x1024xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<128x1024xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [128, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<128x256xf32>> -> tensor<128x256xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [256, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x1024xf32>> -> tensor<256x1024xf32>
   %5 = tensor.empty() : tensor<128x1024xf32>
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/convert_gpu_target.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/convert_gpu_target.mlir
index eca6f4a..73f09e6 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/convert_gpu_target.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/convert_gpu_target.mlir
@@ -6,7 +6,9 @@
       wgp = <compute = fp64|fp32|fp16|int64|int32|int16|int8, storage = b64|b32|b16|b8, subgroup = shuffle|arithmetic, dot = dp4xi8toi32, mma = [<WMMA_F32_16x16x16_F16>, <WMMA_F16_16x16x16_F16>],
       subgroup_size_choices = [32, 64], max_workgroup_sizes = [1024, 1024, 1024], max_thread_count_per_workgroup = 1024, max_workgroup_memory_bytes = 65536,
       max_workgroup_counts = [2147483647, 2147483647, 2147483647]>>}>) {
-  hal.executable.export public @dispatch ordinal(0) layout(#hal.pipeline.layout<push_constants = 0, sets = [<0, bindings = [<0, storage_buffer>]>]>) {
+  hal.executable.export public @dispatch ordinal(0) layout(#hal.pipeline.layout<bindings = [
+    #hal.pipeline.binding<storage_buffer>]>
+  ) {
   ^bb0(%arg0: !hal.device):
     %x, %y, %z = flow.dispatch.workgroup_count_from_slice
     hal.return %x, %y, %z : index, index, index
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/convert_to_spirv.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/convert_to_spirv.mlir
index 9fcdab1..b437006 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/convert_to_spirv.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/convert_to_spirv.mlir
@@ -1,11 +1,9 @@
 // RUN: iree-opt --split-input-file --pass-pipeline='builtin.module(hal.executable(hal.executable.variant(builtin.module(iree-convert-to-spirv))))' %s | FileCheck %s
 // RUN: iree-opt --split-input-file --pass-pipeline='builtin.module(hal.executable(hal.executable.variant(builtin.module(iree-convert-to-spirv{index-bits=64}))))' %s | FileCheck %s --check-prefix=INDEX64
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 5, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 5, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @push_constant {
   hal.executable.variant @vulkan target(<"vulkan-spirv", "vulkan-spirv-fb">) {
@@ -38,14 +36,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 5, sets = [
-  #hal.descriptor_set.layout<1, bindings = [
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>,
-  #hal.descriptor_set.layout<3, bindings = [
-    #hal.descriptor_set.binding<4, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 5, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @resource_bindings_in_same_func {
   hal.executable.variant @vulkan target(<"vulkan-spirv", "vulkan-spirv-fb">) {
@@ -54,10 +48,10 @@
     }
     builtin.module attributes {spirv.target_env = #spirv.target_env<#spirv.vce<v1.3, [Int64, Shader], []>, #spirv.resource_limits<>>} {
       // CHECK-LABEL: spirv.module
-      // CHECK: spirv.GlobalVariable @[[ARG0:.+]] bind(1, 2) : !spirv.ptr<!spirv.struct<(!spirv.array<16 x f32, stride=4> [0])>, StorageBuffer>
-      // CHECK: spirv.GlobalVariable @[[ARG1_0:.+]] bind(1, 3) {aliased} : !spirv.ptr<!spirv.struct<(!spirv.array<16 x f32, stride=4> [0])>, StorageBuffer>
-      // CHECK: spirv.GlobalVariable @[[ARG1_1:.+]] bind(1, 3) {aliased} : !spirv.ptr<!spirv.struct<(!spirv.array<4 x vector<4xf32>, stride=16> [0])>, StorageBuffer>
-      // CHECK: spirv.GlobalVariable @[[RET0:.+]] bind(3, 4) : !spirv.ptr<!spirv.struct<(!spirv.array<16 x f32, stride=4> [0])>, StorageBuffer>
+      // CHECK: spirv.GlobalVariable @[[ARG0:.+]] bind(0, 0) : !spirv.ptr<!spirv.struct<(!spirv.array<16 x f32, stride=4> [0])>, StorageBuffer>
+      // CHECK: spirv.GlobalVariable @[[ARG1_0:.+]] bind(0, 1) {aliased} : !spirv.ptr<!spirv.struct<(!spirv.array<16 x f32, stride=4> [0])>, StorageBuffer>
+      // CHECK: spirv.GlobalVariable @[[ARG1_1:.+]] bind(0, 1) {aliased} : !spirv.ptr<!spirv.struct<(!spirv.array<4 x vector<4xf32>, stride=16> [0])>, StorageBuffer>
+      // CHECK: spirv.GlobalVariable @[[RET0:.+]] bind(0, 2) : !spirv.ptr<!spirv.struct<(!spirv.array<16 x f32, stride=4> [0])>, StorageBuffer>
       // CHECK: spirv.func @resource_bindings_in_same_entry_func()
       func.func @resource_bindings_in_same_entry_func() -> f32 {
         %c0 = arith.constant 0 : index
@@ -65,17 +59,17 @@
         // Same type
         // CHECK: spirv.mlir.addressof @[[ARG0]]
         // CHECK: spirv.mlir.addressof @[[ARG0]]
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(1) binding(2) : memref<4x4xf32, #spirv.storage_class<StorageBuffer>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(1) binding(2) : memref<4x4xf32, #spirv.storage_class<StorageBuffer>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<4x4xf32, #spirv.storage_class<StorageBuffer>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<4x4xf32, #spirv.storage_class<StorageBuffer>>
 
         // Different type
         // CHECK: spirv.mlir.addressof @[[ARG1_0]]
         // CHECK: spirv.mlir.addressof @[[ARG1_1]]
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(1) binding(3) : memref<4x4xf32, #spirv.storage_class<StorageBuffer>>
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(1) binding(3) : memref<4xvector<4xf32>, #spirv.storage_class<StorageBuffer>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<4x4xf32, #spirv.storage_class<StorageBuffer>>
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<4xvector<4xf32>, #spirv.storage_class<StorageBuffer>>
 
         // CHECK: spirv.mlir.addressof @[[RET0]]
-        %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(3) binding(4) : memref<4x4xf32, #spirv.storage_class<StorageBuffer>>
+        %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<4x4xf32, #spirv.storage_class<StorageBuffer>>
 
         %5 = memref.load %0[%c0, %c0] : memref<4x4xf32, #spirv.storage_class<StorageBuffer>>
         %6 = memref.load %1[%c0, %c0] : memref<4x4xf32, #spirv.storage_class<StorageBuffer>>
@@ -99,13 +93,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 5, sets = [
-  #hal.descriptor_set.layout<1, bindings = [
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>,
-  #hal.descriptor_set.layout<3, bindings = [
-    #hal.descriptor_set.binding<4, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 5, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @resource_bindings_in_multi_entry_func {
   hal.executable.variant @vulkan target(<"vulkan-spirv", "vulkan-spirv-fb">) {
@@ -117,18 +107,18 @@
     }
     builtin.module attributes {spirv.target_env = #spirv.target_env<#spirv.vce<v1.3, [Int64, Shader], []>, #spirv.resource_limits<>>} {
       // CHECK-LABEL: spirv.module
-      // CHECK: spirv.GlobalVariable @[[FUNC1_ARG:.+]] bind(1, 2) : !spirv.ptr<!spirv.struct<(!spirv.array<16 x f32, stride=4> [0])>, StorageBuffer>
-      // CHECK: spirv.GlobalVariable @[[FUNC1_RET:.+]] bind(3, 4) : !spirv.ptr<!spirv.struct<(!spirv.array<4 x vector<4xf32>, stride=16> [0])>, StorageBuffer>
-      // CHECK: spirv.GlobalVariable @[[FUNC2_ARG:.+]] bind(1, 2) : !spirv.ptr<!spirv.struct<(!spirv.array<16 x f32, stride=4> [0])>, StorageBuffer>
-      // CHECK: spirv.GlobalVariable @[[FUNC2_RET:.+]] bind(3, 4) : !spirv.ptr<!spirv.struct<(!spirv.array<16 x f32, stride=4> [0])>, StorageBuffer>
+      // CHECK: spirv.GlobalVariable @[[FUNC1_ARG:.+]] bind(0, 0) : !spirv.ptr<!spirv.struct<(!spirv.array<16 x f32, stride=4> [0])>, StorageBuffer>
+      // CHECK: spirv.GlobalVariable @[[FUNC1_RET:.+]] bind(0, 1) : !spirv.ptr<!spirv.struct<(!spirv.array<4 x vector<4xf32>, stride=16> [0])>, StorageBuffer>
+      // CHECK: spirv.GlobalVariable @[[FUNC2_ARG:.+]] bind(0, 0) : !spirv.ptr<!spirv.struct<(!spirv.array<16 x f32, stride=4> [0])>, StorageBuffer>
+      // CHECK: spirv.GlobalVariable @[[FUNC2_RET:.+]] bind(0, 1) : !spirv.ptr<!spirv.struct<(!spirv.array<16 x f32, stride=4> [0])>, StorageBuffer>
 
       // CHECK: spirv.func @resource_bindings_in_entry_func1()
       func.func @resource_bindings_in_entry_func1() -> f32 {
         // CHECK: spirv.mlir.addressof @[[FUNC1_ARG]]
         // CHECK: spirv.mlir.addressof @[[FUNC1_RET]]
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(1) binding(2) : memref<4x4xf32, #spirv.storage_class<StorageBuffer>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(3) binding(4) : memref<4xvector<4xf32>, #spirv.storage_class<StorageBuffer>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<4x4xf32, #spirv.storage_class<StorageBuffer>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<4xvector<4xf32>, #spirv.storage_class<StorageBuffer>>
 
         %2 = memref.load %0[%c0, %c0] : memref<4x4xf32, #spirv.storage_class<StorageBuffer>>
         %3 = memref.load %1[%c0] : memref<4xvector<4xf32>, #spirv.storage_class<StorageBuffer>>
@@ -144,8 +134,8 @@
         // CHECK: spirv.mlir.addressof @[[FUNC2_ARG]]
         // CHECK: spirv.mlir.addressof @[[FUNC2_RET]]
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(1) binding(2) : memref<4x4xf32, #spirv.storage_class<StorageBuffer>> // Same type as previous function
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(3) binding(4) : memref<4x4xf32, #spirv.storage_class<StorageBuffer>> // Different type as previous function
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<4x4xf32, #spirv.storage_class<StorageBuffer>> // Same type as previous function
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<4x4xf32, #spirv.storage_class<StorageBuffer>> // Different type as previous function
 
         %2 = memref.load %0[%c0, %c0] : memref<4x4xf32, #spirv.storage_class<StorageBuffer>>
         %3 = memref.load %1[%c0, %c0] : memref<4x4xf32, #spirv.storage_class<StorageBuffer>>
@@ -160,12 +150,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @interface_binding {
   hal.executable.variant @vulkan target(<"vulkan-spirv", "vulkan-spirv-fb">) {
@@ -175,9 +163,9 @@
     builtin.module attributes {spirv.target_env = #spirv.target_env<#spirv.vce<v1.3, [Int64, Shader], []>, #spirv.resource_limits<>>} {
       func.func @interface_binding() -> f32 {
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<8x5xf32, #spirv.storage_class<StorageBuffer>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<5xf32, #spirv.storage_class<StorageBuffer>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<8x5xf32, #spirv.storage_class<StorageBuffer>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<8x5xf32, #spirv.storage_class<StorageBuffer>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<5xf32, #spirv.storage_class<StorageBuffer>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<8x5xf32, #spirv.storage_class<StorageBuffer>>
 
         %3 = memref.load %0[%c0, %c0] : memref<8x5xf32, #spirv.storage_class<StorageBuffer>>
         %4 = memref.load %1[%c0] : memref<5xf32, #spirv.storage_class<StorageBuffer>>
@@ -205,12 +193,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @interface_wg_id {
   hal.executable.variant @vulkan target(<"vulkan-spirv", "vulkan-spirv-fb">) {
@@ -240,12 +226,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @interface_wg_size {
   hal.executable.variant @vulkan target(<"vulkan-spirv", "vulkan-spirv-fb">) {
@@ -257,7 +241,7 @@
         %c0 = arith.constant 0.0 : f32
         %workgroup_size_x = hal.interface.workgroup.size[0] : index
         %workgroup_size_y = hal.interface.workgroup.size[1] : index
-        %subspan = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<64x64xf32, #spirv.storage_class<StorageBuffer>>
+        %subspan = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<64x64xf32, #spirv.storage_class<StorageBuffer>>
         memref.store %c0, %subspan[%workgroup_size_x, %workgroup_size_y] : memref<64x64xf32, #spirv.storage_class<StorageBuffer>>
         return
       }
@@ -278,12 +262,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @interface_wg_count {
   hal.executable.variant @vulkan target(<"vulkan-spirv", "vulkan-spirv-fb">) {
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/emulate_i64.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/emulate_i64.mlir
index eb1c281..2840572 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/emulate_i64.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/emulate_i64.mlir
@@ -2,12 +2,10 @@
 // RUN:   --pass-pipeline='builtin.module(func.func(iree-spirv-emulate-i64))' %s | \
 // RUN:   FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.gpu.target = #iree_gpu.target<arch = "", features = "spirv:v1.6,cap:Shader", wgp = <
@@ -19,9 +17,9 @@
 func.func @buffer_types() attributes {hal.executable.target = #executable_target_vulkan_spirv_fb} {
   %c0 = arith.constant 0 : index
   %c1_i64 = arith.constant 1 : i64
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<8xi32, #spirv.storage_class<StorageBuffer>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<8xi64, #spirv.storage_class<StorageBuffer>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<8xi64, #spirv.storage_class<StorageBuffer>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<8xi32, #spirv.storage_class<StorageBuffer>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<8xi64, #spirv.storage_class<StorageBuffer>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<8xi64, #spirv.storage_class<StorageBuffer>>
   %3 = memref.load %0[%c0] : memref<8xi32, #spirv.storage_class<StorageBuffer>>
   %4 = memref.load %1[%c0] : memref<8xi64, #spirv.storage_class<StorageBuffer>>
   %5 = arith.addi %4, %c1_i64 : i64
@@ -32,8 +30,8 @@
 // Check that without the Int64 capability emulation produces expected i32 ops.
 //
 // CHECK-LABEL: func.func @buffer_types
-//       CHECK:   [[REF_I64_0:%.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) : memref<8xvector<2xi32>, #spirv.storage_class<StorageBuffer>>
-//       CHECK:   [[REF_I64_1:%.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) : memref<8xvector<2xi32>, #spirv.storage_class<StorageBuffer>>
+//       CHECK:   [[REF_I64_0:%.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) : memref<8xvector<2xi32>, #spirv.storage_class<StorageBuffer>>
+//       CHECK:   [[REF_I64_1:%.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2) : memref<8xvector<2xi32>, #spirv.storage_class<StorageBuffer>>
 //       CHECK:   [[VI64:%.+]]      = memref.load [[REF_I64_0]][{{%.+}}] : memref<8xvector<2xi32>, #spirv.storage_class<StorageBuffer>>
 //       CHECK:   {{%.+}}           = arith.addui_extended {{%.+}}, {{%.+}} : i32, i1
 //       CHECK:   memref.store {{%.+}}, [[REF_I64_1]][{{%.+}}] : memref<8xvector<2xi32>, #spirv.storage_class<StorageBuffer>>
@@ -41,11 +39,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.gpu.target = #iree_gpu.target<arch = "", features = "spirv:v1.6,cap:Shader", wgp = <
@@ -61,9 +57,9 @@
   %c36864 = arith.constant 36864 : index
   %c1523712 = arith.constant 1523712 : index
   %c96 = arith.constant 96 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<?xvector<4xi32>, #spirv.storage_class<StorageBuffer>>{%c96}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c1523712) : memref<?xvector<4xi32>, #spirv.storage_class<StorageBuffer>>{%c36864}
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<?xvector<4xi32>, #spirv.storage_class<StorageBuffer>>{%c36864}
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<?xvector<4xi32>, #spirv.storage_class<StorageBuffer>>{%c96}
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c1523712) : memref<?xvector<4xi32>, #spirv.storage_class<StorageBuffer>>{%c36864}
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<?xvector<4xi32>, #spirv.storage_class<StorageBuffer>>{%c36864}
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
   %thread_id_x = gpu.thread_id  x
@@ -94,12 +90,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.gpu.target = #iree_gpu.target<arch = "", features = "spirv:v1.6,cap:Shader", wgp = <
@@ -111,9 +105,9 @@
 func.func @no_emulation() attributes {hal.executable.target = #executable_target_vulkan_spirv_fb} {
   %c0 = arith.constant 0 : index
   %c1_i64 = arith.constant 1 : i64
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<8xi32, #spirv.storage_class<StorageBuffer>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<8xi64, #spirv.storage_class<StorageBuffer>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<8xi64, #spirv.storage_class<StorageBuffer>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<8xi32, #spirv.storage_class<StorageBuffer>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<8xi64, #spirv.storage_class<StorageBuffer>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<8xi64, #spirv.storage_class<StorageBuffer>>
   %3 = memref.load %0[%c0] : memref<8xi32, #spirv.storage_class<StorageBuffer>>
   %4 = memref.load %1[%c0] : memref<8xi64, #spirv.storage_class<StorageBuffer>>
   %5 = arith.addi %4, %c1_i64 : i64
@@ -125,9 +119,9 @@
 //
 // CHECK-LABEL: func.func @no_emulation
 //       CHECK:   [[CST1:%.+]]      = arith.constant 1 : i64
-//       CHECK:   [[REF_I32:%.+]]   = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<8xi32, #spirv.storage_class<StorageBuffer>>
-//       CHECK:   [[REF_I64_0:%.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) : memref<8xi64, #spirv.storage_class<StorageBuffer>>
-//       CHECK:   [[REF_I64_1:%.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) : memref<8xi64, #spirv.storage_class<StorageBuffer>>
+//       CHECK:   [[REF_I32:%.+]]   = hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<8xi32, #spirv.storage_class<StorageBuffer>>
+//       CHECK:   [[REF_I64_0:%.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) : memref<8xi64, #spirv.storage_class<StorageBuffer>>
+//       CHECK:   [[REF_I64_1:%.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2) : memref<8xi64, #spirv.storage_class<StorageBuffer>>
 //       CHECK:   [[VI32:%.+]]      = memref.load [[REF_I32]][{{%.+}}] : memref<8xi32, #spirv.storage_class<StorageBuffer>>
 //       CHECK:   [[VI64:%.+]]      = memref.load [[REF_I64_0]][{{%.+}}] : memref<8xi64, #spirv.storage_class<StorageBuffer>>
 //       CHECK:   {{%.+}}           = arith.addi {{%.+}} : i64
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/erase_storage_buffer_static_shape.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/erase_storage_buffer_static_shape.mlir
index aa25417..6031f46 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/erase_storage_buffer_static_shape.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/erase_storage_buffer_static_shape.mlir
@@ -1,14 +1,12 @@
 // RUN: iree-opt --split-input-file --pass-pipeline="builtin.module(func.func(iree-spirv-erase-storage-buffer-static-shape))" %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @storage_buffer_load_store(%offset: index, %i0: index, %i1: index) {
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%offset) flags(ReadOnly) : memref<256xf32, #hal.descriptor_type<storage_buffer>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%offset) : memref<256xf32, #hal.descriptor_type<storage_buffer>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%offset) flags(ReadOnly) : memref<256xf32, #hal.descriptor_type<storage_buffer>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%offset) : memref<256xf32, #hal.descriptor_type<storage_buffer>>
   %val = memref.load %0[%i0] : memref<256xf32, #hal.descriptor_type<storage_buffer>>
   memref.store %val, %1[%i1] : memref<256xf32, #hal.descriptor_type<storage_buffer>>
   return
@@ -17,8 +15,8 @@
 // CHECK-LABEL: func.func @storage_buffer_load_store
 //  CHECK-SAME: (%[[OFFSET:.+]]: index, %[[I0:.+]]: index, %[[I1:.+]]: index)
 //       CHECK:   %[[C256:.+]] = arith.constant 256 : index
-//       CHECK:   %[[SPAN0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64) offset(%[[OFFSET]]) flags(ReadOnly) : memref<?xf32, #hal.descriptor_type<storage_buffer>>{%[[C256]]}
-//       CHECK:   %[[SPAN1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) alignment(64) offset(%[[OFFSET]]) : memref<?xf32, #hal.descriptor_type<storage_buffer>>{%[[C256]]}
+//       CHECK:   %[[SPAN0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64) offset(%[[OFFSET]]) flags(ReadOnly) : memref<?xf32, #hal.descriptor_type<storage_buffer>>{%[[C256]]}
+//       CHECK:   %[[SPAN1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) alignment(64) offset(%[[OFFSET]]) : memref<?xf32, #hal.descriptor_type<storage_buffer>>{%[[C256]]}
 //       CHECK:   %[[LD:.+]] = memref.load %[[SPAN0]][%[[I0]]]
 //       CHECK:   memref.store %[[LD]], %[[SPAN1]][%[[I1]]]
 
@@ -26,51 +24,45 @@
 
 // Test that we don't rewrite memref for uniform buffers.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, uniform_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<uniform_buffer>
 ]>
 func.func @uniform_buffer_load(%offset: index, %i0: index) -> f32 {
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%offset) flags(ReadOnly) : memref<256xf32, #hal.descriptor_type<uniform_buffer>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%offset) flags(ReadOnly) : memref<256xf32, #hal.descriptor_type<uniform_buffer>>
   %val = memref.load %0[%i0] : memref<256xf32, #hal.descriptor_type<uniform_buffer>>
   return %val : f32
 }
 
 // CHECK-LABEL: func.func @uniform_buffer_load
-//       CHECK:   %[[SPAN0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64) offset(%{{.+}}) flags(ReadOnly) : memref<256xf32, #hal.descriptor_type<uniform_buffer>>
+//       CHECK:   %[[SPAN0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64) offset(%{{.+}}) flags(ReadOnly) : memref<256xf32, #hal.descriptor_type<uniform_buffer>>
 //       CHECK:   memref.load %[[SPAN0]]
 
 // -----
 
 // Test that we don't rewrite memref without HAL descriptor types.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, uniform_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<uniform_buffer>
 ]>
 func.func @uniform_buffer_load(%offset: index, %i0: index) -> f32 {
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%offset) flags(ReadOnly) : memref<256xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%offset) flags(ReadOnly) : memref<256xf32>
   %val = memref.load %0[%i0] : memref<256xf32>
   return %val : f32
 }
 
 // CHECK-LABEL: func.func @uniform_buffer_load
-//       CHECK:   %[[SPAN0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64) offset(%{{.+}}) flags(ReadOnly) : memref<256xf32>
+//       CHECK:   %[[SPAN0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64) offset(%{{.+}}) flags(ReadOnly) : memref<256xf32>
 //       CHECK:   memref.load %[[SPAN0]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @storage_buffer_transfer_read_write(%offset: index, %i0: index, %i1: index) {
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%offset) flags(ReadOnly) : memref<256xf32, #hal.descriptor_type<storage_buffer>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%offset) : memref<256xf32, #hal.descriptor_type<storage_buffer>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%offset) flags(ReadOnly) : memref<256xf32, #hal.descriptor_type<storage_buffer>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%offset) : memref<256xf32, #hal.descriptor_type<storage_buffer>>
   %f0 = arith.constant 0.0 : f32
   %val = vector.transfer_read %0[%i0], %f0 {in_bounds = [true]} : memref<256xf32, #hal.descriptor_type<storage_buffer>>, vector<4xf32>
   vector.transfer_write %val, %1[%i1] {in_bounds = [true]} : vector<4xf32>, memref<256xf32, #hal.descriptor_type<storage_buffer>>
@@ -83,14 +75,12 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @storage_buffer_subview(%offset : index, %i0: index, %i1: index) -> f32 {
   %c0 = arith.constant 0 : index
-  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%offset) : memref<128xf32, strided<[1], offset: ?>, #hal.descriptor_type<storage_buffer>>
+  %subspan = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%offset) : memref<128xf32, strided<[1], offset: ?>, #hal.descriptor_type<storage_buffer>>
   %subview = memref.subview %subspan[%i0][16][1] : memref<128xf32, strided<[1], offset: ?>, #hal.descriptor_type<storage_buffer>> to memref<16xf32, strided<[1], offset: ?>, #hal.descriptor_type<storage_buffer>>
   %value = memref.load %subview[%c0] : memref<16xf32, strided<[1], offset: ?>, #hal.descriptor_type<storage_buffer>>
   return %value : f32
@@ -101,18 +91,16 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @storage_buffer_cast(%offset: index) -> memref<?xf32, #hal.descriptor_type<storage_buffer>> {
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%offset) : memref<16xf32, #hal.descriptor_type<storage_buffer>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%offset) : memref<16xf32, #hal.descriptor_type<storage_buffer>>
   %1 = memref.cast %0 : memref<16xf32, #hal.descriptor_type<storage_buffer>> to memref<?xf32, #hal.descriptor_type<storage_buffer>>
   return %1 : memref<?xf32, #hal.descriptor_type<storage_buffer>>
 }
 
 // CHECK-LABEL: func.func @storage_buffer_cast
 //       CHECK:   %[[C16:.+]] = arith.constant 16 : index
-//       CHECK:   %[[SPAN0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) alignment(64) offset(%{{.+}}) : memref<?xf32, #hal.descriptor_type<storage_buffer>>{%[[C16]]}
+//       CHECK:   %[[SPAN0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) alignment(64) offset(%{{.+}}) : memref<?xf32, #hal.descriptor_type<storage_buffer>>{%[[C16]]}
 //       CHECK:   return %[[SPAN0]]
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/illegal_configuration.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/illegal_configuration.mlir
index 681cf99..6127e84 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/illegal_configuration.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/illegal_configuration.mlir
@@ -2,12 +2,10 @@
 // RUN:   --pass-pipeline='builtin.module(iree-codegen-materialize-user-configs, iree-spirv-select-lowering-strategy-pass)' \
 // RUN:   --verify-diagnostics --split-input-file %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = []>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
@@ -21,9 +19,9 @@
 #compilation = #iree_codegen.compilation_info<lowering_config = #config, translation_info = #translation>
 func.func @illegal() attributes {hal.executable.target = #executable_target_vulkan_spirv_fb} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<4x8xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<8x16xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<4x16xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<4x8xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<8x16xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<4x16xf32>
   // expected-error @+1 {{expected 1 levels of tiling sizes, got 0}}
   linalg.matmul {compilation_info = #compilation} ins(%0, %1 : memref<4x8xf32>, memref<8x16xf32>) outs(%2 : memref<4x16xf32>)
   return
@@ -31,12 +29,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[32, 64], [4, 4], [0, 0, 4]]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
@@ -51,21 +47,19 @@
 // expected-error @+1 {{expected workgroup size to have three dimensions for SPIR-V pipelines}}
 func.func @illegal() attributes {hal.executable.target = #executable_target_vulkan_spirv_fb} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<64x16xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<16x128xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<64x128xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<64x16xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<16x128xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<64x128xf32>
   linalg.matmul {compilation_info = #compilation} ins(%0, %1 : memref<64x16xf32>, memref<16x128xf32>) outs(%2 : memref<64x128xf32>)
   return
 }
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[32, 64], [4, 4], [0, 0, 4]]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
@@ -79,9 +73,9 @@
 #compilation = #iree_codegen.compilation_info<lowering_config = #config, translation_info = #translation>
 func.func @illegal() attributes {hal.executable.target = #executable_target_vulkan_spirv_fb} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<64x16xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<16x128xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<64x128xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<64x16xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<16x128xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<64x128xf32>
   // expected-error @+1 {{expected workgroup size dimensions not exceeding [128, 128, 64]}}
   linalg.matmul {compilation_info = #compilation} ins(%0, %1 : memref<64x16xf32>, memref<16x128xf32>) outs(%2 : memref<64x128xf32>)
   return
@@ -89,12 +83,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[32, 64], [4, 2], [0, 0, 4]]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
@@ -108,9 +100,9 @@
 #compilation = #iree_codegen.compilation_info<lowering_config = #config, translation_info = #translation>
 func.func @illegal() attributes {hal.executable.target = #executable_target_vulkan_spirv_fb} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<64x16xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<16x128xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<64x128xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<64x16xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<16x128xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<64x128xf32>
   // expected-error @+1 {{expected total invocation count in workgroup to be <= 128}}
   linalg.matmul {compilation_info = #compilation} ins(%0, %1 : memref<64x16xf32>, memref<16x128xf32>) outs(%2 : memref<64x128xf32>)
   return
@@ -118,12 +110,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[32, 64], [16, 8], [0, 0, 4]]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
@@ -137,9 +127,9 @@
 #compilation = #iree_codegen.compilation_info<lowering_config = #config, translation_info = #translation>
 func.func @illegal() attributes {hal.executable.target = #executable_target_vulkan_spirv_fb} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<64x16xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<16x128xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<64x128xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<64x16xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<16x128xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<64x128xf32>
   // expected-error @+1 {{expected total workgroup size to be multiple of 32}}
   linalg.matmul {compilation_info = #compilation} ins(%0, %1 : memref<64x16xf32>, memref<16x128xf32>) outs(%2 : memref<64x128xf32>)
   return
@@ -147,12 +137,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[32, 60], [4, 4], [0, 0, 4]]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
@@ -166,9 +154,9 @@
 #compilation = #iree_codegen.compilation_info<lowering_config = #config, translation_info = #translation>
 func.func @illegal() attributes {hal.executable.target = #executable_target_vulkan_spirv_fb} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<64x16xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<16x128xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<64x128xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<64x16xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<16x128xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<64x128xf32>
   // expected-error @+1 {{expected each workgroup size dimension to be power of two}}
   linalg.matmul {compilation_info = #compilation} ins(%0, %1 : memref<64x16xf32>, memref<16x128xf32>) outs(%2 : memref<64x128xf32>)
   return
@@ -176,12 +164,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[32, 64, 4]]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
@@ -195,9 +181,9 @@
 #compilation = #iree_codegen.compilation_info<lowering_config = #config, translation_info = #translation>
 func.func @illegal() attributes {hal.executable.target = #executable_target_vulkan_spirv_fb} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<48x16xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<16x128xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<48x128xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<48x16xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<16x128xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<48x128xf32>
   // expected-error @+1 {{LHS shape is indivisible by first level tile size}}
   linalg.matmul {compilation_info = #compilation} ins(%0, %1 : memref<48x16xf32>, memref<16x128xf32>) outs(%2 : memref<48x128xf32>)
   return
@@ -205,12 +191,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[32, 64, 4]]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
@@ -224,9 +208,9 @@
 #compilation = #iree_codegen.compilation_info<lowering_config = #config, translation_info = #translation>
 func.func @illegal() attributes {hal.executable.target = #executable_target_vulkan_spirv_fb} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<64x16xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<16x80xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<64x80xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<64x16xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<16x80xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<64x80xf32>
   // expected-error @+1 {{RHS shape is indivisible by first level tile size}}
   linalg.matmul {compilation_info = #compilation} ins(%0, %1 : memref<64x16xf32>, memref<16x80xf32>) outs(%2 : memref<64x80xf32>)
   return
@@ -234,12 +218,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[64, 64], [32, 32], [0, 0, 16]]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
@@ -255,9 +237,9 @@
 func.func @matmul_tensor() attributes {hal.executable.target = #executable_target_vulkan_spirv_fb} {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<64x32xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<32x128xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<64x128xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<64x32xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<32x128xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<64x128xf16>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [64, 32], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<64x32xf16>> -> tensor<64x32xf16>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [32, 128], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<32x128xf16>> -> tensor<32x128xf16>
   %5 = tensor.empty() : tensor<64x128xf16>
@@ -270,12 +252,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[64, 64], [32, 32], [0, 0, 16], [8, 8, 8]]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
@@ -291,9 +271,9 @@
 func.func @matmul_tensor() attributes {hal.executable.target = #executable_target_vulkan_spirv_fb} {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<64x32xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<32x128xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<64x128xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<64x32xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<32x128xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<64x128xf16>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [64, 32], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<64x32xf16>> -> tensor<64x32xf16>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [32, 128], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<32x128xf16>> -> tensor<32x128xf16>
   %5 = tensor.empty() : tensor<64x128xf16>
@@ -306,12 +286,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[32, 32], [8, 8], [0, 0, 4], [16, 16, 16]]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
@@ -327,9 +305,9 @@
 func.func @matmul_tensor() attributes {hal.executable.target = #executable_target_vulkan_spirv_fb} {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<64x32xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<32x128xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<64x128xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<64x32xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<32x128xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<64x128xf16>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [64, 32], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<64x32xf16>> -> tensor<64x32xf16>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [32, 128], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<32x128xf16>> -> tensor<32x128xf16>
   %5 = tensor.empty() : tensor<64x128xf16>
@@ -342,12 +320,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[64, 64], [32, 32], [0, 0, 16], [16, 16, 16]]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
@@ -363,9 +339,9 @@
 func.func @matmul_tensor() attributes {hal.executable.target = #executable_target_vulkan_spirv_fb} {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<64x32xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<32x128xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<64x128xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<64x32xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<32x128xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<64x128xf16>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [64, 32], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<64x32xf16>> -> tensor<64x32xf16>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [32, 128], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<32x128xf16>> -> tensor<32x128xf16>
   %5 = tensor.empty() : tensor<64x128xf16>
@@ -378,12 +354,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[64, 64], [32, 32], [0, 0, 16], [16, 16, 16]]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
@@ -399,9 +373,9 @@
 func.func @matmul_tensor() attributes {hal.executable.target = #executable_target_vulkan_spirv_fb} {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<64x32xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<32x128xf16>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<64x128xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<64x32xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<32x128xf16>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<64x128xf16>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [64, 32], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<64x32xf16>> -> tensor<64x32xf16>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [32, 128], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<32x128xf16>> -> tensor<32x128xf16>
   %5 = tensor.empty() : tensor<64x128xf16>
@@ -414,12 +388,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[0, 4, 4, 16], [0, 2, 2, 2], [0, 0, 0, 0, 1, 1, 4]]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
@@ -439,9 +411,9 @@
   %c16 = arith.constant 16 : index
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x225x225x8xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x3x8x16xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x16xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x225x225x8xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x3x8x16xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x16xf32>>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -474,12 +446,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[0, 6, 6, 16], [0, 3, 3, 2], [0, 0, 0, 0, 1, 1, 4], [0, 1, 0, 0]]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
@@ -499,9 +469,9 @@
   %c16 = arith.constant 16 : index
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x225x225x8xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x3x8x16xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x16xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x225x225x8xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x3x8x16xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x16xf32>>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -534,12 +504,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[0, 4, 4, 16], [0, 2, 2, 4], [0, 0, 0, 0, 1, 1, 4], [0, 1, 0, 0]]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
@@ -559,9 +527,9 @@
   %c16 = arith.constant 16 : index
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x225x225x8xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x3x8x16xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x16xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x225x225x8xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x3x8x16xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x16xf32>>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -594,12 +562,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[0, 1, 7, 64], [0, 1, 7, 2], [0, 0, 0, 0, 5, 5], [0, 1, 0, 0]]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
@@ -613,9 +579,9 @@
 #compilation = #iree_codegen.compilation_info<lowering_config = #config, translation_info = #translation>
 func.func @illegal() attributes {hal.executable.target = #executable_target_vulkan_spirv_fb} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<1x11x11x576xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<5x5x576xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<1x7x7x576xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<1x11x11x576xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<5x5x576xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<1x7x7x576xf32>
   // expected-error @+1 {{expected tile sizes for KH and KW to be 1}}
   linalg.depthwise_conv_2d_nhwc_hwc {compilation_info = #compilation} ins(%0, %1 : memref<1x11x11x576xf32>, memref<5x5x576xf32>) outs(%2 : memref<1x7x7x576xf32>)
   return
@@ -623,12 +589,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[0, 1, 7, 64], [0, 1, 7, 2], [0, 0, 0, 0, 1, 1], [0, 0, 1, 1]]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
@@ -642,9 +606,9 @@
 #compilation = #iree_codegen.compilation_info<lowering_config = #config, translation_info = #translation>
 func.func @illegal() attributes {hal.executable.target = #executable_target_vulkan_spirv_fb} {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<1x11x11x576xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<5x5x576xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<1x7x7x576xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<1x11x11x576xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<5x5x576xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<1x7x7x576xf32>
   // expected-error @+1 {{expected the fourth level of tile size to be [0, 1, 0, 0]}}
   linalg.depthwise_conv_2d_nhwc_hwc {compilation_info = #compilation} ins(%0, %1 : memref<1x11x11x576xf32>, memref<5x5x576xf32>) outs(%2 : memref<1x7x7x576xf32>)
   return
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/link_executables.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/link_executables.mlir
index bee6557..4ca2219 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/link_executables.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/link_executables.mlir
@@ -8,11 +8,9 @@
 
 #vulkan_target = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {iree.spirv.features = ["vulkan-spirv"]}>
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 hal.executable private @dispatch_0 {
@@ -94,9 +92,9 @@
   %dispatch_0_ordinal = hal.executable.export.ordinal target(@dispatch_0::@spirv::@dispatch_0) : index
   %dispatch_1_ordinal = hal.executable.export.ordinal target(@dispatch_1::@spirv::@dispatch_1) : index
   %dispatch_2_ordinal = hal.executable.export.ordinal target(@dispatch_2::@spirv::@dispatch_2) : index
-  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_0_exe : !hal.executable)[%dispatch_0_ordinal] workgroups([%c1, %c1, %c1]) flags(None)
-  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_1_exe : !hal.executable)[%dispatch_1_ordinal] workgroups([%c1, %c1, %c1]) flags(None)
-  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_2_exe : !hal.executable)[%dispatch_2_ordinal] workgroups([%c1, %c1, %c1]) flags(None)
+  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_0_exe : !hal.executable)[%dispatch_0_ordinal] workgroups([%c1, %c1, %c1]) bindings([(%c0 : index)[%c0, %c0]]) flags(None)
+  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_1_exe : !hal.executable)[%dispatch_1_ordinal] workgroups([%c1, %c1, %c1]) bindings([(%c0 : index)[%c0, %c0]]) flags(None)
+  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_2_exe : !hal.executable)[%dispatch_2_ordinal] workgroups([%c1, %c1, %c1]) bindings([(%c0 : index)[%c0, %c0]]) flags(None)
   return
 }
 util.initializer {
@@ -111,9 +109,9 @@
   %dispatch_0_ordinal = hal.executable.export.ordinal target(@dispatch_0::@spirv::@dispatch_0) : index
   %dispatch_1_ordinal = hal.executable.export.ordinal target(@dispatch_1::@spirv::@dispatch_1) : index
   %dispatch_2_ordinal = hal.executable.export.ordinal target(@dispatch_2::@spirv::@dispatch_2) : index
-  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_0_exe : !hal.executable)[%dispatch_0_ordinal] workgroups([%c1, %c1, %c1]) flags(None)
-  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_1_exe : !hal.executable)[%dispatch_1_ordinal] workgroups([%c1, %c1, %c1]) flags(None)
-  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_2_exe : !hal.executable)[%dispatch_2_ordinal] workgroups([%c1, %c1, %c1]) flags(None)
+  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_0_exe : !hal.executable)[%dispatch_0_ordinal] workgroups([%c1, %c1, %c1]) bindings([(%c0 : index)[%c0, %c0]]) flags(None)
+  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_1_exe : !hal.executable)[%dispatch_1_ordinal] workgroups([%c1, %c1, %c1]) bindings([(%c0 : index)[%c0, %c0]]) flags(None)
+  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_2_exe : !hal.executable)[%dispatch_2_ordinal] workgroups([%c1, %c1, %c1]) bindings([(%c0 : index)[%c0, %c0]]) flags(None)
   util.return
 }
 
@@ -165,8 +163,8 @@
 // CHECK-DAG:     %[[DISPATCH_1_ORDINAL:.+]] = hal.executable.export.ordinal target(@link_executables_linked_spirv::@vulkan_spirv_fb::@dispatch_1)
 // CHECK-DAG:     %[[DISPATCH_2_ORDINAL:.+]] = hal.executable.export.ordinal target(@link_executables_linked_spirv::@vulkan_spirv_fb::@dispatch_2)
 // CHECK:         hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%[[DISPATCH_0_EXE]] : !hal.executable)[%[[DISPATCH_0_ORDINAL]]] workgroups([%c1, %c1, %c1])
-// CHECK-NEXT:    hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%[[DISPATCH_1_EXE]] : !hal.executable)[%[[DISPATCH_1_ORDINAL]]] workgroups([%c1, %c1, %c1])
-// CHECK-NEXT:    hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%[[DISPATCH_2_EXE]] : !hal.executable)[%[[DISPATCH_2_ORDINAL]]] workgroups([%c1, %c1, %c1])
+// CHECK:         hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%[[DISPATCH_1_EXE]] : !hal.executable)[%[[DISPATCH_1_ORDINAL]]] workgroups([%c1, %c1, %c1])
+// CHECK:         hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%[[DISPATCH_2_EXE]] : !hal.executable)[%[[DISPATCH_2_ORDINAL]]] workgroups([%c1, %c1, %c1])
 //
 // CHECK:       util.initializer
 // CHECK-DAG:     %[[DISPATCH_0_EXE:.+]] = hal.executable.lookup device(%{{.+}}) executable(@link_executables_linked_spirv) : !hal.executable
@@ -176,8 +174,8 @@
 // CHECK-DAG:     %[[DISPATCH_1_ORDINAL:.+]] = hal.executable.export.ordinal target(@link_executables_linked_spirv::@vulkan_spirv_fb::@dispatch_1)
 // CHECK-DAG:     %[[DISPATCH_2_ORDINAL:.+]] = hal.executable.export.ordinal target(@link_executables_linked_spirv::@vulkan_spirv_fb::@dispatch_2)
 // CHECK:         hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%[[DISPATCH_0_EXE]] : !hal.executable)[%[[DISPATCH_0_ORDINAL]]] workgroups([%c1, %c1, %c1])
-// CHECK-NEXT:    hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%[[DISPATCH_1_EXE]] : !hal.executable)[%[[DISPATCH_1_ORDINAL]]] workgroups([%c1, %c1, %c1])
-// CHECK-NEXT:    hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%[[DISPATCH_2_EXE]] : !hal.executable)[%[[DISPATCH_2_ORDINAL]]] workgroups([%c1, %c1, %c1])
+// CHECK:         hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%[[DISPATCH_1_EXE]] : !hal.executable)[%[[DISPATCH_1_ORDINAL]]] workgroups([%c1, %c1, %c1])
+// CHECK:         hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%[[DISPATCH_2_EXE]] : !hal.executable)[%[[DISPATCH_2_ORDINAL]]] workgroups([%c1, %c1, %c1])
 
 // -----
 
@@ -193,11 +191,9 @@
 #vulkan_target_1 = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.spirv.features = ["vulkan-spirv", "subgroup=1"]}>
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 hal.executable private @dispatch_0 {
@@ -304,10 +300,10 @@
   %dispatch_1_ordinal = hal.executable.export.ordinal target(@dispatch_1::@spirv::@dispatch_1) : index
   %dispatch_2_ordinal = hal.executable.export.ordinal target(@dispatch_2::@spirv::@dispatch_2) : index
   %dispatch_3_ordinal = hal.executable.export.ordinal target(@dispatch_3::@spirv::@dispatch_3) : index
-  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_0_exe : !hal.executable)[%dispatch_0_ordinal] workgroups([%c1, %c1, %c1]) flags(None)
-  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_1_exe : !hal.executable)[%dispatch_1_ordinal] workgroups([%c1, %c1, %c1]) flags(None)
-  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_2_exe : !hal.executable)[%dispatch_2_ordinal] workgroups([%c1, %c1, %c1]) flags(None)
-  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_3_exe : !hal.executable)[%dispatch_3_ordinal] workgroups([%c1, %c1, %c1]) flags(None)
+  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_0_exe : !hal.executable)[%dispatch_0_ordinal] workgroups([%c1, %c1, %c1]) bindings([(%c0 : index)[%c0, %c0]]) flags(None)
+  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_1_exe : !hal.executable)[%dispatch_1_ordinal] workgroups([%c1, %c1, %c1]) bindings([(%c0 : index)[%c0, %c0]]) flags(None)
+  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_2_exe : !hal.executable)[%dispatch_2_ordinal] workgroups([%c1, %c1, %c1]) bindings([(%c0 : index)[%c0, %c0]]) flags(None)
+  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_3_exe : !hal.executable)[%dispatch_3_ordinal] workgroups([%c1, %c1, %c1]) bindings([(%c0 : index)[%c0, %c0]]) flags(None)
   return
 }
 
@@ -387,11 +383,9 @@
 #vulkan_target_2 = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.spirv.features = ["vulkan-spirv", "subgroup=2"]}>
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 hal.executable private @dispatch_0 {
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/lowering_matmul_fusion.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/lowering_matmul_fusion.mlir
index 5828901..ca050b5 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/lowering_matmul_fusion.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/lowering_matmul_fusion.mlir
@@ -1,13 +1,11 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=cdna2@vulkan --pass-pipeline='builtin.module(iree-codegen-spirv-configuration-pipeline, func.func(iree-spirv-lower-executable-target-pass))' %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>,
-    #hal.descriptor_set.binding<4, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[32, 128, 1, 32]]>
 #map = affine_map<()[s0] -> (s0 * 32)>
@@ -24,11 +22,11 @@
   %c128 = arith.constant 128 : index
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<86x128x2048xi4>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<86x2048xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<86x2048xi4>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86x128xf32>>
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(4) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4096x2048xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<86x128x2048xi4>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<86x2048xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<86x2048xi4>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86x128xf32>>
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(4) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<4096x2048xf32>>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
   %5 = affine.apply #map()[%workgroup_id_y]
@@ -71,9 +69,9 @@
 //     CHECK-LABEL: func.func @matmul_i4_quant_weight()
 //           CHECK:   %[[A_ALLOC:.+]] = memref.alloc() : memref<32x1x36xf32, #gpu.address_space<workgroup>>
 //           CHECK:   %[[B_ALLOC:.+]] = memref.alloc() : memref<1x32x132xf32, #gpu.address_space<workgroup>>
-//           CHECK:   %[[WEIGHT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//           CHECK:   %[[SCALE_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//           CHECK:   %[[ZP_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//           CHECK:   %[[WEIGHT_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//           CHECK:   %[[SCALE_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//           CHECK:   %[[ZP_BINDING:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //           CHECK:   scf.for %arg0 = %c0 to %c86 step %c1 iter_args({{.+}}) -> (vector<4xf32>, vector<4xf32>, vector<4xf32>, vector<4xf32>)
 //           CHECK:     %[[SCALE0:.+]] = vector.transfer_read %[[SCALE_BINDING]]
 //           CHECK:     %[[SCALE1:.+]] = vector.transfer_read %[[SCALE_BINDING]]
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/lowering_matmul_promotion.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/lowering_matmul_promotion.mlir
index 274669d..d3ebf52 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/lowering_matmul_promotion.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/lowering_matmul_promotion.mlir
@@ -6,13 +6,11 @@
 
 // Verify pipelining + multi-buffering.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #compilation = #iree_codegen.compilation_info<
     lowering_config  = #iree_codegen.lowering_config<tile_sizes = [[64, 64, 16]]>,
@@ -29,10 +27,10 @@
       func.func @matmul_f32_128x256x64() {
         %cst = arith.constant 0.000000e+00 : f32
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x512xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<512x256xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x256xf32>>
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x256xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x512xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<512x256xf32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x256xf32>>
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x256xf32>>
         %4 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [128, 512], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<128x512xf32>> -> tensor<128x512xf32>
         %5 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [512, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<512x256xf32>> -> tensor<512x256xf32>
         %6 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [128, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<128x256xf32>> -> tensor<128x256xf32>
@@ -94,13 +92,11 @@
 
 // Store in stage 0 of pipeline.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #compilation = #iree_codegen.compilation_info<
     lowering_config  = #iree_codegen.lowering_config<tile_sizes = [[64, 64, 16]]>,
@@ -117,10 +113,10 @@
       func.func @matmul_f32_128x256x64() {
         %cst = arith.constant 0.000000e+00 : f32
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x512xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<512x256xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x256xf32>>
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x256xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x512xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<512x256xf32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x256xf32>>
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x256xf32>>
         %4 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [128, 512], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<128x512xf32>> -> tensor<128x512xf32>
         %5 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [512, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<512x256xf32>> -> tensor<512x256xf32>
         %6 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [128, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<128x256xf32>> -> tensor<128x256xf32>
@@ -195,13 +191,11 @@
 
 // Check that fused transposed consumer elementwise op does not cause extra workgroup memory allocations.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #compilation = #iree_codegen.compilation_info<
     lowering_config  = #iree_codegen.lowering_config<tile_sizes = [[64, 256, 32]]>,
@@ -217,10 +211,10 @@
       func.func @matmul_f16_4096x512x512() {
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f16
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x512xf16>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<512x512xf16>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<512xf16>>
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<512x4096xf16>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x512xf16>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<512x512xf16>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<512xf16>>
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<512x4096xf16>>
         %4 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [4096, 512], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x512xf16>> -> tensor<4096x512xf16>
         %5 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [512, 512], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<512x512xf16>> -> tensor<512x512xf16>
         %6 = flow.dispatch.tensor.load %2, offsets = [0], sizes = [512], strides = [1] : !flow.dispatch.tensor<readonly:tensor<512xf16>> -> tensor<512xf16>
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/lowering_matvec.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/lowering_matvec.mlir
index 0948ba9..24c7b74 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/lowering_matvec.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/lowering_matvec.mlir
@@ -1,13 +1,11 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=cdna2@vulkan --pass-pipeline='builtin.module(iree-spirv-select-lowering-strategy-pass, func.func(iree-spirv-lower-executable-target-pass))' %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>,
-    #hal.descriptor_set.binding<4, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
 #map1 = affine_map<(d0, d1, d2) -> (d0, d1)>
@@ -15,11 +13,11 @@
 #map3 = affine_map<(d0, d1, d2) -> (d0)>
 func.func @i4_dequant_matvec_f32() {
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<4096x86x128xi4>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<4096x86xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<4096x86xf32>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<readonly:tensor<86x128xf32>>
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(4) : !flow.dispatch.tensor<writeonly:tensor<4096xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<4096x86x128xi4>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<4096x86xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<4096x86xf32>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<readonly:tensor<86x128xf32>>
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(4) : !flow.dispatch.tensor<writeonly:tensor<4096xf32>>
   %5 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [4096, 86, 128], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x86x128xi4>> -> tensor<4096x86x128xi4>
   %6 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [4096, 86], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x86xf32>> -> tensor<4096x86xf32>
   %7 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [4096, 86], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x86xf32>> -> tensor<4096x86xf32>
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/lowering_reduction.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/lowering_reduction.mlir
index 6d4d163..9511790 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/lowering_reduction.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/lowering_reduction.mlir
@@ -2,11 +2,9 @@
 // RUN:   --pass-pipeline='builtin.module(func.func(iree-codegen-decompose-softmax), iree-spirv-select-lowering-strategy-pass, func.func(iree-spirv-lower-executable-target-pass))' \
 // RUN:   %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.gpu.target = #iree_gpu.target<arch = "", features = "spirv:v1.6,cap:Shader", wgp = <
@@ -21,8 +19,8 @@
   %c0 = arith.constant 0 : index
   %c10240 = arith.constant 10240 : index
   %cst = arith.constant 1.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<512x10240xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<writeonly:tensor<512xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<512x10240xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<writeonly:tensor<512xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [512, 10240], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<512x10240xf32>> -> tensor<512x10240xf32>
   %3 = tensor.empty() : tensor<512xf32>
   %4 = linalg.fill ins(%cst : f32) outs(%3 : tensor<512xf32>) -> tensor<512xf32>
@@ -91,11 +89,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.gpu.target = #iree_gpu.target<arch = "", features = "spirv:v1.6,cap:Shader", wgp = <
@@ -109,8 +105,8 @@
 func.func @warp_reduction_dispatch() attributes {hal.executable.target = #executable_target_vulkan_spirv_fb} {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<10x9216x9216xf16>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<10x9216x9216xf16>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<10x9216x9216xf16>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<10x9216x9216xf16>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [10, 9216, 9216], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<10x9216x9216xf16>> -> tensor<10x9216x9216xf16>
   %3 = tensor.empty() : tensor<10x9216x9216xf16>
   %4 = tensor.empty() : tensor<10x9216xf16>
@@ -150,8 +146,8 @@
 //     CHECK-DAG:    %[[WGIDY:.+]] = hal.interface.workgroup.id[1] : index
 //     CHECK-DAG:    %[[TIDX:.+]] = gpu.thread_id  x
 
-//     CHECK-DAG:    %[[SPAN0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//     CHECK-DAG:    %[[SPAN1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//     CHECK-DAG:    %[[SPAN0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//     CHECK-DAG:    %[[SPAN1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 
 //         CHECK:    gpu.barrier
 //         CHECK:    %{{.+}}, %{{.+}} = gpu.shuffle  xor %{{.+}}, %[[I1]], %[[I32]] : i32
@@ -175,11 +171,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.gpu.target = #iree_gpu.target<arch = "", features = "spirv:v1.6,cap:Shader", wgp = <
@@ -193,8 +187,8 @@
   %cst = arith.constant -3.40282347E+38 : f32
   %cst_0 = arith.constant 0.000000e+00 : f32
   %cst_1 = arith.constant 1.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<12x128x40960xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<12x128x40960xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<12x128x40960xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<12x128x40960xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [12, 128, 40960], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<12x128x40960xf32>> -> tensor<12x128x40960xf32>
   %3 = tensor.empty() : tensor<12x128x40960xf32>
   %4 = linalg.softmax dimension(2) ins(%2 : tensor<12x128x40960xf32>) outs(%3 : tensor<12x128x40960xf32>) -> tensor<12x128x40960xf32>
@@ -283,11 +277,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   iree.gpu.target = #iree_gpu.target<arch = "", features = "spirv:v1.6,cap:Shader", wgp = <
@@ -306,8 +298,8 @@
   %4 = arith.shli %3, %c32_i64 : i64
   %5 = arith.ori %2, %4 : i64
   %6 = arith.index_castui %5 : i64 to index
-  %8 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x?xf16>>{%6}
-  %9 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32x?xf16>>{%6}
+  %8 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<32x?xf16>>{%6}
+  %9 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32x?xf16>>{%6}
   %10 = flow.dispatch.tensor.load %8, offsets = [0, 0], sizes = [32, %6], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<32x?xf16>>{%6} -> tensor<32x?xf16>
   %11 = tensor.empty(%6) : tensor<32x?xf16>
   %12 = linalg.softmax dimension(1) ins(%10 : tensor<32x?xf16>) outs(%11 : tensor<32x?xf16>) -> tensor<32x?xf16>
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/lowering_scalar_dispatch.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/lowering_scalar_dispatch.mlir
index 3cd1cc8..8aee761 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/lowering_scalar_dispatch.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/lowering_scalar_dispatch.mlir
@@ -1,10 +1,8 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=pascal@vulkan --pass-pipeline='builtin.module(hal.executable(hal.executable.variant(builtin.module(iree-spirv-select-lowering-strategy-pass, func.func(iree-spirv-lower-executable-target-pass)))))' -mlir-print-local-scope %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @scalar_dispatch {
   hal.executable.variant public @vulkan_spirv_fb target(#hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb">) {
@@ -18,8 +16,8 @@
         %c0 = arith.constant 0 : index
         %c6364136223846793005_i64 = arith.constant 6364136223846793005 : i64
         %c1442695040888963407_i64 = arith.constant 1442695040888963407 : i64
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<i64>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<i64>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<i64>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<i64>>
         %2 = flow.dispatch.tensor.load %0, offsets = [], sizes = [], strides = [] : !flow.dispatch.tensor<readonly:tensor<i64>> -> tensor<i64>
         %extracted = tensor.extract %2[] : tensor<i64>
         %3 = arith.muli %extracted, %c6364136223846793005_i64 : i64
@@ -34,8 +32,8 @@
 
 //       CHECK: func.func @scalar_dispatch()
 //  CHECK-SAME:     translation_info = #iree_codegen.translation_info<SPIRVBaseLowering workgroup_size = [1, 1, 1]>
-//       CHECK:   %[[SPAN0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//       CHECK:   %[[SPAN1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//       CHECK:   %[[SPAN0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//       CHECK:   %[[SPAN1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //       CHECK:   memref.load %[[SPAN0]][] : memref<i64, #hal.descriptor_type<storage_buffer>>
 //       CHECK:   arith.muli {{.+}} : i64
 //       CHECK:   arith.addi {{.+}} : i64
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/materialize_executable_conditions.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/materialize_executable_conditions.mlir
index cef2338..d302ed4 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/materialize_executable_conditions.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/materialize_executable_conditions.mlir
@@ -1,19 +1,15 @@
 // RUN: iree-opt --split-input-file --pass-pipeline='builtin.module(hal.executable(hal.executable.variant(iree-spirv-materialize-executable-conditions)))' --mlir-print-local-scope %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  <0, bindings = [
-    <0, storage_buffer, ReadOnly>,
-    <1, storage_buffer, ReadOnly>,
-    <2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer, ReadOnly>,
+  #hal.pipeline.binding<storage_buffer, ReadOnly>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
-#indirect_pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  <0, bindings = [
-    <0, storage_buffer, ReadOnly>,
-    <1, storage_buffer, ReadOnly>,
-    <2, storage_buffer>
-  ], flags = Indirect>
+#indirect_pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer, ReadOnly>,
+  #hal.pipeline.binding<storage_buffer, ReadOnly>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 hal.executable private @dispatch_executable {
@@ -21,10 +17,10 @@
   //  CHECK-SAME: target(<"vulkan-spirv", "vulkan-spirv-fb", {iree.spirv.features = ["vulkan-spirv"]}>)
   //   CHECK-NOT:   hal.executable.condition
   hal.executable.variant public @test_assumed_capabilities target(
-      #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
-        spirv.target_env = #spirv.target_env<#spirv.vce<v1.0, [Shader, GroupNonUniform], []>, #spirv.resource_limits<>>
-      }>
-    ) {
+    #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
+      spirv.target_env = #spirv.target_env<#spirv.vce<v1.0, [Shader, GroupNonUniform], []>, #spirv.resource_limits<>>
+    }>
+  ) {
     hal.executable.export public @test_assumed_capabilities ordinal(0) layout(#pipeline_layout) {
     ^bb0(%arg0: !hal.device):
       %c1 = arith.constant 1 : index
@@ -54,10 +50,10 @@
   //  CHECK-NEXT:   hal.return %[[RESULT]] : i1
   //  CHECK-NEXT: }
   hal.executable.variant public @test_subgroup_capabilities target(
-      #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
-        spirv.target_env = #spirv.target_env<#spirv.vce<v1.0, [GroupNonUniformShuffle, GroupNonUniformArithmetic], []>, #spirv.resource_limits<>>
-      }>
-    ) {
+    #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
+      spirv.target_env = #spirv.target_env<#spirv.vce<v1.0, [GroupNonUniformShuffle, GroupNonUniformArithmetic], []>, #spirv.resource_limits<>>
+    }>
+  ) {
     hal.executable.export public @test_subgroup_capabilities ordinal(0) layout(#pipeline_layout) {
     ^bb0(%arg0: !hal.device):
       %c1 = arith.constant 1 : index
@@ -87,10 +83,10 @@
   //  CHECK-NEXT:   hal.return %[[RESULT]] : i1
   //  CHECK-NEXT: }
   hal.executable.variant public @test_8bit_storage_capabilities target(
-      #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
-        spirv.target_env = #spirv.target_env<#spirv.vce<v1.0, [UniformAndStorageBuffer8BitAccess, StorageBuffer8BitAccess], []>, #spirv.resource_limits<>>
-      }>
-    ) {
+    #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
+      spirv.target_env = #spirv.target_env<#spirv.vce<v1.0, [UniformAndStorageBuffer8BitAccess, StorageBuffer8BitAccess], []>, #spirv.resource_limits<>>
+    }>
+  ) {
     hal.executable.export public @test_8bit_storage_capabilities ordinal(0) layout(#pipeline_layout) {
     ^bb0(%arg0: !hal.device):
       %c1 = arith.constant 1 : index
@@ -121,10 +117,10 @@
   //  CHECK-NEXT:   hal.return %[[RESULT]] : i1
   //  CHECK-NEXT: }
   hal.executable.variant public @test_16bit_storage_capabilities target(
-      #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
-        spirv.target_env = #spirv.target_env<#spirv.vce<v1.0, [StorageBuffer16BitAccess, StorageUniform16], []>, #spirv.resource_limits<>>
-      }>
-    ) {
+    #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
+      spirv.target_env = #spirv.target_env<#spirv.vce<v1.0, [StorageBuffer16BitAccess, StorageUniform16], []>, #spirv.resource_limits<>>
+    }>
+  ) {
     hal.executable.export public @test_16bit_storage_capabilities ordinal(0) layout(#pipeline_layout) {
     ^bb0(%arg0: !hal.device):
       %c1 = arith.constant 1 : index
@@ -147,10 +143,10 @@
   //       CHECK:   %[[TARGET:.+]] = arith.constant 7 : i32
   //       CHECK:   %{{.+}} = arith.andi %[[V]], %[[TARGET]] : i32
   hal.executable.variant public @test_int_compute_capabilities target(
-      #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
-        spirv.target_env = #spirv.target_env<#spirv.vce<v1.0, [Int64, Int16, Int8], []>, #spirv.resource_limits<>>
-      }>
-    ) {
+    #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
+      spirv.target_env = #spirv.target_env<#spirv.vce<v1.0, [Int64, Int16, Int8], []>, #spirv.resource_limits<>>
+    }>
+  ) {
     hal.executable.export public @test_int_compute_capabilities ordinal(0) layout(#pipeline_layout) {
     ^bb0(%arg0: !hal.device):
       %c1 = arith.constant 1 : index
@@ -172,10 +168,10 @@
   //       CHECK:   %[[TARGET:.+]] = arith.constant 3 : i32
   //       CHECK:   %{{.+}} = arith.andi %[[V]], %[[TARGET]] : i32
   hal.executable.variant public @test_float_compute_capabilities target(
-      #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
-        spirv.target_env = #spirv.target_env<#spirv.vce<v1.0, [Float16, Float64], []>, #spirv.resource_limits<>>
-      }>
-    ) {
+    #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
+      spirv.target_env = #spirv.target_env<#spirv.vce<v1.0, [Float16, Float64], []>, #spirv.resource_limits<>>
+    }>
+  ) {
     hal.executable.export public @test_float_compute_capabilities ordinal(0) layout(#pipeline_layout) {
     ^bb0(%arg0: !hal.device):
       %c1 = arith.constant 1 : index
@@ -197,10 +193,10 @@
   //       CHECK:   %[[TARGET:.+]] = arith.constant 1 : i32
   //       CHECK:   %{{.+}} = arith.andi %[[V]], %[[TARGET]] : i32
   hal.executable.variant public @test_dot_product_capabilities target(
-      #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
-        spirv.target_env = #spirv.target_env<#spirv.vce<v1.0, [DotProduct, DotProductInput4x8Bit], []>, #spirv.resource_limits<>>
-      }>
-    ) {
+    #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
+      spirv.target_env = #spirv.target_env<#spirv.vce<v1.0, [DotProduct, DotProductInput4x8Bit], []>, #spirv.resource_limits<>>
+    }>
+  ) {
     hal.executable.export public @test_dot_product_capabilities ordinal(0) layout(#pipeline_layout) {
     ^bb0(%arg0: !hal.device):
       %c1 = arith.constant 1 : index
@@ -222,10 +218,10 @@
   //       CHECK:   %[[TARGET:.+]] = arith.constant 1 : i32
   //       CHECK:   %{{.+}} = arith.andi %[[V]], %[[TARGET]] : i32
   hal.executable.variant public @test_cooperative_matrix_capabilities target(
-      #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
-        spirv.target_env = #spirv.target_env<#spirv.vce<v1.0, [CooperativeMatrixKHR], []>, #spirv.resource_limits<>>
-      }>
-    ) {
+    #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
+      spirv.target_env = #spirv.target_env<#spirv.vce<v1.0, [CooperativeMatrixKHR], []>, #spirv.resource_limits<>>
+    }>
+  ) {
     hal.executable.export public @test_cooperative_matrix_capabilities ordinal(0) layout(#pipeline_layout) attributes {
       iree.spirv.coopmatrix.shape = array<i64: 16, 16, 16>, iree.spirv.coopmatrix.type = [f16, f16]
     } {
@@ -254,13 +250,13 @@
   //       CHECK:   %[[TARGET1:.+]] = arith.constant 1 : i32
   //       CHECK:   %{{.+}} = arith.andi %[[V1]], %[[TARGET1]] : i32
   hal.executable.variant public @test_address_capabilities target(
-      #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb-ptr", {
-        spirv.target_env = #spirv.target_env<#spirv.vce<v1.5,
-                                                        [Int64, PhysicalStorageBufferAddresses],
-                                                        [SPV_KHR_physical_storage_buffer]>,
-                                             #spirv.resource_limits<>>
-      }>
-    ) {
+    #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb-ptr", {
+      spirv.target_env = #spirv.target_env<#spirv.vce<v1.5,
+                                                      [Int64, PhysicalStorageBufferAddresses],
+                                                      [SPV_KHR_physical_storage_buffer]>,
+                                            #spirv.resource_limits<>>
+    }>
+  ) {
     hal.executable.export public @test_address_capabilities ordinal(0) layout(#indirect_pipeline_layout) {
     ^bb0(%arg0: !hal.device):
       %c1 = arith.constant 1 : index
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/physical_storage_buffer_addresses.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/physical_storage_buffer_addresses.mlir
index 65cf1e5..b95b1b8 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/physical_storage_buffer_addresses.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/physical_storage_buffer_addresses.mlir
@@ -2,12 +2,10 @@
 // RUN:   --pass-pipeline='builtin.module(hal.executable(hal.executable.variant(builtin.module(iree-convert-to-spirv{index-bits=64}))))' \
 // RUN:   %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ], flags = Indirect>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @interface_binding {
   hal.executable.variant @vulkan target(<"vulkan-spirv", "vulkan-spirv-fb-ptr">) {
@@ -21,9 +19,9 @@
     } {
       func.func @interface_binding() -> f32 {
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<8x5xf32, #spirv.storage_class<PhysicalStorageBuffer>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<5xf32, #spirv.storage_class<PhysicalStorageBuffer>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<4x5xf32, #spirv.storage_class<PhysicalStorageBuffer>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<8x5xf32, #spirv.storage_class<PhysicalStorageBuffer>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<5xf32, #spirv.storage_class<PhysicalStorageBuffer>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<4x5xf32, #spirv.storage_class<PhysicalStorageBuffer>>
 
         %3 = memref.load %0[%c0, %c0] : memref<8x5xf32, #spirv.storage_class<PhysicalStorageBuffer>>
         %4 = memref.load %1[%c0] : memref<5xf32, #spirv.storage_class<PhysicalStorageBuffer>>
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/pipeline_matmul_cooperative_ops.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/pipeline_matmul_cooperative_ops.mlir
index ed932b8..254d09d 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/pipeline_matmul_cooperative_ops.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/pipeline_matmul_cooperative_ops.mlir
@@ -6,14 +6,12 @@
 // RUN:   --pass-pipeline='builtin.module(hal.executable(hal.executable.variant(builtin.module(iree-codegen-spirv-configuration-pipeline), iree-codegen-linalg-to-spirv-pipeline, canonicalize, cse)))' \
 // RUN:   %s | FileCheck %s --check-prefix=RDNA3
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>,
-    #hal.descriptor_set.binding<4, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 hal.executable public @matmul_256x1024x128_div_exp {
@@ -29,11 +27,11 @@
         %c1024 = arith.constant 1024 : index
         %c256 = arith.constant 256 : index
         %cst = arith.constant 0.000000e+00 : f16
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<256x1024xf16>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<256x1024xf16>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<256x128xf16>>
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<readonly:tensor<128x1024xf16>>
-        %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(4) : !flow.dispatch.tensor<writeonly:tensor<256x1024xf16>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<256x1024xf16>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<256x1024xf16>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<256x128xf16>>
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<readonly:tensor<128x1024xf16>>
+        %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(4) : !flow.dispatch.tensor<writeonly:tensor<256x1024xf16>>
         %11 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [256, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x1024xf16>> -> tensor<256x1024xf16>
         %14 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [256, 1024], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<256x1024xf16>> -> tensor<256x1024xf16>
         %17 = tensor.empty() : tensor<256x1024xf16>
@@ -196,13 +194,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable public @batch_matmul_16x128x256x512_div {
   hal.executable.variant @vulkan target(<"vulkan-spirv", "vulkan-spirv-fb">) {
@@ -215,10 +211,10 @@
       func.func @batch_matmul_16x128x256x512_div() {
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f16
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x128x512xf16>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x512x256xf16>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x128x256xf16>>
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<16x128x256xf16>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x128x512xf16>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x512x256xf16>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<16x128x256xf16>>
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<16x128x256xf16>>
         %4 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [16, 128, 512], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<16x128x512xf16>> -> tensor<16x128x512xf16>
         %5 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [16, 512, 256], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<16x512x256xf16>> -> tensor<16x512x256xf16>
         %6 = flow.dispatch.tensor.load %2, offsets = [0, 0, 0], sizes = [16, 128, 256], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<16x128x256xf16>> -> tensor<16x128x256xf16>
@@ -301,13 +297,11 @@
 
 // Small matmul that each subgroup only handles one tile
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 hal.executable public @matmul_32x32x32_div {
@@ -321,10 +315,10 @@
       func.func @matmul_32x32x32_div() {
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f16
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<32x32xf16>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<32x32xf16>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<32x32xf16>>
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32x32xf16>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<32x32xf16>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<32x32xf16>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<32x32xf16>>
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32x32xf16>>
         %4 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [32, 32], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<32x32xf16>> -> tensor<32x32xf16>
         %5 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [32, 32], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<32x32xf16>> -> tensor<32x32xf16>
         %6 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [32, 32], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<32x32xf16>> -> tensor<32x32xf16>
@@ -355,12 +349,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 hal.executable public @generic_batch_matmul_32x128x512x64 {
@@ -374,9 +366,9 @@
       func.func @generic_batch_matmul_32x128x512x64() {
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f16
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<32x128x64xf16>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<64x512xf16>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32x128x512xf16>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<32x128x64xf16>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<64x512xf16>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32x128x512xf16>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [32, 128, 64], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<32x128x64xf16>> -> tensor<32x128x64xf16>
         %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [64, 512], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<64x512xf16>> -> tensor<64x512xf16>
         %5 = tensor.empty() : tensor<32x128x512xf16>
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/pipeline_matmul_promotion.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/pipeline_matmul_promotion.mlir
index 046c289..9e28eb0 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/pipeline_matmul_promotion.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/pipeline_matmul_promotion.mlir
@@ -1,12 +1,10 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=pascal@vulkan --pass-pipeline='builtin.module(hal.executable(hal.executable.variant(builtin.module(iree-codegen-spirv-configuration-pipeline), iree-codegen-linalg-to-spirv-pipeline)))' %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1) -> (d0, d1)>
 
@@ -21,10 +19,10 @@
       func.func @matmul_f32_128x256x64() {
         %cst = arith.constant 0.000000e+00 : f32
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x512xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<512x256xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x256xf32>>
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x256xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x512xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<512x256xf32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x256xf32>>
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x256xf32>>
         %4 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [128, 512], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<128x512xf32>> -> tensor<128x512xf32>
         %5 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [512, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<512x256xf32>> -> tensor<512x256xf32>
         %6 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [128, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<128x256xf32>> -> tensor<128x256xf32>
@@ -75,13 +73,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1) -> (d0, d1)>
 
@@ -96,10 +92,10 @@
       func.func @matmul_f16_128x256x64() {
         %cst = arith.constant 0.0 : f16
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x512xf16>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<512x256xf16>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x256xf16>>
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x256xf16>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x512xf16>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<512x256xf16>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<128x256xf16>>
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<128x256xf16>>
         %4 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [128, 512], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<128x512xf16>> -> tensor<128x512xf16>
         %5 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [512, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<512x256xf16>> -> tensor<512x256xf16>
         %6 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [128, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<128x256xf16>> -> tensor<128x256xf16>
@@ -153,12 +149,10 @@
 
 // Check scalar load/store for promotion to shared memory.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 #user_config = #iree_codegen.compilation_info<
@@ -176,9 +170,9 @@
       func.func @matmul_f16_32x1280x1280() {
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f16
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<32x1280xf16>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1280x1280xf16>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32x1280xf16>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<32x1280xf16>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1280x1280xf16>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<32x1280xf16>>
         %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [32, 1280], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<32x1280xf16>> -> tensor<32x1280xf16>
         %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [1280, 1280], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1280x1280xf16>> -> tensor<1280x1280xf16>
         %5 = tensor.empty() : tensor<32x1280xf16>
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/pipeline_matmul_vectorization.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/pipeline_matmul_vectorization.mlir
index f9a186c..503781c 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/pipeline_matmul_vectorization.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/pipeline_matmul_vectorization.mlir
@@ -1,11 +1,9 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=valhall1 --pass-pipeline='builtin.module(hal.executable(hal.executable.variant(builtin.module(iree-codegen-spirv-configuration-pipeline), iree-codegen-linalg-to-spirv-pipeline)))' %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @fuse_and_vectorize_fill_matmul {
   hal.executable.variant @vulkan_spirv_fb target(<"vulkan-spirv", "vulkan-spirv-fb">) {
@@ -19,9 +17,9 @@
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f32
         %c4096 = arith.constant 4096 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<4096x4096xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<4096x4096xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<4096x4096xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<4096x4096xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<4096x4096xf32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<4096x4096xf32>>
         %8 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [4096, 4096], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x4096xf32>> -> tensor<4096x4096xf32>
         %10 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [4096, 4096], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x4096xf32>> -> tensor<4096x4096xf32>
         %15 = tensor.empty() : tensor<4096x4096xf32>
@@ -44,13 +42,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @fuse_and_vectorize_matmul_add {
   hal.executable.variant @vulkan_spirv_fb target(<"vulkan-spirv", "vulkan-spirv-fb">) {
@@ -65,10 +61,10 @@
         %cst = arith.constant 0.000000e+00 : f32
         %c1024 = arith.constant 1024 : index
         %c256 = arith.constant 256 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<1024x256xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<1024x512xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<readonly:tensor<512x256xf32>>
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) : !flow.dispatch.tensor<writeonly:tensor<1024x256xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<1024x256xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<1024x512xf32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<readonly:tensor<512x256xf32>>
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) : !flow.dispatch.tensor<writeonly:tensor<1024x256xf32>>
         %10 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1024, 256], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1024x256xf32>> -> tensor<1024x256xf32>
         %13 = tensor.empty() : tensor<1024x256xf32>
         %15 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [1024, 512], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1024x512xf32>> -> tensor<1024x512xf32>
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/pipeline_matvec.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/pipeline_matvec.mlir
index aed1a32..41a2c4a 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/pipeline_matvec.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/pipeline_matvec.mlir
@@ -2,14 +2,12 @@
 // RUN:   --pass-pipeline='builtin.module(hal.executable(hal.executable.variant(builtin.module(iree-codegen-spirv-configuration-pipeline), iree-codegen-linalg-to-spirv-pipeline)))' \
 // RUN:   %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>,
-    #hal.descriptor_set.binding<4, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @i4_dequant_unit_matmul_f16 {
   hal.executable.variant @vulkan_spirv_fb target(<"vulkan-spirv", "vulkan-spirv-fb", {
@@ -28,11 +26,11 @@
       func.func @i4_dequant_unit_matmul_f16() {
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f16
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86x128xi4>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86x1xf16>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86x1xf16>>
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x1x86x128xf16>>
-        %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(4) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x1x4096xf16>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86x128xi4>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86x1xf16>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86x1xf16>>
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x1x86x128xf16>>
+        %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(4) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x1x4096xf16>>
         %5 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0], sizes = [4096, 86, 128], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x86x128xi4>> -> tensor<4096x86x128xi4>
         %6 = flow.dispatch.tensor.load %1, offsets = [0, 0, 0], sizes = [4096, 86, 1], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x86x1xf16>> -> tensor<4096x86x1xf16>
         %7 = flow.dispatch.tensor.load %2, offsets = [0, 0, 0], sizes = [4096, 86, 1], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x86x1xf16>> -> tensor<4096x86x1xf16>
@@ -114,12 +112,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 5, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 5, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @i4_dequant_matvec_f16_subgroup_64 {
   hal.executable.variant @vulkan_spirv_fb target(<"vulkan-spirv", "vulkan-spirv-fb", {
@@ -147,11 +143,11 @@
         %7 = arith.index_castui %2 : i32 to index
         %8 = arith.index_castui %3 : i32 to index
         %9 = arith.index_castui %4 : i32 to index
-        %10 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%5) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86x128xi4>>
-        %11 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%6) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86xf16>>
-        %12 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%7) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86xf16>>
-        %13 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%8) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<86x128xf16>>
-        %14 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%9) : !flow.dispatch.tensor<writeonly:tensor<4096xf16>>
+        %10 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%5) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86x128xi4>>
+        %11 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%6) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86xf16>>
+        %12 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%7) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<4096x86xf16>>
+        %13 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%8) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<86x128xf16>>
+        %14 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%9) : !flow.dispatch.tensor<writeonly:tensor<4096xf16>>
         %15 = flow.dispatch.tensor.load %10, offsets = [0, 0, 0], sizes = [4096, 86, 128], strides = [1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x86x128xi4>> -> tensor<4096x86x128xi4>
         %16 = flow.dispatch.tensor.load %11, offsets = [0, 0], sizes = [4096, 86], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x86xf16>> -> tensor<4096x86xf16>
         %17 = flow.dispatch.tensor.load %12, offsets = [0, 0], sizes = [4096, 86], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<4096x86xf16>> -> tensor<4096x86xf16>
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/pipeline_reduction_subgroup.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/pipeline_reduction_subgroup.mlir
index 03a12aa..f514357 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/pipeline_reduction_subgroup.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/pipeline_reduction_subgroup.mlir
@@ -1,11 +1,9 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=valhall1 --pass-pipeline='builtin.module(hal.executable(hal.executable.variant(builtin.module(iree-codegen-spirv-configuration-pipeline), iree-codegen-linalg-to-spirv-pipeline)))' %s | FileCheck %s
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=vp_android_baseline_2022@vulkan --pass-pipeline='builtin.module(hal.executable(hal.executable.variant(builtin.module(iree-codegen-spirv-configuration-pipeline), iree-codegen-linalg-to-spirv-pipeline)))' %s | FileCheck %s --check-prefix=NOSHUFFLE
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @subgroup_reduce {
   hal.executable.variant @vulkan_spirv_fb target(<"vulkan-spirv", "vulkan-spirv-fb">) {
@@ -18,8 +16,8 @@
       func.func @subgroup_reduce() {
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f32
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x512xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x512xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2xf32>>
         %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2, 512], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2x512xf32>> -> tensor<2x512xf32>
         %3 = tensor.empty() : tensor<2xf32>
         %4 = linalg.fill ins(%cst : f32) outs(%3 : tensor<2xf32>) -> tensor<2xf32>
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/pipeline_sub_byte_dequant.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/pipeline_sub_byte_dequant.mlir
index b2e8d5b..c799a0d 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/pipeline_sub_byte_dequant.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/pipeline_sub_byte_dequant.mlir
@@ -1,12 +1,10 @@
 // RUN: iree-opt --split-input-file --iree-gpu-test-target=vp_android_baseline_2022@vulkan --pass-pipeline='builtin.module(hal.executable(hal.executable.variant(builtin.module(iree-codegen-spirv-configuration-pipeline), iree-codegen-linalg-to-spirv-pipeline)))' %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable @i4_dequant {
   hal.executable.variant @vulkan_spirv_fb target(<"vulkan-spirv", "vulkan-spirv-fb">) {
@@ -18,10 +16,10 @@
     builtin.module {
       func.func @i4_dequant() {
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<131072x128xi4>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<131072xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<131072xf32>>
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<131072x128xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<131072x128xi4>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<131072xf32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<131072xf32>>
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<131072x128xf32>>
         %4 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [131072, 128], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<131072x128xi4>> -> tensor<131072x128xi4>
         %5 = flow.dispatch.tensor.load %1, offsets = [0], sizes = [131072], strides = [1] : !flow.dispatch.tensor<readonly:tensor<131072xf32>> -> tensor<131072xf32>
         %6 = flow.dispatch.tensor.load %2, offsets = [0], sizes = [131072], strides = [1] : !flow.dispatch.tensor<readonly:tensor<131072xf32>> -> tensor<131072xf32>
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/set_transform_strategy.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/set_transform_strategy.mlir
index 331832e..d32855d 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/set_transform_strategy.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/set_transform_strategy.mlir
@@ -6,19 +6,17 @@
 // core, but there are no such wmma intrinsics. Fix it to support fp16-input.
 // TODO: | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul() {
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2052x2556xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2556x2052xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2052x2052xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2052x2556xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<2556x2052xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2052x2052xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [2052, 2556], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2052x2556xf32>> -> tensor<2052x2556xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [2556, 2052], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<2556x2052xf32>> -> tensor<2556x2052xf32>
   %5 = tensor.empty() : tensor<2052x2052xf32>
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_distribute.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_distribute.mlir
index 27dbd92..9bc5667 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_distribute.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_distribute.mlir
@@ -10,12 +10,10 @@
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[8, 16], [1, 1], [0, 0, 1]]>
 #translation = #iree_codegen.translation_info<SPIRVBaseDistribute>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 3, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 3, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @matmul {
   hal.executable.variant @vulkan target(<"vulkan-spirv", "vulkan-spirv-fb">) {
@@ -29,9 +27,9 @@
         %M = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
         %N = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
         %K = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : index
-        %arg0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<?x?xf32>{%M, %K}
-        %arg1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<?x?xf32>{%K, %N}
-        %arg2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<?x?xf32>{%M, %N}
+        %arg0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<?x?xf32>{%M, %K}
+        %arg1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<?x?xf32>{%K, %N}
+        %arg2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<?x?xf32>{%M, %N}
         %c4 = arith.constant 4 : index
         %c1 = arith.constant 1 : index
         %0 = memref.dim %arg0, %c1 : memref<?x?xf32>
@@ -80,12 +78,10 @@
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[1, 4, 32], [1, 1, 1]]>
 #translation = #iree_codegen.translation_info<SPIRVBaseDistribute>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @conv_1d {
   hal.executable.variant @vulkan target(<"vulkan-spirv", "vulkan-spirv-fb">) {
@@ -97,9 +93,9 @@
       func.func @conv_1d() {
         %cst = arith.constant 0.000000e+00 : f32
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<3x6x1xf32>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<3x8x1xf32>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<3x1x1xf32>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<3x6x1xf32>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<3x8x1xf32>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<3x1x1xf32>
         %3 = gpu.block_id x
         %4 = gpu.block_id y
         %5 = gpu.block_id z
@@ -125,9 +121,9 @@
 }
 
 // CHECK-LABEL: func.func @conv_1d
-//       CHECK-DAG: %[[RET:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
-//       CHECK-DAG: %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//       CHECK-DAG: %[[ARG1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//       CHECK-DAG: %[[RET:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
+//       CHECK-DAG: %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//       CHECK-DAG: %[[ARG1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 //       CHECK-DAG: %[[ARG0SV1:.+]] = memref.subview %[[ARG0]]
 //       CHECK-DAG: %[[ARG1SV1:.+]] = memref.subview %[[ARG1]]
 //       CHECK-DAG: %[[RETSV1:.+]] = memref.subview %[[RET]]
@@ -157,12 +153,10 @@
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[0, 1, 4, 32], [0, 1, 1, 1], [0, 0, 0, 0, 1, 1, 4]]>
 #translation = #iree_codegen.translation_info<SPIRVBaseDistribute>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 9, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 9, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @conv_2d {
   hal.executable.variant @vulkan target(<"vulkan-spirv", "vulkan-spirv-fb">) {
@@ -182,9 +176,9 @@
         %ic = hal.interface.constant.load layout(#pipeline_layout) ordinal(6) : index
         %fh = hal.interface.constant.load layout(#pipeline_layout) ordinal(7) : index
         %fw = hal.interface.constant.load layout(#pipeline_layout) ordinal(8) : index
-        %arg0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<?x?x?x?xf32>{%n, %ih, %iw, %ic}
-        %arg1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<?x?x?x?xf32>{%fh, %fw, %ic, %oc}
-        %arg2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<?x?x?x?xf32>{%n, %oh, %ow, %oc}
+        %arg0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<?x?x?x?xf32>{%n, %ih, %iw, %ic}
+        %arg1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<?x?x?x?xf32>{%fh, %fw, %ic, %oc}
+        %arg2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<?x?x?x?xf32>{%n, %oh, %ow, %oc}
         %c2 = arith.constant 2 : index
         %c3 = arith.constant 3 : index
         %c1 = arith.constant 1 : index
@@ -239,9 +233,9 @@
 //     CHECK-DAG: #[[MAP0:.+]] = affine_map<()[s0] -> (s0 * 4)>
 //     CHECK-DAG: #[[MAP1:.+]] = affine_map<()[s0] -> (s0 * 32)>
 //         CHECK: func.func @conv_2d
-//     CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//     CHECK-DAG:   %[[ARG1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//     CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//     CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//     CHECK-DAG:   %[[ARG1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//     CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //     CHECK-DAG:   %[[C0:.+]] = arith.constant 0
 //     CHECK-DAG:   %[[C1:.+]] = arith.constant 1
 //     CHECK-DAG:   %[[C4:.+]] = arith.constant 4
@@ -272,12 +266,10 @@
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[0, 0, 1, 4, 32], [0, 0, 1, 1, 1]]>
 #translation = #iree_codegen.translation_info<SPIRVBaseDistribute>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @conv_3d {
   hal.executable.variant @vulkan target(<"vulkan-spirv", "vulkan-spirv-fb">) {
@@ -289,9 +281,9 @@
       func.func @conv_3d() {
         %cst = arith.constant 0.000000e+00 : f32
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<2x7x7x7x2xf32>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<2x8x8x8x3xf32>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<2x2x2x3x2xf32>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<2x7x7x7x2xf32>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<2x8x8x8x3xf32>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<2x2x2x3x2xf32>
         %3 = gpu.block_id x
         %4 = gpu.block_id y
         %5 = gpu.block_id z
@@ -342,12 +334,10 @@
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[1, 4, 32], [1, 1, 1]]>
 #translation = #iree_codegen.translation_info<SPIRVBaseDistribute>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 module  {
   hal.executable private @pooling_nhwc_max {
@@ -359,9 +349,9 @@
       builtin.module {
         func.func @pooling_nhwc_max() {
           %c0 = arith.constant 0 : index
-          %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<2x16x16x6xf32>
-          %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<3x4xf32>
-          %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<2x14x13x6xf32>
+          %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<2x16x16x6xf32>
+          %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<3x4xf32>
+          %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<2x14x13x6xf32>
           %3 = gpu.block_id x
           %4 = gpu.block_id y
           %5 = affine.apply #map0()[%4]
@@ -385,9 +375,9 @@
 //     CHECK-DAG: #[[MAP0:.+]] = affine_map<()[s0] -> (s0 * 4)>
 //     CHECK-DAG: #[[MAP2:.+]] = affine_map<()[s0] -> (s0 * 32)>
 //         CHECK: func.func @pooling_nhwc_max
-//     CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//     CHECK-DAG:   %[[ARG1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//     CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//     CHECK-DAG:   %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//     CHECK-DAG:   %[[ARG1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//     CHECK-DAG:   %[[RET0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //         CHECK:   %[[SV1:.+]] = memref.subview %[[ARG0]]
 //         CHECK:   %[[SV2:.+]] = memref.subview %[[RET0]]
 //     CHECK-DAG:   %[[TIDX:.+]] = gpu.thread_id x
@@ -409,12 +399,10 @@
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[32], [1]]>
 #translation = #iree_codegen.translation_info<SPIRVBaseDistribute>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 hal.executable @matvec {
@@ -428,9 +416,9 @@
         %c250 = arith.constant 250 : index
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f32
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<250x1024xf32>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<1024xf32>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<250xf32>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<250x1024xf32>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<1024xf32>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<250xf32>
         %workgroup_id_x = hal.interface.workgroup.id[0] : index
         %workgroup_count_x = hal.interface.workgroup.count[0] : index
         %3 = affine.apply affine_map<()[s0] -> (s0 * 32)>()[%workgroup_id_x]
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_distribute_scatter.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_distribute_scatter.mlir
index 8ce2c91..63c4d2e 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_distribute_scatter.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_distribute_scatter.mlir
@@ -2,12 +2,10 @@
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[1, 16], [1, 1]]>
 #translation = #iree_codegen.translation_info<SPIRVBaseDistribute>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @static_scatter_update_slice  {
   hal.executable.variant @vulkan_spirv_fb target(<"vulkan-spirv", "vulkan-spirv-fb">) {
@@ -20,9 +18,9 @@
         %c40 = arith.constant 40 : index
         %c500 = arith.constant 500 : index
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<40x500xi32>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<40x1xi32>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : memref<100x500xi32>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<40x500xi32>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<40x1xi32>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : memref<100x500xi32>
         %workgroup_id_x = hal.interface.workgroup.id[0] : index
         %workgroup_count_x = hal.interface.workgroup.count[0] : index
         %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -50,9 +48,9 @@
 }
 
 // CHECK-LABEL: func.func @static_scatter_update_slice()
-//       CHECK: %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//       CHECK: %[[ARG1:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
-//       CHECK: %[[ARG2:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//       CHECK: %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//       CHECK: %[[ARG1:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
+//       CHECK: %[[ARG2:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //       CHECK: scf.for
 //       CHECK:   scf.for
 //       CHECK:     %[[WG_UPDATE:.+]] = memref.subview %[[ARG0]]
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_distribute_sort.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_distribute_sort.mlir
index 42738b7..051cd3d 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_distribute_sort.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_distribute_sort.mlir
@@ -2,10 +2,8 @@
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[1, 0, 16], [1, 0, 1]]>
 #translation = #iree_codegen.translation_info<SPIRVBaseDistribute>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @static_3d_sort  {
   hal.executable.variant @vulkan_spirv_fb target(<"vulkan-spirv", "vulkan-spirv-fb">) {
@@ -16,7 +14,7 @@
     builtin.module {
       func.func @static_3d_sort() {
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<64x32x128xi32, #hal.descriptor_type<storage_buffer>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<64x32x128xi32, #hal.descriptor_type<storage_buffer>>
         memref.assume_alignment %0, 64 : memref<64x32x128xi32, #hal.descriptor_type<storage_buffer>>
         %workgroup_id_x = hal.interface.workgroup.id[0] : index
         %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -34,7 +32,7 @@
 }
 
 // CHECK-LABEL: func.func @static_3d_sort()
-//       CHECK: %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
+//       CHECK: %[[ARG0:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
 //       CHECK: %[[WG_OUTPUT:.+]] = memref.subview %[[ARG0]]
 //       CHECK: %[[TID_X:.+]] = gpu.thread_id x
 //       CHECK: %[[DIM_X:.+]] = gpu.block_dim x
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_promote_cooperative_matrix.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_promote_cooperative_matrix.mlir
index daa7c7e..afc60d5 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_promote_cooperative_matrix.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_promote_cooperative_matrix.mlir
@@ -8,13 +8,11 @@
 
 // Single tile per workgroup means no subview ops for promotion.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[32, 32, 32], [16, 16, 16], [0, 0, 32]]>
 #map = affine_map<()[s0] -> (s0 * 32)>
@@ -24,13 +22,13 @@
   %c32 = arith.constant 32 : index
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<32x32xf16>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<32x32xf16>
   memref.assume_alignment %0, 64 : memref<32x32xf16>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<32x32xf16>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<32x32xf16>
   memref.assume_alignment %1, 64 : memref<32x32xf16>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<32x32xf16>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<32x32xf16>
   memref.assume_alignment %2, 64 : memref<32x32xf16>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : memref<32x32xf16>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : memref<32x32xf16>
   memref.assume_alignment %3, 64 : memref<32x32xf16>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
@@ -56,8 +54,8 @@
 
 // CHECK-LABEL: func.func @matmul_f16_32x32x32()
 
-//       CHECK:   %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0)
-//       CHECK:   %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1)
+//       CHECK:   %[[LHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0)
+//       CHECK:   %[[RHS:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1)
 
 //   CHECK-NOT:   memref.alloc()
 //   CHECK-NOT:   memref.copy
@@ -69,12 +67,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[1, 32, 32, 32], [1, 16, 16, 16], [0, 0, 0, 32]]>
 #map = affine_map<()[s0] -> (s0 * 32)>
@@ -89,10 +85,10 @@
   %c512 = arith.constant 512 : index
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<128x32x64xf16>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<32x64x512xf16>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<32x128x512xf16>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<32x128x512xf16>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<128x32x64xf16>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<32x64x512xf16>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<32x128x512xf16>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<32x128x512xf16>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -183,12 +179,10 @@
 
 // Cooperative matrix fusable elementwise ops do not need promote C.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[1, 32, 32, 32], [1, 16, 16, 16], [0, 0, 0, 32]]>
 #map = affine_map<()[s0] -> (s0 * 32)>
@@ -203,10 +197,10 @@
   %c512 = arith.constant 512 : index
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<128x32x64xf16>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<32x64x512xf16>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<32x128x512xf16>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<32x128x512xf16>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<128x32x64xf16>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<32x64x512xf16>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<32x128x512xf16>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<32x128x512xf16>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -267,12 +261,10 @@
 
 // No need to promote C if there is no fused element wise ops.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[1, 32, 32, 32], [1, 16, 16, 16], [0, 0, 0, 32]]>
 #map = affine_map<()[s0] -> (s0 * 32)>
@@ -286,9 +278,9 @@
   %c512 = arith.constant 512 : index
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<128x32x64xf16>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<32x64x512xf16>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<32x128x512xf16>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<128x32x64xf16>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<32x64x512xf16>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<32x128x512xf16>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -325,7 +317,7 @@
 //      PROMOTEC: %[[LHS_ALLOC:.+]] = memref.alloc() : memref<32x1x32xf16, #gpu.address_space<workgroup>>
 //  PROMOTEC-NOT: memref.alloc()
 
-//      PROMOTEC: %[[SPAN2:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
+//      PROMOTEC: %[[SPAN2:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
 //      PROMOTEC: %[[OUT_VIEW:.+]] = memref.subview %[[SPAN2]]
 
 //      PROMOTEC: linalg.fill
@@ -352,12 +344,10 @@
 
 // No need to promote again with allocations from bufferization.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[1, 64, 128], [1, 32, 64], [0, 0, 0, 32], [1, 16, 16, 16]]>
 #map = affine_map<()[s0] -> (s0 * 64)>
@@ -368,9 +358,9 @@
   %c4096 = arith.constant 4096 : index
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<1x4096x512xf16>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<1x512x4096xf16>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<1x4096x4096xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<1x4096x512xf16>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<1x512x4096xf16>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<1x4096x4096xf32>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -429,13 +419,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[64, 128], [32, 64], [0, 0, 32], [16, 16, 16]]>
 #map = affine_map<()[s0] -> (s0 * 64)>
@@ -448,10 +436,10 @@
   %c4096 = arith.constant 4096 : index
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<512x64xf16>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<64x4096xf16>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<4096xf16>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : memref<512x4096xf16>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<512x64xf16>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<64x4096xf16>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<4096xf16>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : memref<512x4096xf16>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -485,8 +473,8 @@
 //  PROMOTEC-DAG: %[[RHS_ALLOC:.+]] = memref.alloc() : memref<32x128xf16, #gpu.address_space<workgroup>>
 //  PROMOTEC-NOT: memref.alloc()
 
-//      PROMOTEC: %[[SPAN2:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
-//      PROMOTEC: %[[SPAN3:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(3)
+//      PROMOTEC: %[[SPAN2:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
+//      PROMOTEC: %[[SPAN3:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(3)
 //      PROMOTEC: %[[OUT_VIEW:.+]] = memref.subview %[[SPAN3]]
 
 //      PROMOTEC: linalg.fill
@@ -520,13 +508,11 @@
 
 // Transposed+broadcasted elementwise ops does not need promoting C matrix.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[64, 128], [32, 64], [0, 0, 32], [16, 16, 16]]>
 #map = affine_map<()[s0] -> (s0 * 64)>
@@ -539,10 +525,10 @@
   %c4096 = arith.constant 4096 : index
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<512x64xf16>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<64x4096xf16>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<512xf16>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : memref<512x4096xf16>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<512x64xf16>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<64x4096xf16>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<512xf16>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : memref<512x4096xf16>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -576,8 +562,8 @@
 //  PROMOTEC-DAG: %[[RHS_ALLOC:.+]] = memref.alloc() : memref<32x128xf16, #gpu.address_space<workgroup>>
 //  PROMOTEC-NOT: memref.alloc()
 
-//      PROMOTEC: %[[SPAN2:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2)
-//      PROMOTEC: %[[SPAN3:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(3)
+//      PROMOTEC: %[[SPAN2:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2)
+//      PROMOTEC: %[[SPAN3:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(3)
 //      PROMOTEC: %[[OUT_VIEW:.+]] = memref.subview %[[SPAN3]]
 
 //      PROMOTEC: linalg.fill
@@ -611,12 +597,10 @@
 
 // Inlined large constant array needs promoting C matrix.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[64, 128], [32, 64], [0, 0, 32], [16, 16, 16]]>
 #map = affine_map<()[s0] -> (s0 * 64)>
@@ -633,9 +617,9 @@
   %cst = arith.constant 0.000000e+00 : f16
   %cst_0 = arith.constant dense<"0x69222B2E40A3002A45AC1AAB2E2E202DA21C212680264C2A102314A041A7D029CB28352E5BAAD3B02F299D9A142B8AA1D1285C28412B25AF9A24EE2BA22C242D53AD9E2948A9289FCF301D28012F08AD68A6DD20ECAC912465290B2E9420C5AA50A222A912AB9526B62ADA2039AD4D912C9FDD287B20B224D329BA2A4D2C41A76DAB7E30B027F62ED1A0F1273A2BAE9D0FA48029812992A65AA92A2C9C2EE9A744A4632C5FA8A9A4CF2D70A482A0F5A2DBA7B6304B9D22A52B1B9DA8E424722AB5ACD0248A2B8B29C82D782E402D1A99F0A60CA4DE2DD32815266F2A6B247FA6FE214E2853AA402390AB6925F1A339307F2664A23CACBE28BA2B3D286DB0BA2E"> : tensor<128xf16>
   %0 = bufferization.to_memref %cst_0 : memref<128xf16>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c96565312) : memref<128x2304xf16>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c806357120) : memref<2304x262144xf16>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c134217728) : memref<128x262144xf16>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c96565312) : memref<128x2304xf16>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c806357120) : memref<2304x262144xf16>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c134217728) : memref<128x262144xf16>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_promote_matmul.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_promote_matmul.mlir
index 36510ae..cc235c4 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_promote_matmul.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_promote_matmul.mlir
@@ -1,12 +1,10 @@
 // RUN: iree-opt --split-input-file --mlir-print-local-scope --iree-gpu-test-target=pascal@vulkan --pass-pipeline='builtin.module(func.func(iree-spirv-tile-and-promote, cse))' %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[128, 128], [16, 4], [0, 0, 32]]>
 #map = affine_map<()[s0] -> (s0 * 128)>
@@ -19,10 +17,10 @@
   %c256 = arith.constant 256 : index
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<256x128xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<128x1024xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<256x1024xf32>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : memref<256x1024xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<256x128xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<128x1024xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<256x1024xf32>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : memref<256x1024xf32>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -58,10 +56,10 @@
 //  CHECK-DAG: %[[MEM_A:.+]] = memref.alloc() : memref<128x32xf32, #gpu.address_space<workgroup>>
 //  CHECK-DAG: %[[MEM_B:.+]] = memref.alloc() : memref<32x128xf32, #gpu.address_space<workgroup>>
 
-//  CHECK-DAG: %[[BUFFER_A:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) {{.+}} : memref<256x128xf32>
-//  CHECK-DAG: %[[BUFFER_B:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) {{.+}} : memref<128x1024xf32>
-//  CHECK-DAG: %[[BUFFER_C:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(3) {{.+}} : memref<256x1024xf32>
-//  CHECK-DAG: %[[BUFFER_D:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(2) {{.+}} : memref<256x1024xf32>
+//  CHECK-DAG: %[[BUFFER_A:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) {{.+}} : memref<256x128xf32>
+//  CHECK-DAG: %[[BUFFER_B:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) {{.+}} : memref<128x1024xf32>
+//  CHECK-DAG: %[[BUFFER_C:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(3) {{.+}} : memref<256x1024xf32>
+//  CHECK-DAG: %[[BUFFER_D:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(2) {{.+}} : memref<256x1024xf32>
 
 //      CHECK: scf.for
 //      CHECK:   scf.for
@@ -112,12 +110,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[1, 64, 256], [1, 8, 8], [0, 0, 0, 16]]>
 #map = affine_map<()[s0] -> (s0 * 64)>
@@ -130,9 +126,9 @@
   %c1024 = arith.constant 1024 : index
   %cst = arith.constant 0.111803398 : f32
   %cst_0 = arith.constant 0.000000e+00 : f16
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<16x1024x80xf16>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<16x80x1024xf16>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<16x1024x1024xf16>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<16x1024x80xf16>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<16x80x1024xf16>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<16x1024x1024xf16>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -179,12 +175,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[1, 512, 8], [1, 8, 4], [0, 0, 0, 16]]>
 #map = affine_map<()[s0] -> (s0 * 512)>
@@ -196,9 +190,9 @@
   %c40 = arith.constant 40 : index
   %c0 = arith.constant 0 : index
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<16x4096x4096xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<16x4096x40xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<16x4096x40xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<16x4096x4096xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<16x4096x40xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<16x4096x40xf32>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_vectorize_batch_matmul.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_vectorize_batch_matmul.mlir
index f04d688..aa1e45f 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_vectorize_batch_matmul.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_vectorize_batch_matmul.mlir
@@ -4,12 +4,10 @@
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[1, 8, 64], [1, 8, 4], [0, 0, 0, 4]]>
 #translation = #iree_codegen.translation_info<SPIRVBaseVectorize>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @fused_fill_batch_matmul {
   hal.executable.variant @vulkan target(<"vulkan-spirv", "vulkan-spirv-fb">) {
@@ -23,9 +21,9 @@
         %cst = arith.constant 0.000000e+00 : f32
         %c4 = arith.constant 4 : index
         %c1024 = arith.constant 1024 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<4x1024x1024xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<4x1024x1024xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<4x1024x1024xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<4x1024x1024xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<4x1024x1024xf32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<4x1024x1024xf32>>
         %workgroup_id_x = hal.interface.workgroup.id[0] : index
         %workgroup_count_x = hal.interface.workgroup.count[0] : index
         %workgroup_id_y = hal.interface.workgroup.id[1] : index
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_vectorize_conv.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_vectorize_conv.mlir
index e0faab5..ec4976c 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_vectorize_conv.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_vectorize_conv.mlir
@@ -4,12 +4,10 @@
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[0, 4, 4, 16], [0, 2, 2, 4], [0, 0, 0, 0, 1, 1, 4], [0, 1, 0, 0]]>
 #translation = #iree_codegen.translation_info<SPIRVBaseVectorize>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @nhwc_conv_static_shape_f32 {
   hal.executable.variant @vulkan target(<"vulkan-spirv", "vulkan-spirv-fb">) {
@@ -23,9 +21,9 @@
         %c16 = arith.constant 16 : index
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f32
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x225x225x8xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x3x8x16xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x16xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x225x225x8xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x3x8x16xf32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x16xf32>>
         %workgroup_id_x = hal.interface.workgroup.id[0] : index
         %workgroup_count_x = hal.interface.workgroup.count[0] : index
         %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -78,12 +76,10 @@
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[0, 1, 8, 32], [0, 1, 4, 4], [0, 0, 0, 0, 1, 1], [0, 1, 0, 0]]>
 #translation = #iree_codegen.translation_info<SPIRVBaseVectorize>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @nhwc_nhwc_depthwise_conv_static_shape_f32 {
   hal.executable.variant @vulkan target(<"vulkan-spirv", "vulkan-spirv-fb">) {
@@ -97,9 +93,9 @@
         %c96 = arith.constant 96 : index
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f32
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x113x113x96xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x3x96xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x56x56x96xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x113x113x96xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x3x96xf32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x56x56x96xf32>>
         %workgroup_id_x = hal.interface.workgroup.id[0] : index
         %workgroup_count_x = hal.interface.workgroup.count[0] : index
         %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -148,13 +144,11 @@
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[0, 1, 4, 32], [0, 1, 2, 4], [0, 0, 0, 0, 1, 1, 4], [0, 1, 0, 0]]>
 #translation = #iree_codegen.translation_info<SPIRVBaseVectorize>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 hal.executable private @low_padded_conv {
@@ -176,10 +170,10 @@
         %c0 = arith.constant 0 : index
         %c112 = arith.constant 112 : index
         %c32 = arith.constant 32 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x224x224x3xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x3x3x32xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x112x112x32xf32>>
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(32) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x32xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x224x224x3xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x3x3x32xf32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x112x112x32xf32>>
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(32) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x32xf32>>
         %4 = tensor.empty() : tensor<1x112x112x32xf32>
         %workgroup_id_x = hal.interface.workgroup.id[0] : index
         %workgroup_count_x = hal.interface.workgroup.count[0] : index
@@ -266,13 +260,11 @@
 
 #config =  #iree_codegen.lowering_config<tile_sizes = [[0, 1, 4, 32], [0, 1, 2, 4], [0, 0, 0, 0, 1, 1], [0, 1, 0, 0]]>
 #translation = #iree_codegen.translation_info<SPIRVBaseVectorize>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 hal.executable private @low_high_padded_nhwc_depthwise_conv {
@@ -294,10 +286,10 @@
         %c0 = arith.constant 0 : index
         %c112 = arith.constant 112 : index
         %c32 = arith.constant 32 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x112x112x32xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x3x32xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<32xf32>>
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(32) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x32xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1x112x112x32xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x3x32xf32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<32xf32>>
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(32) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x112x112x32xf32>>
         %4 = tensor.empty() : tensor<1x112x112x32xf32>
         %workgroup_id_x = hal.interface.workgroup.id[0] : index
         %workgroup_count_x = hal.interface.workgroup.count[0] : index
@@ -388,12 +380,10 @@
 #config = #iree_codegen.lowering_config<tile_sizes = [[0, 16, 8, 8], [0, 8, 1, 4], [0, 0, 0, 0, 4, 1, 1], [0, 0, 1, 0]]>
 #translation = #iree_codegen.translation_info<SPIRVBaseVectorize>
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 hal.executable private @nchw_conv_static_shape_f32 {
@@ -407,9 +397,9 @@
         %c1280 = arith.constant 1280 : index
         %c8 = arith.constant 8 : index
         %c0 = arith.constant 0 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x1280x10x10xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1280x1280x3x3xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<2x1280x8x8xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x1280x10x10xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1280x1280x3x3xf32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<2x1280x8x8xf32>>
         %workgroup_id_x = hal.interface.workgroup.id[0] : index
         %workgroup_count_x = hal.interface.workgroup.count[0] : index
         %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -462,13 +452,11 @@
 #config = #iree_codegen.lowering_config<tile_sizes = [[0, 1, 64, 64], [1, 1, 8, 8], [0, 0, 0, 0, 1, 1, 8], [0, 1, 0, 0]]>
 #translation = #iree_codegen.translation_info<SPIRVBaseVectorize>
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 hal.executable private @nhwc_conv_static_shape_f16_batch2 {
@@ -483,10 +471,10 @@
         %c320 = arith.constant 320 : index
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f16
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x66x66x320xf16>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x3x320x320xf16>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x64x64x320xf16>>
-        %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x64x64x320xf16>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x66x66x320xf16>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3x3x320x320xf16>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2x64x64x320xf16>>
+        %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2x64x64x320xf16>>
         %workgroup_id_x = hal.interface.workgroup.id[0] : index
         %workgroup_count_x = hal.interface.workgroup.count[0] : index
         %workgroup_id_y = hal.interface.workgroup.id[1] : index
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_vectorize_matmul.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_vectorize_matmul.mlir
index e194b30..5487820 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_vectorize_matmul.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_vectorize_matmul.mlir
@@ -3,12 +3,10 @@
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[8, 64], [8, 4], [0, 0, 4]]>
 #translation = #iree_codegen.translation_info<SPIRVBaseVectorize>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @matmul_static_shape_f16 {
   hal.executable.variant @vulkan target(<"vulkan-spirv", "vulkan-spirv-fb">) {
@@ -21,9 +19,9 @@
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f16
         %c4096 = arith.constant 4096 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<4096x4096xf16>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<4096x4096xf16>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<4096x4096xf16>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<4096x4096xf16>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<4096x4096xf16>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<4096x4096xf16>>
         %workgroup_id_x = hal.interface.workgroup.id[0] : index
         %workgroup_count_x = hal.interface.workgroup.count[0] : index
         %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -65,12 +63,10 @@
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[8, 64], [8, 4], [0, 0, 4]]>
 #translation = #iree_codegen.translation_info<SPIRVBaseVectorize>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @matmul_static_shape_f32 {
   hal.executable.variant @vulkan target(<"vulkan-spirv", "vulkan-spirv-fb">) {
@@ -83,9 +79,9 @@
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f32
         %c4096 = arith.constant 4096 : index
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<4096x4096xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<4096x4096xf32>>
-        %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<4096x4096xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<4096x4096xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<4096x4096xf32>>
+        %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<4096x4096xf32>>
         %workgroup_id_x = hal.interface.workgroup.id[0] : index
         %workgroup_count_x = hal.interface.workgroup.count[0] : index
         %workgroup_id_y = hal.interface.workgroup.id[1] : index
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_vectorize_pooling.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_vectorize_pooling.mlir
index 9c43800..fb3e57c 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_vectorize_pooling.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_vectorize_pooling.mlir
@@ -4,11 +4,9 @@
 
 #config = #iree_codegen.lowering_config<tile_sizes = [[0, 2, 2, 8], [0, 1, 1, 4], [0, 0, 0, 0, 1, 1], [0, 1, 0, 0]]>
 #translation = #iree_codegen.translation_info<SPIRVBaseVectorize>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 hal.executable private @pooling_nhwc_sum_f32 {
@@ -28,8 +26,8 @@
         %c8 = arith.constant 8 : index
         %c0 = arith.constant 0 : index
         %cst = arith.constant 0.000000e+00 : f32
-        %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x24x24x8xf32>>
-        %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x2x2x8xf32>>
+        %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1x24x24x8xf32>>
+        %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<1x2x2x8xf32>>
         %2 = tensor.empty() : tensor<12x12xf32>
         %workgroup_id_x = hal.interface.workgroup.id[0] : index
         %workgroup_id_y = hal.interface.workgroup.id[1] : index
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_vectorize_to_cooperative_ops.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_vectorize_to_cooperative_ops.mlir
index a27be5d..cc56261 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_vectorize_to_cooperative_ops.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/tile_and_vectorize_to_cooperative_ops.mlir
@@ -2,14 +2,12 @@
 // RUN:   --pass-pipeline='builtin.module(func.func(iree-spirv-tile-to-cooperative-ops, iree-codegen-generic-vectorization, iree-spirv-vectorize-to-cooperative-ops, iree-codegen-optimize-tensor-insert-extract-slices, canonicalize, cse))' \
 // RUN:   %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>,
-    #hal.descriptor_set.binding<4, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[32, 32], [16, 16], [0, 0, 32], [16, 16, 16]]>
 #translation = #iree_codegen.translation_info<SPIRVCooperativeMatrixVectorize workgroup_size = [32, 1, 1]>
@@ -23,11 +21,11 @@
   %2 = gpu.thread_id  z
   %alloc = memref.alloc() : memref<32x32xf16, 3>
   %alloc_0 = memref.alloc() : memref<32x32xf16, 3>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<256x1024xf16>
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<1024x128xf16>
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<256x128xf16>
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : memref<256x128xf16>
-  %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(4) alignment(64) offset(%c0) : memref<256x128xf16>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<256x1024xf16>
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<1024x128xf16>
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<256x128xf16>
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : memref<256x128xf16>
+  %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(4) alignment(64) offset(%c0) : memref<256x128xf16>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
   %8 = affine.apply affine_map<()[s0] -> (s0 * 32)>()[%workgroup_id_y]
@@ -134,13 +132,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[1, 32, 32], [1, 16, 16], [0, 0, 0, 32], [1, 16, 16, 16]]>
 #translation = #iree_codegen.translation_info<SPIRVCooperativeMatrixVectorize workgroup_size=[32, 1, 1]>
@@ -155,13 +151,13 @@
   %2 = gpu.thread_id  z
   %alloc = memref.alloc() : memref<1x32x32xf16, 3>
   %alloc_0 = memref.alloc() : memref<1x32x32xf16, 3>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<16x128x512xf16>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<16x128x512xf16>
   memref.assume_alignment %3, 64 : memref<16x128x512xf16>
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<16x512x256xf16>
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<16x512x256xf16>
   memref.assume_alignment %4, 64 : memref<16x512x256xf16>
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<16x128x256xf16>
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<16x128x256xf16>
   memref.assume_alignment %5, 64 : memref<16x128x256xf16>
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : memref<16x128x256xf16>
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : memref<16x128x256xf16>
   memref.assume_alignment %6, 64 : memref<16x128x256xf16>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -272,12 +268,10 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<4, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #config = #iree_codegen.lowering_config<tile_sizes = [[32, 32], [16, 16], [0, 0, 32], [16, 16, 16]]>
 #translation = #iree_codegen.translation_info<SPIRVCooperativeMatrixVectorize workgroup_size=[32, 1, 1]>
@@ -292,9 +286,9 @@
   %2 = gpu.thread_id  z
   %alloc = memref.alloc() : memref<32x32xi8, 3>
   %alloc_0 = memref.alloc() : memref<32x32xi8, 3>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<256x1024xi8>
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<1024x128xi8>
-  %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(4) alignment(64) offset(%c0) : memref<256x128xi32>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<256x1024xi8>
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<1024x128xi8>
+  %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<256x128xi32>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_id_y = hal.interface.workgroup.id[1] : index
   %8 = affine.apply affine_map<()[s0] -> (s0 * 32)>()[%workgroup_id_y]
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/trim_executable_target_env.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/trim_executable_target_env.mlir
index 07a831e..9c2477a 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/trim_executable_target_env.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/trim_executable_target_env.mlir
@@ -2,14 +2,18 @@
 
 #executable_target_vulkan_spirv_fb = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {
   spirv.target_env = #spirv.target_env<#spirv.vce<v1.6, [Shader, Float64, Float16, Int64, Int16, Int8, GroupNonUniformArithmetic],
-                                      [SPV_KHR_16bit_storage, SPV_KHR_8bit_storage, SPV_KHR_storage_buffer_storage_class]>,
-                                      api=Vulkan, AMD:DiscreteGPU, #spirv.resource_limits<>>}>
-
+                                       [SPV_KHR_16bit_storage, SPV_KHR_8bit_storage, SPV_KHR_storage_buffer_storage_class]>,
+                                       api=Vulkan, AMD:DiscreteGPU, #spirv.resource_limits<>>
+}>
 
 // CHECK-DAG: #[[$TARGET0:.+]] = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.0, [Shader], [SPV_KHR_storage_buffer_storage_class]>, #spirv.resource_limits<>>}>
 // CHECK-DAG: #[[$TARGET1:.+]] = #hal.executable.target<"vulkan-spirv", "vulkan-spirv-fb", {spirv.target_env = #spirv.target_env<#spirv.vce<v1.3, [Shader, GroupNonUniformArithmetic], [SPV_KHR_storage_buffer_storage_class]>, #spirv.resource_limits<>>}>
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [<0, bindings = [<0, storage_buffer, ReadOnly>, <1, storage_buffer, ReadOnly>, <2, storage_buffer>]>]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer, ReadOnly>,
+  #hal.pipeline.binding<storage_buffer, ReadOnly>,
+  #hal.pipeline.binding<storage_buffer>
+]>
 
 hal.executable private @predict_dispatch_0 {
   // CHECK-LABEL: hal.executable.variant public @vulkan_spirv_fb0
@@ -22,10 +26,12 @@
       hal.return %c2, %c1, %c1 : index, index, index
     }
     // CHECK-NOT: spirv.target_env
-    builtin.module attributes {spirv.target_env = #spirv.target_env<
+    builtin.module attributes {
+      spirv.target_env = #spirv.target_env<
         #spirv.vce<v1.6, [Shader, Float64, Float16, Int64, Int16, Int8],
         [SPV_KHR_16bit_storage, SPV_KHR_8bit_storage, SPV_KHR_storage_buffer_storage_class]>,
-        api=Vulkan, AMD:DiscreteGPU, #spirv.resource_limits<>>} {
+        api=Vulkan, AMD:DiscreteGPU, #spirv.resource_limits<>>
+    } {
       spirv.module Logical GLSL450 requires #spirv.vce<v1.0, [Shader], [SPV_KHR_storage_buffer_storage_class]> {
         spirv.func @predict_dispatch_0_vecmat_128x784_f32() "None" {
           spirv.Return
@@ -48,10 +54,12 @@
       hal.return %c10, %c1, %c1 : index, index, index
     }
     // CHECK-NOT: spirv.target_env
-    builtin.module attributes {spirv.target_env = #spirv.target_env<
+    builtin.module attributes {
+      spirv.target_env = #spirv.target_env<
         #spirv.vce<v1.6, [Shader, Float64, Float16, Int64, Int16, Int8],
         [SPV_KHR_16bit_storage, SPV_KHR_8bit_storage, SPV_KHR_storage_buffer_storage_class]>,
-        api=Vulkan, AMD:DiscreteGPU, #spirv.resource_limits<>>} {
+        api=Vulkan, AMD:DiscreteGPU, #spirv.resource_limits<>>
+    } {
       spirv.module Logical GLSL450 requires #spirv.vce<v1.3, [Shader, GroupNonUniformArithmetic], [SPV_KHR_storage_buffer_storage_class]> {
         spirv.func @predict_dispatch_1_vecmat_10x128_f32() "None" {
           spirv.Return
diff --git a/compiler/src/iree/compiler/Codegen/SPIRV/test/vectorize_load_store.mlir b/compiler/src/iree/compiler/Codegen/SPIRV/test/vectorize_load_store.mlir
index 0a2fb48..93a861d 100644
--- a/compiler/src/iree/compiler/Codegen/SPIRV/test/vectorize_load_store.mlir
+++ b/compiler/src/iree/compiler/Codegen/SPIRV/test/vectorize_load_store.mlir
@@ -49,23 +49,21 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK-LABEL: func.func @resource_copy()
-//     CHECK: %[[A:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<4096x1024xvector<4xf32>>
-//     CHECK: %[[B:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) : memref<4096x1024xvector<4xf32>>
+//     CHECK: %[[A:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<4096x1024xvector<4xf32>>
+//     CHECK: %[[B:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) : memref<4096x1024xvector<4xf32>>
 //     CHECK: %[[V:.+]] = memref.load %[[A]][%{{.*}}, %{{.*}}] : memref<4096x1024xvector<4xf32>>
 //     CHECK: memref.store %[[V]], %[[B]][%{{.*}}, %{{.*}}] : memref<4096x1024xvector<4xf32>>
 func.func @resource_copy() {
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<4096x4096xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<4096x4096xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<4096x4096xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<4096x4096xf32>
   %v = vector.transfer_read %0[%c0, %c0], %cst : memref<4096x4096xf32>, vector<4xf32>
   vector.transfer_write %v, %1[%c0, %c0] : vector<4xf32>, memref<4096x4096xf32>
   return
@@ -73,24 +71,22 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK-LABEL: func.func @resource_copy_with_offset()
-//     CHECK: %[[A:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) offset(%{{.*}}) : memref<2048x4096x1024xvector<4xf32>, strided<[4194304, 1024, 1], offset: ?>>
-//     CHECK: %[[B:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) : memref<4096x1024xvector<4xf32>>
+//     CHECK: %[[A:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) offset(%{{.*}}) : memref<2048x4096x1024xvector<4xf32>, strided<[4194304, 1024, 1], offset: ?>>
+//     CHECK: %[[B:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) : memref<4096x1024xvector<4xf32>>
 //     CHECK: %[[V:.+]] = memref.load %[[A]][%{{.*}}, %{{.*}}, %{{.*}}] : memref<2048x4096x1024xvector<4xf32>, strided<[4194304, 1024, 1], offset: ?>>
 //     CHECK: memref.store %[[V]], %[[B]][%{{.*}}, %{{.*}}] : memref<4096x1024xvector<4xf32>>
 func.func @resource_copy_with_offset() {
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
   %offset = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%offset) : memref<2048x4096x4096xf32, strided<[16777216, 4096, 1], offset: ?>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<4096x4096xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%offset) : memref<2048x4096x4096xf32, strided<[16777216, 4096, 1], offset: ?>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<4096x4096xf32>
   %v = vector.transfer_read %0[%c0, %c0, %c0], %cst : memref<2048x4096x4096xf32, strided<[16777216, 4096, 1], offset: ?>>, vector<4xf32>
   vector.transfer_write %v, %1[%c0, %c0] : vector<4xf32>, memref<4096x4096xf32>
   return
@@ -98,23 +94,21 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK-LABEL: func.func @resource_copy_f16
-//     CHECK: %[[A:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<4096x1024xvector<4xf16>>
-//     CHECK: %[[B:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) : memref<4096x1024xvector<4xf16>>
+//     CHECK: %[[A:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<4096x1024xvector<4xf16>>
+//     CHECK: %[[B:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) : memref<4096x1024xvector<4xf16>>
 //     CHECK: %[[V:.+]] = memref.load %[[A]][%{{.*}}, %{{.*}}] : memref<4096x1024xvector<4xf16>>
 //     CHECK: memref.store %[[V]], %[[B]][%{{.*}}, %{{.*}}] : memref<4096x1024xvector<4xf16>>
 func.func @resource_copy_f16() {
   %cst = arith.constant 0.000000e+00 : f16
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<4096x4096xf16>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<4096x4096xf16>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<4096x4096xf16>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<4096x4096xf16>
   %v = vector.transfer_read %0[%c0, %c0], %cst : memref<4096x4096xf16>, vector<4xf16>
   vector.transfer_write %v, %1[%c0, %c0] : vector<4xf16>, memref<4096x4096xf16>
   return
@@ -122,23 +116,21 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK-LABEL: func.func @resource_copy_8xf16
-//     CHECK: %[[A:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<4096x512xvector<4xf32>>
-//     CHECK: %[[B:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) : memref<4096x512xvector<4xf32>>
+//     CHECK: %[[A:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<4096x512xvector<4xf32>>
+//     CHECK: %[[B:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) : memref<4096x512xvector<4xf32>>
 //     CHECK: %[[V:.+]] = memref.load %[[A]][%{{.*}}, %{{.*}}] : memref<4096x512xvector<4xf32>>
 //     CHECK: memref.store %[[V]], %[[B]][%{{.*}}, %{{.*}}] : memref<4096x512xvector<4xf32>>
 func.func @resource_copy_8xf16() {
   %cst = arith.constant 0.000000e+00 : f16
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<4096x4096xf16>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<4096x4096xf16>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<4096x4096xf16>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<4096x4096xf16>
   %v = vector.transfer_read %0[%c0, %c0], %cst : memref<4096x4096xf16>, vector<8xf16>
   vector.transfer_write %v, %1[%c0, %c0] : vector<8xf16>, memref<4096x4096xf16>
   return
@@ -146,11 +138,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK-LABEL: func.func @resource_copy_dynamic_shape()
@@ -162,10 +152,10 @@
   %dim0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
   %dim1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : index
 
-  // CHECK: %[[INPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<?x8x?x32xvector<4xf32>>{%[[DIM0]], %[[DIM1]]}
-  // CHECK: %[[OUTPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) : memref<?x8x?x32xvector<4xf32>>{%[[DIM0]], %[[DIM1]]}
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<?x8x?x128xf32>{%dim0, %dim1}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<?x8x?x128xf32>{%dim0, %dim1}
+  // CHECK: %[[INPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<?x8x?x32xvector<4xf32>>{%[[DIM0]], %[[DIM1]]}
+  // CHECK: %[[OUTPUT:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(1) : memref<?x8x?x32xvector<4xf32>>{%[[DIM0]], %[[DIM1]]}
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<?x8x?x128xf32>{%dim0, %dim1}
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<?x8x?x128xf32>{%dim0, %dim1}
 
   // CHECK: %[[VAL:.+]] = memref.load %[[INPUT]]
   // CHECK: memref.store %[[VAL]], %[[OUTPUT]]
@@ -177,11 +167,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK-LABEL: func.func @resource_copy_dynamic_last_dim()
@@ -189,10 +177,10 @@
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
   %dim = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
-  // CHECK: hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<4096x?xf32>
-  // CHECK: hal.interface.binding.subspan layout({{.+}}) set(0) binding(1) : memref<4096x?xf32>
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<4096x?xf32>{%dim}
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<4096x?xf32>{%dim}
+  // CHECK: hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<4096x?xf32>
+  // CHECK: hal.interface.binding.subspan layout({{.+}}) binding(1) : memref<4096x?xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<4096x?xf32>{%dim}
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<4096x?xf32>{%dim}
   %v = vector.transfer_read %0[%c0, %c0], %cst : memref<4096x?xf32>, vector<4xf32>
   vector.transfer_write %v, %1[%c0, %c0] : vector<4xf32>, memref<4096x?xf32>
   return
@@ -200,11 +188,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK-LABEL: func.func @dont_vectorize_odd_vector_size
@@ -213,10 +199,10 @@
   %c0 = arith.constant 0 : index
   // CHECK: hal.interface.binding.subspan
   // CHECK-SAME: memref<4x3xf32>
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<4x3xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<4x3xf32>
   // CHECK: hal.interface.binding.subspan
   // CHECK-SAME: memref<4x3xf32>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<4x3xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<4x3xf32>
   %v = vector.transfer_read %0[%c0, %c0], %cst : memref<4x3xf32>, vector<3xf32>
   vector.transfer_write %v, %1[%c0, %c0] : vector<3xf32>, memref<4x3xf32>
   return
@@ -224,11 +210,9 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK-LABEL: func.func @scalarize_vector_transfer_op
@@ -236,8 +220,8 @@
   %c0 = arith.constant 0: index
   %c3 = arith.constant 3: index
   %f0 = arith.constant 0.0 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<20xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<20xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<20xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<20xf32>
   // CHECK-DAG: %[[INDEX0:.+]] = arith.constant 3 : index
   // CHECK-DAG: %[[INDEX1:.+]] = arith.constant 4 : index
   // CHECK-DAG: %[[INDEX2:.+]] = arith.constant 5 : index
@@ -289,17 +273,15 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK-LABEL: func.func @scalarize_non_minor_identity_transfer_write
 //  CHECK-SAME: (%[[VALUE:.+]]: vector<4xf32>, %[[I1:.+]]: index, %[[I2:.+]]: index)
 func.func @scalarize_non_minor_identity_transfer_write(%value: vector<4xf32>, %i1: index, %i2: index) {
   %c0 = arith.constant 0: index
-  %buffer = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<1x130x130x64xf32>
+  %buffer = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<1x130x130x64xf32>
   vector.transfer_write %value, %buffer[%c0, %i1, %i2, %c0] {in_bounds = [true], permutation_map = affine_map<(d0, d1, d2, d3) -> (d2)>} : vector<4xf32>, memref<1x130x130x64xf32>
   return
 }
@@ -346,16 +328,14 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK-LABEL: func.func @scalarize_indivisible_vector_transfer_read_op
 func.func @scalarize_indivisible_vector_transfer_read_op(%i: index) -> vector<4xf32> {
   %f0 = arith.constant 0.0 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<10xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<10xf32>
   %1 = vector.transfer_read %0[%i], %f0 : memref<10xf32>, vector<4xf32>
   return %1: vector<4xf32>
 }
@@ -366,16 +346,14 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK-LABEL: func.func @scalarize_indivisible_vector_transfer_write_op
 func.func @scalarize_indivisible_vector_transfer_write_op(%value: vector<4xf32>, %i: index) {
   %f0 = arith.constant 0.0 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<10xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<10xf32>
   vector.transfer_write %value, %0[%i] : vector<4xf32>, memref<10xf32>
   return
 }
@@ -434,17 +412,15 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK-LABEL: func.func @scalarize_vector_load_op
 //  CHECK-SAME: (%[[ARG0:.+]]: index)
 func.func @scalarize_vector_load_op(%i: index) -> vector<4xi32> {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<10x10xi32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<10x10xi32>
   %1 = vector.load %0[%c0, %i] : memref<10x10xi32>, vector<4xi32>
   return %1: vector<4xi32>
 }
@@ -469,35 +445,31 @@
 
 // Test that the memref is not vectorized if the element type is a complex type.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK-LABEL: func.func @complex_memref
 func.func @complex_memref(%x: index, %y: index) -> complex<f32> {
-  // CHECK: hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<8x32xcomplex<f32>>
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<8x32xcomplex<f32>>
+  // CHECK: hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<8x32xcomplex<f32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<8x32xcomplex<f32>>
   %1 = memref.load %0[%x, %y] : memref<8x32xcomplex<f32>>
   return %1: complex<f32>
 }
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK-LABEL: func.func @vectorize_mma_load_store_non_identity_memref
 //  CHECK-SAME: (%[[I0:.+]]: index, %[[I1:.+]]: index)
 func.func @vectorize_mma_load_store_non_identity_memref(%i0: index, %i1: index) {
   %c0 = arith.constant 0 : index
-  %span0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<32x1280xf16, strided<[1280, 1], offset: 11840>, #hal.descriptor_type<storage_buffer>>
-  %span1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<32x1280xf16, strided<[1280, 1], offset: 11840>, #hal.descriptor_type<storage_buffer>>
+  %span0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<32x1280xf16, strided<[1280, 1], offset: 11840>, #hal.descriptor_type<storage_buffer>>
+  %span1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<32x1280xf16, strided<[1280, 1], offset: 11840>, #hal.descriptor_type<storage_buffer>>
   %val = gpu.subgroup_mma_load_matrix %span0[%i0, %i1] {leadDimension = 1280 : index} : memref<32x1280xf16, strided<[1280, 1], offset: 11840>, #hal.descriptor_type<storage_buffer>> -> !gpu.mma_matrix<16x16xf16, "COp">
   gpu.subgroup_mma_store_matrix %val, %span1[%i0, %i1] {leadDimension = 1280 : index} : !gpu.mma_matrix<16x16xf16, "COp">, memref<32x1280xf16, strided<[1280, 1], offset: 11840>, #hal.descriptor_type<storage_buffer>>
   return
@@ -512,22 +484,20 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 func.func @transfer_read_i4_memref_vector8(%x: index) -> vector<8xi4> {
   %c0_i4 = arith.constant 0 : i4
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<2048xi4>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<2048xi4>
   %1 = vector.transfer_read %0[%x], %c0_i4 {in_bounds = [true]} : memref<2048xi4>, vector<8xi4>
   return %1: vector<8xi4>
 }
 
 // CHECK-LABEL: func.func @transfer_read_i4_memref_vector8
 //  CHECK-SAME: (%[[ARG:.+]]: index)
-//       CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<256xvector<1xi32>>
+//       CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<256xvector<1xi32>>
 //       CHECK:   %[[INDEX:.+]] = affine.apply affine_map<()[s0] -> (s0 floordiv 8)>()[%[[ARG]]]
 //       CHECK:   %[[LOAD:.+]] = memref.load %[[SUBSPAN]][%[[INDEX]]] : memref<256xvector<1xi32>>
 //       CHECK:   %[[CAST:.+]] = vector.bitcast %[[LOAD]] : vector<1xi32> to vector<8xi4>
@@ -537,14 +507,14 @@
 
 // func.func @transfer_read_i4_memref_vector4(%x: index) -> vector<4xi4> {
 //   %c0_i4 = arith.constant 0 : i4
-//   %0 = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<2048xi4>
+//   %0 = hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<2048xi4>
 //   %1 = vector.transfer_read %0[%x], %c0_i4 {in_bounds = [true]} : memref<2048xi4>, vector<4xi4>
 //   return %1: vector<4xi4>
 // }
 
 // XXXXX-LABEL: func.func @transfer_read_i4_memref_vector4
 //  XXXXX-SAME: (%[[ARG:.+]]: index)
-//       XXXXX:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<512xvector<2xi8>>
+//       XXXXX:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<512xvector<2xi8>>
 //       XXXXX:   %[[INDEX:.+]] = affine.apply affine_map<()[s0] -> (s0 floordiv 4)>()[%[[ARG]]]
 //       XXXXX:   %[[LOAD:.+]] = memref.load %[[SUBSPAN]][%[[INDEX]]] : memref<512xvector<2xi8>>
 //       XXXXX:   %[[CAST:.+]] = vector.bitcast %[[LOAD]] : vector<2xi8> to vector<4xi4>
@@ -552,22 +522,20 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 func.func @transfer_read_i4_memref_vector2(%x: index) -> vector<2xi4> {
   %c0_i4 = arith.constant 0 : i4
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<2048xi4>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<2048xi4>
   %1 = vector.transfer_read %0[%x], %c0_i4 {in_bounds = [true]} : memref<2048xi4>, vector<2xi4>
   return %1: vector<2xi4>
 }
 
 // XXXXX-LABEL: func.func @transfer_read_i4_memref_vector2
 //  XXXXX-SAME: (%[[ARG:.+]]: index)
-//       XXXXX:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<1024xvector<1xi8>>
+//       XXXXX:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<1024xvector<1xi8>>
 //       XXXXX:   %[[INDEX:.+]] = affine.apply affine_map<()[s0] -> (s0 floordiv 2)>()[%[[ARG]]]
 //       XXXXX:   %[[LOAD:.+]] = memref.load %[[SUBSPAN]][%[[INDEX]]] : memref<1024xvector<1xi8>>
 //       XXXXX:   %[[CAST:.+]] = vector.bitcast %[[LOAD]] : vector<1xi8> to vector<2xi4>
@@ -575,34 +543,30 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 func.func @transfer_read_i3_memref_vector8(%x: index) -> vector<8xi3> {
   %c0_i3 = arith.constant 0 : i3
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<2048xi3>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<2048xi3>
   %1 = vector.transfer_read %0[%x], %c0_i3 {in_bounds = [true]} : memref<2048xi3>, vector<8xi3>
   return %1: vector<8xi3>
 }
 
 //   CHECK-LABEL: func.func @transfer_read_i3_memref_vector8
-//         CHECK:   hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<2048xi3>
+//         CHECK:   hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<2048xi3>
 // CHECK-COUNT-8:   memref.load {{.+}} : memref<2048xi3>
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 func.func @transfer_read_vector2_vector8(%x: index) -> (vector<2xi32>, vector<8xi32>) {
   %c0 = arith.constant 0 : i32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<2048xi32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<2048xi32>
   %1 = vector.transfer_read %0[%x], %c0 {in_bounds = [true]} : memref<2048xi32>, vector<2xi32>
   %2 = vector.transfer_read %0[%x], %c0 {in_bounds = [true]} : memref<2048xi32>, vector<8xi32>
   return %1, %2: vector<2xi32>, vector<8xi32>
@@ -628,15 +592,13 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 func.func @transfer_write_vector2_vector8(%x: index, %val0: vector<2xi32>, %val1: vector<8xi32>) {
   %c0 = arith.constant 0 : i32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<2048xi32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<2048xi32>
   vector.transfer_write %val0, %0[%x] : vector<2xi32>, memref<2048xi32>
   vector.transfer_write %val1, %0[%x] : vector<8xi32>, memref<2048xi32>
   return
@@ -644,7 +606,7 @@
 
 // CHECK-LABEL: func @transfer_write_vector2_vector8
 //  CHECK-SAME: (%[[INDEX:.+]]: index, %[[VAL0:.+]]: vector<2xi32>, %[[VAL1:.+]]: vector<8xi32>)
-//       CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(0) binding(0) : memref<1024xvector<2xi32>>
+//       CHECK:   %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) binding(0) : memref<1024xvector<2xi32>>
 
 //       CHECK:   %[[OFFSET0:.+]] = affine.apply affine_map<()[s0] -> (s0 floordiv 2)>()[%[[INDEX]]]
 //       CHECK:   memref.store %[[VAL0]], %[[SUBSPAN]][%[[OFFSET0]]]
@@ -663,19 +625,17 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 func.func @scalarize_masked_vector_transfer_op(%arg: vector<3xf32>, %mask: vector<3xi1>) -> (vector<3xf32>) {
   %c0 = arith.constant 0: index
   %c3 = arith.constant 3: index
   %f0 = arith.constant 0.0 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<20xf32>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<20xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<20xf32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<20xf32>
   %3 = vector.transfer_read %0[%c3], %f0, %mask : memref<20xf32>, vector<3xf32>
   vector.transfer_write %arg, %2[%c3], %mask : vector<3xf32>, memref<20xf32>
   return %3: vector<3xf32>
@@ -722,17 +682,15 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 func.func @extract_vector_transfer_read_mask_bits(%arg: vector<3xf32>, %index: index) -> (vector<3xf32>) {
   %c3 = arith.constant 3: index
   %f0 = arith.constant 0.0 : f32
   %mask = vector.create_mask %index : vector<3xi1>
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<20xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<20xf32>
   %1 = vector.transfer_read %0[%c3], %f0, %mask : memref<20xf32>, vector<3xf32>
   return %1: vector<3xf32>
 }
diff --git a/compiler/src/iree/compiler/Codegen/Transforms/Transforms.cpp b/compiler/src/iree/compiler/Codegen/Transforms/Transforms.cpp
index 337d4ed..3507752 100644
--- a/compiler/src/iree/compiler/Codegen/Transforms/Transforms.cpp
+++ b/compiler/src/iree/compiler/Codegen/Transforms/Transforms.cpp
@@ -551,7 +551,7 @@
 
     Value newSubspanOp = rewriter.create<IREE::HAL::InterfaceBindingSubspanOp>(
         subspanOp.getLoc(), newSubspanType, subspanOp.getLayout(),
-        subspanOp.getSet(), subspanOp.getBinding(), subspanOp.getByteOffset(),
+        subspanOp.getBinding(), subspanOp.getByteOffset(),
         subspanOp.getDynamicDims(), subspanOp.getAlignmentAttr(),
         subspanOp.getDescriptorFlagsAttr());
 
@@ -623,7 +623,7 @@
       rewriter.setInsertionPointAfter(subspanOp);
       newSubspanOp = rewriter.create<IREE::HAL::InterfaceBindingSubspanOp>(
           subspanOp.getLoc(), newSubspanType, subspanOp.getLayout(),
-          subspanOp.getSet(), subspanOp.getBinding(), subspanOp.getByteOffset(),
+          subspanOp.getBinding(), subspanOp.getByteOffset(),
           subspanOp.getDynamicDims(), subspanOp.getAlignmentAttr(),
           subspanOp.getDescriptorFlagsAttr());
     }
@@ -759,7 +759,7 @@
       rewriter.setInsertionPointAfter(subspanOp);
       newSubspanOp = rewriter.create<IREE::HAL::InterfaceBindingSubspanOp>(
           subspanOp.getLoc(), newSubspanType, subspanOp.getLayout(),
-          subspanOp.getSet(), subspanOp.getBinding(), subspanOp.getByteOffset(),
+          subspanOp.getBinding(), subspanOp.getByteOffset(),
           subspanOp.getDynamicDims(), subspanOp.getAlignmentAttr(),
           subspanOp.getDescriptorFlagsAttr());
     }
diff --git a/compiler/src/iree/compiler/Codegen/VMVX/test/link_executables.mlir b/compiler/src/iree/compiler/Codegen/VMVX/test/link_executables.mlir
index 2baedf6..e760eda 100644
--- a/compiler/src/iree/compiler/Codegen/VMVX/test/link_executables.mlir
+++ b/compiler/src/iree/compiler/Codegen/VMVX/test/link_executables.mlir
@@ -1,11 +1,9 @@
 // RUN: iree-opt --split-input-file --iree-vmvx-link-executables %s | FileCheck %s
 
 #vmvx_target = #hal.executable.target<"vmvx", "vmvx-bytecode-fb">
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 hal.executable private @dispatch_0 {
@@ -87,9 +85,9 @@
   %dispatch_0_ordinal = hal.executable.export.ordinal target(@dispatch_0::@vmvx::@dispatch_0) : index
   %dispatch_1_ordinal = hal.executable.export.ordinal target(@dispatch_1::@vmvx::@dispatch_1) : index
   %dispatch_2_ordinal = hal.executable.export.ordinal target(@dispatch_2::@vmvx::@dispatch_2) : index
-  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_0_exe : !hal.executable)[%dispatch_0_ordinal] workgroups([%c1, %c1, %c1]) flags(None)
-  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_1_exe : !hal.executable)[%dispatch_1_ordinal] workgroups([%c1, %c1, %c1]) flags(None)
-  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_2_exe : !hal.executable)[%dispatch_2_ordinal] workgroups([%c1, %c1, %c1]) flags(None)
+  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_0_exe : !hal.executable)[%dispatch_0_ordinal] workgroups([%c1, %c1, %c1]) bindings([(%c0 : index)[%c0, %c0]]) flags(None)
+  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_1_exe : !hal.executable)[%dispatch_1_ordinal] workgroups([%c1, %c1, %c1]) bindings([(%c0 : index)[%c0, %c0]]) flags(None)
+  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_2_exe : !hal.executable)[%dispatch_2_ordinal] workgroups([%c1, %c1, %c1]) bindings([(%c0 : index)[%c0, %c0]]) flags(None)
   return
 }
 util.initializer {
@@ -104,9 +102,9 @@
   %dispatch_0_ordinal = hal.executable.export.ordinal target(@dispatch_0::@vmvx::@dispatch_0) : index
   %dispatch_1_ordinal = hal.executable.export.ordinal target(@dispatch_1::@vmvx::@dispatch_1) : index
   %dispatch_2_ordinal = hal.executable.export.ordinal target(@dispatch_2::@vmvx::@dispatch_2) : index
-  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_0_exe : !hal.executable)[%dispatch_0_ordinal] workgroups([%c1, %c1, %c1]) flags(None)
-  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_1_exe : !hal.executable)[%dispatch_1_ordinal] workgroups([%c1, %c1, %c1]) flags(None)
-  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_2_exe : !hal.executable)[%dispatch_2_ordinal] workgroups([%c1, %c1, %c1]) flags(None)
+  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_0_exe : !hal.executable)[%dispatch_0_ordinal] workgroups([%c1, %c1, %c1]) bindings([(%c0 : index)[%c0, %c0]]) flags(None)
+  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_1_exe : !hal.executable)[%dispatch_1_ordinal] workgroups([%c1, %c1, %c1]) bindings([(%c0 : index)[%c0, %c0]]) flags(None)
+  hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%dispatch_2_exe : !hal.executable)[%dispatch_2_ordinal] workgroups([%c1, %c1, %c1]) bindings([(%c0 : index)[%c0, %c0]]) flags(None)
   util.return
 }
 
@@ -158,8 +156,8 @@
 // CHECK-DAG:     %[[DISPATCH_1_ORDINAL:.+]] = hal.executable.export.ordinal target(@link_executables_linked_vmvx::@vmvx_bytecode_fb::@dispatch_1)
 // CHECK-DAG:     %[[DISPATCH_2_ORDINAL:.+]] = hal.executable.export.ordinal target(@link_executables_linked_vmvx::@vmvx_bytecode_fb::@dispatch_2)
 // CHECK:         hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%[[DISPATCH_0_EXE]] : !hal.executable)[%[[DISPATCH_0_ORDINAL]]] workgroups([%c1, %c1, %c1])
-// CHECK-NEXT:    hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%[[DISPATCH_1_EXE]] : !hal.executable)[%[[DISPATCH_1_ORDINAL]]] workgroups([%c1, %c1, %c1])
-// CHECK-NEXT:    hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%[[DISPATCH_2_EXE]] : !hal.executable)[%[[DISPATCH_2_ORDINAL]]] workgroups([%c1, %c1, %c1])
+// CHECK:         hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%[[DISPATCH_1_EXE]] : !hal.executable)[%[[DISPATCH_1_ORDINAL]]] workgroups([%c1, %c1, %c1])
+// CHECK:         hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%[[DISPATCH_2_EXE]] : !hal.executable)[%[[DISPATCH_2_ORDINAL]]] workgroups([%c1, %c1, %c1])
 //
 // CHECK:       util.initializer
 // CHECK-DAG:     %[[DISPATCH_0_EXE:.+]] = hal.executable.lookup device(%{{.+}}) executable(@link_executables_linked_vmvx) : !hal.executable
@@ -169,17 +167,15 @@
 // CHECK-DAG:     %[[DISPATCH_1_ORDINAL:.+]] = hal.executable.export.ordinal target(@link_executables_linked_vmvx::@vmvx_bytecode_fb::@dispatch_1)
 // CHECK-DAG:     %[[DISPATCH_2_ORDINAL:.+]] = hal.executable.export.ordinal target(@link_executables_linked_vmvx::@vmvx_bytecode_fb::@dispatch_2)
 // CHECK:         hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%[[DISPATCH_0_EXE]] : !hal.executable)[%[[DISPATCH_0_ORDINAL]]] workgroups([%c1, %c1, %c1])
-// CHECK-NEXT:    hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%[[DISPATCH_1_EXE]] : !hal.executable)[%[[DISPATCH_1_ORDINAL]]] workgroups([%c1, %c1, %c1])
-// CHECK-NEXT:    hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%[[DISPATCH_2_EXE]] : !hal.executable)[%[[DISPATCH_2_ORDINAL]]] workgroups([%c1, %c1, %c1])
+// CHECK:         hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%[[DISPATCH_1_EXE]] : !hal.executable)[%[[DISPATCH_1_ORDINAL]]] workgroups([%c1, %c1, %c1])
+// CHECK:         hal.command_buffer.dispatch<%cmd : !hal.command_buffer> target(%[[DISPATCH_2_EXE]] : !hal.executable)[%[[DISPATCH_2_ORDINAL]]] workgroups([%c1, %c1, %c1])
 
 // -----
 
 #vmvx_target = #hal.executable.target<"vmvx", "vmvx-bytecode-fb">
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 hal.executable private @dispatch_0 {
diff --git a/compiler/src/iree/compiler/Codegen/VMVX/test/pipeline.mlir b/compiler/src/iree/compiler/Codegen/VMVX/test/pipeline.mlir
index 38cc29a..9f18092 100644
--- a/compiler/src/iree/compiler/Codegen/VMVX/test/pipeline.mlir
+++ b/compiler/src/iree/compiler/Codegen/VMVX/test/pipeline.mlir
@@ -2,11 +2,9 @@
 
 #executable_target_vmvx_bytecode_fb = #hal.executable.target<"vmvx", "vmvx-bytecode-fb", {ukernels = "all"}>
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 #map = affine_map<(d0, d1, d2) -> (d0, d2)>
@@ -23,15 +21,15 @@
   %0:2 = iree_codegen.query_tile_sizes tensor<16x16xi8, #iree_encoding.encoding<operand_index = 0, op_type = matmul, element_types = [i8, i8, i32], user_indexing_maps = [#map, #map1, #map2]>> -> index, index
   %1 = affine.apply #map3()[%0#0]
   %2 = affine.apply #map3()[%0#1]
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xi8>>{%1, %2, %0#0, %0#1}
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xi8>>{%1, %2, %0#0, %0#1}
   %4:2 = iree_codegen.query_tile_sizes tensor<16x16xi8, #iree_encoding.encoding<operand_index = 1, op_type = matmul, element_types = [i8, i8, i32], user_indexing_maps = [#map, #map1, #map2]>> -> index, index
   %5 = affine.apply #map3()[%4#0]
   %6 = affine.apply #map3()[%4#1]
-  %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c256) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xi8>>{%5, %6, %4#0, %4#1}
+  %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c256) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xi8>>{%5, %6, %4#0, %4#1}
   %8:2 = iree_codegen.query_tile_sizes tensor<16x16xi32, #iree_encoding.encoding<operand_index = 2, op_type = matmul, element_types = [i8, i8, i32], user_indexing_maps = [#map, #map1, #map2]>> -> index, index
   %9 = affine.apply #map3()[%8#0]
   %10 = affine.apply #map3()[%8#1]
-  %11 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c512) : !flow.dispatch.tensor<readwrite:tensor<?x?x?x?xi32>>{%9, %10, %8#0, %8#1}
+  %11 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c512) : !flow.dispatch.tensor<readwrite:tensor<?x?x?x?xi32>>{%9, %10, %8#0, %8#1}
   %12 = flow.dispatch.tensor.load %3, offsets = [0, 0, 0, 0], sizes = [%1, %2, %0#0, %0#1], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xi8>>{%1, %2, %0#0, %0#1} -> tensor<?x?x?x?xi8>
   %13 = flow.dispatch.tensor.load %7, offsets = [0, 0, 0, 0], sizes = [%5, %6, %4#0, %4#1], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?x?x?xi8>>{%5, %6, %4#0, %4#1} -> tensor<?x?x?x?xi8>
   %14 = flow.dispatch.tensor.load %11, offsets = [0, 0, 0, 0], sizes = [%9, %10, %8#0, %8#1], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readwrite:tensor<?x?x?x?xi32>>{%9, %10, %8#0, %8#1} -> tensor<?x?x?x?xi32>
diff --git a/compiler/src/iree/compiler/Codegen/VMVX/test/select_lowering_strategy.mlir b/compiler/src/iree/compiler/Codegen/VMVX/test/select_lowering_strategy.mlir
index 5a2f640..34bcdaa 100644
--- a/compiler/src/iree/compiler/Codegen/VMVX/test/select_lowering_strategy.mlir
+++ b/compiler/src/iree/compiler/Codegen/VMVX/test/select_lowering_strategy.mlir
@@ -1,18 +1,16 @@
 // RUN: iree-opt -pass-pipeline='builtin.module(iree-vmvx-select-lowering-strategy)' -split-input-file %s | FileCheck %s
 
 #executable_target_vmvx_bytecode_fb = #hal.executable.target<"vmvx", "vmvx-bytecode-fb">
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @matmul_static() attributes {hal.executable.target = #executable_target_vmvx_bytecode_fb} {
   %cst = arith.constant 0.000000e+00 : f32
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readonly:tensor<384x512xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readonly:tensor<512x128xf32>>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) : !flow.dispatch.tensor<writeonly:tensor<384x128xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readonly:tensor<384x512xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readonly:tensor<512x128xf32>>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) : !flow.dispatch.tensor<writeonly:tensor<384x128xf32>>
   %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [384, 512], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<384x512xf32>> -> tensor<384x512xf32>
   %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [512, 128], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<512x128xf32>> -> tensor<512x128xf32>
   %5 = tensor.empty() : tensor<384x128xf32>
@@ -32,11 +30,9 @@
 // -----
 
 #executable_target_vmvx_bytecode_fb = #hal.executable.target<"vmvx", "vmvx-bytecode-fb">
-#pipeline_layout = #hal.pipeline.layout<push_constants = 6, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 6, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1) -> (d0, d1)>
 func.func @copy_op_dynamic() attributes {hal.executable.target = #executable_target_vmvx_bytecode_fb} {
@@ -46,8 +42,8 @@
   %3 = hal.interface.constant.load layout(#pipeline_layout) ordinal(3) : index
   %4 = hal.interface.constant.load layout(#pipeline_layout) ordinal(4) : index
   %5 = hal.interface.constant.load layout(#pipeline_layout) ordinal(5) : index
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<?x?xi32>{%0, %1}
-  %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<?x?xi32>{%2, %3}
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<?x?xi32>{%0, %1}
+  %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<?x?xi32>{%2, %3}
   %subview = memref.subview %7[%4, %5] [%0, %1] [1, 1] : memref<?x?xi32> to memref<?x?xi32, strided<[?, 1], offset: ?>>
   linalg.generic {indexing_maps = [#map, #map], iterator_types = ["parallel", "parallel"]} ins(%6 : memref<?x?xi32>) outs(%subview : memref<?x?xi32, strided<[?, 1], offset: ?>>) {
   ^bb0(%in: i32, %out: i32):
@@ -66,19 +62,17 @@
 // -----
 
 #executable_target_vmvx_bytecode_fb = #hal.executable.target<"vmvx", "vmvx-bytecode-fb">
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @static_1d_fft_stage2() attributes {hal.executable.target = #executable_target_vmvx_bytecode_fb} {
   %c0 = arith.constant 0 : index
   %c2 = arith.constant 2 : index
   %cst = arith.constant dense<[1.000000e+00, 6.12323426E-17]> : tensor<2xf32>
   %cst_0 = arith.constant dense<[-0.000000e+00, -1.000000e+00]> : tensor<2xf32>
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : !flow.dispatch.tensor<readwrite:tensor<32xf32>>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : !flow.dispatch.tensor<readwrite:tensor<32xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : !flow.dispatch.tensor<readwrite:tensor<32xf32>>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : !flow.dispatch.tensor<readwrite:tensor<32xf32>>
   %2 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [32], strides = [1] : !flow.dispatch.tensor<readwrite:tensor<32xf32>> -> tensor<32xf32>
   %3 = flow.dispatch.tensor.load %1, offsets = [0], sizes = [32], strides = [1] : !flow.dispatch.tensor<readwrite:tensor<32xf32>> -> tensor<32xf32>
   %4:2 = iree_linalg_ext.fft ins(%c2, %cst, %cst_0 : index, tensor<2xf32>, tensor<2xf32>) outs(%2, %3 : tensor<32xf32>, tensor<32xf32>) : tensor<32xf32>, tensor<32xf32>
@@ -97,13 +91,11 @@
 // -----
 
 #executable_target_vmvx_bytecode_fb = #hal.executable.target<"vmvx", "vmvx-bytecode-fb">
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>,
-    #hal.descriptor_set.binding<3, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<(d0, d1) -> (d1)>
 #map1 = affine_map<(d0, d1) -> (d0, d1)>
@@ -117,11 +109,11 @@
   %c0 = arith.constant 0 : index
   %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : i32
   %1 = arith.index_castui %0 : i32 to index
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3360x32xi8>>
-  %3 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<32xi32>>
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c107520) : !flow.dispatch.tensor<readonly:tensor<32xi32>>
-  %5 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?x3360xi8>>{%1}
-  %6 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<?x32xi8>>{%1}
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3360x32xi8>>
+  %3 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<32xi32>>
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c107520) : !flow.dispatch.tensor<readonly:tensor<32xi32>>
+  %5 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?x3360xi8>>{%1}
+  %6 = hal.interface.binding.subspan layout(#pipeline_layout) binding(3) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<?x32xi8>>{%1}
   %7 = flow.dispatch.tensor.load %5, offsets = [0, 0], sizes = [%1, 3360], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<?x3360xi8>>{%1} -> tensor<?x3360xi8>
   %8 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [3360, 32], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<3360x32xi8>> -> tensor<3360x32xi8>
   %9 = flow.dispatch.tensor.load %3, offsets = [0], sizes = [32], strides = [1] : !flow.dispatch.tensor<readonly:tensor<32xi32>> -> tensor<32xi32>
@@ -158,11 +150,9 @@
 // -----
 
 #executable_target_vmvx_bytecode_fb = #hal.executable.target<"vmvx", "vmvx-bytecode-fb">
-#pipeline_layout = #hal.pipeline.layout<push_constants = 4, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 4, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @unpack_outer_dynamic() attributes {hal.executable.target = #executable_target_vmvx_bytecode_fb} {
   %c131072 = arith.constant 131072 : index
@@ -175,8 +165,8 @@
   %5 = arith.index_castui %1 : i32 to index
   %6 = arith.index_castui %2 : i32 to index
   %7 = arith.index_castui %3 : i32 to index
-  %8 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?x?x32x16xi32>>{%4, %5}
-  %9 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c131072) : !flow.dispatch.tensor<writeonly:tensor<?x?xi32>>{%6, %7}
+  %8 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<?x?x32x16xi32>>{%4, %5}
+  %9 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c131072) : !flow.dispatch.tensor<writeonly:tensor<?x?xi32>>{%6, %7}
   %10 = flow.dispatch.tensor.load %8, offsets = [0, 0, 0, 0], sizes = [%4, %5, 32, 16], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<?x?x32x16xi32>>{%4, %5} -> tensor<?x?x32x16xi32>
   %11 = tensor.empty(%6, %7) : tensor<?x?xi32>
   %unpack = tensor.unpack %10 inner_dims_pos = [0, 1] inner_tiles = [32, 16] into %11 : tensor<?x?x32x16xi32> -> tensor<?x?xi32>
@@ -194,11 +184,9 @@
 // -----
 
 #executable_target_vmvx_bytecode_fb = #hal.executable.target<"vmvx", "vmvx-bytecode-fb", {ukernels = true}>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 #map = affine_map<()[s0] -> (1024 ceildiv s0)>
 #map1 = affine_map<()[s0] -> (2048 ceildiv s0)>
@@ -206,11 +194,11 @@
 func.func @elem_pack_ukernels() attributes {hal.executable.target = #executable_target_vmvx_bytecode_fb} {
   %cst = arith.constant 0.000000e+00 : f32
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1024x2048xf32>>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<1024x2048xf32>>
   %1:2 = iree_codegen.query_tile_sizes tensor<1024x2048xf32, #iree_encoding.encoding<operand_index = 0, op_type = matmul, element_types = [f32, f32, f32], original_type = tensor<1024x2048xf32>>> -> index, index
   %2 = affine.apply #map()[%1#0]
   %3 = affine.apply #map1()[%1#1]
-  %4 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<?x?x?x?xf32>>{%2, %3, %1#0, %1#1}
+  %4 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<?x?x?x?xf32>>{%2, %3, %1#0, %1#1}
   %5 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [1024, 2048], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<1024x2048xf32>> -> tensor<1024x2048xf32>
   %6 = tensor.empty() : tensor<1024x2048xf32>
   %7 = linalg.generic {indexing_maps = [#map2, #map2], iterator_types = ["parallel", "parallel"]} ins(%5 : tensor<1024x2048xf32>) outs(%6 : tensor<1024x2048xf32>) {
@@ -240,10 +228,8 @@
 // -----
 
 #executable_target_vmvx_bytecode_fb = #hal.executable.target<"vmvx", "vmvx-bytecode-fb", {ukernels = "none"}>
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 func.func @copy_cst() attributes {hal.executable.target = #executable_target_vmvx_bytecode_fb} {
   %cst = arith.constant dense<4.200000e-01> : tensor<5x19x8x4xf32>
@@ -255,7 +241,7 @@
   %4 = arith.shli %3, %c32_i64 : i64
   %5 = arith.ori %2, %4 : i64
   %6 = arith.index_castui %5 : i64 to index
-  %7 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%6) : !flow.dispatch.tensor<writeonly:tensor<5x19x8x4xf32>>
+  %7 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%6) : !flow.dispatch.tensor<writeonly:tensor<5x19x8x4xf32>>
   flow.dispatch.tensor.store %cst, %7, offsets = [0, 0, 0, 0], sizes = [5, 19, 8, 4], strides = [1, 1, 1, 1] : tensor<5x19x8x4xf32> -> !flow.dispatch.tensor<writeonly:tensor<5x19x8x4xf32>>
   return
 }
diff --git a/compiler/src/iree/compiler/Codegen/WGSL/BUILD.bazel b/compiler/src/iree/compiler/Codegen/WGSL/BUILD.bazel
index 0fb2528..70cd7d8 100644
--- a/compiler/src/iree/compiler/Codegen/WGSL/BUILD.bazel
+++ b/compiler/src/iree/compiler/Codegen/WGSL/BUILD.bazel
@@ -47,7 +47,6 @@
     name = "WGSL",
     srcs = [
         "Passes.cpp",
-        "WGSLReplacePushConstants.cpp",
     ],
     hdrs = [
         "Passes.h",
diff --git a/compiler/src/iree/compiler/Codegen/WGSL/CMakeLists.txt b/compiler/src/iree/compiler/Codegen/WGSL/CMakeLists.txt
index 5b3c46f..a2f9f66 100644
--- a/compiler/src/iree/compiler/Codegen/WGSL/CMakeLists.txt
+++ b/compiler/src/iree/compiler/Codegen/WGSL/CMakeLists.txt
@@ -44,7 +44,6 @@
     "Passes.h"
   SRCS
     "Passes.cpp"
-    "WGSLReplacePushConstants.cpp"
   DEPS
     ::PassHeaders
     ::PassesIncGen
diff --git a/compiler/src/iree/compiler/Codegen/WGSL/Passes.td b/compiler/src/iree/compiler/Codegen/WGSL/Passes.td
index 8a672dd..8746b0b 100644
--- a/compiler/src/iree/compiler/Codegen/WGSL/Passes.td
+++ b/compiler/src/iree/compiler/Codegen/WGSL/Passes.td
@@ -13,11 +13,4 @@
 // WGSL passes (keep alphabetical)
 //===---------------------------------------------------------------------===//
 
-def WGSLReplacePushConstantsPass :
-    InterfacePass<"iree-wgsl-replace-push-constants", "mlir::FunctionOpInterface"> {
-  let summary =
-      "Replaces push constant loads with binding loads for when using "
-      "WGSL without push constant support";
-}
-
 #endif // IREE_CODEGEN_WGSL_PASSES
diff --git a/compiler/src/iree/compiler/Codegen/WGSL/WGSLReplacePushConstants.cpp b/compiler/src/iree/compiler/Codegen/WGSL/WGSLReplacePushConstants.cpp
deleted file mode 100644
index 378a560..0000000
--- a/compiler/src/iree/compiler/Codegen/WGSL/WGSLReplacePushConstants.cpp
+++ /dev/null
@@ -1,200 +0,0 @@
-// Copyright 2022 The IREE Authors
-//
-// Licensed under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-
-#include "iree/compiler/Codegen/WGSL/Passes.h"
-#include "iree/compiler/Dialect/Flow/IR/FlowOps.h"
-#include "iree/compiler/Dialect/HAL/IR/HALDialect.h"
-#include "iree/compiler/Dialect/HAL/IR/HALOps.h"
-#include "mlir/Dialect/Arith/IR/Arith.h"
-#include "mlir/Dialect/Tensor/IR/Tensor.h"
-#include "mlir/Dialect/Vector/IR/VectorOps.h"
-#include "mlir/IR/Attributes.h"
-#include "mlir/IR/Builders.h"
-#include "mlir/Pass/Pass.h"
-
-namespace mlir::iree_compiler {
-
-#define GEN_PASS_DEF_WGSLREPLACEPUSHCONSTANTSPASS
-#include "iree/compiler/Codegen/WGSL/Passes.h.inc"
-
-namespace {
-
-// These must match what the runtime uses.
-#define IREE_HAL_WEBGPU_PARAMS_BIND_GROUP_INDEX 3
-#define IREE_HAL_WEBGPU_PARAMS_BINDING_INDEX 0
-
-static Value convertOpTypeFromI32(IREE::HAL::InterfaceConstantLoadOp loadOp,
-                                  vector::ExtractElementOp extractElementOp) {
-  OpBuilder builder(loadOp);
-
-  auto loc = loadOp.getLoc();
-  auto opType = loadOp.getType();
-
-  // Index
-  if (opType.isIndex()) {
-    return builder.create<arith::IndexCastOp>(loc, opType, extractElementOp);
-  }
-
-  unsigned sourceBitWidth = 32;
-  unsigned destBitWidth = IREE::Util::getTypeBitWidth(opType);
-
-  // AnySignlessInteger
-  if (llvm::isa<IntegerType>(opType)) {
-    if (sourceBitWidth > destBitWidth) {
-      return builder.create<arith::TruncIOp>(loc, opType, extractElementOp);
-    } else if (sourceBitWidth < destBitWidth) {
-      return builder.create<arith::ExtUIOp>(loc, opType, extractElementOp);
-    } else {
-      return extractElementOp.getResult();
-    }
-  }
-
-  // AnyFloat
-  Value resizedValue = extractElementOp.getResult();
-  if (sourceBitWidth > destBitWidth) {
-    return builder.create<arith::TruncFOp>(loc, opType, resizedValue);
-  } else if (sourceBitWidth < destBitWidth) {
-    return builder.create<arith::ExtFOp>(loc, opType, resizedValue);
-  }
-  return builder.create<arith::BitcastOp>(loc, opType, resizedValue);
-}
-
-static void replaceConstantLoadOp(IREE::Flow::DispatchTensorLoadOp loadOp,
-                                  IREE::HAL::InterfaceConstantLoadOp op) {
-  OpBuilder builder(op);
-
-  // tensor.extract -> vector<4xi32>
-  uint64_t vec4Index = op.getOrdinal().getZExtValue() / 4;
-  auto tensorOffsetValue =
-      builder.createOrFold<arith::ConstantIndexOp>(op.getLoc(), vec4Index);
-  auto tensorExtractOp = builder.createOrFold<tensor::ExtractOp>(
-      op.getLoc(), loadOp, tensorOffsetValue);
-
-  // vector<4xi32> -> i32
-  uint64_t elementIndex = op.getOrdinal().getZExtValue() % 4;
-  auto vectorOffsetValue =
-      builder.createOrFold<arith::ConstantIndexOp>(op.getLoc(), elementIndex);
-  auto vectorExtractElementOp = builder.create<vector::ExtractElementOp>(
-      op.getLoc(), tensorExtractOp, vectorOffsetValue);
-
-  // i32 -> original type
-  auto convertedTypeResult = convertOpTypeFromI32(op, vectorExtractElementOp);
-  op.replaceAllUsesWith(convertedTypeResult);
-
-  op.erase();
-}
-
-// Adds set 3 with the emulated push descriptor binding, if needed.
-static IREE::HAL::PipelineLayoutAttr
-addSet3IfNeeded(IREE::HAL::PipelineLayoutAttr originalAttr) {
-  for (auto setLayoutAttr : originalAttr.getSetLayouts()) {
-    if (setLayoutAttr.getOrdinal() == 3) {
-      return originalAttr;
-    }
-  }
-  SmallVector<IREE::HAL::DescriptorSetLayoutAttr> setLayoutAttrs(
-      originalAttr.getSetLayouts());
-  SmallVector<IREE::HAL::DescriptorSetBindingAttr> bindingAttrs;
-  bindingAttrs.push_back(IREE::HAL::DescriptorSetBindingAttr::get(
-      originalAttr.getContext(), 0, IREE::HAL::DescriptorType::UniformBuffer,
-      IREE::HAL::DescriptorFlags::None));
-  setLayoutAttrs.push_back(IREE::HAL::DescriptorSetLayoutAttr::get(
-      originalAttr.getContext(), 3, bindingAttrs, std::nullopt));
-  return IREE::HAL::PipelineLayoutAttr::get(originalAttr.getContext(),
-                                            originalAttr.getPushConstants(),
-                                            setLayoutAttrs);
-}
-
-class WGSLReplacePushConstantsPass final
-    : public impl::WGSLReplacePushConstantsPassBase<
-          WGSLReplacePushConstantsPass> {
-  void getDependentDialects(DialectRegistry &registry) const override {
-    registry.insert<mlir::arith::ArithDialect, mlir::tensor::TensorDialect,
-                    mlir::vector::VectorDialect, IREE::Flow::FlowDialect,
-                    IREE::HAL::HALDialect>();
-  }
-
-  void runOnOperation() override {
-    auto funcOp = getOperation();
-    auto loc = funcOp.getLoc();
-    auto constantLoadOps = llvm::to_vector(
-        funcOp.getFunctionBody().getOps<IREE::HAL::InterfaceConstantLoadOp>());
-    if (constantLoadOps.empty())
-      return;
-
-    OpBuilder builder(funcOp);
-    builder.setInsertionPointToStart(&funcOp.getBlocks().front());
-
-    // Group all push constants into a single `hal.interface.binding.subspan`
-    // and load from it once using `flow.dispatch.tensor.load`, then extract
-    // individual push constants with `tensor.extract`.
-
-    // Get the pipeline layout from the first constant load. It should be
-    // uniform across all constants. Add set 3 so we can use it.
-    IREE::HAL::PipelineLayoutAttr layoutAttr =
-        addSet3IfNeeded(constantLoadOps.front().getLayout());
-
-    // Inspect the alignment values. These are just hints, so if all are equal
-    // then use the value, otherwise drop the alignment hint.
-    SmallVector<uint64_t> alignmentValues;
-    bool missingAlignmentValue = false;
-    for (auto constantLoadOp : constantLoadOps) {
-      auto alignmentAttr = constantLoadOp.getAlignmentAttr();
-      if (alignmentAttr) {
-        uint64_t alignmentValue = alignmentAttr.getValue().getZExtValue();
-        alignmentValues.push_back(alignmentValue);
-      } else {
-        missingAlignmentValue = true;
-      }
-    }
-    mlir::IntegerAttr alignmentAttr = nullptr;
-    // TODO(scotttodd): try llvm::all_equal with attrs directly
-    if (!missingAlignmentValue && llvm::all_equal(alignmentValues)) {
-      alignmentAttr = constantLoadOps[0].getAlignmentAttr();
-    }
-
-    // We could store into a tensor<Nxi32>, but vec4s are better supported, so
-    // we'll use tensor<Nxvector<4xi32>> instead.
-    // Compute how many vec4s to use, i.e.
-    //   max index 0 -> 1 vec4
-    //   max index 3 -> 1 vec4
-    //   max index 4 -> 2 vec4s
-    int64_t numberOfVec4s = layoutAttr.getPushConstants() / 4 + 1;
-
-    // hal.interface.binding.subspan ->
-    // !flow.dispatch.tensor<readonly:tensor<Nxvector<4xi32>>>
-    //   * Group all push constants into a single tensor<Nxvector<4xi32>>
-    //   * If individual data types differ, they'll be bitcast when extracted
-    auto v4i32Type = VectorType::get({4}, builder.getI32Type());
-    auto dispatchTensorType = IREE::Flow::DispatchTensorType::get(
-        IREE::Flow::TensorAccess::ReadOnly,
-        {static_cast<int64_t>(numberOfVec4s)}, v4i32Type);
-    SmallVector<Value> dynamicDims;
-    // Note: we're ignoring all potential 'values' hints (if provided) on ops -
-    // InterfaceBindingSubspanOp has no matching concept and we assume that any
-    // analysis using the hint should have been performed by earlier passes.
-    auto zero = builder.create<arith::ConstantIndexOp>(loc, 0);
-    auto subspanOp = builder.create<IREE::HAL::InterfaceBindingSubspanOp>(
-        loc, dispatchTensorType, layoutAttr,
-        /*set=*/APInt(64, IREE_HAL_WEBGPU_PARAMS_BIND_GROUP_INDEX),
-        /*binding=*/APInt(64, IREE_HAL_WEBGPU_PARAMS_BINDING_INDEX),
-        /*byte_offset=*/zero, dynamicDims, alignmentAttr, nullptr);
-
-    // flow.dispatch.tensor.load -> tensor<Nxvector<4xi32>>
-    auto tensorType =
-        RankedTensorType::get({(int64_t)numberOfVec4s}, v4i32Type);
-    auto loadOp = builder.create<IREE::Flow::DispatchTensorLoadOp>(
-        loc, tensorType, subspanOp, dynamicDims);
-
-    // The grouped subspan and load are complete - now extract each constant.
-    for (auto constantLoadOp : constantLoadOps) {
-      replaceConstantLoadOp(loadOp, constantLoadOp);
-    }
-  }
-};
-
-} // namespace
-} // namespace mlir::iree_compiler
diff --git a/compiler/src/iree/compiler/Codegen/WGSL/test/BUILD.bazel b/compiler/src/iree/compiler/Codegen/WGSL/test/BUILD.bazel
index 669aae9..26b0d8f 100644
--- a/compiler/src/iree/compiler/Codegen/WGSL/test/BUILD.bazel
+++ b/compiler/src/iree/compiler/Codegen/WGSL/test/BUILD.bazel
@@ -16,7 +16,6 @@
     name = "lit",
     srcs = enforce_glob(
         [
-            "replace_push_constants.mlir",
         ],
         include = ["*.mlir"],
     ),
diff --git a/compiler/src/iree/compiler/Codegen/WGSL/test/CMakeLists.txt b/compiler/src/iree/compiler/Codegen/WGSL/test/CMakeLists.txt
index 3b78b10..e65bcd6 100644
--- a/compiler/src/iree/compiler/Codegen/WGSL/test/CMakeLists.txt
+++ b/compiler/src/iree/compiler/Codegen/WGSL/test/CMakeLists.txt
@@ -13,8 +13,6 @@
 iree_lit_test_suite(
   NAME
     lit
-  SRCS
-    "replace_push_constants.mlir"
   TOOLS
     FileCheck
     iree-opt
diff --git a/compiler/src/iree/compiler/Codegen/WGSL/test/replace_push_constants.mlir b/compiler/src/iree/compiler/Codegen/WGSL/test/replace_push_constants.mlir
deleted file mode 100644
index 15b2328..0000000
--- a/compiler/src/iree/compiler/Codegen/WGSL/test/replace_push_constants.mlir
+++ /dev/null
@@ -1,180 +0,0 @@
-// RUN: iree-opt --split-input-file --pass-pipeline="builtin.module(func.func(iree-wgsl-replace-push-constants))" %s | FileCheck %s
-
-// CHECK-LABEL: @emptyFunctionNoOp
-func.func @emptyFunctionNoOp() {
-  // CHECK-NEXT: return
-  return
-}
-
-// -----
-
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
-]>
-
-// CHECK-LABEL: @constantLoadIndex
-func.func @constantLoadIndex() {
-  // CHECK: %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(3) binding(0) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1xvector<4xi32>>>
-  // CHECK: %[[LOAD:.+]] = flow.dispatch.tensor.load %[[SUBSPAN]], offsets = [0], sizes = [1], strides = [1] : !flow.dispatch.tensor<readonly:tensor<1xvector<4xi32>>> -> tensor<1xvector<4xi32>>
-  // CHECK: %[[TENSOR_EXTRACT:.+]] = tensor.extract %[[LOAD]][%c0{{.*}}] : tensor<1xvector<4xi32>>
-  // CHECK: %[[VECTOR_EXTRACT:.+]] = vector.extractelement %[[TENSOR_EXTRACT]][%c0{{.*}}] : vector<4xi32>
-  // CHECK: %[[CAST:.+]] = arith.index_cast %[[VECTOR_EXTRACT]] : i32 to index
-  %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : index
-  // CHECK: = arith.index_cast %[[CAST]] : index to i32
-  %1 = arith.index_cast %0 : index to i32
-  return
-}
-
-// -----
-
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
-]>
-
-// CHECK-LABEL: @constantLoadI32
-func.func @constantLoadI32() {
-  // CHECK: %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(3) binding(0) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1xvector<4xi32>>>
-  // CHECK: %[[LOAD:.+]] = flow.dispatch.tensor.load %[[SUBSPAN]], offsets = [0], sizes = [1], strides = [1] : !flow.dispatch.tensor<readonly:tensor<1xvector<4xi32>>> -> tensor<1xvector<4xi32>>
-  // CHECK: %[[TENSOR_EXTRACT:.+]] = tensor.extract %[[LOAD]][%c0{{.*}}] : tensor<1xvector<4xi32>>
-  // CHECK: %[[VECTOR_EXTRACT:.+]] = vector.extractelement %[[TENSOR_EXTRACT]][%c0{{.*}}] : vector<4xi32>
-  %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : i32
-  // CHECK: = math.absi %[[VECTOR_EXTRACT]] : i32
-  %1 = math.absi %0 : i32
-  return
-}
-
-// -----
-
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
-]>
-
-// CHECK-LABEL: @constantLoadI16
-func.func @constantLoadI16() {
-  // CHECK: %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(3) binding(0) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1xvector<4xi32>>>
-  // CHECK: %[[LOAD:.+]] = flow.dispatch.tensor.load %[[SUBSPAN]], offsets = [0], sizes = [1], strides = [1] : !flow.dispatch.tensor<readonly:tensor<1xvector<4xi32>>> -> tensor<1xvector<4xi32>>
-  // CHECK: %[[TENSOR_EXTRACT:.+]] = tensor.extract %[[LOAD]][%c0{{.*}}] : tensor<1xvector<4xi32>>
-  // CHECK: %[[VECTOR_EXTRACT:.+]] = vector.extractelement %[[TENSOR_EXTRACT]][%c0{{.*}}] : vector<4xi32>
-  // CHECK: %[[TRUNC:.+]] = arith.trunci %[[VECTOR_EXTRACT]] : i32 to i16
-  %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : i16
-  // CHECK: = math.absi %[[TRUNC]] : i16
-  %1 = math.absi %0 : i16
-  return
-}
-
-// -----
-
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
-]>
-
-// CHECK-LABEL: @constantLoadF32
-func.func @constantLoadF32() {
-  // CHECK: %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(3) binding(0) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<1xvector<4xi32>>>
-  // CHECK: %[[LOAD:.+]] = flow.dispatch.tensor.load %[[SUBSPAN]], offsets = [0], sizes = [1], strides = [1] : !flow.dispatch.tensor<readonly:tensor<1xvector<4xi32>>> -> tensor<1xvector<4xi32>>
-  // CHECK: %[[TENSOR_EXTRACT:.+]] = tensor.extract %[[LOAD]][%c0{{.*}}] : tensor<1xvector<4xi32>>
-  // CHECK: %[[VECTOR_EXTRACT:.+]] = vector.extractelement %[[TENSOR_EXTRACT]][%c0{{.*}}] : vector<4xi32>
-  // CHECK: %[[CAST:.+]] = arith.bitcast %[[VECTOR_EXTRACT]] : i32 to f32
-  %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : f32
-  // CHECK: = math.absf %[[CAST]] : f32
-  %1 = math.absf %0 : f32
-  return
-}
-
-// -----
-
-#pipeline_layout = #hal.pipeline.layout<push_constants = 6, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
-]>
-
-// CHECK-LABEL: @constantLoadWithIndexAndAlignment
-func.func @constantLoadWithIndexAndAlignment() {
-  // CHECK: %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(3) binding(0) alignment(16) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2xvector<4xi32>>
-  // CHECK: %[[LOAD:.+]] = flow.dispatch.tensor.load %[[SUBSPAN]], offsets = [0], sizes = [2], strides = [1] : !flow.dispatch.tensor<readonly:tensor<2xvector<4xi32>>> -> tensor<2xvector<4xi32>>
-  // CHECK: %[[TENSOR_EXTRACT:.+]] = tensor.extract %[[LOAD]][%c1{{.*}}] : tensor<2xvector<4xi32>>
-  // CHECK: %[[VECTOR_EXTRACT:.+]] = vector.extractelement %[[TENSOR_EXTRACT]][%c1{{.*}}] : vector<4xi32>
-  // CHECK: %[[CAST:.+]] = arith.index_cast %[[VECTOR_EXTRACT]] : i32 to index
-  %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(5) alignment(16) : index
-  // CHECK: = arith.index_cast %[[CAST]] : index to i32
-  %1 = arith.index_cast %0 : index to i32
-  return
-}
-
-// -----
-
-#pipeline_layout = #hal.pipeline.layout<push_constants = 9, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
-]>
-
-// CHECK-LABEL: @constantLoadMultiple
-func.func @constantLoadMultiple() {
-  // CHECK: %[[SUBSPAN:.+]] = hal.interface.binding.subspan layout({{.+}}) set(3) binding(0) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<3xvector<4xi32>>>
-  // CHECK: %[[LOAD:.+]] = flow.dispatch.tensor.load %[[SUBSPAN]], offsets = [0], sizes = [3], strides = [1] : !flow.dispatch.tensor<readonly:tensor<3xvector<4xi32>>> -> tensor<3xvector<4xi32>>
-
-  // Extracting 8 i32s from tensor<3xvector<4xi32>:
-  //   [0 1 2 3][4 5 6 7][8 9 10 11]
-  //    ^-----------------^
-  // 0-3 use the first vec4 (tensor extract 0 then vector extract 0-3)
-  // 4-7 use the second vec4 (tensor extract 1 then vector extract 0-3)
-  // 8 uses the third vec4 (tensor extract 2 then vector extract 0)
-
-  // CHECK: %[[TENSOR_EXTRACT_0:.+]] = tensor.extract %[[LOAD]][%c0{{.*}}] : tensor<3xvector<4xi32>>
-  // CHECK: %[[VECTOR_EXTRACT_0:.+]] = vector.extractelement %[[TENSOR_EXTRACT_0]][%c0{{.*}}] : vector<4xi32>
-  %0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : i32
-  // CHECK: %[[TENSOR_EXTRACT_1:.+]] = tensor.extract %[[LOAD]][%c0{{.*}}] : tensor<3xvector<4xi32>>
-  // CHECK: %[[VECTOR_EXTRACT_1:.+]] = vector.extractelement %[[TENSOR_EXTRACT_1]][%c1{{.*}}] : vector<4xi32>
-  %1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : i32
-  // CHECK: %[[TENSOR_EXTRACT_2:.+]] = tensor.extract %[[LOAD]][%c0{{.*}}] : tensor<3xvector<4xi32>>
-  // CHECK: %[[VECTOR_EXTRACT_2:.+]] = vector.extractelement %[[TENSOR_EXTRACT_2]][%c2{{.*}}] : vector<4xi32>
-  %2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : i32
-  // CHECK: %[[TENSOR_EXTRACT_3:.+]] = tensor.extract %[[LOAD]][%c0{{.*}}] : tensor<3xvector<4xi32>>
-  // CHECK: %[[VECTOR_EXTRACT_3:.+]] = vector.extractelement %[[TENSOR_EXTRACT_3]][%c3{{.*}}] : vector<4xi32>
-  %3 = hal.interface.constant.load layout(#pipeline_layout) ordinal(3) : i32
-  // CHECK: %[[TENSOR_EXTRACT_4:.+]] = tensor.extract %[[LOAD]][%c1{{.*}}] : tensor<3xvector<4xi32>>
-  // CHECK: %[[VECTOR_EXTRACT_4:.+]] = vector.extractelement %[[TENSOR_EXTRACT_4]][%c0{{.*}}] : vector<4xi32>
-  %4 = hal.interface.constant.load layout(#pipeline_layout) ordinal(4) : i32
-  // CHECK: %[[TENSOR_EXTRACT_5:.+]] = tensor.extract %[[LOAD]][%c1{{.*}}] : tensor<3xvector<4xi32>>
-  // CHECK: %[[VECTOR_EXTRACT_5:.+]] = vector.extractelement %[[TENSOR_EXTRACT_5]][%c1{{.*}}] : vector<4xi32>
-  %5 = hal.interface.constant.load layout(#pipeline_layout) ordinal(5) : i32
-  // CHECK: %[[TENSOR_EXTRACT_6:.+]] = tensor.extract %[[LOAD]][%c1{{.*}}] : tensor<3xvector<4xi32>>
-  // CHECK: %[[VECTOR_EXTRACT_6:.+]] = vector.extractelement %[[TENSOR_EXTRACT_6]][%c2{{.*}}] : vector<4xi32>
-  %6 = hal.interface.constant.load layout(#pipeline_layout) ordinal(6) : i32
-  // CHECK: %[[TENSOR_EXTRACT_7:.+]] = tensor.extract %[[LOAD]][%c1{{.*}}] : tensor<3xvector<4xi32>>
-  // CHECK: %[[VECTOR_EXTRACT_7:.+]] = vector.extractelement %[[TENSOR_EXTRACT_7]][%c3{{.*}}] : vector<4xi32>
-  %7 = hal.interface.constant.load layout(#pipeline_layout) ordinal(7) : i32
-  // CHECK: %[[TENSOR_EXTRACT_8:.+]] = tensor.extract %[[LOAD]][%c2{{.*}}] : tensor<3xvector<4xi32>>
-  // CHECK: %[[VECTOR_EXTRACT_8:.+]] = vector.extractelement %[[TENSOR_EXTRACT_8]][%c0{{.*}}] : vector<4xi32>
-  %8 = hal.interface.constant.load layout(#pipeline_layout) ordinal(8) : i32
-
-  // CHECK: = math.absi %[[VECTOR_EXTRACT_0]] : i32
-  %abs_0 = math.absi %0 : i32
-  // CHECK: = math.absi %[[VECTOR_EXTRACT_1]] : i32
-  %abs_1 = math.absi %1 : i32
-  // CHECK: = math.absi %[[VECTOR_EXTRACT_2]] : i32
-  %abs_2 = math.absi %2 : i32
-  // CHECK: = math.absi %[[VECTOR_EXTRACT_3]] : i32
-  %abs_3 = math.absi %3 : i32
-  // CHECK: = math.absi %[[VECTOR_EXTRACT_4]] : i32
-  %abs_4 = math.absi %4 : i32
-  // CHECK: = math.absi %[[VECTOR_EXTRACT_5]] : i32
-  %abs_5 = math.absi %5 : i32
-  // CHECK: = math.absi %[[VECTOR_EXTRACT_6]] : i32
-  %abs_6 = math.absi %6 : i32
-  // CHECK: = math.absi %[[VECTOR_EXTRACT_7]] : i32
-  %abs_7 = math.absi %7 : i32
-  // CHECK: = math.absi %[[VECTOR_EXTRACT_8]] : i32
-  %abs_8 = math.absi %8 : i32
-  return
-}
diff --git a/compiler/src/iree/compiler/Dialect/Flow/Transforms/OutlineDispatchExterns.cpp b/compiler/src/iree/compiler/Dialect/Flow/Transforms/OutlineDispatchExterns.cpp
index 41acaf2..751f784 100644
--- a/compiler/src/iree/compiler/Dialect/Flow/Transforms/OutlineDispatchExterns.cpp
+++ b/compiler/src/iree/compiler/Dialect/Flow/Transforms/OutlineDispatchExterns.cpp
@@ -110,9 +110,6 @@
         dispatchExternOp.getSubgroupSizeAttr(),
         dispatchExternOp.getWorkgroupLocalMemoryAttr());
     exportOp->setDialectAttrs(dispatchExternOp->getDialectAttrs());
-    if (auto bindingsAttr = dispatchExternOp.getBindingsAttr()) {
-      exportOp->setAttr("hal.interface.bindings", bindingsAttr);
-    }
     if (!dispatchExternOp.getWorkgroupCount().empty()) {
       IRMapping mapper;
       dispatchExternOp.getWorkgroupCount().cloneInto(
diff --git a/compiler/src/iree/compiler/Dialect/Flow/Transforms/test/deduplicate_executables.mlir b/compiler/src/iree/compiler/Dialect/Flow/Transforms/test/deduplicate_executables.mlir
index 8ce0a36..3d25684 100644
--- a/compiler/src/iree/compiler/Dialect/Flow/Transforms/test/deduplicate_executables.mlir
+++ b/compiler/src/iree/compiler/Dialect/Flow/Transforms/test/deduplicate_executables.mlir
@@ -386,11 +386,9 @@
       hal.return %selected : i1
     }
     hal.executable.export public @dispatch ordinal(0)
-        layout(#hal.pipeline.layout<push_constants = 0, sets = [
-          <0, bindings = [
-              <0, storage_buffer, ReadOnly>,
-              <1, storage_buffer>
-          ]>
+        layout(#hal.pipeline.layout<bindings = [
+          #hal.pipeline.binding<storage_buffer, ReadOnly>,
+          #hal.pipeline.binding<storage_buffer>
         ]>) {
     ^bb0(%device: !hal.device, %workload: index):
       hal.return %workload, %workload, %workload : index, index, index
@@ -405,11 +403,9 @@
       hal.return %selected : i1
     }
     hal.executable.export public @dispatch ordinal(0)
-        layout(#hal.pipeline.layout<push_constants = 0, sets = [
-          <0, bindings = [
-              <0, storage_buffer, ReadOnly>,
-              <1, storage_buffer>
-          ]>
+        layout(#hal.pipeline.layout<bindings = [
+          #hal.pipeline.binding<storage_buffer, ReadOnly>,
+          #hal.pipeline.binding<storage_buffer>
         ]>) {
     ^bb0(%device: !hal.device, %workload: index):
       hal.return %workload, %workload, %workload : index, index, index
diff --git a/compiler/src/iree/compiler/Dialect/Flow/Transforms/test/outline_dispatch_externs.mlir b/compiler/src/iree/compiler/Dialect/Flow/Transforms/test/outline_dispatch_externs.mlir
index 4e73fe4..d4ff9aa 100644
--- a/compiler/src/iree/compiler/Dialect/Flow/Transforms/test/outline_dispatch_externs.mlir
+++ b/compiler/src/iree/compiler/Dialect/Flow/Transforms/test/outline_dispatch_externs.mlir
@@ -4,8 +4,7 @@
 // CHECK-NEXT:   hal.executable.variant public @a target(<"llvm-cpu", "a">)
 // CHECK-SAME:       objects([#hal.executable.object<{path = "a.o"}>])
 // CHECK-NEXT:     hal.executable.export public @main ordinal(100)
-// CHECK-SAME:         layout(#hal.pipeline.layout<push_constants = 1, sets = [<0, bindings = [<0, storage_buffer, ReadOnly>, <1, storage_buffer>]>]>)
-// CHECK-SAME:         hal.interface.bindings = [#hal.interface.binding<0, 0>, #hal.interface.binding<0, 1>]
+// CHECK-SAME:         layout(#hal.pipeline.layout<constants = 1, bindings = [#hal.pipeline.binding<storage_buffer, ReadOnly>, #hal.pipeline.binding<storage_buffer>]>)
 // CHECK-NEXT:     ^bb0(%arg0: !hal.device, %arg1: index, %arg2: index):
 // CHECK-NEXT:       %ok, %value = hal.device.query<%arg0 : !hal.device> key("some" :: "value") : i1, i32
 // CHECK-NEXT:       %0 = arith.index_cast %value : i32 to index
@@ -16,8 +15,7 @@
 // CHECK-NEXT:       %ok, %value = hal.device.query<%arg0 : !hal.device> key("some" :: "feature") : i1, i32
 // CHECK-NEXT:       hal.return %ok : i1
 //      CHECK:     hal.executable.export public @main ordinal(200)
-// CHECK-SAME:         layout(#hal.pipeline.layout<push_constants = 1, sets = [<0, bindings = [<0, storage_buffer, ReadOnly>, <1, storage_buffer>]>]>)
-// CHECK-SAME:         hal.interface.bindings = [#hal.interface.binding<0, 0>, #hal.interface.binding<0, 1>]
+// CHECK-SAME:         layout(#hal.pipeline.layout<constants = 1, bindings = [#hal.pipeline.binding<storage_buffer, ReadOnly>, #hal.pipeline.binding<storage_buffer>]>)
 // CHECK-NEXT:     ^bb0(%arg0: !hal.device, %arg1: index, %arg2: index):
 
 // Demonstrates the full functionality of an extern dispatch op.
@@ -41,17 +39,10 @@
       hal.return %x_capture, %y_capture, %z : index, index, index
     }
     // Must match the external definition.
-    layout(#hal.pipeline.layout<push_constants = 1, sets = [
-      <0, bindings = [
-          <0, storage_buffer, ReadOnly>,
-          <1, storage_buffer>
-      ]>
+    layout(#hal.pipeline.layout<constants = 1, bindings = [
+      #hal.pipeline.binding<storage_buffer, ReadOnly>,
+      #hal.pipeline.binding<storage_buffer>
     ]>)
-    // Optional, automatically inferred if omitted.
-    bindings([
-      #hal.interface.binding<0, 0>,
-      #hal.interface.binding<0, 1>
-    ])
     // Can have object references for multiple targets or configurations.
     objects({
       #hal.executable.target<"llvm-cpu", "a"> ordinal(100) = [#hal.executable.object<{path = "a.o"}>],
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Analysis/BindingLayout.cpp b/compiler/src/iree/compiler/Dialect/HAL/Analysis/BindingLayout.cpp
index 39b5ec8..14790bf 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Analysis/BindingLayout.cpp
+++ b/compiler/src/iree/compiler/Dialect/HAL/Analysis/BindingLayout.cpp
@@ -19,20 +19,15 @@
 
 void PipelineLayout::print(llvm::raw_ostream &os) const {
   os << "PipelineLayout:\n";
-  os << "  push constants: " << pushConstantCount << "\n";
-  os << "  sets:\n";
-  for (auto &setLayout : setLayouts) {
-    os << "    set[" << setLayout.ordinal
-       << "]: " << stringifyDescriptorSetLayoutFlags(setLayout.flags) << "\n";
-    for (auto &binding : setLayout.bindings) {
-      os << "      binding[" << binding.ordinal
-         << "]: " << stringifyDescriptorType(binding.type) << "\n";
-    }
+  os << "  constants: " << constantCount << "\n";
+  os << "  bindings:\n";
+  for (auto &binding : bindings) {
+    os << "    binding[" << binding.ordinal
+       << "]: " << stringifyDescriptorType(binding.type) << "\n";
   }
   os << "  resource map:\n";
-  for (auto setBinding : llvm::enumerate(resourceMap)) {
-    os << "    resource[" << setBinding.index() << "]: set "
-       << setBinding.value().first << " binding " << setBinding.value().second
+  for (auto ordinal : llvm::enumerate(resourceMap)) {
+    os << "    resource[" << ordinal.index() << "]: binding " << ordinal.value()
        << "\n";
   }
 }
@@ -41,33 +36,18 @@
 static PipelineLayout
 assumeExportLayout(IREE::HAL::PipelineLayoutAttr layoutAttr) {
   PipelineLayout pipelineLayout;
-  pipelineLayout.pushConstantCount = layoutAttr.getPushConstants();
+  pipelineLayout.constantCount = layoutAttr.getConstants();
 
-  auto setLayoutAttrs = layoutAttr.getSetLayouts();
-  int64_t bindingCount = 0;
-  for (auto setLayoutAttr : setLayoutAttrs) {
-    bindingCount += setLayoutAttr.getBindings().size();
-  }
-
-  pipelineLayout.setLayouts.resize(setLayoutAttrs.size());
+  size_t bindingCount = layoutAttr.getBindings().size();
+  pipelineLayout.bindings.resize(bindingCount);
   pipelineLayout.resourceMap.resize(bindingCount);
-  for (auto setLayoutAttr : setLayoutAttrs) {
-    DescriptorSetLayout setLayout;
-    setLayout.ordinal = setLayoutAttr.getOrdinal();
-    setLayout.flags = setLayoutAttr.getFlags().value_or(
-        IREE::HAL::DescriptorSetLayoutFlags::None);
-    auto bindingAttrs = setLayoutAttr.getBindings();
-    setLayout.bindings.resize(bindingAttrs.size());
-    for (auto bindingAttr : bindingAttrs) {
-      DescriptorSetLayoutBinding setBinding;
-      setBinding.ordinal = bindingAttr.getOrdinal();
-      setBinding.type = bindingAttr.getType();
-      setBinding.flags = bindingAttr.getFlags();
-      setLayout.bindings[setBinding.ordinal] = setBinding;
-      pipelineLayout.resourceMap.emplace_back(setLayout.ordinal,
-                                              setBinding.ordinal);
-    }
-    pipelineLayout.setLayouts[setLayout.ordinal] = setLayout;
+  for (auto [i, bindingAttr] : llvm::enumerate(layoutAttr.getBindings())) {
+    PipelineLayoutBinding binding;
+    binding.ordinal = i;
+    binding.type = bindingAttr.getType();
+    binding.flags = bindingAttr.getFlags();
+    pipelineLayout.bindings[binding.ordinal] = binding;
+    pipelineLayout.resourceMap[i] = binding.ordinal;
   }
 
   return pipelineLayout;
@@ -174,30 +154,25 @@
   }
 
   PipelineLayout pipelineLayout;
-  pipelineLayout.pushConstantCount = operandCount;
+  pipelineLayout.constantCount = operandCount;
   pipelineLayout.resourceMap.resize(bindingCount);
 
-  // TODO(#18154): simplify binding setup.
-  DescriptorSetLayout setLayout;
-  setLayout.ordinal = 0;
-  setLayout.flags = IREE::HAL::DescriptorSetLayoutFlags::None;
-  setLayout.bindings.reserve(bindingCount);
+  IREE::HAL::PipelineLayoutFlags layoutFlags =
+      IREE::HAL::PipelineLayoutFlags::None;
   for (unsigned i = 0; i < bindingCount; ++i) {
     const auto &descriptorInfo = descriptorInfos[i];
     if (allEnumBitsSet(descriptorInfo.flags,
                        IREE::HAL::DescriptorFlags::Indirect)) {
-      setLayout.flags =
-          setLayout.flags | IREE::HAL::DescriptorSetLayoutFlags::Indirect;
+      layoutFlags = layoutFlags | IREE::HAL::PipelineLayoutFlags::Indirect;
     }
-    DescriptorSetLayoutBinding setBinding;
-    setBinding.ordinal = setLayout.bindings.size();
-    setBinding.type = IREE::HAL::DescriptorType::StorageBuffer;
-    setBinding.flags = descriptorInfo.flags;
-    setLayout.bindings.push_back(setBinding);
-    pipelineLayout.resourceMap[i] =
-        std::make_pair(setLayout.ordinal, setBinding.ordinal);
+    PipelineLayoutBinding binding;
+    binding.ordinal = i;
+    binding.type = IREE::HAL::DescriptorType::StorageBuffer;
+    binding.flags = descriptorInfo.flags;
+    pipelineLayout.bindings.push_back(binding);
+    pipelineLayout.resourceMap[i] = binding.ordinal;
   }
-  pipelineLayout.setLayouts.push_back(setLayout);
+  pipelineLayout.flags = layoutFlags;
 
   LLVM_DEBUG({
     auto executableOp = exportOp->getParentOfType<IREE::Stream::ExecutableOp>();
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Analysis/BindingLayout.h b/compiler/src/iree/compiler/Dialect/HAL/Analysis/BindingLayout.h
index 7d08959..50f17d9 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Analysis/BindingLayout.h
+++ b/compiler/src/iree/compiler/Dialect/HAL/Analysis/BindingLayout.h
@@ -16,35 +16,28 @@
 
 namespace mlir::iree_compiler::IREE::HAL {
 
-struct DescriptorSetLayoutBinding {
+struct PipelineLayoutBinding {
   // Ordinal of the descriptor within its parent set layout.
-  unsigned ordinal;
+  unsigned ordinal = 0;
   // Storage type of the descriptor resource.
-  IREE::HAL::DescriptorType type;
+  IREE::HAL::DescriptorType type = IREE::HAL::DescriptorType::StorageBuffer;
   // Flags defining how the descriptor behaves.
-  IREE::HAL::DescriptorFlags flags;
+  IREE::HAL::DescriptorFlags flags = IREE::HAL::DescriptorFlags::None;
 };
 
-struct DescriptorSetLayout {
-  // Ordinal of the set within the parent pipeline layout.
-  unsigned ordinal;
-  // Usage of the descriptor set (such as whether it is persistent or push).
-  IREE::HAL::DescriptorSetLayoutFlags flags;
-  // Bindings within the layout. Ordinals may be sparse.
-  SmallVector<DescriptorSetLayoutBinding> bindings;
-};
-
-using PipelineResourceMap = SmallVector<std::pair<unsigned, unsigned>>;
+using PipelineResourceMap = SmallVector<unsigned>;
 
 struct PipelineLayout {
   // Total number of 32-bit push constants allocated. Not all dispatchable
   // functions using this layout will use all constants.
-  int64_t pushConstantCount;
-  // Sets bound in the layout. Ordinals may be sparse.
-  SmallVector<DescriptorSetLayout> setLayouts;
-  // Mapping of flattened source resource bindings into the descriptor sets.
-  // Matches 1:1 with the IREE::Stream::CmdDispatchOp::resources.
+  int64_t constantCount;
+  // Bindings within the layout. Ordinals may be sparse.
+  SmallVector<PipelineLayoutBinding> bindings;
+  // Mapping of flattened source resource bindings into the descriptor set
+  // bindings. Matches 1:1 with the IREE::Stream::CmdDispatchOp::resources.
   PipelineResourceMap resourceMap;
+  // Flags defining behavior of the pipeline.
+  IREE::HAL::PipelineLayoutFlags flags = IREE::HAL::PipelineLayoutFlags::None;
 
   void print(llvm::raw_ostream &os) const;
 };
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Analysis/DeviceAnalysis.cpp b/compiler/src/iree/compiler/Dialect/HAL/Analysis/DeviceAnalysis.cpp
index 6779531..1af7cce 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Analysis/DeviceAnalysis.cpp
+++ b/compiler/src/iree/compiler/Dialect/HAL/Analysis/DeviceAnalysis.cpp
@@ -110,6 +110,18 @@
   return globalOps;
 }
 
+std::optional<DeviceSet> DeviceAnalysis::lookupDeviceTargets(
+    IREE::Util::GlobalOpInterface deviceGlobalOp) {
+  return lookupDeviceTargets(FlatSymbolRefAttr::get(deviceGlobalOp));
+}
+
+std::optional<DeviceSet>
+DeviceAnalysis::lookupDeviceTargets(SymbolRefAttr deviceGlobalAttr) {
+  SetVector<IREE::HAL::DeviceTargetAttr> resultSet;
+  gatherDeviceTargets(deviceGlobalAttr, explorer.getRootOp(), resultSet);
+  return DeviceSet(resultSet.getArrayRef());
+}
+
 std::optional<DeviceSet>
 DeviceAnalysis::lookupDeviceTargets(Value deviceValue) {
   auto valuePVS = solver.lookupElementFor<DeviceTargetValuePVS>(
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Analysis/DeviceAnalysis.h b/compiler/src/iree/compiler/Dialect/HAL/Analysis/DeviceAnalysis.h
index 88e6a62..9136678 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Analysis/DeviceAnalysis.h
+++ b/compiler/src/iree/compiler/Dialect/HAL/Analysis/DeviceAnalysis.h
@@ -51,6 +51,15 @@
   std::optional<SmallVector<IREE::Util::GlobalOpInterface>>
   lookupDeviceGlobals(Value deviceValue);
 
+  // Returns a set of possible targets of the given `!hal.device` global, if
+  // analyzed.
+  std::optional<DeviceSet>
+  lookupDeviceTargets(IREE::Util::GlobalOpInterface deviceGlobalOp);
+
+  // Returns a set of possible targets of the given `!hal.device` global, if
+  // analyzed.
+  std::optional<DeviceSet> lookupDeviceTargets(SymbolRefAttr deviceGlobalAttr);
+
   // Returns a set of possible targets of the given `!hal.device` value, if
   // analyzed.
   std::optional<DeviceSet> lookupDeviceTargets(Value deviceValue);
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Analysis/DeviceSet.cpp b/compiler/src/iree/compiler/Dialect/HAL/Analysis/DeviceSet.cpp
index b60e5e7..251b621 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Analysis/DeviceSet.cpp
+++ b/compiler/src/iree/compiler/Dialect/HAL/Analysis/DeviceSet.cpp
@@ -19,6 +19,12 @@
   }
 }
 
+DeviceSet::DeviceSet(ArrayRef<IREE::HAL::DeviceTargetAttr> targetAttrs) {
+  for (auto targetAttr : targetAttrs) {
+    this->targetAttrs.insert(targetAttr);
+  }
+}
+
 DeviceSet::DeviceSet(const DenseSet<IREE::HAL::DeviceTargetAttr> &targetAttrs)
     : targetAttrs(targetAttrs) {}
 
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Analysis/DeviceSet.h b/compiler/src/iree/compiler/Dialect/HAL/Analysis/DeviceSet.h
index 18f7d9f..58e1dfe 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Analysis/DeviceSet.h
+++ b/compiler/src/iree/compiler/Dialect/HAL/Analysis/DeviceSet.h
@@ -20,6 +20,7 @@
 public:
   DeviceSet() = default;
   explicit DeviceSet(ArrayAttr targetsAttr);
+  explicit DeviceSet(ArrayRef<IREE::HAL::DeviceTargetAttr> targetAttrs);
   explicit DeviceSet(const DenseSet<IREE::HAL::DeviceTargetAttr> &targetAttrs);
   ~DeviceSet();
 
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Conversion/HALToHAL/test/pseudo_ops.mlir b/compiler/src/iree/compiler/Dialect/HAL/Conversion/HALToHAL/test/pseudo_ops.mlir
index 8533479..4cbccfb 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Conversion/HALToHAL/test/pseudo_ops.mlir
+++ b/compiler/src/iree/compiler/Dialect/HAL/Conversion/HALToHAL/test/pseudo_ops.mlir
@@ -1,9 +1,7 @@
 // RUN: iree-opt --split-input-file --iree-hal-conversion %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<4, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @ex {
   hal.executable.variant public @variant target(#hal.executable.target<"llvm-cpu", "embedded-elf-x86_64">) {
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Conversion/HALToVM/ConvertCommandBufferOps.cpp b/compiler/src/iree/compiler/Dialect/HAL/Conversion/HALToVM/ConvertCommandBufferOps.cpp
index 72716cb..508e08f 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Conversion/HALToVM/ConvertCommandBufferOps.cpp
+++ b/compiler/src/iree/compiler/Dialect/HAL/Conversion/HALToVM/ConvertCommandBufferOps.cpp
@@ -320,125 +320,20 @@
   mutable IREE::VM::ImportOp importOp;
 };
 
-class CommandBufferPushDescriptorSetOpConversion
-    : public OpConversionPattern<IREE::HAL::CommandBufferPushDescriptorSetOp> {
+class CommandBufferDispatchOpConversion
+    : public OpConversionPattern<IREE::HAL::CommandBufferDispatchOp> {
 public:
-  CommandBufferPushDescriptorSetOpConversion(MLIRContext *context,
-                                             SymbolTable &importSymbols,
-                                             TypeConverter &typeConverter,
-                                             StringRef importName)
-      : OpConversionPattern(context) {
-    importOp = importSymbols.lookup<IREE::VM::ImportOp>(importName);
-    assert(importOp);
-  }
-
-  LogicalResult
-  matchAndRewrite(IREE::HAL::CommandBufferPushDescriptorSetOp op,
-                  OpAdaptor adaptor,
-                  ConversionPatternRewriter &rewriter) const override {
-    auto importType = importOp.getFunctionType();
-
-    auto i32Type = rewriter.getI32Type();
-    auto i64Type = rewriter.getI64Type();
-
-    SmallVector<Value, 8> callOperands = {
-        adaptor.getCommandBuffer(),
-        adaptor.getPipelineLayout(),
-        castToImportType(adaptor.getSet(), i32Type, rewriter),
-    };
-    SmallVector<int16_t, 5> segmentSizes = {
-        /*command_buffer=*/-1,
-        /*pipeline_layout=*/-1,
-        /*set=*/-1,
-        /*bindings=*/
-        static_cast<int16_t>(adaptor.getBindingOrdinals().size()),
-    };
-    for (size_t i = 0; i < adaptor.getBindingOrdinals().size(); ++i) {
-      callOperands.push_back(
-          castToImportType(adaptor.getBindingOrdinals()[i], i32Type, rewriter));
-      auto [bindingBufferSlot, bindingBuffer] = splitBufferSlot(
-          op.getLoc(), adaptor.getBindingBuffers()[i], rewriter);
-      callOperands.push_back(bindingBufferSlot);
-      callOperands.push_back(bindingBuffer);
-      callOperands.push_back(
-          castToImportType(adaptor.getBindingOffsets()[i], i64Type, rewriter));
-      callOperands.push_back(
-          castToImportType(adaptor.getBindingLengths()[i], i64Type, rewriter));
-    }
-
-    auto callOp = rewriter.replaceOpWithNewOp<IREE::VM::CallVariadicOp>(
-        op, SymbolRefAttr::get(importOp), importType.getResults(), segmentSizes,
-        importType.getInputs(), callOperands);
-    copyImportAttrs(importOp, callOp);
-    return success();
-  }
-
-private:
-  mutable IREE::VM::ImportOp importOp;
-};
-
-class CommandBufferDispatchIndirectOpConversion
-    : public OpConversionPattern<IREE::HAL::CommandBufferDispatchIndirectOp> {
-public:
-  CommandBufferDispatchIndirectOpConversion(MLIRContext *context,
-                                            SymbolTable &importSymbols,
-                                            TypeConverter &typeConverter,
-                                            StringRef importName)
+  CommandBufferDispatchOpConversion(MLIRContext *context,
+                                    SymbolTable &importSymbols,
+                                    TypeConverter &typeConverter,
+                                    StringRef importName)
       : OpConversionPattern(typeConverter, context) {
     importOp = importSymbols.lookup<IREE::VM::ImportOp>(importName);
     assert(importOp);
   }
 
   LogicalResult
-  matchAndRewrite(IREE::HAL::CommandBufferDispatchIndirectOp op,
-                  OpAdaptor adaptor,
-                  ConversionPatternRewriter &rewriter) const override {
-    auto importType = importOp.getFunctionType();
-    auto [workgroupsBufferSlot, workgroupsBuffer] =
-        splitBufferSlot(op.getLoc(), adaptor.getWorkgroupsBuffer(), rewriter);
-    auto flags = adaptor.getFlagsAttr()
-                     ? rewriter
-                           .create<IREE::VM::ConstI64Op>(
-                               op.getLoc(), adaptor.getFlagsAttr().getInt())
-                           .getResult()
-                     : rewriter.create<IREE::VM::ConstI64ZeroOp>(op.getLoc())
-                           .getResult();
-    SmallVector<Value, 8> callOperands = {
-        adaptor.getCommandBuffer(),
-        adaptor.getExecutable(),
-        castToImportType(adaptor.getEntryPoint(), rewriter.getI32Type(),
-                         rewriter),
-        workgroupsBufferSlot,
-        workgroupsBuffer,
-        castToImportType(adaptor.getWorkgroupsOffset(), rewriter.getI64Type(),
-                         rewriter),
-        flags,
-    };
-    auto callOp = rewriter.replaceOpWithNewOp<IREE::VM::CallOp>(
-        op, SymbolRefAttr::get(importOp), importType.getResults(),
-        callOperands);
-    copyImportAttrs(importOp, callOp);
-    return success();
-  }
-
-private:
-  mutable IREE::VM::ImportOp importOp;
-};
-
-class CommandBufferDispatch2OpConversion
-    : public OpConversionPattern<IREE::HAL::CommandBufferDispatch2Op> {
-public:
-  CommandBufferDispatch2OpConversion(MLIRContext *context,
-                                     SymbolTable &importSymbols,
-                                     TypeConverter &typeConverter,
-                                     StringRef importName)
-      : OpConversionPattern(typeConverter, context) {
-    importOp = importSymbols.lookup<IREE::VM::ImportOp>(importName);
-    assert(importOp);
-  }
-
-  LogicalResult
-  matchAndRewrite(IREE::HAL::CommandBufferDispatch2Op op, OpAdaptor adaptor,
+  matchAndRewrite(IREE::HAL::CommandBufferDispatchOp op, OpAdaptor adaptor,
                   ConversionPatternRewriter &rewriter) const override {
     auto importType = importOp.getFunctionType();
 
@@ -501,20 +396,20 @@
   mutable IREE::VM::ImportOp importOp;
 };
 
-class CommandBufferDispatch2IndirectOpConversion
-    : public OpConversionPattern<IREE::HAL::CommandBufferDispatch2IndirectOp> {
+class CommandBufferDispatchIndirectOpConversion
+    : public OpConversionPattern<IREE::HAL::CommandBufferDispatchIndirectOp> {
 public:
-  CommandBufferDispatch2IndirectOpConversion(MLIRContext *context,
-                                             SymbolTable &importSymbols,
-                                             TypeConverter &typeConverter,
-                                             StringRef importName)
+  CommandBufferDispatchIndirectOpConversion(MLIRContext *context,
+                                            SymbolTable &importSymbols,
+                                            TypeConverter &typeConverter,
+                                            StringRef importName)
       : OpConversionPattern(typeConverter, context) {
     importOp = importSymbols.lookup<IREE::VM::ImportOp>(importName);
     assert(importOp);
   }
 
   LogicalResult
-  matchAndRewrite(IREE::HAL::CommandBufferDispatch2IndirectOp op,
+  matchAndRewrite(IREE::HAL::CommandBufferDispatchIndirectOp op,
                   OpAdaptor adaptor,
                   ConversionPatternRewriter &rewriter) const override {
 
@@ -612,23 +507,11 @@
       context, importSymbols, typeConverter, "hal.command_buffer.copy_buffer");
   patterns.insert<CommandBufferCollectiveOpConversion>(
       context, importSymbols, typeConverter, "hal.command_buffer.collective");
-  patterns
-      .insert<VMImportOpConversion<IREE::HAL::CommandBufferPushConstantsOp>>(
-          context, importSymbols, typeConverter,
-          "hal.command_buffer.push_constants");
-  patterns.insert<CommandBufferPushDescriptorSetOpConversion>(
-      context, importSymbols, typeConverter,
-      "hal.command_buffer.push_descriptor_set");
-  patterns.insert<VMImportOpConversion<IREE::HAL::CommandBufferDispatchOp>>(
+  patterns.insert<CommandBufferDispatchOpConversion>(
       context, importSymbols, typeConverter, "hal.command_buffer.dispatch");
   patterns.insert<CommandBufferDispatchIndirectOpConversion>(
       context, importSymbols, typeConverter,
       "hal.command_buffer.dispatch.indirect");
-  patterns.insert<CommandBufferDispatch2OpConversion>(
-      context, importSymbols, typeConverter, "hal.command_buffer.dispatch2");
-  patterns.insert<CommandBufferDispatch2IndirectOpConversion>(
-      context, importSymbols, typeConverter,
-      "hal.command_buffer.dispatch2.indirect");
 }
 
 } // namespace mlir::iree_compiler
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Conversion/HALToVM/ConvertExecutableOps.cpp b/compiler/src/iree/compiler/Dialect/HAL/Conversion/HALToVM/ConvertExecutableOps.cpp
index 7b0372d..5f5fe0f 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Conversion/HALToVM/ConvertExecutableOps.cpp
+++ b/compiler/src/iree/compiler/Dialect/HAL/Conversion/HALToVM/ConvertExecutableOps.cpp
@@ -120,73 +120,6 @@
     auto constantBuffer = createPackedConstantBuffer(
         createOp.getLoc(), adaptor.getConstants(), rewriter);
 
-    SmallVector<int16_t, 5> segmentSizes = {
-        /*device=*/-1,
-        /*executable_format=*/-1,
-        /*executable_data=*/-1,
-        /*constants=*/-1,
-        /*pipeline_layouts=*/
-        static_cast<int16_t>(llvm::size(adaptor.getLayouts())),
-    };
-    SmallVector<Value, 8> callOperands = {
-        adaptor.getDevice(),
-        executableFormatStr,
-        rodataOp,
-        constantBuffer,
-    };
-    callOperands.append(adaptor.getLayouts().begin(),
-                        adaptor.getLayouts().end());
-
-    auto importType = importOp.getFunctionType();
-    auto callOp = rewriter.replaceOpWithNewOp<IREE::VM::CallVariadicOp>(
-        createOp, SymbolRefAttr::get(importOp), importType.getResults(),
-        segmentSizes, importType.getInputs(), callOperands);
-    copyImportAttrs(importOp, callOp);
-
-    return success();
-  }
-
-private:
-  mutable IREE::VM::ImportOp importOp;
-};
-
-class ExecutableCreate2OpConversion
-    : public OpConversionPattern<IREE::HAL::ExecutableCreate2Op> {
-public:
-  ExecutableCreate2OpConversion(MLIRContext *context,
-                                SymbolTable &importSymbols,
-                                TypeConverter &typeConverter,
-                                StringRef importName)
-      : OpConversionPattern(context) {
-    importOp = importSymbols.lookup<IREE::VM::ImportOp>(importName);
-    assert(importOp);
-  }
-
-  LogicalResult
-  matchAndRewrite(IREE::HAL::ExecutableCreate2Op createOp, OpAdaptor adaptor,
-                  ConversionPatternRewriter &rewriter) const override {
-    // Materialize vm.rodata for the binary.
-    auto executableBinaryOp =
-        SymbolTable::lookupNearestSymbolFrom<IREE::HAL::ExecutableBinaryOp>(
-            createOp, createOp.getExecutableTarget());
-    auto executableOp = executableBinaryOp.getOperation()
-                            ->getParentOfType<IREE::HAL::ExecutableOp>();
-    std::string rodataName = sanitizeSymbolName(
-        (executableOp.getName() + "_" + executableBinaryOp.getName()).str());
-    auto rodataOp = rewriter.create<IREE::VM::RodataInlineOp>(
-        executableBinaryOp.getLoc(),
-        IREE::VM::RefType::get(rewriter.getType<IREE::VM::BufferType>()),
-        rewriter.getStringAttr(rodataName), executableBinaryOp.getData(),
-        rewriter.getI64IntegerAttr(16), executableBinaryOp.getMimeTypeAttr());
-
-    // Get format string as a rodata blob.
-    auto executableFormatStr = rewriter.create<IREE::VM::RodataInlineOp>(
-        createOp.getLoc(), executableBinaryOp.getFormatAttr());
-
-    // Pack constants, if any.
-    auto constantBuffer = createPackedConstantBuffer(
-        createOp.getLoc(), adaptor.getConstants(), rewriter);
-
     SmallVector<Value, 8> callOperands = {
         adaptor.getDevice(),
         executableFormatStr,
@@ -218,14 +151,6 @@
 
   patterns.insert<ExecutableCreateOpConversion>(
       context, importSymbols, typeConverter, "hal.executable.create");
-  patterns.insert<ExecutableCreate2OpConversion>(
-      context, importSymbols, typeConverter, "hal.executable.create2");
-
-  patterns.insert<VMImportOpConversion<IREE::HAL::DescriptorSetLayoutCreateOp>>(
-      context, importSymbols, typeConverter,
-      "hal.descriptor_set_layout.create");
-  patterns.insert<VMImportOpConversion<IREE::HAL::PipelineLayoutCreateOp>>(
-      context, importSymbols, typeConverter, "hal.pipeline_layout.create");
 }
 
 } // namespace mlir::iree_compiler
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Conversion/HALToVM/test/command_buffer_ops.mlir b/compiler/src/iree/compiler/Dialect/HAL/Conversion/HALToVM/test/command_buffer_ops.mlir
index f005985..faff006 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Conversion/HALToVM/test/command_buffer_ops.mlir
+++ b/compiler/src/iree/compiler/Dialect/HAL/Conversion/HALToVM/test/command_buffer_ops.mlir
@@ -292,118 +292,12 @@
 
 // -----
 
-// CHECK-LABEL: @command_buffer_push_descriptor_set
-//  CHECK-SAME: (%[[CMD:.+]]: !vm.ref<!hal.command_buffer>,
-//  CHECK-SAME:  %[[LAYOUT:.+]]: !vm.ref<!hal.pipeline_layout>,
-//  CHECK-SAME:  %[[BUFFER:.+]]: !vm.ref<!hal.buffer>,
-//  CHECK-SAME:  %[[SLOT:.+]]: i32)
-util.func public @command_buffer_push_descriptor_set(
-    %cmd: !hal.command_buffer,
-    %layout: !hal.pipeline_layout,
-    %buffer: !hal.buffer,
-    %slot: index
-  ) {
-  %c0 = arith.constant 0 : index
-  %c1 = arith.constant 1 : index
-  %c4 = arith.constant 4 : index
-  %c4096 = arith.constant 4096 : index
-  %c8000 = arith.constant 8000 : index
-  // CHECK: %[[C0:.+]] = vm.const.i32.zero
-  // CHECK: %[[C1:.+]] = vm.const.i32 1
-  // CHECK: %[[NULL:.+]] = vm.const.ref.zero : !vm.ref<!hal.buffer>
-  // CHECK: vm.call.variadic @hal.command_buffer.push_descriptor_set
-  // CHECK-SAME: (%[[CMD]], %[[LAYOUT]], %c1, [
-  // CHECK-SAME:   (%[[C0]], %[[C0]], %[[BUFFER]], %c4096, %c8000),
-  // CHECK-SAME:   (%[[C1]], %[[SLOT]], %[[NULL]], %c4, %c4096)
-  // CHECK-SAME: ]) : (!vm.ref<!hal.command_buffer>, !vm.ref<!hal.pipeline_layout>, i32, tuple<i32, i32, !vm.ref<!hal.buffer>, i64, i64> ...)
-  hal.command_buffer.push_descriptor_set<%cmd : !hal.command_buffer>
-      layout(%layout : !hal.pipeline_layout)[%c1]
-      bindings([
-        %c0 = (%buffer : !hal.buffer)[%c4096, %c8000],
-        %c1 = (%slot : index)[%c4, %c4096]
-      ])
-  util.return
-}
-
-// -----
-
 // CHECK-LABEL: @command_buffer_dispatch
 //  CHECK-SAME: (%[[CMD:.+]]: !vm.ref<!hal.command_buffer>,
-//  CHECK-SAME:  %[[EXECUTABLE:.+]]: !vm.ref<!hal.executable>)
-util.func public @command_buffer_dispatch(
-  %cmd: !hal.command_buffer,
-  %executable: !hal.executable
-) {
-  // CHECK-DAG: %[[ORDINAL:.+]] = vm.const.i32 123
-  %ordinal = arith.constant 123 : index
-  %c100 = arith.constant 100 : index
-  %c200 = arith.constant 200 : index
-  %c300 = arith.constant 300 : index
-  // CHECK-DAG: %[[FLAGS:.+]] = vm.const.i64.zero
-  // CHECK: vm.call @hal.command_buffer.dispatch(%[[CMD]], %[[EXECUTABLE]], %[[ORDINAL]], %c100, %c200, %c300, %[[FLAGS]])
-  hal.command_buffer.dispatch<%cmd : !hal.command_buffer>
-      target(%executable : !hal.executable)[%ordinal]
-      workgroups([%c100, %c200, %c300])
-      flags(None)
-  util.return
-}
-
-// -----
-
-// CHECK-LABEL: @command_buffer_dispatch_indirect
-//  CHECK-SAME: (%[[CMD:.+]]: !vm.ref<!hal.command_buffer>,
-//  CHECK-SAME:  %[[EXECUTABLE:.+]]: !vm.ref<!hal.executable>,
-//  CHECK-SAME:  %[[BUFFER:.+]]: !vm.ref<!hal.buffer>)
-util.func public @command_buffer_dispatch_indirect(
-  %cmd: !hal.command_buffer,
-  %executable: !hal.executable,
-  %buffer: !hal.buffer
-) {
-  // CHECK-DAG: %[[ORDINAL:.+]] = vm.const.i32 123
-  %ordinal = arith.constant 123 : index
-  %c100 = arith.constant 100 : index
-  // CHECK-DAG: %[[UNUSED_SLOT:.+]] = vm.const.i32.zero
-  // CHECK-DAG: %[[FLAGS:.+]] = vm.const.i64.zero
-  // CHECK: vm.call @hal.command_buffer.dispatch.indirect(%[[CMD]], %[[EXECUTABLE]], %[[ORDINAL]], %[[UNUSED_SLOT]], %[[BUFFER]], %c100, %[[FLAGS]])
-  hal.command_buffer.dispatch.indirect<%cmd : !hal.command_buffer>
-      target(%executable : !hal.executable)[%ordinal]
-      workgroups(%buffer : !hal.buffer)[%c100]
-      flags(None)
-  util.return
-}
-
-// -----
-
-// CHECK-LABEL: @command_buffer_dispatch_indirect_indirect
-//  CHECK-SAME: (%[[CMD:.+]]: !vm.ref<!hal.command_buffer>,
-//  CHECK-SAME:  %[[EXECUTABLE:.+]]: !vm.ref<!hal.executable>,
-//  CHECK-SAME:  %[[BUFFER_SLOT:.+]]: i32)
-util.func public @command_buffer_dispatch_indirect_indirect(
-  %cmd: !hal.command_buffer,
-  %executable: !hal.executable,
-  %buffer_slot: index
-) {
-  // CHECK-DAG: %[[ORDINAL:.+]] = vm.const.i32 123
-  %ordinal = arith.constant 123 : index
-  %c100 = arith.constant 100 : index
-  // CHECK-DAG: %[[NULL_BUFFER:.+]] = vm.const.ref.zero : !vm.ref<!hal.buffer>
-  // CHECK-DAG: %[[FLAGS:.+]] = vm.const.i64.zero
-  // CHECK: vm.call @hal.command_buffer.dispatch.indirect(%[[CMD]], %[[EXECUTABLE]], %[[ORDINAL]], %[[BUFFER_SLOT]], %[[NULL_BUFFER]], %c100, %[[FLAGS]])
-  hal.command_buffer.dispatch.indirect<%cmd : !hal.command_buffer>
-      target(%executable : !hal.executable)[%ordinal]
-      workgroups(%buffer_slot : index)[%c100]
-      flags(None)
-  util.return
-}
-
-// -----
-
-// CHECK-LABEL: @command_buffer_dispatch2
-//  CHECK-SAME: (%[[CMD:.+]]: !vm.ref<!hal.command_buffer>,
 //  CHECK-SAME:  %[[EXECUTABLE:.+]]: !vm.ref<!hal.executable>,
 //  CHECK-SAME:  %[[BUFFER:.+]]: !vm.ref<!hal.buffer>,
 //  CHECK-SAME:  %[[SLOT:.+]]: i32)
-util.func public @command_buffer_dispatch2(
+util.func public @command_buffer_dispatch(
   %cmd: !hal.command_buffer,
   %executable: !hal.executable,
   %buffer: !hal.buffer,
@@ -427,7 +321,7 @@
   %c8000 = arith.constant 8000 : index
   // CHECK-DAG: %[[NULL_BUFFER:.+]] = vm.const.ref.zero : !vm.ref<!hal.buffer>
   // CHECK-DAG: %[[FLAGS:.+]] = vm.const.i64.zero
-  // CHECK: vm.call.variadic @hal.command_buffer.dispatch2
+  // CHECK: vm.call.variadic @hal.command_buffer.dispatch
   // CHECK-SAME: %[[CMD]],
   // CHECK-SAME: %[[EXECUTABLE]], %[[ORDINAL]],
   // CHECK-SAME: %[[X]], %[[Y]], %[[Z]],
@@ -435,7 +329,7 @@
   // CHECK-SAME: [%[[CONSTANT0]], %[[CONSTANT1]]],
   // CHECK-SAME: [(%[[C0]], %[[C0]], %[[BUFFER]], %c4096, %c8000),
   // CHECK-SAME:  (%[[C0]], %[[SLOT]], %[[NULL_BUFFER]], %c4, %c4096)]
-  hal.command_buffer.dispatch2<%cmd : !hal.command_buffer>
+  hal.command_buffer.dispatch<%cmd : !hal.command_buffer>
       target(%executable : !hal.executable)[%ordinal]
       workgroups([%x, %y, %z])
       constants([%constant0, %constant1])
@@ -449,13 +343,13 @@
 
 // -----
 
-// CHECK-LABEL: vm.func private @command_buffer_dispatch2
+// CHECK-LABEL: vm.func private @command_buffer_dispatch
 //  CHECK-SAME: (%[[CMD:[a-z0-9]+]]: !vm.ref<!hal.command_buffer>,
 //  CHECK-SAME:  %[[EXECUTABLE:[a-z0-9]+]]: !vm.ref<!hal.executable>,
 //  CHECK-SAME:  %[[WORKGROUPS_SLOT:[a-z0-9]+]]: i32,
 //  CHECK-SAME:  %[[BUFFER:[a-z0-9]+]]: !vm.ref<!hal.buffer>,
 //  CHECK-SAME:  %[[SLOT:[a-z0-9]+]]: i32)
-util.func public @command_buffer_dispatch2(
+util.func public @command_buffer_dispatch(
   %cmd: !hal.command_buffer,
   %executable: !hal.executable,
   %workgroups_slot: index,
@@ -476,7 +370,7 @@
   %c8000 = arith.constant 8000 : index
   // CHECK-DAG: %[[NULL_BUFFER:.+]] = vm.const.ref.zero : !vm.ref<!hal.buffer>
   // CHECK-DAG: %[[FLAGS:.+]] = vm.const.i64.zero
-  // CHECK: vm.call.variadic @hal.command_buffer.dispatch2.indirect
+  // CHECK: vm.call.variadic @hal.command_buffer.dispatch.indirect
   // CHECK-SAME: %[[CMD]],
   // CHECK-SAME: %[[EXECUTABLE]], %[[ORDINAL]],
   // CHECK-SAME: %[[WORKGROUPS_SLOT]], %[[NULL_BUFFER]], %[[WORKGROUPS_OFFSET]],
@@ -484,7 +378,7 @@
   // CHECK-SAME: [%[[CONSTANT0]], %[[CONSTANT1]]],
   // CHECK-SAME: [(%[[C0]], %[[C0]], %[[BUFFER]], %c4096, %c8000),
   // CHECK-SAME:  (%[[C0]], %[[SLOT]], %[[NULL_BUFFER]], %c4, %c4096)]
-  hal.command_buffer.dispatch2.indirect<%cmd : !hal.command_buffer>
+  hal.command_buffer.dispatch.indirect<%cmd : !hal.command_buffer>
       target(%executable : !hal.executable)[%ordinal]
       workgroups(%workgroups_slot : index)[%workgroups_offset]
       constants([%constant0, %constant1])
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Conversion/HALToVM/test/executable_ops.mlir b/compiler/src/iree/compiler/Dialect/HAL/Conversion/HALToVM/test/executable_ops.mlir
index 292cb47..87fab3f 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Conversion/HALToVM/test/executable_ops.mlir
+++ b/compiler/src/iree/compiler/Dialect/HAL/Conversion/HALToVM/test/executable_ops.mlir
@@ -13,49 +13,6 @@
 
 // CHECK-LABEL: @executableCreate
 util.func public @executableCreate(
-    // CHECK-SAME: %[[DEV:.+]]: !vm.ref<!hal.device>
-    %device: !hal.device,
-    // CHECK-SAME: %[[LAYOUT0:.+]]: !vm.ref<!hal.pipeline_layout>,
-    %layout0: !hal.pipeline_layout,
-    // CHECK-SAME: %[[LAYOUT1:.+]]: !vm.ref<!hal.pipeline_layout>
-    %layout1: !hal.pipeline_layout
-  ) -> (!hal.executable, !hal.executable) {
-
-  // CHECK-DAG: %[[FORMAT1:.+]] = vm.rodata.inline "_utf8_format1_
-  // CHECK-DAG: %[[BINARY1:.+]] = vm.rodata.inline "exe_binary1" {alignment = 16 : i64} : !vm.buffer = dense<[0, 1, 2, 3]> : vector<4xi8>
-  // CHECK-DAG: %[[NULL1:.+]] = vm.const.ref.zero : !vm.buffer
-  // CHECK: %[[EXE1:.+]] = vm.call.variadic @hal.executable.create(
-  // CHECK-SAME: %[[DEV]], %[[FORMAT1]], %[[BINARY1]], %[[NULL1]], [%[[LAYOUT0]], %[[LAYOUT1]]]
-  // CHECK-SAME: ) {nosideeffects} : (!vm.ref<!hal.device>, !vm.buffer, !vm.buffer, !vm.buffer, !vm.ref<!hal.pipeline_layout> ...) -> !vm.ref<!hal.executable>
-  %0 = hal.executable.create device(%device : !hal.device) target(@exe::@binary1) layouts([%layout0, %layout1]) : !hal.executable
-
-  // CHECK-DAG: %[[FORMAT2:.+]] = vm.rodata.inline "_utf8_format2_
-  // CHECK-DAG: %[[BINARY2:.+]] = vm.rodata.inline "exe_binary2" {alignment = 16 : i64} : !vm.buffer = dense<[4, 5, 6, 7]> : vector<4xi8>
-  // CHECK-DAG: %[[NULL2:.+]] = vm.const.ref.zero : !vm.buffer
-  // CHECK: %[[EXE2:.+]] = vm.call.variadic @hal.executable.create(
-  // CHECK-SAME: %[[DEV]], %[[FORMAT2]], %[[BINARY2]], %[[NULL2]], [%[[LAYOUT1]], %[[LAYOUT0]]]
-  // CHECK-SAME: ) {nosideeffects} : (!vm.ref<!hal.device>, !vm.buffer, !vm.buffer, !vm.buffer, !vm.ref<!hal.pipeline_layout> ...) -> !vm.ref<!hal.executable>
-  %1 = hal.executable.create device(%device : !hal.device) target(@exe::@binary2) layouts([%layout1, %layout0]) : !hal.executable
-
-  // CHECK: vm.return %[[EXE1]], %[[EXE2]]
-  util.return %0, %1 : !hal.executable, !hal.executable
-}
-
-// -----
-
-hal.executable @exe {
-  hal.executable.binary @binary1 attributes {
-    data = dense<[0, 1, 2, 3]> : vector<4xi8>,
-    format = "format1"
-  }
-  hal.executable.binary @binary2 attributes {
-    data = dense<[4, 5, 6, 7]> : vector<4xi8>,
-    format = "format2"
-  }
-}
-
-// CHECK-LABEL: @executableCreate2
-util.func public @executableCreate2(
   // CHECK-SAME: %[[DEV:.+]]: !vm.ref<!hal.device>
   %device: !hal.device
 ) -> (!hal.executable, !hal.executable) {
@@ -63,18 +20,18 @@
   // CHECK-DAG: %[[FORMAT1:.+]] = vm.rodata.inline "_utf8_format1_
   // CHECK-DAG: %[[BINARY1:.+]] = vm.rodata.inline "exe_binary1" {alignment = 16 : i64} : !vm.buffer = dense<[0, 1, 2, 3]> : vector<4xi8>
   // CHECK-DAG: %[[NULL1:.+]] = vm.const.ref.zero : !vm.buffer
-  // CHECK: %[[EXE1:.+]] = vm.call @hal.executable.create2(
+  // CHECK: %[[EXE1:.+]] = vm.call @hal.executable.create(
   // CHECK-SAME: %[[DEV]], %[[FORMAT1]], %[[BINARY1]], %[[NULL1]]
   // CHECK-SAME: ) {nosideeffects} : (!vm.ref<!hal.device>, !vm.buffer, !vm.buffer, !vm.buffer) -> !vm.ref<!hal.executable>
-  %0 = hal.executable.create2 device(%device : !hal.device) target(@exe::@binary1) : !hal.executable
+  %0 = hal.executable.create device(%device : !hal.device) target(@exe::@binary1) : !hal.executable
 
   // CHECK-DAG: %[[FORMAT2:.+]] = vm.rodata.inline "_utf8_format2_
   // CHECK-DAG: %[[BINARY2:.+]] = vm.rodata.inline "exe_binary2" {alignment = 16 : i64} : !vm.buffer = dense<[4, 5, 6, 7]> : vector<4xi8>
   // CHECK-DAG: %[[NULL2:.+]] = vm.const.ref.zero : !vm.buffer
-  // CHECK: %[[EXE2:.+]] = vm.call @hal.executable.create2(
+  // CHECK: %[[EXE2:.+]] = vm.call @hal.executable.create(
   // CHECK-SAME: %[[DEV]], %[[FORMAT2]], %[[BINARY2]], %[[NULL2]]
   // CHECK-SAME: ) {nosideeffects} : (!vm.ref<!hal.device>, !vm.buffer, !vm.buffer, !vm.buffer) -> !vm.ref<!hal.executable>
-  %1 = hal.executable.create2 device(%device : !hal.device) target(@exe::@binary2) : !hal.executable
+  %1 = hal.executable.create device(%device : !hal.device) target(@exe::@binary2) : !hal.executable
 
   // CHECK: vm.return %[[EXE1]], %[[EXE2]]
   util.return %0, %1 : !hal.executable, !hal.executable
@@ -97,16 +54,14 @@
 
 // CHECK-LABEL: @multipleExecutables
 util.func public @multipleExecutables(
-    %device: !hal.device,
-    %layout0: !hal.pipeline_layout,
-    %layout1: !hal.pipeline_layout
+    %device: !hal.device
   ) -> (!hal.executable, !hal.executable) {
   // CHECK-DAG: %[[FORMAT1:.+]] = vm.rodata.inline "_utf8_format_
   // CHECK-DAG: %[[BINARY1:.+]] = vm.rodata.inline "exe1_binary1" {alignment = 16 : i64} : !vm.buffer = dense<[0, 1, 2, 3]> : vector<4xi8>
-  %0 = hal.executable.create device(%device : !hal.device) target(@exe1::@binary1) layouts([%layout0, %layout1]) : !hal.executable
+  %0 = hal.executable.create device(%device : !hal.device) target(@exe1::@binary1) : !hal.executable
   // CHECK-DAG: %[[FORMAT2:.+]] = vm.rodata.inline "_utf8_format_
   // CHECK-DAG: %[[BINARY2:.+]] = vm.rodata.inline "exe2_binary2" {alignment = 16 : i64} : !vm.buffer = dense<[4, 5, 6, 7]> : vector<4xi8>
-  %1 = hal.executable.create device(%device : !hal.device) target(@exe2::@binary2) layouts([%layout1, %layout0]) : !hal.executable
+  %1 = hal.executable.create device(%device : !hal.device) target(@exe2::@binary2) : !hal.executable
   util.return %0, %1 : !hal.executable, !hal.executable
 }
 
@@ -123,8 +78,6 @@
 util.func public @executableConstants(
     // CHECK-SAME: %[[DEV:.+]]: !vm.ref<!hal.device>
     %device: !hal.device,
-    // CHECK-SAME: %[[LAYOUT:.+]]: !vm.ref<!hal.pipeline_layout>
-    %layout: !hal.pipeline_layout,
     // CHECK-SAME: %[[CONSTANT0:.+]]: i32, %[[CONSTANT1:.+]]: i32
     %constant0: i32, %constant1: i32
   ) -> !hal.executable {
@@ -141,13 +94,12 @@
   // CHECK-DAG: %[[INDEX2:.+]] = vm.const.i64 2
   // CHECK-DAG: vm.buffer.store.i32 %[[CONSTANT1]], %[[CONSTANTS]][%[[INDEX2]]] : i32 -> !vm.buffer
 
-  // CHECK: %[[EXE:.+]] = vm.call.variadic @hal.executable.create(
-  // CHECK-SAME: %[[DEV]], %[[FORMAT]], %[[BINARY]], %[[CONSTANTS]], [%[[LAYOUT]]]
-  // CHECK-SAME: ) {nosideeffects} : (!vm.ref<!hal.device>, !vm.buffer, !vm.buffer, !vm.buffer, !vm.ref<!hal.pipeline_layout> ...) -> !vm.ref<!hal.executable>
+  // CHECK: %[[EXE:.+]] = vm.call @hal.executable.create(
+  // CHECK-SAME: %[[DEV]], %[[FORMAT]], %[[BINARY]], %[[CONSTANTS]]
+  // CHECK-SAME: ) {nosideeffects} : (!vm.ref<!hal.device>, !vm.buffer, !vm.buffer, !vm.buffer) -> !vm.ref<!hal.executable>
   %0 = hal.executable.create
       device(%device : !hal.device)
       target(@exe::@binary)
-      layouts([%layout])
       constants([%constant0, %c0, %constant1]) : !hal.executable
 
   // CHECK: vm.return %[[EXE]]
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Conversion/StreamToHAL/Patterns.cpp b/compiler/src/iree/compiler/Dialect/HAL/Conversion/StreamToHAL/Patterns.cpp
index 109701a..8270b21 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Conversion/StreamToHAL/Patterns.cpp
+++ b/compiler/src/iree/compiler/Dialect/HAL/Conversion/StreamToHAL/Patterns.cpp
@@ -30,13 +30,6 @@
     llvm::cl::init(false),
 };
 
-// TODO(#18154): switch default to true and then remove.
-static llvm::cl::opt<bool> clExperimentalDispatch2{
-    "iree-hal-experimental-dispatch2",
-    llvm::cl::desc("Whether to emit iree_hal_command_buffer_dispatch2 ops."),
-    llvm::cl::init(false),
-};
-
 struct ContextResolveOpPattern
     : public StreamConversionPattern<IREE::Stream::ContextResolveOp> {
   using StreamConversionPattern::StreamConversionPattern;
@@ -670,7 +663,6 @@
   }
 };
 
-// TODO(#18154): switch to dispatch2.
 struct CmdDispatchOpPattern
     : public StreamConversionPattern<IREE::Stream::CmdDispatchOp> {
   using StreamConversionPattern::StreamConversionPattern;
@@ -708,188 +700,6 @@
       caseExportOps.push_back(std::make_pair(entryPointAttr, exportOp));
     });
 
-    auto recordDispatch = [&](SymbolRefAttr entryPointAttr,
-                              IREE::HAL::ExecutableExportOp exportOp,
-                              OpBuilder &builder) {
-      // Record push constants and buffer bindings.
-      recordParameters(loc, affinityAttr, device, commandBufferMapping,
-                       exportOp, dispatchOp, adaptor, builder);
-
-      // Dispatch with a target-specific workgroup count.
-      auto workgroupCount = exportOp.calculateWorkgroupCount(
-          loc, device, adaptor.getWorkload(), builder);
-      Value executable = builder.create<IREE::HAL::ExecutableLookupOp>(
-          loc, builder.getType<IREE::HAL::ExecutableType>(), device,
-          entryPointAttr.getRootReference().getValue());
-      Value ordinal = builder.create<IREE::HAL::ExecutableExportOrdinalOp>(
-          loc, builder.getIndexType(), entryPointAttr);
-      auto flags = builder.getAttr<IREE::HAL::DispatchFlagsAttr>(
-          IREE::HAL::DispatchFlags::None);
-      return builder.create<IREE::HAL::CommandBufferDispatchOp>(
-          loc, commandBufferMapping.getHandle(), executable, ordinal,
-          workgroupCount[0], workgroupCount[1], workgroupCount[2], flags);
-    };
-
-    // If there is only one variant we can emit that directly without a
-    // conditional check. The same result should occur later on but it saves
-    // a lot of IR during generation if we know we can avoid it.
-    if (caseExportOps.size() == 1) {
-      auto [entryPointAttr, exportOp] = caseExportOps.front();
-      rewriter.replaceOp(dispatchOp,
-                         recordDispatch(entryPointAttr, exportOp, rewriter));
-    } else {
-      // Select the variant index.
-      Value selectedIndex = buildIfElseTree(
-          loc, caseExportOps.size(),
-          [&](Location loc, size_t i, OpBuilder &builder) {
-            auto exportOp = caseExportOps[i].second;
-            auto variantOp =
-                exportOp->getParentOfType<IREE::HAL::ExecutableVariantOp>();
-            return variantOp.buildCondition(device, rewriter);
-          },
-          rewriter);
-
-      // Allow each variant to define how it is dispatched.
-      auto switchOp = rewriter.create<scf::IndexSwitchOp>(
-          loc, TypeRange{}, selectedIndex, caseIndices, caseIndices.size());
-      for (size_t i = 0; i < caseExportOps.size(); ++i) {
-        auto [entryPointAttr, exportOp] = caseExportOps[i];
-        auto &caseBlock = switchOp.getCaseRegions()[i].emplaceBlock();
-        auto caseBuilder = OpBuilder::atBlockBegin(&caseBlock);
-        recordDispatch(entryPointAttr, exportOp, caseBuilder);
-        caseBuilder.create<scf::YieldOp>(loc);
-      }
-
-      // Fallback for no available variant. Today we just no-op as executable
-      // loading should have already failed.
-      auto &defaultBlock = switchOp.getDefaultRegion().emplaceBlock();
-      auto defaultBuilder = OpBuilder::atBlockBegin(&defaultBlock);
-      defaultBuilder.create<scf::YieldOp>(loc);
-
-      rewriter.replaceOp(dispatchOp, switchOp);
-    }
-
-    return success();
-  }
-
-  void recordParameters(Location loc, IREE::Stream::AffinityAttr affinityAttr,
-                        Value device,
-                        CommandBufferConversionMapping &commandBufferMapping,
-                        IREE::HAL::ExecutableExportOp exportOp,
-                        IREE::Stream::CmdDispatchOp dispatchOp,
-                        OpAdaptor adaptor, OpBuilder &builder) const {
-    auto layoutAttr = exportOp.getLayout();
-    auto pipelineLayout =
-        builder
-            .create<IREE::HAL::PipelineLayoutLookupOp>(
-                loc, IREE::HAL::PipelineLayoutType::get(loc.getContext()),
-                device, layoutAttr)
-            .getResult();
-
-    // Push constant values.
-    // TODO(#5322): symbolic push constant names on the hal.interface so we can
-    // sparsely pack these.
-    if (!adaptor.getUniformOperands().empty()) {
-      int pushConstantBase = 0; // always 0 today
-      SmallVector<Value> pushConstants;
-      for (auto operand : adaptor.getUniformOperands()) {
-        assert(
-            operand.getType().isInteger(32) &&
-            "expected only i32 values after iree-hal-pack-dispatch-operands");
-        pushConstants.push_back(operand);
-      }
-      builder.create<IREE::HAL::CommandBufferPushConstantsOp>(
-          loc, commandBufferMapping.getHandle(), pipelineLayout,
-          builder.getIndexAttr(pushConstantBase), pushConstants);
-    }
-
-    // Push descriptor bindings set by set.
-    // We build a table of all sets in the layout then populate the bindings as
-    // we walk the flattened/unordered resource list. After we've collected all
-    // of the bindings we issue the command for that set.
-    auto bindingAttrs = IREE::HAL::getInterfaceBindingAttrs(
-        exportOp, dispatchOp.getResources().size());
-    int64_t maxSet = llvm::max_element(bindingAttrs, [](auto lhs, auto rhs) {
-                       return lhs.getSet() < rhs.getSet();
-                     })->getSet();
-    SmallVector<SmallVector<IREE::HAL::DescriptorSetBindingValue>> setBindings;
-    setBindings.resize(maxSet + 1);
-    for (auto [i, bindingAttr] : llvm::enumerate(bindingAttrs)) {
-      auto setLayoutFlags =
-          layoutAttr.getSetLayout(bindingAttr.getSet())
-              .getFlags()
-              .value_or(IREE::HAL::DescriptorSetLayoutFlags::None);
-      IREE::HAL::DescriptorSetBindingValue binding;
-      binding.ordinal =
-          builder.create<arith::ConstantIndexOp>(loc, bindingAttr.getBinding());
-      if (bitEnumContainsAll(setLayoutFlags,
-                             IREE::HAL::DescriptorSetLayoutFlags::Indirect)) {
-        // Indirect binding resolved through the cached command buffer binding
-        // table. The buffer recorded in the descriptor is a slot ordinal into
-        // the binding table. Note that the range may be adjusted based on the
-        // range bound to the slot in the table.
-        auto resolvedBinding = commandBufferMapping.resolveBinding(
-            loc, dispatchOp.getResources()[i], adaptor.getResources()[i],
-            adaptor.getResourceOffsets()[i], adaptor.getResourceLengths()[i],
-            builder);
-        binding.buffer = resolvedBinding.buffer;
-        binding.byteOffset = resolvedBinding.byteOffset;
-        binding.byteLength = resolvedBinding.byteLength;
-      } else {
-        // Direct binding referencing the buffer and range provided on the op.
-        binding.buffer = adaptor.getResources()[i];
-        binding.byteOffset = adaptor.getResourceOffsets()[i];
-        binding.byteLength = adaptor.getResourceLengths()[i];
-      }
-      setBindings[bindingAttr.getSet()].push_back(binding);
-    }
-    for (auto [set, bindings] : llvm::enumerate(setBindings)) {
-      if (!bindings.empty()) {
-        builder.create<IREE::HAL::CommandBufferPushDescriptorSetOp>(
-            loc, commandBufferMapping.getHandle(), pipelineLayout, set,
-            bindings);
-      }
-    }
-  }
-};
-
-struct CmdDispatch2OpPattern
-    : public StreamConversionPattern<IREE::Stream::CmdDispatchOp> {
-  using StreamConversionPattern::StreamConversionPattern;
-  LogicalResult
-  matchAndRewrite(IREE::Stream::CmdDispatchOp dispatchOp, OpAdaptor adaptor,
-                  ConversionPatternRewriter &rewriter) const override {
-    auto loc = dispatchOp.getLoc();
-    auto commandBufferMapping = mapping->lookupCommandBufferFor(dispatchOp);
-
-    // TODO(multi-device): reusable command buffers done at the stream level may
-    // make this difficult. For now we assume each stream region being lowered
-    // has a singular affinity that may itself reference multiple devices in the
-    // future but currently uniquely identifies a device.
-    auto affinityAttr = IREE::Stream::AffinityAttr::lookupOrDefault(dispatchOp);
-
-    // Get the device handle we're executing against in this execution region.
-    // Note that this is a dynamic value: we have to treat the device as unknown
-    // here.
-    Value device = rewriter.create<IREE::HAL::CommandBufferDeviceOp>(
-        loc, rewriter.getType<IREE::HAL::DeviceType>(),
-        commandBufferMapping.getHandle());
-
-    // Prepare for variant switch table by gathering the conditions selecting
-    // each variant.
-    SmallVector<int64_t> caseIndices;
-    SmallVector<std::pair<SymbolRefAttr, IREE::HAL::ExecutableExportOp>>
-        caseExportOps;
-    dispatchOp.forEachEntryPointAttr([&](SymbolRefAttr entryPointAttr) {
-      // NOTE: slow lookup!
-      auto exportOp =
-          SymbolTable::lookupNearestSymbolFrom<IREE::HAL::ExecutableExportOp>(
-              dispatchOp, entryPointAttr);
-      assert(exportOp && "dispatch target export not found");
-      caseIndices.push_back(caseIndices.size());
-      caseExportOps.push_back(std::make_pair(entryPointAttr, exportOp));
-    });
-
     // If there is only one variant we can emit that directly without a
     // conditional check. The same result should occur later on but it saves
     // a lot of IR during generation if we know we can avoid it.
@@ -952,15 +762,10 @@
     Value ordinal = builder.create<IREE::HAL::ExecutableExportOrdinalOp>(
         loc, builder.getIndexType(), entryPointAttr);
 
-    // TODO(#18154): simplify bindings by removing descriptor sets.
     auto layoutAttr = exportOp.getLayout();
-    auto bindingAttrs = IREE::HAL::getInterfaceBindingAttrs(
-        exportOp, dispatchOp.getResources().size());
     SmallVector<IREE::HAL::BindingValue> bindings;
-    for (auto [i, bindingAttr] : llvm::enumerate(bindingAttrs)) {
-      auto descriptorFlags = layoutAttr.getSetLayout(bindingAttr.getSet())
-                                 .getBinding(i)
-                                 .getFlags();
+    for (auto [i, bindingAttr] : llvm::enumerate(layoutAttr.getBindings())) {
+      auto descriptorFlags = bindingAttr.getFlags();
       IREE::HAL::BindingValue binding;
       if (bitEnumContainsAll(descriptorFlags,
                              IREE::HAL::DescriptorFlags::Indirect)) {
@@ -986,7 +791,7 @@
 
     auto flags = IREE::HAL::DispatchFlags::None;
 
-    return builder.create<IREE::HAL::CommandBufferDispatch2Op>(
+    return builder.create<IREE::HAL::CommandBufferDispatchOp>(
         loc, commandBufferMapping.getHandle(), executable, ordinal,
         workgroupCount, adaptor.getUniformOperands(), bindings, flags);
   }
@@ -1555,15 +1360,9 @@
   patterns
       .insert<CmdFlushOpPattern, CmdInvalidateOpPattern, CmdDiscardOpPattern,
               CmdFillOpPattern, CmdCopyOpPattern, CmdCollectiveOpPattern,
-              CmdFuncOpPattern, CmdCallOpPattern, CmdExecuteOpPattern,
-              CmdSerialOpPattern, CmdConcurrentOpPattern>(
+              CmdDispatchOpPattern, CmdFuncOpPattern, CmdCallOpPattern,
+              CmdExecuteOpPattern, CmdSerialOpPattern, CmdConcurrentOpPattern>(
           mapping, typeConverter, context);
-  // TODO(#18154): drop existing pattern.
-  if (clExperimentalDispatch2) {
-    patterns.insert<CmdDispatch2OpPattern>(mapping, typeConverter, context);
-  } else {
-    patterns.insert<CmdDispatchOpPattern>(mapping, typeConverter, context);
-  }
   patterns.insert<TimepointImmediateOpPattern, TimepointImportOpPattern,
                   TimepointExportOpPattern, TimepointChainExternalOpPattern,
                   TimepointJoinOpPattern, TimepointBarrierOpPattern,
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Conversion/StreamToHAL/test/BUILD.bazel b/compiler/src/iree/compiler/Dialect/HAL/Conversion/StreamToHAL/test/BUILD.bazel
index 12dc520..2d6f777 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Conversion/StreamToHAL/test/BUILD.bazel
+++ b/compiler/src/iree/compiler/Dialect/HAL/Conversion/StreamToHAL/test/BUILD.bazel
@@ -17,7 +17,6 @@
     srcs = enforce_glob(
         [
             "channel_ops.mlir",
-            "cmd_dispatch2_ops.mlir",
             "cmd_ops.mlir",
             "context_ops.mlir",
             "debug_ops.mlir",
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Conversion/StreamToHAL/test/CMakeLists.txt b/compiler/src/iree/compiler/Dialect/HAL/Conversion/StreamToHAL/test/CMakeLists.txt
index 0aeea90..b273190 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Conversion/StreamToHAL/test/CMakeLists.txt
+++ b/compiler/src/iree/compiler/Dialect/HAL/Conversion/StreamToHAL/test/CMakeLists.txt
@@ -15,7 +15,6 @@
     lit
   SRCS
     "channel_ops.mlir"
-    "cmd_dispatch2_ops.mlir"
     "cmd_ops.mlir"
     "context_ops.mlir"
     "debug_ops.mlir"
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Conversion/StreamToHAL/test/cmd_dispatch2_ops.mlir b/compiler/src/iree/compiler/Dialect/HAL/Conversion/StreamToHAL/test/cmd_dispatch2_ops.mlir
deleted file mode 100644
index ce9a4ad..0000000
--- a/compiler/src/iree/compiler/Dialect/HAL/Conversion/StreamToHAL/test/cmd_dispatch2_ops.mlir
+++ /dev/null
@@ -1,114 +0,0 @@
-// RUN: iree-opt --split-input-file --iree-hal-conversion --cse --iree-hal-indirect-command-buffers=true --iree-hal-experimental-dispatch2=true %s | FileCheck %s
-
-#executable_target_aarch64 = #hal.executable.target<"llvm-cpu", "embedded-elf-aarch64">
-#executable_target_x86_64 = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64">
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer, Indirect>
-  ]>
-]>
-hal.executable private @ex {
-  hal.executable.variant public @aarch64 target(#executable_target_aarch64) {
-    hal.executable.condition(%device: !hal.device) -> i1 {
-      %ok, %selected = hal.device.query<%device : !hal.device> key("some" :: "feature") : i1, i1
-      hal.return %selected : i1
-    }
-    hal.executable.export public @dispatch ordinal(0) layout(#pipeline_layout) attributes {
-      translation_info = #iree_codegen.translation_info<CPUDefault>
-    } {
-    ^bb0(%device: !hal.device, %arg0: index, %arg1: index, %arg2: index):  // no predecessors
-      %c1 = arith.constant 1 : index
-      %0 = affine.apply affine_map<()[s0] -> (s0 ceildiv 4)>()[%arg0]
-      hal.return %0, %c1, %c1 : index, index, index
-    }
-    builtin.module {
-      // Opaque at this point (in some target-specific dialects).
-    }
-  }
-  hal.executable.variant public @x86_64 target(#executable_target_x86_64) {
-    hal.executable.export public @dispatch ordinal(0) layout(#pipeline_layout) attributes {
-      translation_info = #iree_codegen.translation_info<CPUDefault>
-    } {
-    ^bb0(%device: !hal.device, %arg0: index, %arg1: index, %arg2: index):  // no predecessors
-      %c1 = arith.constant 1 : index
-      %0 = affine.apply affine_map<()[s0] -> (s0 ceildiv 4)>()[%arg0]
-      hal.return %0, %c1, %c1 : index, index, index
-    }
-    builtin.module {
-      // Opaque at this point (in some target-specific dialects).
-    }
-  }
-}
-
-util.global private @device : !hal.device
-util.global private @constant_resource : !stream.resource<constant>
-util.global private @constant_size : index
-
-// CHECK-LABEL: @cmdDispatch
-//  CHECK-SAME: (%[[ARG_RESOURCE:.+]]: !hal.buffer, %[[ARG_SIZE:.+]]: index)
-util.func public @cmdDispatch(%arg_resource: !stream.resource<external>, %arg_size: index) -> !stream.timepoint {
-  %c0 = arith.constant 0 : index
-  %c1 = arith.constant 1 : index
-  %c2 = arith.constant 2 : index
-  %c3 = arith.constant 3 : index
-  %c4_i32 = arith.constant 4 : i32
-  %c5_i32 = arith.constant 5 : i32
-  %c128 = arith.constant 128 : index
-  // CHECK-DAG: %[[CONSTANT_RESOURCE:.+]] = util.global.load immutable @constant_resource
-  %constant_resource = util.global.load immutable @constant_resource : !stream.resource<constant>
-  %constant_size = util.global.load immutable @constant_size : index
-  // CHECK-DAG: %[[DEVICE:.+]] = util.global.load immutable @device
-  // CHECK: %[[MEMOIZED_CMD:.+]] = hal.device.memoize
-  // CHECK: %[[CMD:.+]] = hal.command_buffer.create
-  %0 = stream.cmd.execute on(#hal.device.affinity<@device>) with(%constant_resource as %constant_capture: !stream.resource<constant>{%constant_size}, %arg_resource as %arg_capture: !stream.resource<external>{%arg_size}) {
-    // Switch for each executable variant by checking conditions and ranking:
-    // CHECK: %[[CMD_DEVICE:.+]] = hal.command_buffer.device<%[[CMD]] : !hal.command_buffer>
-    //  CHECK-DAG: %{{.+}}, %[[AARCH64_FORMAT:.+]] = hal.device.query<%[[CMD_DEVICE]] : !hal.device> key("hal.executable.format" :: "embedded-elf-aarch64")
-    //  CHECK-DAG: %[[AARCH64_FEATURE:.+]] = scf.execute_region -> i1 {
-    // CHECK-NEXT:   %{{.+}}, %[[FEATURE:.+]] = hal.device.query<%[[CMD_DEVICE]] : !hal.device> key("some" :: "feature")
-    // CHECK-NEXT:   scf.yield %[[FEATURE]]
-    // CHECK-NEXT: }
-    //  CHECK-DAG: %[[AARCH64_SELECTED:.+]] = arith.andi %[[AARCH64_FORMAT]], %[[AARCH64_FEATURE]]
-    //  CHECK-DAG: %{{.+}}, %[[X86_64_SELECTED:.+]] = hal.device.query<%[[CMD_DEVICE]] : !hal.device> key("hal.executable.format" :: "embedded-elf-x86_64")
-    // CHECK: %[[VARIANT1:.+]] = arith.select %[[X86_64_SELECTED]], %c1
-    // CHECK: %[[VARIANT0:.+]] = arith.select %[[AARCH64_SELECTED]], %c0, %[[VARIANT1]]
-    // CHECK: scf.index_switch %[[VARIANT0]]
-    // CHECK-NEXT: case 0 {
-
-    // Inlined workgroup count calculation:
-    // CHECK: %[[X:.+]] = affine.apply #map()[%c1]
-
-    // Target executable/export:
-    //  CHECK-DAG: %[[EXECUTABLE_0:.+]] = hal.executable.lookup
-    // CHECK-SAME:     device(%[[CMD_DEVICE]] : !hal.device)
-    // CHECK-SAME:     executable(@ex) : !hal.executable
-    //  CHECK-DAG: %[[ORDINAL_0:.+]] = hal.executable.export.ordinal
-    // CHECK-SAME:     target(@ex::@aarch64::@dispatch) : index
-
-    // Dispatch:
-    // CHECK: hal.command_buffer.dispatch2<%[[CMD]]
-    // CHECK-SAME: target(%[[EXECUTABLE_0]] : !hal.executable)[%[[ORDINAL_0]]]
-    // CHECK-SAME: workgroups([%[[X]], %c1, %c1])
-    // CHECK-SAME: constants([%c4_i32, %c5_i32])
-    // CHECK-SAME: bindings([
-    // CHECK-NEXT:   (%[[CONSTANT_RESOURCE]] : !hal.buffer)[%c0, %c128],
-    // CHECK-NEXT:   (%c0 : index)[%c0, %c128]
-
-    // Other variant, when selected:
-    // CHECK: case 1 {
-    // CHECK-DAG: %[[ORDINAL_1:.+]] = hal.executable.export.ordinal target(@ex::@x86_64::@dispatch)
-    // CHECK: hal.command_buffer.dispatch2<%[[CMD]]
-    // CHECK-SAME: target({{.+}})[%[[ORDINAL_1]]]
-    stream.cmd.dispatch {@ex::@aarch64::@dispatch, @ex::@x86_64::@dispatch}[%c1, %c2, %c3](%c4_i32, %c5_i32 : i32, i32) {
-      ro %constant_capture[%c0 for %c128] : !stream.resource<constant>{%constant_size},
-      wo %arg_capture[%c0 for %c128] : !stream.resource<external>{%arg_size}
-    }
-    // CHECK: hal.command_buffer.execution_barrier<%[[CMD]]
-  } => !stream.timepoint
-  // CHECK-NEXT: hal.command_buffer.finalize<%[[CMD]]
-  //      CHECK: hal.device.queue.execute.indirect<%[[DEVICE]] : !hal.device> {{.+}} commands(%[[MEMOIZED_CMD]]) bindings([
-  // CHECK-NEXT:   (%[[ARG_RESOURCE]] : !hal.buffer)[%c0, %[[ARG_SIZE]]]
-  // CHECK-NEXT: ])
-  util.return %0 : !stream.timepoint
-}
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Conversion/StreamToHAL/test/cmd_ops.mlir b/compiler/src/iree/compiler/Dialect/HAL/Conversion/StreamToHAL/test/cmd_ops.mlir
index 941c15b..ece8202 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Conversion/StreamToHAL/test/cmd_ops.mlir
+++ b/compiler/src/iree/compiler/Dialect/HAL/Conversion/StreamToHAL/test/cmd_ops.mlir
@@ -1,4 +1,4 @@
-// RUN: iree-opt --split-input-file --allow-unregistered-dialect --iree-hal-conversion %s | FileCheck %s
+// RUN: iree-opt --split-input-file --allow-unregistered-dialect --iree-hal-conversion --cse --iree-hal-indirect-command-buffers=true %s | FileCheck %s
 
 // Today all memory control operations are ignored and we're just left with
 // the normal sequential execution barriers.
@@ -32,7 +32,7 @@
   %c128 = arith.constant 128 : index
   %c255_i32 = arith.constant 255 : i32
   // CHECK: %[[CMD:.+]] = hal.command_buffer.create
-  %0 = stream.cmd.execute on(#hal.device.affinity<@device>) with(%arg0 as %arg2: !stream.resource<transient>{%arg1}) {
+  %0 = stream.cmd.execute once on(#hal.device.affinity<@device>) with(%arg0 as %arg2: !stream.resource<transient>{%arg1}) {
     // CHECK-NEXT: hal.command_buffer.fill_buffer<%[[CMD]] : !hal.command_buffer>
     // CHECK-SAME: target(%arg0 : !hal.buffer)[%c0, %c128]
     // CHECK-SAME: pattern(%c255_i32 : i32)
@@ -52,7 +52,7 @@
   %c0 = arith.constant 0 : index
   %c128 = arith.constant 128 : index
   // CHECK: %[[CMD:.+]] = hal.command_buffer.create
-  %0 = stream.cmd.execute on(#hal.device.affinity<@device>) with(%arg0 as %arg4: !stream.resource<transient>{%arg1}, %arg2 as %arg5: !stream.resource<staging>{%arg3}) {
+  %0 = stream.cmd.execute once on(#hal.device.affinity<@device>) with(%arg0 as %arg4: !stream.resource<transient>{%arg1}, %arg2 as %arg5: !stream.resource<staging>{%arg3}) {
     // CHECK-NEXT: hal.command_buffer.copy_buffer<%[[CMD]] : !hal.command_buffer>
     // CHECK-SAME: source(%arg0 : !hal.buffer)[%c0]
     // CHECK-SAME: target(%arg2 : !hal.buffer)[%c0]
@@ -73,7 +73,7 @@
   %c0 = arith.constant 0 : index
   %c128 = arith.constant 128 : index
   // CHECK: %[[CMD:.+]] = hal.command_buffer.create
-  %0 = stream.cmd.execute on(#hal.device.affinity<@device>) with(%arg0 as %arg5: !stream.resource<transient>{%arg1}, %arg2 as %arg6: !stream.resource<transient>{%arg3}) {
+  %0 = stream.cmd.execute once on(#hal.device.affinity<@device>) with(%arg0 as %arg5: !stream.resource<transient>{%arg1}, %arg2 as %arg6: !stream.resource<transient>{%arg3}) {
 
     // Out-of-place all-reduce:
     // CHECK-NEXT: hal.command_buffer.collective
@@ -142,7 +142,7 @@
   %c0 = arith.constant 0 : index
   %c128 = arith.constant 128 : index
   // CHECK: %[[CMD:.+]] = hal.command_buffer.create
-  %0 = stream.cmd.execute on(#hal.device.affinity<@device>) await(%arg4) => with(%arg0 as %arg5: !stream.resource<transient>{%arg1}, %arg2 as %arg6: !stream.resource<staging>{%arg3}) {
+  %0 = stream.cmd.execute once on(#hal.device.affinity<@device>) await(%arg4) => with(%arg0 as %arg5: !stream.resource<transient>{%arg1}, %arg2 as %arg6: !stream.resource<staging>{%arg3}) {
     stream.cmd.concurrent {
       // CHECK-NEXT: hal.command_buffer.copy_buffer<%[[CMD]]
       stream.cmd.copy %arg5[%c0], %arg6[%c0], %c128 : !stream.resource<transient>{%arg1} -> !stream.resource<staging>{%arg3}
@@ -176,13 +176,9 @@
 
 #executable_target_aarch64 = #hal.executable.target<"llvm-cpu", "embedded-elf-aarch64">
 #executable_target_x86_64 = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64">
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<4, storage_buffer>
-  ]>,
-  #hal.descriptor_set.layout<1, bindings = [
-    #hal.descriptor_set.binding<5, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer, Indirect>
 ]>
 hal.executable private @ex {
   hal.executable.variant public @aarch64 target(#executable_target_aarch64) {
@@ -191,10 +187,6 @@
       hal.return %selected : i1
     }
     hal.executable.export public @dispatch ordinal(0) layout(#pipeline_layout) attributes {
-      hal.interface.bindings = [
-        #hal.interface.binding<0, 4>,
-        #hal.interface.binding<1, 5>
-      ],
       translation_info = #iree_codegen.translation_info<CPUDefault>
     } {
     ^bb0(%device: !hal.device, %arg0: index, %arg1: index, %arg2: index):  // no predecessors
@@ -208,10 +200,6 @@
   }
   hal.executable.variant public @x86_64 target(#executable_target_x86_64) {
     hal.executable.export public @dispatch ordinal(0) layout(#pipeline_layout) attributes {
-      hal.interface.bindings = [
-        #hal.interface.binding<0, 4>,
-        #hal.interface.binding<1, 5>
-      ],
       translation_info = #iree_codegen.translation_info<CPUDefault>
     } {
     ^bb0(%device: !hal.device, %arg0: index, %arg1: index, %arg2: index):  // no predecessors
@@ -226,9 +214,12 @@
 }
 
 util.global private @device : !hal.device
+util.global private @constant_resource : !stream.resource<constant>
+util.global private @constant_size : index
 
 // CHECK-LABEL: @cmdDispatch
-util.func public @cmdDispatch(%arg0: !stream.resource<transient>, %arg1: index, %arg2: !stream.resource<external>, %arg3: index) -> !stream.timepoint {
+//  CHECK-SAME: (%[[ARG_RESOURCE:.+]]: !hal.buffer, %[[ARG_SIZE:.+]]: index)
+util.func public @cmdDispatch(%arg_resource: !stream.resource<external>, %arg_size: index) -> !stream.timepoint {
   %c0 = arith.constant 0 : index
   %c1 = arith.constant 1 : index
   %c2 = arith.constant 2 : index
@@ -236,46 +227,33 @@
   %c4_i32 = arith.constant 4 : i32
   %c5_i32 = arith.constant 5 : i32
   %c128 = arith.constant 128 : index
+  // CHECK-DAG: %[[CONSTANT_RESOURCE:.+]] = util.global.load immutable @constant_resource
+  %constant_resource = util.global.load immutable @constant_resource : !stream.resource<constant>
+  %constant_size = util.global.load immutable @constant_size : index
+  // CHECK-DAG: %[[DEVICE:.+]] = util.global.load immutable @device
+  // CHECK: %[[MEMOIZED_CMD:.+]] = hal.device.memoize
   // CHECK: %[[CMD:.+]] = hal.command_buffer.create
-  %0 = stream.cmd.execute on(#hal.device.affinity<@device>) with(%arg0 as %arg4: !stream.resource<transient>{%arg1}, %arg2 as %arg5: !stream.resource<external>{%arg3}) {
+  %0 = stream.cmd.execute on(#hal.device.affinity<@device>) with(%constant_resource as %constant_capture: !stream.resource<constant>{%constant_size}, %arg_resource as %arg_capture: !stream.resource<external>{%arg_size}) {
     // Switch for each executable variant by checking conditions and ranking:
-    // CHECK: %[[DEVICE:.+]] = hal.command_buffer.device<%[[CMD]] : !hal.command_buffer>
-    //  CHECK-DAG: %{{.+}}, %[[AARCH64_FORMAT:.+]] = hal.device.query<%[[DEVICE]] : !hal.device> key("hal.executable.format" :: "embedded-elf-aarch64")
+    // CHECK: %[[CMD_DEVICE:.+]] = hal.command_buffer.device<%[[CMD]] : !hal.command_buffer>
+    //  CHECK-DAG: %{{.+}}, %[[AARCH64_FORMAT:.+]] = hal.device.query<%[[CMD_DEVICE]] : !hal.device> key("hal.executable.format" :: "embedded-elf-aarch64")
     //  CHECK-DAG: %[[AARCH64_FEATURE:.+]] = scf.execute_region -> i1 {
-    // CHECK-NEXT:   %{{.+}}, %[[FEATURE:.+]] = hal.device.query<%[[DEVICE]] : !hal.device> key("some" :: "feature")
+    // CHECK-NEXT:   %{{.+}}, %[[FEATURE:.+]] = hal.device.query<%[[CMD_DEVICE]] : !hal.device> key("some" :: "feature")
     // CHECK-NEXT:   scf.yield %[[FEATURE]]
     // CHECK-NEXT: }
     //  CHECK-DAG: %[[AARCH64_SELECTED:.+]] = arith.andi %[[AARCH64_FORMAT]], %[[AARCH64_FEATURE]]
-    //  CHECK-DAG: %{{.+}}, %[[X86_64_SELECTED:.+]] = hal.device.query<%[[DEVICE]] : !hal.device> key("hal.executable.format" :: "embedded-elf-x86_64")
+    //  CHECK-DAG: %{{.+}}, %[[X86_64_SELECTED:.+]] = hal.device.query<%[[CMD_DEVICE]] : !hal.device> key("hal.executable.format" :: "embedded-elf-x86_64")
     // CHECK: %[[VARIANT1:.+]] = arith.select %[[X86_64_SELECTED]], %c1
-    // CHECK: %[[VARIANT0:.+]] = arith.select %[[AARCH64_SELECTED]], %c0{{.+}}, %[[VARIANT1]]
+    // CHECK: %[[VARIANT0:.+]] = arith.select %[[AARCH64_SELECTED]], %c0, %[[VARIANT1]]
     // CHECK: scf.index_switch %[[VARIANT0]]
     // CHECK-NEXT: case 0 {
 
-    // Cache queries:
-    //  CHECK-DAG:   %[[LAYOUT:.+]] = hal.pipeline_layout.lookup {{.+}} layout(#pipeline_layout)
-
-    // Push constants:
-    //  CHECK-DAG:   hal.command_buffer.push_constants<%[[CMD]]
-    // CHECK-SAME:       layout(%[[LAYOUT]] : !hal.pipeline_layout)
-    // CHECK-SAME:       offset(0)
-    // CHECK-SAME:       values([%c4_i32, %c5_i32]) : i32, i32
-
-    // Descriptor sets:
-    //  CHECK-DAG:   hal.command_buffer.push_descriptor_set<%[[CMD]]
-    // CHECK-SAME:       layout(%[[LAYOUT]] : !hal.pipeline_layout)[%c0
-    // CHECK-NEXT:     %c4 = (%arg0 : !hal.buffer)[%c0, %c128]
-    //  CHECK-DAG:   hal.command_buffer.push_descriptor_set<%[[CMD]]
-    // CHECK-SAME:       layout(%[[LAYOUT]] : !hal.pipeline_layout)[%c1
-    // CHECK-NEXT:     %c5 = (%arg2 : !hal.buffer)[%c0, %c128]
-
     // Inlined workgroup count calculation:
-    // CHECK: %[[YZ:.+]] = arith.constant 1 : index
-    // CHECK-NEXT: %[[X:.+]] = affine.apply #map()[%c1]
+    // CHECK: %[[X:.+]] = affine.apply #map()[%c1]
 
     // Target executable/export:
     //  CHECK-DAG: %[[EXECUTABLE_0:.+]] = hal.executable.lookup
-    // CHECK-SAME:     device(%[[DEVICE]] : !hal.device)
+    // CHECK-SAME:     device(%[[CMD_DEVICE]] : !hal.device)
     // CHECK-SAME:     executable(@ex) : !hal.executable
     //  CHECK-DAG: %[[ORDINAL_0:.+]] = hal.executable.export.ordinal
     // CHECK-SAME:     target(@ex::@aarch64::@dispatch) : index
@@ -283,7 +261,11 @@
     // Dispatch:
     // CHECK: hal.command_buffer.dispatch<%[[CMD]]
     // CHECK-SAME: target(%[[EXECUTABLE_0]] : !hal.executable)[%[[ORDINAL_0]]]
-    // CHECK-SAME: workgroups([%[[X]], %[[YZ]], %[[YZ]]])
+    // CHECK-SAME: workgroups([%[[X]], %c1, %c1])
+    // CHECK-SAME: constants([%c4_i32, %c5_i32])
+    // CHECK-SAME: bindings([
+    // CHECK-NEXT:   (%[[CONSTANT_RESOURCE]] : !hal.buffer)[%c0, %c128],
+    // CHECK-NEXT:   (%c0 : index)[%c0, %c128]
 
     // Other variant, when selected:
     // CHECK: case 1 {
@@ -291,12 +273,15 @@
     // CHECK: hal.command_buffer.dispatch<%[[CMD]]
     // CHECK-SAME: target({{.+}})[%[[ORDINAL_1]]]
     stream.cmd.dispatch {@ex::@aarch64::@dispatch, @ex::@x86_64::@dispatch}[%c1, %c2, %c3](%c4_i32, %c5_i32 : i32, i32) {
-      ro %arg4[%c0 for %c128] : !stream.resource<transient>{%arg1},
-      wo %arg5[%c0 for %c128] : !stream.resource<external>{%arg3}
+      ro %constant_capture[%c0 for %c128] : !stream.resource<constant>{%constant_size},
+      wo %arg_capture[%c0 for %c128] : !stream.resource<external>{%arg_size}
     }
     // CHECK: hal.command_buffer.execution_barrier<%[[CMD]]
   } => !stream.timepoint
   // CHECK-NEXT: hal.command_buffer.finalize<%[[CMD]]
+  //      CHECK: hal.device.queue.execute.indirect<%[[DEVICE]] : !hal.device> {{.+}} commands(%[[MEMOIZED_CMD]]) bindings([
+  // CHECK-NEXT:   (%[[ARG_RESOURCE]] : !hal.buffer)[%c0, %[[ARG_SIZE]]]
+  // CHECK-NEXT: ])
   util.return %0 : !stream.timepoint
 }
 
@@ -341,7 +326,7 @@
   %c0 = arith.constant 0 : index
   %c128 = arith.constant 128 : index
   // CHECK: %[[CMD:.+]] = hal.command_buffer.create
-  %0 = stream.cmd.execute on(#hal.device.affinity<@device, [0, 1]>) await(%arg4) => with(%arg0 as %arg5: !stream.resource<transient>{%arg1}, %arg2 as %arg6: !stream.resource<staging>{%arg3}) {
+  %0 = stream.cmd.execute once on(#hal.device.affinity<@device, [0, 1]>) await(%arg4) => with(%arg0 as %arg5: !stream.resource<transient>{%arg1}, %arg2 as %arg6: !stream.resource<staging>{%arg3}) {
     stream.cmd.copy %arg5[%c0], %arg6[%c0], %c128 : !stream.resource<transient>{%arg1} -> !stream.resource<staging>{%arg3}
   } => !stream.timepoint
   // CHECK: hal.device.queue.execute
diff --git a/compiler/src/iree/compiler/Dialect/HAL/IR/HALAttrs.cpp b/compiler/src/iree/compiler/Dialect/HAL/IR/HALAttrs.cpp
index 4869771..95cf53e 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/IR/HALAttrs.cpp
+++ b/compiler/src/iree/compiler/Dialect/HAL/IR/HALAttrs.cpp
@@ -86,45 +86,13 @@
 }
 
 //===----------------------------------------------------------------------===//
-// hal.descriptor_set.layout<*>
+// #hal.pipeline.layout<*>
 //===----------------------------------------------------------------------===//
 
-DescriptorSetBindingAttr
-DescriptorSetLayoutAttr::getBinding(int64_t ordinal) const {
-  for (auto binding : getBindings()) {
-    if (binding.getOrdinal() == ordinal) {
-      return binding;
-    }
-  }
-  return {};
-}
-
-//===----------------------------------------------------------------------===//
-// hal.pipeline.layout<*>
-//===----------------------------------------------------------------------===//
-
-DescriptorSetLayoutAttr
-PipelineLayoutAttr::getSetLayout(int64_t ordinal) const {
-  for (auto setLayout : getSetLayouts()) {
-    if (setLayout.getOrdinal() == ordinal) {
-      return setLayout;
-    }
-  }
-  return {};
-}
-
-int64_t PipelineLayoutAttr::getFlatBindingIndex(int64_t set,
-                                                int64_t binding) const {
-  int64_t flatIndex = 0;
-  for (auto setLayoutAttr : getSetLayouts()) {
-    if (setLayoutAttr.getOrdinal() == set) {
-      flatIndex += binding;
-      break;
-    } else {
-      flatIndex += setLayoutAttr.getBindings().size();
-    }
-  }
-  return flatIndex;
+PipelineBindingAttr PipelineLayoutAttr::getBinding(int64_t ordinal) const {
+  assert(ordinal >= 0 && ordinal < getBindings().size() &&
+         "binding ordinal out of bounds");
+  return getBindings()[ordinal];
 }
 
 //===----------------------------------------------------------------------===//
diff --git a/compiler/src/iree/compiler/Dialect/HAL/IR/HALAttrs.td b/compiler/src/iree/compiler/Dialect/HAL/IR/HALAttrs.td
index 78fdad3..976a4ca 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/IR/HALAttrs.td
+++ b/compiler/src/iree/compiler/Dialect/HAL/IR/HALAttrs.td
@@ -178,12 +178,12 @@
   let cppNamespace = "::mlir::iree_compiler::IREE::HAL";
 }
 
-def HAL_DescriptorSetLayoutFlags_None : I32BitEnumAttrCase<"None", 0x0000>;
-def HAL_DescriptorSetLayoutFlags_Indirect : I32BitEnumAttrCase<"Indirect", 0x0001>;
-def HAL_DescriptorSetLayoutFlagsAttr :
-    I32BitEnumAttr<"DescriptorSetLayoutFlags", "valid DescriptorSetLayout flags", [
-      HAL_DescriptorSetLayoutFlags_None,
-      HAL_DescriptorSetLayoutFlags_Indirect,
+def HAL_PipelineLayoutFlags_None : I32BitEnumAttrCase<"None", 0x0000>;
+def HAL_PipelineLayoutFlags_Indirect : I32BitEnumAttrCase<"Indirect", 0x0001>;
+def HAL_PipelineLayoutFlagsAttr :
+    I32BitEnumAttr<"PipelineLayoutFlags", "valid PipelineLayout flags", [
+      HAL_PipelineLayoutFlags_None,
+      HAL_PipelineLayoutFlags_Indirect,
     ]> {
   let cppNamespace = "::mlir::iree_compiler::IREE::HAL";
 }
@@ -376,57 +376,22 @@
 }
 
 //===----------------------------------------------------------------------===//
-// hal.descriptor_set.binding<*>
+// hal.pipeline.binding<*>
 //===----------------------------------------------------------------------===//
 
-def HAL_DescriptorSetBindingAttr :
-    AttrDef<HAL_Dialect, "DescriptorSetBinding", []> {
-  let mnemonic = "descriptor_set.binding";
-  let summary = [{descriptor set binding specification}];
+def HAL_PipelineBindingAttr :
+    AttrDef<HAL_Dialect, "PipelineBinding", []> {
+  let mnemonic = "pipeline.binding";
+  let summary = [{pipeline binding specification}];
   let description = [{
-    Specifies a single binding within a descriptor set layout.
+    Specifies a single binding within a pipeline layout.
   }];
   let parameters = (ins
-    AttrParameter<"int64_t", "">:$ordinal,
     AttrParameter<"DescriptorType", "">:$type,
     OptionalParameter<"DescriptorFlags", "DescriptorFlags::None">:$flags
   );
   let assemblyFormat = [{
-    `<` $ordinal `,` $type (`,` $flags^)? `>`
-  }];
-}
-
-def HAL_DescriptorSetLayoutBindingArrayAttr :
-    TypedArrayAttrBase<HAL_DescriptorSetBindingAttr,
-                       "HAL descriptor set layout binding array attribute">;
-
-//===----------------------------------------------------------------------===//
-// hal.descriptor_set.layout<*>
-//===----------------------------------------------------------------------===//
-
-def HAL_DescriptorSetLayoutAttr :
-    AttrDef<HAL_Dialect, "DescriptorSetLayout", []> {
-  let mnemonic = "descriptor_set.layout";
-  let summary = [{descriptor set layout specification}];
-  let description = [{
-    Specifies the layout information of a single set of descriptors used within
-    an pipeline layout. Multiple of these sets may be used by a single entry
-    point to allow for bindings with similar update frequencies to be grouped.
-  }];
-  let parameters = (ins
-    AttrParameter<"int64_t", "">:$ordinal,
-    ArrayRefParameter<"DescriptorSetBindingAttr", "">:$bindings,
-    OptionalParameter<"std::optional<DescriptorSetLayoutFlags>">:$flags
-  );
-  let assemblyFormat = [{
-    `<`
-    $ordinal `,`
-    `bindings` `=` `[` $bindings `]`
-    (`,` `flags` `=` $flags^)?
-    `>`
-  }];
-  let extraClassDeclaration = [{
-    DescriptorSetBindingAttr getBinding(int64_t ordinal) const;
+    `<` $type (`,` $flags^)? `>`
   }];
 }
 
@@ -444,56 +409,26 @@
     lower-level target-specific argument passing behavior.
   }];
   let parameters = (ins
-    AttrParameter<"int64_t", "">:$pushConstants,
-    ArrayRefParameter<"DescriptorSetLayoutAttr", "">:$setLayouts
+    ArrayRefParameter<"PipelineBindingAttr", "">:$bindings,
+    OptionalParameter<"int64_t", "0">:$constants,
+    OptionalParameter<"std::optional<PipelineLayoutFlags>">:$flags
   );
   let assemblyFormat = [{
     `<`
-    `push_constants` `=` $pushConstants `,`
-    `sets` `=` `[` $setLayouts `]`
+    (`constants` `=` $constants^ `,` ` `)?
+    `bindings` `=` `[` qualified($bindings) `]`
+    (`,` `flags` `=` $flags^)?
     `>`
   }];
   let extraClassDeclaration = [{
-    DescriptorSetLayoutAttr getSetLayout(int64_t ordinal) const;
-
-    // Returns the binding index in a flattened list of all sets and bindings.
-    // For example, if the layout is [set(bindings[4]), set(bindings[2])] then
-    // a query for set 1 binding 0 would return 4.
-    int64_t getFlatBindingIndex(int64_t set, int64_t binding) const;
+    IREE::HAL::PipelineBindingAttr getBinding(int64_t ordinal) const;
+    IREE::HAL::PipelineBindingAttr getBinding(APInt ordinal) const {
+      return getBinding(ordinal.getSExtValue());
+    }
   }];
 }
 
 //===----------------------------------------------------------------------===//
-// hal.interface.binding<*>
-//===----------------------------------------------------------------------===//
-
-def HAL_InterfaceBindingAttr :
-    AttrDef<HAL_Dialect, "InterfaceBinding", []> {
-  let mnemonic = "interface.binding";
-  let summary = [{interface binding specification}];
-  let description = [{
-    Specifies the descriptor set and binding ordinal of a particular layout
-    binding.
-
-    Example:
-    ```mlir
-    #hal.interface.binding<0, 1>
-    ```
-  }];
-  let parameters = (ins
-    AttrParameter<"int64_t", "">:$set,
-    AttrParameter<"int64_t", "">:$binding
-  );
-  let assemblyFormat = [{
-    `<` $set `,` $binding `>`
-  }];
-}
-
-def HAL_InterfaceBindingArrayAttr :
-    TypedArrayAttrBase<HAL_InterfaceBindingAttr,
-                       "HAL binding array attribute">;
-
-//===----------------------------------------------------------------------===//
 // #hal.executable.target<*>
 //===----------------------------------------------------------------------===//
 
@@ -744,7 +679,7 @@
 
     Example:
     ```mlir
-    #hal.device.target<"llvm-cpu", {
+    #hal.device.target<"local", {
       device_configuration = ...
     }, [
       #hal.executable.target<"llvm-cpu", "embedded-elf-arm_32">,
diff --git a/compiler/src/iree/compiler/Dialect/HAL/IR/HALBase.td b/compiler/src/iree/compiler/Dialect/HAL/IR/HALBase.td
index 4dd2be8..2b2f23c 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/IR/HALBase.td
+++ b/compiler/src/iree/compiler/Dialect/HAL/IR/HALBase.td
@@ -77,16 +77,6 @@
   let builderCall = "$_builder.getType<IREE::HAL::CommandBufferType>()";
 }
 
-def HAL_DescriptorSetLayout : DialectType<
-    HAL_Dialect,
-    CPred<"isa<IREE::HAL::DescriptorSetLayoutType>($_self)">,
-    "descriptor_set_layout"> {
-  let description = [{
-    Descriptor set layout.
-  }];
-  let builderCall = "$_builder.getType<IREE::HAL::DescriptorSetLayoutType>()";
-}
-
 def HAL_Device : DialectType<
     HAL_Dialect,
     CPred<"isa<IREE::HAL::DeviceType>($_self)">,
@@ -140,28 +130,16 @@
   let builderCall = "$_builder.getType<IREE::HAL::FileType>()";
 }
 
-def HAL_PipelineLayout : DialectType<
-    HAL_Dialect,
-    CPred<"isa<IREE::HAL::PipelineLayoutType>($_self)">,
-    "pipeline_layout"> {
-  let description = [{
-    A pipeline layout describing the descriptor sets and push constants used.
-  }];
-  let builderCall = "$_builder.getType<IREE::HAL::PipelineLayoutType>()";
-}
-
 def HAL_ObjectType : AnyTypeOf<[
   HAL_Allocator,
   HAL_Buffer,
   HAL_BufferView,
   HAL_CommandBuffer,
-  HAL_DescriptorSetLayout,
   HAL_Device,
   HAL_Event,
   HAL_Executable,
   HAL_Fence,
   HAL_File,
-  HAL_PipelineLayout,
 ]>;
 
 def HAL_BufferType : AnyTypeOf<[
diff --git a/compiler/src/iree/compiler/Dialect/HAL/IR/HALDialect.cpp b/compiler/src/iree/compiler/Dialect/HAL/IR/HALDialect.cpp
index 811789c..00c2c6e 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/IR/HALDialect.cpp
+++ b/compiler/src/iree/compiler/Dialect/HAL/IR/HALDialect.cpp
@@ -100,9 +100,7 @@
     MLIRContext *context = attr.getContext();
     // TODO(benvanik): remove this interface or make it an attr interface.
     if (auto bindingAttr =
-            llvm::dyn_cast<IREE::HAL::DescriptorSetBindingAttr>(attr)) {
-      fn(IntegerAttr::get(IndexType::get(context),
-                          APInt(64, bindingAttr.getOrdinal())));
+            llvm::dyn_cast<IREE::HAL::PipelineBindingAttr>(attr)) {
       fn(IREE::HAL::DescriptorTypeAttr::get(context, bindingAttr.getType()));
       fn(IREE::HAL::DescriptorFlagsAttr::get(context, bindingAttr.getFlags()));
       return success();
diff --git a/compiler/src/iree/compiler/Dialect/HAL/IR/HALOpFolders.cpp b/compiler/src/iree/compiler/Dialect/HAL/IR/HALOpFolders.cpp
index bf596ce..45dbb15 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/IR/HALOpFolders.cpp
+++ b/compiler/src/iree/compiler/Dialect/HAL/IR/HALOpFolders.cpp
@@ -188,7 +188,7 @@
 /// the same scope.
 struct SkipCommandBufferDeviceOp
     : public OpRewritePattern<CommandBufferDeviceOp> {
-  using OpRewritePattern<CommandBufferDeviceOp>::OpRewritePattern;
+  using OpRewritePattern::OpRewritePattern;
 
   LogicalResult matchAndRewrite(CommandBufferDeviceOp op,
                                 PatternRewriter &rewriter) const override {
@@ -342,13 +342,13 @@
 
 namespace {
 
-/// Folds hal.buffer.subspans into push descriptor bindings.
+/// Folds hal.buffer.subspans into dispatch bindings.
 /// The binding range is always equal to or a subset of the subspan.
-struct FoldCommandBufferPushDescriptorSetBufferSubspan
-    : public OpRewritePattern<CommandBufferPushDescriptorSetOp> {
-  using OpRewritePattern::OpRewritePattern;
+template <typename OpT>
+struct FoldCommandBufferDispatchBufferSubspan : public OpRewritePattern<OpT> {
+  using OpRewritePattern<OpT>::OpRewritePattern;
 
-  LogicalResult matchAndRewrite(CommandBufferPushDescriptorSetOp op,
+  LogicalResult matchAndRewrite(OpT op,
                                 PatternRewriter &rewriter) const override {
     auto ip = rewriter.saveInsertionPoint();
     rewriter.setInsertionPoint(op);
@@ -383,9 +383,51 @@
 
 } // namespace
 
-void CommandBufferPushDescriptorSetOp::getCanonicalizationPatterns(
+void CommandBufferDispatchOp::getCanonicalizationPatterns(
     RewritePatternSet &results, MLIRContext *context) {
-  results.insert<FoldCommandBufferPushDescriptorSetBufferSubspan>(context);
+  results
+      .insert<FoldCommandBufferDispatchBufferSubspan<CommandBufferDispatchOp>>(
+          context);
+}
+
+namespace {
+
+/// Folds hal.buffer.subspans into the indirect dispatch workgroup count.
+/// The binding range is always equal to or a subset of the subspan.
+struct FoldCommandBufferDispatchIndirectBufferSubspan
+    : public OpRewritePattern<CommandBufferDispatchIndirectOp> {
+  using OpRewritePattern::OpRewritePattern;
+
+  LogicalResult matchAndRewrite(CommandBufferDispatchIndirectOp op,
+                                PatternRewriter &rewriter) const override {
+    Value workgroupsBuffer = op.getWorkgroupsBuffer();
+    auto *definingOp = workgroupsBuffer.getDefiningOp();
+    if (!definingOp)
+      return failure();
+    Value workgroupsOffset = op.getWorkgroupsOffset();
+    if (auto subspanOp = dyn_cast<IREE::HAL::BufferSubspanOp>(definingOp)) {
+      workgroupsBuffer = subspanOp.getSourceBuffer();
+      workgroupsOffset = rewriter.createOrFold<arith::AddIOp>(
+          subspanOp.getLoc(), subspanOp.getSourceOffset(), workgroupsOffset);
+    } else {
+      return failure();
+    }
+    rewriter.modifyOpInPlace(op, [&]() {
+      op.getWorkgroupsBufferMutable().set(workgroupsBuffer);
+      op.getWorkgroupsOffsetMutable().set(workgroupsOffset);
+    });
+    return success();
+  }
+};
+
+} // namespace
+
+void CommandBufferDispatchIndirectOp::getCanonicalizationPatterns(
+    RewritePatternSet &results, MLIRContext *context) {
+  results.insert<FoldCommandBufferDispatchIndirectBufferSubspan>(context);
+  results.insert<
+      FoldCommandBufferDispatchBufferSubspan<CommandBufferDispatchIndirectOp>>(
+      context);
 }
 
 //===----------------------------------------------------------------------===//
diff --git a/compiler/src/iree/compiler/Dialect/HAL/IR/HALOps.cpp b/compiler/src/iree/compiler/Dialect/HAL/IR/HALOps.cpp
index 6b787a7..4485011 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/IR/HALOps.cpp
+++ b/compiler/src/iree/compiler/Dialect/HAL/IR/HALOps.cpp
@@ -47,14 +47,14 @@
 }
 
 //===----------------------------------------------------------------------===//
-// custom<DescriptorSetBindings>($binding_ordinals,
+// custom<PipelineBindings>($binding_ordinals,
 //                               $binding_buffers,
 //                               type($binding_buffers),
 //                               $binding_offsets,
 //                               $binding_lengths)
 //===----------------------------------------------------------------------===//
 
-static ParseResult parseDescriptorSetBindings(
+static ParseResult parsePipelineBindings(
     OpAsmParser &parser,
     SmallVectorImpl<OpAsmParser::UnresolvedOperand> &ordinals,
     SmallVectorImpl<OpAsmParser::UnresolvedOperand> &buffers,
@@ -86,11 +86,11 @@
   return success();
 }
 
-static void printDescriptorSetBindings(OpAsmPrinter &p, Operation *op,
-                                       ValueRange ordinals, ValueRange buffers,
-                                       TypeRange bufferTypes,
-                                       ValueRange bufferOffsets,
-                                       ValueRange bufferLengths) {
+static void printPipelineBindings(OpAsmPrinter &p, Operation *op,
+                                  ValueRange ordinals, ValueRange buffers,
+                                  TypeRange bufferTypes,
+                                  ValueRange bufferOffsets,
+                                  ValueRange bufferLengths) {
   llvm::interleaveComma(llvm::zip_equal(ordinals, buffers, bufferTypes,
                                         bufferOffsets, bufferLengths),
                         p,
@@ -1076,49 +1076,15 @@
 }
 
 //===----------------------------------------------------------------------===//
-// hal.command_buffer.push_descriptor_set
+// hal.command_buffer.dispatch + .indirect
 //===----------------------------------------------------------------------===//
 
-void CommandBufferPushDescriptorSetOp::build(
-    OpBuilder &builder, OperationState &state, Value commandBuffer,
-    Value pipelineLayout, int64_t set,
-    ArrayRef<DescriptorSetBindingValue> bindings) {
-  build(builder, state, commandBuffer, pipelineLayout,
-        builder.createOrFold<arith::ConstantIndexOp>(state.location, set),
-        bindings);
-}
-
-void CommandBufferPushDescriptorSetOp::build(
-    OpBuilder &builder, OperationState &state, Value commandBuffer,
-    Value pipelineLayout, Value set,
-    ArrayRef<DescriptorSetBindingValue> bindings) {
-  state.addOperands({commandBuffer, pipelineLayout, set});
-  SmallVector<Value> bindingOrdinals;
-  SmallVector<Value> bindingBuffers;
-  SmallVector<Value> bindingOffsets;
-  SmallVector<Value> bindingLengths;
-  for (auto binding : bindings) {
-    bindingOrdinals.push_back(binding.ordinal);
-    bindingBuffers.push_back(binding.buffer);
-    bindingOffsets.push_back(binding.byteOffset);
-    bindingLengths.push_back(binding.byteLength);
-  }
-  state.addOperands(bindingOrdinals);
-  state.addOperands(bindingBuffers);
-  state.addOperands(bindingOffsets);
-  state.addOperands(bindingLengths);
-}
-
-//===----------------------------------------------------------------------===//
-// hal.command_buffer.dispatch2 + .indirect
-//===----------------------------------------------------------------------===//
-
-void CommandBufferDispatch2Op::build(OpBuilder &builder, OperationState &state,
-                                     Value commandBuffer, Value executable,
-                                     Value entryPoint, ValueRange workgroups,
-                                     ValueRange constants,
-                                     ArrayRef<BindingValue> bindings,
-                                     IREE::HAL::DispatchFlags flags) {
+void CommandBufferDispatchOp::build(OpBuilder &builder, OperationState &state,
+                                    Value commandBuffer, Value executable,
+                                    Value entryPoint, ValueRange workgroups,
+                                    ValueRange constants,
+                                    ArrayRef<BindingValue> bindings,
+                                    IREE::HAL::DispatchFlags flags) {
   state.addOperands({commandBuffer, executable, entryPoint});
   state.addOperands(workgroups);
   state.addOperands(constants);
@@ -1150,7 +1116,7 @@
                      }));
 }
 
-void CommandBufferDispatch2IndirectOp::build(
+void CommandBufferDispatchIndirectOp::build(
     OpBuilder &builder, OperationState &state, Value commandBuffer,
     Value executable, Value entryPoint, Value workgroupsBuffer,
     Value workgroupsOffset, ValueRange constants,
@@ -1185,10 +1151,10 @@
                      }));
 }
 
-static LogicalResult verifyDispatch2Bindings(Operation *op,
-                                             ValueRange bindingBuffers,
-                                             ValueRange bindingOffsets,
-                                             ValueRange bindingLengths) {
+static LogicalResult verifyDispatchBindings(Operation *op,
+                                            ValueRange bindingBuffers,
+                                            ValueRange bindingOffsets,
+                                            ValueRange bindingLengths) {
   if (bindingBuffers.size() != bindingOffsets.size() ||
       bindingBuffers.size() != bindingLengths.size()) {
     return op->emitOpError() << "requires that binding fields all have the "
@@ -1197,27 +1163,16 @@
   return success();
 }
 
-LogicalResult CommandBufferDispatch2Op::verify() {
-  CommandBufferDispatch2Op op = *this;
-  return verifyDispatch2Bindings(op, op.getBindingBuffers(),
-                                 op.getBindingOffsets(),
-                                 op.getBindingLengths());
+LogicalResult CommandBufferDispatchOp::verify() {
+  CommandBufferDispatchOp op = *this;
+  return verifyDispatchBindings(op, op.getBindingBuffers(),
+                                op.getBindingOffsets(), op.getBindingLengths());
 }
 
-LogicalResult CommandBufferDispatch2IndirectOp::verify() {
-  CommandBufferDispatch2IndirectOp op = *this;
-  return verifyDispatch2Bindings(op, op.getBindingBuffers(),
-                                 op.getBindingOffsets(),
-                                 op.getBindingLengths());
-}
-
-//===----------------------------------------------------------------------===//
-// hal.descriptor_set_layout.create
-//===----------------------------------------------------------------------===//
-
-void DescriptorSetLayoutCreateOp::getAsmResultNames(
-    function_ref<void(Value, StringRef)> setNameFn) {
-  setNameFn(getResult(), "descriptor_set_layout");
+LogicalResult CommandBufferDispatchIndirectOp::verify() {
+  CommandBufferDispatchIndirectOp op = *this;
+  return verifyDispatchBindings(op, op.getBindingBuffers(),
+                                op.getBindingOffsets(), op.getBindingLengths());
 }
 
 //===----------------------------------------------------------------------===//
@@ -1902,16 +1857,6 @@
 void ExecutableCreateOp::getAsmResultNames(
     function_ref<void(Value, StringRef)> setNameFn) {
   // TODO(benvanik): name after sanitized symbol.
-  setNameFn(getResult(), StringRef("exe"));
-}
-
-//===----------------------------------------------------------------------===//
-// hal.executable.create2
-//===----------------------------------------------------------------------===//
-
-void ExecutableCreate2Op::getAsmResultNames(
-    function_ref<void(Value, StringRef)> setNameFn) {
-  // TODO(benvanik): name after sanitized symbol.
   setNameFn(getResult(), StringRef("executable"));
 }
 
@@ -1942,7 +1887,7 @@
 LogicalResult InterfaceConstantLoadOp::verify() {
   InterfaceConstantLoadOp op = *this;
   auto layoutAttr = op.getLayout();
-  if (op.getOrdinal().getZExtValue() >= layoutAttr.getPushConstants()) {
+  if (op.getOrdinal().getZExtValue() >= layoutAttr.getConstants()) {
     return op.emitOpError("push constant ordinal out of bounds");
   }
   return success();
@@ -1952,18 +1897,20 @@
 // hal.interface.binding.subspan
 //===----------------------------------------------------------------------===//
 
-void InterfaceBindingSubspanOp::build(
-    OpBuilder &builder, OperationState &result, Type resultType,
-    IREE::HAL::PipelineLayoutAttr layout, APInt set, APInt binding,
-    Value byte_offset, ValueRange dynamic_dims, IntegerAttr alignment,
-    std::optional<DescriptorFlags> flags) {
+void InterfaceBindingSubspanOp::build(OpBuilder &builder,
+                                      OperationState &result, Type resultType,
+                                      IREE::HAL::PipelineLayoutAttr layout,
+                                      APInt binding, Value byte_offset,
+                                      ValueRange dynamic_dims,
+                                      IntegerAttr alignment,
+                                      std::optional<DescriptorFlags> flags) {
   IREE::HAL::DescriptorFlagsAttr descriptorAttr;
   if (flags.has_value()) {
     descriptorAttr = IREE::HAL::DescriptorFlagsAttr::get(builder.getContext(),
                                                          flags.value());
   }
-  build(builder, result, resultType, layout, set, binding, byte_offset,
-        dynamic_dims, alignment, descriptorAttr);
+  build(builder, result, resultType, layout, binding, byte_offset, dynamic_dims,
+        alignment, descriptorAttr);
 }
 
 LogicalResult InterfaceBindingSubspanOp::verify() {
@@ -1976,58 +1923,24 @@
              << " associated dimension SSA values";
     }
   }
-  int64_t set = op.getSet().getSExtValue();
-  int64_t binding = op.getBinding().getSExtValue();
-  bool foundSet = false;
-  bool foundBinding = false;
-  for (auto setLayoutAttr : op.getLayout().getSetLayouts()) {
-    if (setLayoutAttr.getOrdinal() == set) {
-      foundSet = true;
-      for (auto bindingAttr : setLayoutAttr.getBindings()) {
-        if (bindingAttr.getOrdinal() == binding) {
-          foundBinding = true;
-          break;
-        }
-      }
-    }
-  }
-  if (!foundSet) {
-    return op.emitOpError("set ordinal ")
-           << set << " not present in pipeline layout";
-  } else if (!foundBinding) {
+  uint64_t binding = op.getBinding().getZExtValue();
+  if (binding >= op.getLayout().getBindings().size()) {
     return op.emitOpError("binding ordinal ")
-           << binding << " not present in descriptor set layout";
+           << binding << " out of bounds in layout " << op.getLayout();
   }
   return success();
 }
 
-IREE::HAL::DescriptorSetBindingAttr
-InterfaceBindingSubspanOp::getDescriptorSetBindingAttr() {
-  int64_t set = getSet().getSExtValue();
-  int64_t binding = getBinding().getSExtValue();
-  for (auto setLayoutAttr : getLayout().getSetLayouts()) {
-    if (setLayoutAttr.getOrdinal() == set) {
-      for (auto bindingAttr : setLayoutAttr.getBindings()) {
-        if (bindingAttr.getOrdinal() == binding) {
-          return bindingAttr;
-        }
-      }
-    }
-  }
-  return {};
+IREE::HAL::PipelineBindingAttr
+InterfaceBindingSubspanOp::getPipelineBindingAttr() {
+  return getLayout().getBinding(getBinding());
 }
 
 IREE::HAL::DescriptorType InterfaceBindingSubspanOp::getDescriptorType() {
-  auto bindingAttr = getDescriptorSetBindingAttr();
+  auto bindingAttr = getPipelineBindingAttr();
   return bindingAttr.getType();
 }
 
-int64_t InterfaceBindingSubspanOp::getFlatBindingIndex() {
-  int64_t set = getSet().getSExtValue();
-  int64_t binding = getBinding().getSExtValue();
-  return getLayout().getFlatBindingIndex(set, binding);
-}
-
 llvm::MaybeAlign InterfaceBindingSubspanOp::getBaseAlignment() {
   if (auto baseAlignmentInt = getAlignment()) {
     return llvm::MaybeAlign(baseAlignmentInt.value().getZExtValue());
@@ -2108,24 +2021,6 @@
 }
 
 //===----------------------------------------------------------------------===//
-// hal.pipeline_layout.create
-//===----------------------------------------------------------------------===//
-
-void PipelineLayoutCreateOp::getAsmResultNames(
-    function_ref<void(Value, StringRef)> setNameFn) {
-  setNameFn(getResult(), "pipeline_layout");
-}
-
-//===----------------------------------------------------------------------===//
-// hal.pipeline_layout.lookup
-//===----------------------------------------------------------------------===//
-
-void PipelineLayoutLookupOp::getAsmResultNames(
-    function_ref<void(Value, StringRef)> setNameFn) {
-  setNameFn(getResult(), "pipeline_layout");
-}
-
-//===----------------------------------------------------------------------===//
 // hal.fence.*
 //===----------------------------------------------------------------------===//
 
diff --git a/compiler/src/iree/compiler/Dialect/HAL/IR/HALOps.td b/compiler/src/iree/compiler/Dialect/HAL/IR/HALOps.td
index a6fe1f2..fdd43b7 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/IR/HALOps.td
+++ b/compiler/src/iree/compiler/Dialect/HAL/IR/HALOps.td
@@ -429,7 +429,6 @@
     OptionalAttr<HAL_WorkgroupSizeAttr>:$workgroup_size,
     OptionalAttr<HAL_SubgroupSizeAttr>:$subgroup_size,
     OptionalAttr<IndexAttr>:$workgroup_local_memory,
-    OptionalAttr<HAL_InterfaceBindingArrayAttr>:$bindings,
     OptionalAttr<Util_TiedOpStorageAttr>:$tied_operands
   );
   let results = (outs
@@ -451,7 +450,6 @@
                                $tied_operands)
     `count` `` custom<WorkgroupCountRegion>($workgroup_count)
     `layout` `(` $layout `)`
-    (`bindings` `(` $bindings^ `)`)?
     `objects` `(` `{` custom<TargetConditionObjects>($targets,
                                                      $target_ordinals,
                                                      $target_objects,
@@ -1469,140 +1467,7 @@
   }];
 }
 
-// TODO(#18154): deprecated and will be replaced with simplified bindings.
-def HAL_CommandBufferPushConstantsOp : HAL_Op<"command_buffer.push_constants"> {
-  let summary = [{command buffer push constants operation}];
-  let description = [{
-    Pushes an inline set of constants that can be accessed by subsequent
-    dispatches using a compatible pipeline layout.
-
-    Push constants are always 4-byte values and treated as opaque, meaning that
-    they may be bit-casted floats, bit-packed booleans, etc.
-  }];
-
-  let arguments = (ins
-    HAL_CommandBuffer:$command_buffer,
-    HAL_PipelineLayout:$pipeline_layout,
-    IndexAttr:$offset,
-    Variadic<I32>:$values
-  );
-
-  let assemblyFormat = [{
-    `<` $command_buffer `:` type($command_buffer) `>`
-    `layout` `(` $pipeline_layout `:` type($pipeline_layout) `)`
-    `offset` `(` $offset `)`
-    `values` `(` `[` $values `]` `)`
-    `:` type($values)
-    attr-dict-with-keyword
-  }];
-}
-
-// TODO(#18154): deprecated and will be replaced with simplified bindings.
-def HAL_CommandBufferPushDescriptorSetOp : HAL_Op<"command_buffer.push_descriptor_set", [
-  SameVariadicOperandSize,
-]> {
-  let summary = [{command buffer descriptor set push binding operation}];
-  let description = [{
-    Pushes an inline-defined descriptor set to the command buffer.
-    The provided buffers may either be HAL buffers or indirect references into
-    the command buffer binding table.
-  }];
-
-  let arguments = (ins
-    HAL_CommandBuffer:$command_buffer,
-    HAL_PipelineLayout:$pipeline_layout,
-    Index:$set,
-    Variadic<Index>:$binding_ordinals,
-    Variadic<AnyTypeOf<[Index, HAL_BufferType]>>:$binding_buffers,
-    Variadic<HAL_DeviceSize>:$binding_offsets,
-    Variadic<HAL_DeviceSize>:$binding_lengths
-  );
-
-  let assemblyFormat = [{
-    `<` $command_buffer `:` type($command_buffer) `>`
-    `layout` `(` $pipeline_layout `:` type($pipeline_layout) `)`
-    `` `[` $set `]`
-    `bindings` `(` `[`
-    custom<DescriptorSetBindings>($binding_ordinals,
-                                  $binding_buffers,
-                                  type($binding_buffers),
-                                  $binding_offsets,
-                                  $binding_lengths)
-    `]` `)`
-    attr-dict-with-keyword
-  }];
-
-  let skipDefaultBuilders = 1;
-  let builders = [
-    OpBuilder<(ins "Value":$commandBuffer, "Value":$pipelineLayout,
-      "int64_t":$set, "ArrayRef<DescriptorSetBindingValue>":$bindings)>,
-    OpBuilder<(ins "Value":$commandBuffer, "Value":$pipelineLayout,
-      "Value":$set, "ArrayRef<DescriptorSetBindingValue>":$bindings)>,
-  ];
-
-  let hasCanonicalizer = 1;
-}
-
-// TODO(#18154): deprecated and will be replaced with simplified bindings.
-def HAL_CommandBufferDispatchOp : HAL_Op<"command_buffer.dispatch"> {
-  let summary = [{command buffer dispatch recording operation}];
-  let description = [{
-    Dispatches an execution request.
-  }];
-
-  let arguments = (ins
-    HAL_CommandBuffer:$command_buffer,
-    HAL_Executable:$executable,
-    HAL_Ordinal:$entry_point,
-    HAL_Dim:$workgroup_x,
-    HAL_Dim:$workgroup_y,
-    HAL_Dim:$workgroup_z,
-    HAL_DispatchFlagsAttr:$flags
-  );
-
-  let assemblyFormat = [{
-    `<` $command_buffer `:` type($command_buffer) `>`
-    `target` `(` $executable `:` type($executable) `)`
-    `` `[` $entry_point `]`
-    `workgroups` `(` `[`
-        $workgroup_x `,`
-        $workgroup_y `,`
-        $workgroup_z
-    `]` `)`
-    `flags` `(` $flags `)`
-    attr-dict-with-keyword
-  }];
-}
-
-// TODO(#18154): deprecated and will be replaced with simplified bindings.
-def HAL_CommandBufferDispatchIndirectOp : HAL_Op<"command_buffer.dispatch.indirect"> {
-  let summary = [{command buffer indirect dispatch recording operation}];
-  let description = [{
-    Dispatches an execution request with the dispatch parameters loaded from the
-    given buffer.
-  }];
-
-  let arguments = (ins
-    HAL_CommandBuffer:$command_buffer,
-    HAL_Executable:$executable,
-    HAL_Ordinal:$entry_point,
-    AnyTypeOf<[Index, HAL_BufferType]>:$workgroups_buffer,
-    HAL_DeviceSize:$workgroups_offset,
-    HAL_DispatchFlagsAttr:$flags
-  );
-
-  let assemblyFormat = [{
-    `<` $command_buffer `:` type($command_buffer) `>`
-    `target` `(` $executable `:` type($executable) `)`
-    `` `[` $entry_point `]`
-    `workgroups` `(` $workgroups_buffer `:` type($workgroups_buffer) `)`
-    `` `[` $workgroups_offset `]`
-    `flags` `(` $flags `)`
-    attr-dict-with-keyword
-  }];
-}
-
-def HAL_CommandBufferDispatch2Op : HAL_Op<"command_buffer.dispatch2", [
+def HAL_CommandBufferDispatchOp : HAL_Op<"command_buffer.dispatch", [
   AttrSizedOperandSegments,
 ]> {
   let summary = [{command buffer dispatch recording operation}];
@@ -1665,10 +1530,11 @@
     )>,
   ];
 
+  let hasCanonicalizer = 1;
   let hasVerifier = 1;
 }
 
-def HAL_CommandBufferDispatch2IndirectOp : HAL_Op<"command_buffer.dispatch2.indirect", [
+def HAL_CommandBufferDispatchIndirectOp : HAL_Op<"command_buffer.dispatch.indirect", [
   AttrSizedOperandSegments,
 ]> {
   let summary = [{command buffer indirect dispatch recording operation}];
@@ -1732,56 +1598,13 @@
     )>,
   ];
 
+  let hasCanonicalizer = 1;
   let hasVerifier = 1;
 }
 
 } // OpGroupCommandBufferOps
 
 //===----------------------------------------------------------------------===//
-// !hal.descriptor_set_layout / iree_hal_descriptor_set_layout_t
-//===----------------------------------------------------------------------===//
-
-def OpGroupDescriptorSetLayoutOps : OpDocGroup {
-  let summary = "Descriptor set layout ops";
-  let description = [{
-    Ops for `!hal.descriptor_set_layout` / `iree_hal_descriptor_set_layout_t`.
-  }];
-}
-
-let opDocGroup = OpGroupDescriptorSetLayoutOps in {
-
-def HAL_DescriptorSetLayoutCreateOp : HAL_PureOp<"descriptor_set_layout.create", [
-  DeclareOpInterfaceMethods<OpAsmOpInterface, ["getAsmResultNames"]>,
-]> {
-  let summary = [{creates a descriptor set layout}];
-  let description = [{
-    Creates a descriptor set layout that defines the bindings used within a set.
-    The same descriptor set layout may be shared with many different executable
-    layouts and by doing so some runtime binding overhead when switching between
-    executables that use the same set layouts can be reduced.
-  }];
-
-  let arguments = (ins
-    HAL_Device:$device,
-    HAL_DescriptorSetLayoutFlagsAttr:$flags,
-    HAL_DescriptorSetLayoutBindingArrayAttr:$bindings
-  );
-  let results = (outs
-    HAL_DescriptorSetLayout:$result
-  );
-
-  let assemblyFormat = [{
-    `device` `(` $device `:` type($device) `)`
-    `flags` `(` $flags `)`
-    `bindings` `(` $bindings `)`
-    `:` type($result)
-    attr-dict-with-keyword
-  }];
-}
-
-} // OpGroupDescriptorSetLayoutOps
-
-//===----------------------------------------------------------------------===//
 // !hal.device / iree_hal_device_t
 //===----------------------------------------------------------------------===//
 
@@ -2800,48 +2623,8 @@
   ];
 }
 
-// TODO(#18154): deprecated and will be replaced with simplified bindings.
 def HAL_ExecutableCreateOp : HAL_PureOp<"executable.create", [
   DeclareOpInterfaceMethods<OpAsmOpInterface, ["getAsmResultNames"]>,
-  AttrSizedOperandSegments,
-]> {
-  let summary = [{creates an executable}];
-  let description = [{
-    Creates a target-dependent executable cached on the provided device. Entry
-    points contained within the executable can be dispatched using the resulting
-    executable handle.
-
-    Depending on the driver creation may take a non-trivial amount of time
-    (such as when JITing/etc). As the cache is internally synchronized callers
-    can issue preparation requests from multiple threads - even for the same
-    executables - and calls will block until preparation completes.
-
-    Optional constants provide for specialization of the executable based on
-    runtime-derived parameters.
-  }];
-
-  let arguments = (ins
-    HAL_Device:$device,
-    SymbolRefAttr:$executable_target,
-    Variadic<HAL_PipelineLayout>:$layouts,
-    Variadic<I32>:$constants
-  );
-  let results = (outs
-    HAL_Executable:$result
-  );
-
-  let assemblyFormat = [{
-    `device` `(` $device `:` type($device) `)`
-    `target` `(` $executable_target `)`
-    `layouts` `(` `[` $layouts `]` `)`
-    (`constants` `(` `[` $constants^ `]` `)`)?
-    `:` type($result)
-    attr-dict-with-keyword
-  }];
-}
-
-def HAL_ExecutableCreate2Op : HAL_PureOp<"executable.create2", [
-  DeclareOpInterfaceMethods<OpAsmOpInterface, ["getAsmResultNames"]>,
 ]> {
   let summary = [{creates an executable}];
   let description = [{
@@ -3276,7 +3059,6 @@
 
   let arguments = (ins
     HAL_PipelineLayoutAttr:$layout,
-    IndexAttr:$set,
     IndexAttr:$binding,
     Optional<HAL_DeviceSize>:$byte_offset,
     HAL_ShapeDynamicDims:$dynamic_dims,
@@ -3289,7 +3071,6 @@
 
   let assemblyFormat = [{
     `layout` `(` $layout `)`
-    `set` `(` $set `)`
     `binding` `(` $binding `)`
     (`alignment` `(` $alignment^ `)`)?
     (`offset` `(` $byte_offset^ `)`)?
@@ -3301,7 +3082,6 @@
     OpBuilder<(ins
       "Type":$resultType,
       "IREE::HAL::PipelineLayoutAttr":$layout,
-      "APInt":$set,
       "APInt":$binding,
       "Value":$byte_offset,
       "ValueRange":$dynamic_dims,
@@ -3317,16 +3097,11 @@
     ValueRange getResultDynamicDims(unsigned idx) { return getDynamicDims(); }
 
     // Returns the descriptor set binding metadata for the given set/binding.
-    IREE::HAL::DescriptorSetBindingAttr getDescriptorSetBindingAttr();
+    IREE::HAL::PipelineBindingAttr getPipelineBindingAttr();
 
     // Returns the type of the descriptor this binding references.
     IREE::HAL::DescriptorType getDescriptorType();
 
-    // Returns the binding index in a flattened list of all sets and bindings.
-    // For example, if the layout is [set(bindings[4]), set(bindings[2])] then
-    // a query for set 1 binding 0 would return 4.
-    int64_t getFlatBindingIndex();
-
     // Returns the alignment of the base buffer pointer (before offset).
     llvm::MaybeAlign getBaseAlignment();
 
@@ -3341,79 +3116,6 @@
 } // OpGroupInterfaceOps
 
 //===----------------------------------------------------------------------===//
-// !hal.pipeline_layout / iree_hal_pipeline_layout_t
-//===----------------------------------------------------------------------===//
-
-def OpGroupPipelineLayoutOps : OpDocGroup {
-  let summary = "Pipeline layout ops";
-  let description = [{
-    Ops for `!hal.pipeline_layout` / `iree_hal_pipeline_layout_t`.
-  }];
-}
-
-let opDocGroup = OpGroupPipelineLayoutOps in {
-
-def HAL_PipelineLayoutCreateOp : HAL_PureOp<"pipeline_layout.create", [
-  DeclareOpInterfaceMethods<OpAsmOpInterface, ["getAsmResultNames"]>,
-]> {
-  let summary = [{creates an pipeline layout}];
-  let description = [{
-    Creates an pipeline layout from the given descriptor sets and push
-    constant required size. Pipeline layouts can be shared across any
-    executable that uses the same layout and push constant information. Sharing
-    the layout between executables will reduce runtime binding overhead and it
-    is often worth the cost to allow a small number of unused bindings in one
-    executable such that it can share layouts with others that will be scheduled
-    adjacent to it.
-  }];
-
-  let arguments = (ins
-    HAL_Device:$device,
-    IndexAttr:$push_constants,
-    Variadic<HAL_DescriptorSetLayout>:$set_layouts
-  );
-  let results = (outs
-    HAL_PipelineLayout:$result
-  );
-
-  // TODO(benvanik): include descriptor set layout types.
-  let assemblyFormat = [{
-    `device` `(` $device `:` type($device) `)`
-    `push_constants` `(` $push_constants `)`
-    `layouts` `(` `[` $set_layouts `]` `)`
-    `:` type($result)
-    attr-dict-with-keyword
-  }];
-}
-
-def HAL_PipelineLayoutLookupOp : HAL_PureOp<"pipeline_layout.lookup", [
-  DeclareOpInterfaceMethods<OpAsmOpInterface, ["getAsmResultNames"]>,
-]> {
-  let summary = [{pipeline layout cache lookup pseudo-op}];
-  let description = [{
-    Used during conversion to provide a placeholder for a globally cached and
-    possibly lazy-initialized pipeline layout.
-  }];
-
-  let arguments = (ins
-    HAL_Device:$device,
-    HAL_PipelineLayoutAttr:$layout
-  );
-  let results = (outs
-    HAL_PipelineLayout:$result
-  );
-
-  let assemblyFormat = [{
-    `device` `(` $device `:` type($device) `)`
-    `layout` `(` $layout `)`
-    `:` type($result)
-    attr-dict-with-keyword
-  }];
-}
-
-} // OpGroupPipelineLayoutOps
-
-//===----------------------------------------------------------------------===//
 // !hal.fence / iree_hal_fence_t
 //===----------------------------------------------------------------------===//
 
diff --git a/compiler/src/iree/compiler/Dialect/HAL/IR/HALTypes.cpp b/compiler/src/iree/compiler/Dialect/HAL/IR/HALTypes.cpp
index afc2932..9e31af7 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/IR/HALTypes.cpp
+++ b/compiler/src/iree/compiler/Dialect/HAL/IR/HALTypes.cpp
@@ -90,26 +90,6 @@
 // Utilities
 //===----------------------------------------------------------------------===//
 
-SmallVector<IREE::HAL::InterfaceBindingAttr>
-getInterfaceBindingAttrs(Operation *op, size_t resourceCount) {
-  // It'd be nice if we had something typed here but this is just used for
-  // spooky action at a distance or user overrides. If the attribute is not
-  // found (not set by MaterializeInterfaces or the user) we construct one by
-  // convention (dense set 0 bindings for each resource).
-  auto bindingAttrs = op->getAttrOfType<ArrayAttr>("hal.interface.bindings");
-  if (bindingAttrs) {
-    return llvm::to_vector(
-        bindingAttrs.getAsRange<IREE::HAL::InterfaceBindingAttr>());
-  }
-  SmallVector<IREE::HAL::InterfaceBindingAttr> bindings;
-  for (size_t i = 0; i < resourceCount; ++i) {
-    bindings.push_back(IREE::HAL::InterfaceBindingAttr::get(op->getContext(),
-                                                            /*set=*/0,
-                                                            /*binding=*/i));
-  }
-  return bindings;
-}
-
 //===----------------------------------------------------------------------===//
 // Dialect registration
 //===----------------------------------------------------------------------===//
@@ -119,32 +99,27 @@
 
 void HALDialect::registerTypes() {
   addTypes<AllocatorType, BufferType, BufferViewType, ChannelType,
-           CommandBufferType, DescriptorSetLayoutType, DeviceType, EventType,
-           ExecutableType, FenceType, FileType, PipelineLayoutType,
-           SemaphoreType>();
+           CommandBufferType, DeviceType, EventType, ExecutableType, FenceType,
+           FileType, SemaphoreType>();
 }
 
 Type HALDialect::parseType(DialectAsmParser &parser) const {
   StringRef typeKind;
   if (parser.parseKeyword(&typeKind))
     return {};
-  auto type =
-      llvm::StringSwitch<Type>(typeKind)
-          .Case("allocator", AllocatorType::get(getContext()))
-          .Case("buffer", BufferType::get(getContext()))
-          .Case("buffer_view", BufferViewType::get(getContext()))
-          .Case("channel", ChannelType::get(getContext()))
-          .Case("command_buffer", CommandBufferType::get(getContext()))
-          .Case("descriptor_set_layout",
-                DescriptorSetLayoutType::get(getContext()))
-          .Case("device", DeviceType::get(getContext()))
-          .Case("event", EventType::get(getContext()))
-          .Case("executable", ExecutableType::get(getContext()))
-          .Case("fence", FenceType::get(getContext()))
-          .Case("file", FileType::get(getContext()))
-          .Case("pipeline_layout", PipelineLayoutType::get(getContext()))
-          .Case("semaphore", SemaphoreType::get(getContext()))
-          .Default(nullptr);
+  auto type = llvm::StringSwitch<Type>(typeKind)
+                  .Case("allocator", AllocatorType::get(getContext()))
+                  .Case("buffer", BufferType::get(getContext()))
+                  .Case("buffer_view", BufferViewType::get(getContext()))
+                  .Case("channel", ChannelType::get(getContext()))
+                  .Case("command_buffer", CommandBufferType::get(getContext()))
+                  .Case("device", DeviceType::get(getContext()))
+                  .Case("event", EventType::get(getContext()))
+                  .Case("executable", ExecutableType::get(getContext()))
+                  .Case("fence", FenceType::get(getContext()))
+                  .Case("file", FileType::get(getContext()))
+                  .Case("semaphore", SemaphoreType::get(getContext()))
+                  .Default(nullptr);
   if (!type) {
     parser.emitError(parser.getCurrentLocation())
         << "unknown HAL type: " << typeKind;
@@ -163,8 +138,6 @@
     p << "channel";
   } else if (llvm::isa<CommandBufferType>(type)) {
     p << "command_buffer";
-  } else if (llvm::isa<DescriptorSetLayoutType>(type)) {
-    p << "descriptor_set_layout";
   } else if (llvm::isa<DeviceType>(type)) {
     p << "device";
   } else if (llvm::isa<EventType>(type)) {
@@ -175,8 +148,6 @@
     p << "fence";
   } else if (llvm::isa<FileType>(type)) {
     p << "file";
-  } else if (llvm::isa<PipelineLayoutType>(type)) {
-    p << "pipeline_layout";
   } else if (llvm::isa<SemaphoreType>(type)) {
     p << "semaphore";
   } else {
diff --git a/compiler/src/iree/compiler/Dialect/HAL/IR/HALTypes.h b/compiler/src/iree/compiler/Dialect/HAL/IR/HALTypes.h
index cdb29e0..fefe68f 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/IR/HALTypes.h
+++ b/compiler/src/iree/compiler/Dialect/HAL/IR/HALTypes.h
@@ -111,13 +111,6 @@
   static constexpr StringLiteral name = "hal.command_buffer";
 };
 
-struct DescriptorSetLayoutType
-    : public Type::TypeBase<DescriptorSetLayoutType, Type, TypeStorage> {
-  using Base::Base;
-
-  static constexpr StringLiteral name = "hal.descriptor_set_layout";
-};
-
 struct DeviceType
     : public Type::TypeBase<DeviceType, Type, TypeStorage,
                             mlir::OpTrait::IREE::Util::ImplicitlyCaptured,
@@ -158,13 +151,6 @@
   static constexpr StringLiteral name = "hal.file";
 };
 
-struct PipelineLayoutType
-    : public Type::TypeBase<PipelineLayoutType, Type, TypeStorage> {
-  using Base::Base;
-
-  static constexpr StringLiteral name = "hal.pipeline_layout";
-};
-
 struct SemaphoreType : public Type::TypeBase<SemaphoreType, Type, TypeStorage> {
   using Base::Base;
 
@@ -178,7 +164,7 @@
 // A tuple containing runtime values for a descriptor set binding.
 // The buffer specified may be either a !hal.buffer or an index of a binding
 // table slot to source the buffer from.
-struct DescriptorSetBindingValue {
+struct PipelineBindingValue {
   Value ordinal;
   Value buffer;
   Value byteOffset;
@@ -234,14 +220,14 @@
 
 template <>
 struct FieldParser<
-    std::optional<mlir::iree_compiler::IREE::HAL::DescriptorSetLayoutFlags>> {
-  static FailureOr<mlir::iree_compiler::IREE::HAL::DescriptorSetLayoutFlags>
+    std::optional<mlir::iree_compiler::IREE::HAL::PipelineLayoutFlags>> {
+  static FailureOr<mlir::iree_compiler::IREE::HAL::PipelineLayoutFlags>
   parse(AsmParser &parser) {
     std::string value;
     if (parser.parseKeywordOrString(&value))
       return failure();
     auto result = mlir::iree_compiler::IREE::HAL::symbolizeEnum<
-        mlir::iree_compiler::IREE::HAL::DescriptorSetLayoutFlags>(value);
+        mlir::iree_compiler::IREE::HAL::PipelineLayoutFlags>(value);
     if (!result.has_value())
       return failure();
     return result.value();
@@ -249,8 +235,7 @@
 };
 static inline AsmPrinter &operator<<(
     AsmPrinter &printer,
-    std::optional<mlir::iree_compiler::IREE::HAL::DescriptorSetLayoutFlags>
-        param) {
+    std::optional<mlir::iree_compiler::IREE::HAL::PipelineLayoutFlags> param) {
   printer << (param.has_value()
                   ? mlir::iree_compiler::IREE::HAL::stringifyEnum(param.value())
                   : StringRef{""});
@@ -301,11 +286,6 @@
 
 namespace mlir::iree_compiler::IREE::HAL {
 
-// Returns the assigned bindings via the `hal.interface.bindings` attribute
-// on the operation or default bindings in set 0 with bindings [0, count).
-SmallVector<IREE::HAL::InterfaceBindingAttr>
-getInterfaceBindingAttrs(Operation *op, size_t resourceCount);
-
 } // namespace mlir::iree_compiler::IREE::HAL
 
 #endif // IREE_COMPILER_DIALECT_HAL_IR_HALTYPES_H_
diff --git a/compiler/src/iree/compiler/Dialect/HAL/IR/test/BUILD.bazel b/compiler/src/iree/compiler/Dialect/HAL/IR/test/BUILD.bazel
index 8c88e20..a111296 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/IR/test/BUILD.bazel
+++ b/compiler/src/iree/compiler/Dialect/HAL/IR/test/BUILD.bazel
@@ -24,7 +24,6 @@
             "channel_ops.mlir",
             "command_buffer_folding.mlir",
             "command_buffer_ops.mlir",
-            "descriptor_set_ops.mlir",
             "device_folding.mlir",
             "device_ops.mlir",
             "devices_ops.mlir",
diff --git a/compiler/src/iree/compiler/Dialect/HAL/IR/test/CMakeLists.txt b/compiler/src/iree/compiler/Dialect/HAL/IR/test/CMakeLists.txt
index f1fb349..e2d654b 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/IR/test/CMakeLists.txt
+++ b/compiler/src/iree/compiler/Dialect/HAL/IR/test/CMakeLists.txt
@@ -22,7 +22,6 @@
     "channel_ops.mlir"
     "command_buffer_folding.mlir"
     "command_buffer_ops.mlir"
-    "descriptor_set_ops.mlir"
     "device_folding.mlir"
     "device_ops.mlir"
     "devices_ops.mlir"
diff --git a/compiler/src/iree/compiler/Dialect/HAL/IR/test/attributes.mlir b/compiler/src/iree/compiler/Dialect/HAL/IR/test/attributes.mlir
index d04c7d6..0e7dd9e 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/IR/test/attributes.mlir
+++ b/compiler/src/iree/compiler/Dialect/HAL/IR/test/attributes.mlir
@@ -3,30 +3,26 @@
 
 // CHECK-LABEL: descriptor_set_layout_binding.basic
 "descriptor_set_layout_binding.basic"() {
-  // CHECK: dslb0 = #hal.descriptor_set.binding<0, uniform_buffer>
-  dslb0 = #hal.descriptor_set.binding<0, uniform_buffer>,
-  // CHECK: dslb1 = #hal.descriptor_set.binding<1, storage_buffer, "ReadOnly|Indirect">
-  dslb1 = #hal.descriptor_set.binding<1, storage_buffer, "ReadOnly|Indirect">
+  // CHECK: dslb0 = #hal.pipeline.binding<uniform_buffer>
+  dslb0 = #hal.pipeline.binding<uniform_buffer>,
+  // CHECK: dslb1 = #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">
+  dslb1 = #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">
 } : () -> ()
 
 // -----
 
 // CHECK-LABEL: pipeline_layout.basic
 "pipeline_layout.basic"() {
-  // CHECK: layout0 = #hal.pipeline.layout<push_constants = 4, sets = [
-  // CHECK-SAME: <0, bindings = [
-  // CHECK-SAME:   <0, storage_buffer>
-  // CHECK-SAME:   <1, storage_buffer>
-  // CHECK-SAME: <1, bindings = [
-  // CHECK-SAME:   <0, uniform_buffer>
-  layout0 = #hal.pipeline.layout<push_constants = 4, sets = [
-    #hal.descriptor_set.layout<0, bindings = [
-      #hal.descriptor_set.binding<0, storage_buffer>,
-      #hal.descriptor_set.binding<1, storage_buffer>
-    ]>,
-    #hal.descriptor_set.layout<1, bindings = [
-      #hal.descriptor_set.binding<0, uniform_buffer>
-    ]>
+  // CHECK: layout0 = #hal.pipeline.layout<
+  // CHECK-SAME: constants = 4
+  // CHECK-SAME: bindings = [
+  // CHECK-SAME:   #hal.pipeline.binding<storage_buffer>,
+  // CHECK-SAME:   #hal.pipeline.binding<storage_buffer>,
+  // CHECK-SAME:   #hal.pipeline.binding<uniform_buffer>
+  layout0 = #hal.pipeline.layout<constants = 4, bindings = [
+    #hal.pipeline.binding<storage_buffer>,
+    #hal.pipeline.binding<storage_buffer>,
+    #hal.pipeline.binding<uniform_buffer>
   ]>
 } : () -> ()
 
diff --git a/compiler/src/iree/compiler/Dialect/HAL/IR/test/command_buffer_folding.mlir b/compiler/src/iree/compiler/Dialect/HAL/IR/test/command_buffer_folding.mlir
index 3adbce8..279511e 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/IR/test/command_buffer_folding.mlir
+++ b/compiler/src/iree/compiler/Dialect/HAL/IR/test/command_buffer_folding.mlir
@@ -98,13 +98,13 @@
 
 // -----
 
-// CHECK-LABEL: @fold_buffer_subspan_into_push_descriptor_set
+// CHECK-LABEL: @fold_buffer_subspan_into_dispatch
 //  CHECK-SAME: %[[CMD:.+]]: !hal.command_buffer,
-//  CHECK-SAME: %[[LAYOUT:.+]]: !hal.pipeline_layout,
+//  CHECK-SAME: %[[EXECUTABLE:.+]]: !hal.executable,
 //  CHECK-SAME: %[[BASE_BUFFER:.+]]: !hal.buffer
-util.func public @fold_buffer_subspan_into_push_descriptor_set(
+util.func public @fold_buffer_subspan_into_dispatch(
     %cmd: !hal.command_buffer,
-    %layout: !hal.pipeline_layout,
+    %executable: !hal.executable,
     %buffer: !hal.buffer
   ) {
   %c0 = arith.constant 0 : index
@@ -116,20 +116,52 @@
   %c262140 = arith.constant 262140 : index
   %c262144 = arith.constant 262144 : index
   %subspan = hal.buffer.subspan<%buffer : !hal.buffer>[%c4096, %c262144] : !hal.buffer
-  //      CHECK: hal.command_buffer.push_descriptor_set
+  //      CHECK: hal.command_buffer.dispatch
   // CHECK-SAME:   bindings([
-  hal.command_buffer.push_descriptor_set<%cmd : !hal.command_buffer>
-      layout(%layout : !hal.pipeline_layout)[%c0]
+  hal.command_buffer.dispatch<%cmd : !hal.command_buffer>
+      target(%executable: !hal.executable)[%c0]
+      workgroups([%c1, %c1, %c1])
       bindings([
         // 0 + 4096:
-        // CHECK-NEXT: %c0 = (%[[BASE_BUFFER]] : !hal.buffer)[%c4096, %c8000]
-        %c0 = (%subspan : !hal.buffer)[%c0, %c8000],
+        // CHECK-NEXT: (%[[BASE_BUFFER]] : !hal.buffer)[%c4096, %c8000]
+        (%subspan : !hal.buffer)[%c0, %c8000],
         // 4096 + 4:
-        // CHECK-NEXT: %c1 = (%[[BASE_BUFFER]] : !hal.buffer)[%c4100, %c262140]
-        %c1 = (%subspan : !hal.buffer)[%c4, %c262140],
+        // CHECK-NEXT: (%[[BASE_BUFFER]] : !hal.buffer)[%c4100, %c262140]
+        (%subspan : !hal.buffer)[%c4, %c262140],
         // No change:
-        // CHECK-NEXT: %c2 = (%[[BASE_BUFFER]] : !hal.buffer)[%c4096, %c262144]
-        %c2 = (%buffer : !hal.buffer)[%c4096, %c262144]
+        // CHECK-NEXT: (%[[BASE_BUFFER]] : !hal.buffer)[%c4096, %c262144]
+        (%buffer : !hal.buffer)[%c4096, %c262144]
       ])
+      flags("None")
+  util.return
+}
+
+// -----
+
+// CHECK-LABEL: @fold_buffer_subspan_into_dispatch_indirect
+//  CHECK-SAME: %[[CMD:.+]]: !hal.command_buffer,
+//  CHECK-SAME: %[[EXECUTABLE:.+]]: !hal.executable,
+//  CHECK-SAME: %[[BASE_BUFFER:.+]]: !hal.buffer
+util.func public @fold_buffer_subspan_into_dispatch_indirect(
+    %cmd: !hal.command_buffer,
+    %executable: !hal.executable,
+    %buffer: !hal.buffer
+  ) {
+  %c0 = arith.constant 0 : index
+  %c1 = arith.constant 1 : index
+  %c4 = arith.constant 4 : index
+  %c4096 = arith.constant 4096 : index
+  %c262144 = arith.constant 262144 : index
+  %subspan = hal.buffer.subspan<%buffer : !hal.buffer>[%c4096, %c262144] : !hal.buffer
+  // CHECK: hal.command_buffer.dispatch.indirect
+  hal.command_buffer.dispatch.indirect<%cmd : !hal.command_buffer>
+      target(%executable: !hal.executable)[%c0]
+      // 4096 + 4:
+      // CHECK-SAME: workgroups(%[[BASE_BUFFER]] : !hal.buffer)[%c4100]
+      workgroups(%subspan : !hal.buffer)[%c4]
+      bindings([
+        (%buffer : !hal.buffer)[%c0, %c1]
+      ])
+      flags("None")
   util.return
 }
diff --git a/compiler/src/iree/compiler/Dialect/HAL/IR/test/command_buffer_ops.mlir b/compiler/src/iree/compiler/Dialect/HAL/IR/test/command_buffer_ops.mlir
index 77d56e3..2e14141 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/IR/test/command_buffer_ops.mlir
+++ b/compiler/src/iree/compiler/Dialect/HAL/IR/test/command_buffer_ops.mlir
@@ -213,44 +213,11 @@
 
 // -----
 
-// CHECK-LABEL: @command_buffer_push_descriptor_set
-//  CHECK-SAME: (%[[CMD:.+]]: !hal.command_buffer,
-//  CHECK-SAME:  %[[LAYOUT:.+]]: !hal.pipeline_layout,
-//  CHECK-SAME:  %[[BUFFER:.+]]: !hal.buffer,
-//  CHECK-SAME:  %[[SLOT:.+]]: index)
-util.func public @command_buffer_push_descriptor_set(
-    %cmd: !hal.command_buffer,
-    %layout: !hal.pipeline_layout,
-    %buffer: !hal.buffer,
-    %slot: index) {
-  %c0 = arith.constant 0 : index
-  %c1 = arith.constant 1 : index
-  %c4 = arith.constant 4 : index
-  %c4096 = arith.constant 4096 : index
-  %c8000 = arith.constant 8000 : index
-  // CHECK: hal.command_buffer.push_descriptor_set<%[[CMD]] : !hal.command_buffer>
-  hal.command_buffer.push_descriptor_set<%cmd : !hal.command_buffer>
-      // CHECK-SAME: layout(%[[LAYOUT]] : !hal.pipeline_layout)[%c1]
-      layout(%layout : !hal.pipeline_layout)[%c1]
-      // CHECK-SAME: bindings([
-      bindings([
-        // CHECK-NEXT: %c0 = (%[[BUFFER]] : !hal.buffer)[%c4096, %c8000]
-        %c0 = (%buffer : !hal.buffer)[%c4096, %c8000],
-        // CHECK-NEXT: %c1 = (%[[SLOT]] : index)[%c4, %c4096]
-        %c1 = (%slot : index)[%c4, %c4096]
-      ])
-  util.return
-}
-
-// -----
-
 hal.executable @ex {
   hal.executable.variant @backend target(<"backend", "format">) {
-    hal.executable.export @entry0 ordinal(0) layout(#hal.pipeline.layout<push_constants = 0, sets = [
-      #hal.descriptor_set.layout<0, bindings = [
-        #hal.descriptor_set.binding<0, storage_buffer>,
-        #hal.descriptor_set.binding<1, storage_buffer>
-      ]>
+    hal.executable.export @entry0 ordinal(0) layout(#hal.pipeline.layout<bindings = [
+      #hal.pipeline.binding<storage_buffer>,
+      #hal.pipeline.binding<storage_buffer>
     ]>)
   }
 }
@@ -258,18 +225,34 @@
 // CHECK-LABEL: @command_buffer_dispatch
 //  CHECK-SAME: (%[[CMD:.+]]: !hal.command_buffer,
 //  CHECK-SAME:  %[[EXECUTABLE:.+]]: !hal.executable, %[[ORDINAL:[a-z0-9]+]]: index,
-//  CHECK-SAME:  %[[X:[a-z0-9]+]]: index, %[[Y:[a-z0-9]+]]: index, %[[Z:[a-z0-9]+]]: index)
+//  CHECK-SAME:  %[[X:[a-z0-9]+]]: index, %[[Y:[a-z0-9]+]]: index, %[[Z:[a-z0-9]+]]: index,
+//  CHECK-SAME:  %[[BUFFER:.+]]: !hal.buffer,
+//  CHECK-SAME:  %[[SLOT:.+]]: index)
 util.func public @command_buffer_dispatch(
     %cmd: !hal.command_buffer,
     %executable: !hal.executable, %ordinal: index,
-    %x: index, %y: index, %z: index) {
-  //      CHECK: hal.command_buffer.dispatch<%[[CMD]] : !hal.command_buffer>
-  // CHECK-SAME:   target(%[[EXECUTABLE]] : !hal.executable)[%[[ORDINAL]]
-  // CHECK-SAME:   workgroups([%[[X]], %[[Y]], %[[Z]]])
-  // CHECK-SAME:   flags("None")
+    %x: index, %y: index, %z: index,
+    %buffer: !hal.buffer,
+    %slot: index) {
+  %c0 = arith.constant 0 : index
+  %c1 = arith.constant 1 : index
+  %c4 = arith.constant 4 : index
+  %c4096 = arith.constant 4096 : index
+  %c8000 = arith.constant 8000 : index
+  // CHECK: hal.command_buffer.dispatch<%[[CMD]] : !hal.command_buffer>
   hal.command_buffer.dispatch<%cmd : !hal.command_buffer>
+      // CHECK-SAME:   target(%[[EXECUTABLE]] : !hal.executable)[%[[ORDINAL]]
       target(%executable: !hal.executable)[%ordinal]
+      // CHECK-SAME: workgroups([%[[X]], %[[Y]], %[[Z]]])
       workgroups([%x, %y, %z])
+      // CHECK-SAME: bindings([
+      bindings([
+        // CHECK-NEXT: (%[[BUFFER]] : !hal.buffer)[%c4096, %c8000]
+        (%buffer : !hal.buffer)[%c4096, %c8000],
+        // CHECK-NEXT: (%[[SLOT]] : index)[%c4, %c4096]
+        (%slot : index)[%c4, %c4096]
+      ])
+      // CHECK-NEXT: flags("None")
       flags("None")
   util.return
 }
@@ -278,30 +261,33 @@
 
 hal.executable @ex {
   hal.executable.variant @backend target(<"backend", "format">) {
-    hal.executable.export @entry0 ordinal(0) layout(#hal.pipeline.layout<push_constants = 0, sets = [
-      #hal.descriptor_set.layout<0, bindings = [
-        #hal.descriptor_set.binding<0, storage_buffer>,
-        #hal.descriptor_set.binding<1, storage_buffer>
-      ]>
+    hal.executable.export @entry0 ordinal(0) layout(#hal.pipeline.layout<bindings = [
+      #hal.pipeline.binding<storage_buffer>,
+      #hal.pipeline.binding<storage_buffer>
     ]>)
   }
 }
 
 // CHECK-LABEL: @command_buffer_dispatch_indirect
 //  CHECK-SAME: (%[[CMD:.+]]: !hal.command_buffer,
-//  CHECK-SAME:  %[[EXECUTABLE:.+]]: !hal.executable, %[[ORDINAL:.+]]: index,
-//  CHECK-SAME:  %[[BUFFER:.+]]: !hal.buffer, %[[OFFSET:.+]]: index)
+//  CHECK-SAME:  %[[EXECUTABLE:.+]]: !hal.executable, %[[ORDINAL:[a-z0-9]+]]: index,
+//  CHECK-SAME:  %[[BUFFER:.+]]: !hal.buffer, %[[OFFSET:.+]]: index, %[[LENGTH:.+]]: index)
 util.func public @command_buffer_dispatch_indirect(
     %cmd: !hal.command_buffer,
     %executable: !hal.executable, %ordinal: index,
-    %buffer: !hal.buffer, %offset: index) {
+    %buffer: !hal.buffer, %offset: index, %length: index) {
   //      CHECK: hal.command_buffer.dispatch.indirect<%[[CMD]] : !hal.command_buffer>
   // CHECK-SAME:   target(%[[EXECUTABLE]] : !hal.executable)[%[[ORDINAL]]
   // CHECK-SAME:   workgroups(%[[BUFFER]] : !hal.buffer)[%[[OFFSET]]]
-  // CHECK-SAME:   flags("None")
+  // CHECK-SAME:   bindings([
+  // CHECK-NEXT:     (%[[BUFFER]] : !hal.buffer)[%[[OFFSET]], %[[LENGTH]]]
+  // CHECK-NEXT:   ]) flags("None")
   hal.command_buffer.dispatch.indirect<%cmd : !hal.command_buffer>
       target(%executable: !hal.executable)[%ordinal]
       workgroups(%buffer : !hal.buffer)[%offset]
+      bindings([
+        (%buffer : !hal.buffer)[%offset, %length]
+      ])
       flags("None")
   util.return
 }
@@ -310,11 +296,9 @@
 
 hal.executable @ex {
   hal.executable.variant @backend target(<"backend", "format">) {
-    hal.executable.export @entry0 ordinal(0) layout(#hal.pipeline.layout<push_constants = 0, sets = [
-      #hal.descriptor_set.layout<0, bindings = [
-        #hal.descriptor_set.binding<0, storage_buffer>,
-        #hal.descriptor_set.binding<1, storage_buffer>
-      ]>
+    hal.executable.export @entry0 ordinal(0) layout(#hal.pipeline.layout<bindings = [
+      #hal.pipeline.binding<storage_buffer>,
+      #hal.pipeline.binding<storage_buffer>
     ]>)
   }
 }
@@ -322,18 +306,23 @@
 // CHECK-LABEL: @command_buffer_dispatch_indirect_indirect
 //  CHECK-SAME: (%[[CMD:.+]]: !hal.command_buffer,
 //  CHECK-SAME:  %[[EXECUTABLE:[a-z0-9]+]]: !hal.executable, %[[ORDINAL:[a-z0-9]+]]: index,
-//  CHECK-SAME:  %[[BUFFER_SLOT:[a-z0-9]+]]: index, %[[OFFSET:[a-z0-9]+]]: index)
+//  CHECK-SAME:  %[[BUFFER_SLOT:[a-z0-9]+]]: index, %[[OFFSET:[a-z0-9]+]]: index, %[[LENGTH:[a-z0-9]+]]: index)
 util.func public @command_buffer_dispatch_indirect_indirect(
     %cmd: !hal.command_buffer,
     %executable: !hal.executable, %ordinal: index,
-    %buffer_slot: index, %offset: index) {
+    %buffer_slot: index, %offset: index, %length: index) {
   //      CHECK: hal.command_buffer.dispatch.indirect<%[[CMD]] : !hal.command_buffer>
   // CHECK-SAME:   target(%[[EXECUTABLE]] : !hal.executable)[%[[ORDINAL]]
   // CHECK-SAME:   workgroups(%[[BUFFER_SLOT]] : index)[%[[OFFSET]]]
-  // CHECK-SAME:   flags("None")
+  // CHECK-SAME:   bindings([
+  // CHECK-NEXT:     (%[[BUFFER_SLOT]] : index)[%[[OFFSET]], %[[LENGTH]]]
+  // CHECK-NEXT:   ]) flags("None")
   hal.command_buffer.dispatch.indirect<%cmd : !hal.command_buffer>
       target(%executable: !hal.executable)[%ordinal]
       workgroups(%buffer_slot : index)[%offset]
+      bindings([
+        (%buffer_slot : index)[%offset, %length]
+      ])
       flags("None")
   util.return
 }
diff --git a/compiler/src/iree/compiler/Dialect/HAL/IR/test/descriptor_set_ops.mlir b/compiler/src/iree/compiler/Dialect/HAL/IR/test/descriptor_set_ops.mlir
deleted file mode 100644
index 86180ac..0000000
--- a/compiler/src/iree/compiler/Dialect/HAL/IR/test/descriptor_set_ops.mlir
+++ /dev/null
@@ -1,20 +0,0 @@
-// RUN: iree-opt --split-input-file %s | iree-opt --split-input-file | FileCheck %s
-
-// CHECK-LABEL: @descriptor_set_layout_create
-// CHECK-SAME: (%[[DEVICE:.+]]: !hal.device)
-util.func public @descriptor_set_layout_create(%device: !hal.device) {
-  //      CHECK: = hal.descriptor_set_layout.create
-  // CHECK-SAME:     device(%[[DEVICE]] : !hal.device)
-  // CHECK-SAME:     flags("None")
-  // CHECK-SAME:     bindings([
-  // CHECK-SAME:       #hal.descriptor_set.binding<0, storage_buffer>,
-  // CHECK-SAME:       #hal.descriptor_set.binding<1, storage_buffer>
-  // CHECK-SAME:     ]) : !hal.descriptor_set_layout
-  %0 = hal.descriptor_set_layout.create device(%device : !hal.device)
-                                        flags("None")
-                                        bindings([
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]) : !hal.descriptor_set_layout
-  util.return
-}
diff --git a/compiler/src/iree/compiler/Dialect/HAL/IR/test/executable_ops.mlir b/compiler/src/iree/compiler/Dialect/HAL/IR/test/executable_ops.mlir
index c3ee554..3e69574 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/IR/test/executable_ops.mlir
+++ b/compiler/src/iree/compiler/Dialect/HAL/IR/test/executable_ops.mlir
@@ -13,11 +13,9 @@
   ]) {
     // CHECK-DAG: hal.executable.export public @entry0 ordinal(0) layout(#pipeline_layout) attributes {
     // CHECK-SAME:     workgroup_size = [4 : index, 1 : index, 1 : index]
-    hal.executable.export @entry0 ordinal(0) layout(#hal.pipeline.layout<push_constants = 0, sets = [
-      #hal.descriptor_set.layout<0, bindings = [
-        #hal.descriptor_set.binding<0, storage_buffer>,
-        #hal.descriptor_set.binding<1, storage_buffer>
-      ]>
+    hal.executable.export @entry0 ordinal(0) layout(#hal.pipeline.layout<bindings = [
+      #hal.pipeline.binding<storage_buffer>,
+      #hal.pipeline.binding<storage_buffer>
     ]>) attributes {
       workgroup_size = [4 : index, 1 : index, 1 : index]
     }
@@ -42,11 +40,9 @@
     // CHECK-DAG: hal.executable.export public @entry0 ordinal(0) layout(#pipeline_layout) attributes {
     // CHECK-SAME:     subgroup_size = 64 : index
     // CHECK-SAME:     workgroup_size = [4 : index, 1 : index, 1 : index]
-    hal.executable.export @entry0 ordinal(0) layout(#hal.pipeline.layout<push_constants = 0, sets = [
-      #hal.descriptor_set.layout<0, bindings = [
-        #hal.descriptor_set.binding<0, storage_buffer>,
-        #hal.descriptor_set.binding<1, storage_buffer>
-      ]>
+    hal.executable.export @entry0 ordinal(0) layout(#hal.pipeline.layout<bindings = [
+      #hal.pipeline.binding<storage_buffer>,
+      #hal.pipeline.binding<storage_buffer>
     ]>) attributes {
       subgroup_size = 64 : index,
       workgroup_size = [4 : index, 1 : index, 1 : index]
@@ -83,11 +79,9 @@
     // CHECK-DAG: hal.executable.export public @entry0 ordinal(0) layout(#pipeline_layout) attributes {
     // CHECK-SAME:     subgroup_size = 64 : index
     // CHECK-SAME:     workgroup_size = [4 : index, 1 : index, 1 : index]
-    hal.executable.export @entry0 ordinal(0) layout(#hal.pipeline.layout<push_constants = 0, sets = [
-      #hal.descriptor_set.layout<0, bindings = [
-        #hal.descriptor_set.binding<0, storage_buffer>,
-        #hal.descriptor_set.binding<1, storage_buffer>
-      ]>
+    hal.executable.export @entry0 ordinal(0) layout(#hal.pipeline.layout<bindings = [
+      #hal.pipeline.binding<storage_buffer>,
+      #hal.pipeline.binding<storage_buffer>
     ]>) attributes {
       subgroup_size = 64 : index,
       workgroup_size = [4 : index, 1 : index, 1 : index]
@@ -141,68 +135,26 @@
 // -----
 
 // CHECK-LABEL: @executable_create
-// CHECK-SAME: %[[DEVICE:.+]]: !hal.device,
-// CHECK-SAME: %[[LAYOUT0:.+]]: !hal.pipeline_layout,
-// CHECK-SAME: %[[LAYOUT1:.+]]: !hal.pipeline_layout
-util.func public @executable_create(
-    %device: !hal.device,
-    %layout0: !hal.pipeline_layout,
-    %layout1: !hal.pipeline_layout) {
-  //      CHECK: = hal.executable.create
-  // CHECK-SAME:     device(%[[DEVICE]] : !hal.device)
-  // CHECK-SAME:     target(@exe::@binary1)
-  // CHECK-SAME:    layouts([%[[LAYOUT0]], %[[LAYOUT1]]]) : !hal.executable
-  %0 = hal.executable.create device(%device : !hal.device)
-                             target(@exe::@binary1)
-                            layouts([%layout0, %layout1]) : !hal.executable
-  util.return
-}
-
-// -----
-
-// CHECK-LABEL: @executable_create2
 // CHECK-SAME: %[[DEVICE:.+]]: !hal.device
-util.func public @executable_create2(%device: !hal.device) {
+util.func public @executable_create(%device: !hal.device) {
   //      CHECK: = hal.executable.create
   // CHECK-SAME:     device(%[[DEVICE]] : !hal.device)
   // CHECK-SAME:     target(@exe::@binary1) : !hal.executable
-  %0 = hal.executable.create2 device(%device : !hal.device)
+  %0 = hal.executable.create device(%device : !hal.device)
                               target(@exe::@binary1) : !hal.executable
   util.return
 }
 
 // -----
 
-// CHECK-LABEL: @pipeline_layout_create
-// CHECK-SAME: %[[DEVICE:.+]]: !hal.device,
-// CHECK-SAME: %[[LAYOUT0:.+]]: !hal.descriptor_set_layout,
-// CHECK-SAME: %[[LAYOUT1:.+]]: !hal.descriptor_set_layout
-util.func public @pipeline_layout_create(
-    %device: !hal.device,
-    %layout0: !hal.descriptor_set_layout,
-    %layout1: !hal.descriptor_set_layout) {
-  // CHECK: hal.pipeline_layout.create
-  // CHECK-SAME:          device(%[[DEVICE]] : !hal.device)
-  // CHECK-SAME:  push_constants(1)
-  // CHECK-SAME:         layouts([%[[LAYOUT0]], %[[LAYOUT1]]]) : !hal.pipeline_layout
-  %0 = hal.pipeline_layout.create device(%device : !hal.device)
-                          push_constants(1)
-                                 layouts([%layout0, %layout1]) : !hal.pipeline_layout
-  util.return
-}
-
-// -----
-
 // CHECK-LABEL: @unresolved_workload_ex
 hal.executable @unresolved_workload_ex {
   // CHECK: hal.executable.variant public @backend
   hal.executable.variant @backend target(#hal.executable.target<"backend", "format">) {
     // CHECK: hal.executable.export public @entry0
-    hal.executable.export public @entry0 ordinal(0) layout(#hal.pipeline.layout<push_constants = 0, sets = [
-      #hal.descriptor_set.layout<0, bindings = [
-        #hal.descriptor_set.binding<0, storage_buffer>,
-        #hal.descriptor_set.binding<1, storage_buffer>
-      ]>
+    hal.executable.export public @entry0 ordinal(0) layout(#hal.pipeline.layout<bindings = [
+      #hal.pipeline.binding<storage_buffer>,
+      #hal.pipeline.binding<storage_buffer>
     ]>) {
     ^bb0(%device: !hal.device, %arg0: index):
       hal.return %arg0, %arg0, %arg0 : index, index, index
diff --git a/compiler/src/iree/compiler/Dialect/HAL/IR/test/interface_ops.mlir b/compiler/src/iree/compiler/Dialect/HAL/IR/test/interface_ops.mlir
index 7800b2a..b87c6bc 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/IR/test/interface_ops.mlir
+++ b/compiler/src/iree/compiler/Dialect/HAL/IR/test/interface_ops.mlir
@@ -13,8 +13,8 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  <0, bindings = [<0, storage_buffer>]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK-LABEL: @interface_io_constant
@@ -26,9 +26,11 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  <0, bindings = [<0, storage_buffer, ReadOnly>]>,
-  <1, bindings = [<0, storage_buffer, ReadOnly>, <1, storage_buffer, ReadOnly>, <2, storage_buffer>]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer, ReadOnly>,
+  #hal.pipeline.binding<storage_buffer, ReadOnly>,
+  #hal.pipeline.binding<storage_buffer, ReadOnly>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK-LABEL: @interface_io_subspan
@@ -36,26 +38,26 @@
 func.func @interface_io_subspan(%dim0: index, %dim2: index) {
   %c8 = arith.constant 8 : index
 
-  // CHECK: = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%c8) : memref<?x4x?x16xi8>{%[[DIM0]], %[[DIM2]]}
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%c8) : memref<?x4x?x16xi8>{%dim0, %dim2}
+  // CHECK: = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%c8) : memref<?x4x?x16xi8>{%[[DIM0]], %[[DIM2]]}
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%c8) : memref<?x4x?x16xi8>{%dim0, %dim2}
 
-  // CHECK: = hal.interface.binding.subspan layout(#pipeline_layout) set(1) binding(2) alignment(16) : memref<16xi8>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(1) binding(2) alignment(16) : memref<16xi8>
+  // CHECK: = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(16) : memref<16xi8>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(16) : memref<16xi8>
 
   return
 }
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  <0, bindings = [<0, storage_buffer, ReadOnly>]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer, ReadOnly>
 ]>
 
 func.func @interface_io_subspan_wrong_dynamic_dim(%dim: index) {
   %c8 = arith.constant 8 : index
 
   // expected-error @+1{{result type 'memref<?x4x?x16xi8>' has 2 dynamic dimensions but 1 associated dimension SSA values}}
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) offset(%c8) : memref<?x4x?x16xi8>{%dim}
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) offset(%c8) : memref<?x4x?x16xi8>{%dim}
 
   return
 }
diff --git a/compiler/src/iree/compiler/Dialect/HAL/IR/test/tensor_ops.mlir b/compiler/src/iree/compiler/Dialect/HAL/IR/test/tensor_ops.mlir
index 5dd1ea7..59145c5 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/IR/test/tensor_ops.mlir
+++ b/compiler/src/iree/compiler/Dialect/HAL/IR/test/tensor_ops.mlir
@@ -80,19 +80,11 @@
       hal.return %x_capture, %y_capture, %z : index, index, index
     }
     // Must match the external definition.
-    // CHECK: layout(<push_constants = 1, sets =
-    layout(#hal.pipeline.layout<push_constants = 1, sets = [
-      <0, bindings = [
-          <0, storage_buffer, ReadOnly>,
-          <1, storage_buffer>
-      ]>
+    // CHECK: layout(<constants = 1, bindings =
+    layout(#hal.pipeline.layout<constants = 1, bindings = [
+      #hal.pipeline.binding<storage_buffer, ReadOnly>,
+      #hal.pipeline.binding<storage_buffer>
     ]>)
-    // Optional, automatically inferred if omitted.
-    // CHECK: bindings([#hal.interface.binding<0, 0>, #hal.interface.binding<0, 1>])
-    bindings([
-      #hal.interface.binding<0, 0>,
-      #hal.interface.binding<0, 1>
-    ])
     // Can have object references for multiple targets or configurations.
     // CHECK: objects({
     objects({
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Target/Devices/LocalDevice.cpp b/compiler/src/iree/compiler/Dialect/HAL/Target/Devices/LocalDevice.cpp
index e482a49..bd0ab7c 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Target/Devices/LocalDevice.cpp
+++ b/compiler/src/iree/compiler/Dialect/HAL/Target/Devices/LocalDevice.cpp
@@ -36,11 +36,12 @@
 IREE::HAL::DeviceTargetAttr LocalDevice::getDefaultDeviceTarget(
     MLIRContext *context, const TargetRegistry &targetRegistry) const {
   Builder b(context);
-  SmallVector<NamedAttribute> configItems;
 
-  // TODO(benvanik): flags for common queries.
+  SmallVector<NamedAttribute> deviceConfigAttrs;
+  auto deviceConfigAttr = b.getDictionaryAttr(deviceConfigAttrs);
 
-  auto configAttr = b.getDictionaryAttr(configItems);
+  SmallVector<NamedAttribute> executableConfigAttrs;
+  auto executableConfigAttr = b.getDictionaryAttr(executableConfigAttrs);
 
   SmallVector<IREE::HAL::ExecutableTargetAttr> executableTargetAttrs;
   for (auto backendName : options.defaultTargetBackends) {
@@ -50,23 +51,25 @@
                    << "\n";
       return {};
     }
-    targetBackend->getDefaultExecutableTargets(context, "local", configAttr,
-                                               executableTargetAttrs);
+    targetBackend->getDefaultExecutableTargets(
+        context, "local", executableConfigAttr, executableTargetAttrs);
   }
 
   return IREE::HAL::DeviceTargetAttr::get(context, b.getStringAttr("local"),
-                                          configAttr, executableTargetAttrs);
+                                          deviceConfigAttr,
+                                          executableTargetAttrs);
 }
 
 std::optional<IREE::HAL::DeviceTargetAttr>
 LocalDevice::getHostDeviceTarget(MLIRContext *context,
                                  const TargetRegistry &targetRegistry) const {
   Builder b(context);
-  SmallVector<NamedAttribute> configItems;
 
-  // TODO(benvanik): flags for overrides or ask LLVM for info about the host.
+  SmallVector<NamedAttribute> deviceConfigAttrs;
+  auto deviceConfigAttr = b.getDictionaryAttr(deviceConfigAttrs);
 
-  auto configAttr = b.getDictionaryAttr(configItems);
+  SmallVector<NamedAttribute> executableConfigAttrs;
+  auto executableConfigAttr = b.getDictionaryAttr(executableConfigAttrs);
 
   SmallVector<IREE::HAL::ExecutableTargetAttr> executableTargetAttrs;
   for (auto backendName : options.defaultHostBackends) {
@@ -76,12 +79,13 @@
                    << "\n";
       return std::nullopt;
     }
-    targetBackend->getHostExecutableTargets(context, "local", configAttr,
-                                            executableTargetAttrs);
+    targetBackend->getHostExecutableTargets(
+        context, "local", executableConfigAttr, executableTargetAttrs);
   }
 
   return IREE::HAL::DeviceTargetAttr::get(context, b.getStringAttr("local"),
-                                          configAttr, executableTargetAttrs);
+                                          deviceConfigAttr,
+                                          executableTargetAttrs);
 }
 
 Value LocalDevice::buildDeviceTargetMatch(
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Target/TargetBackend.cpp b/compiler/src/iree/compiler/Dialect/HAL/Target/TargetBackend.cpp
index a668bd5..c576369 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Target/TargetBackend.cpp
+++ b/compiler/src/iree/compiler/Dialect/HAL/Target/TargetBackend.cpp
@@ -52,7 +52,7 @@
       llvm::join_items(llvm::sys::path::get_separator(), path, fileName);
   auto filePath = llvm::sys::path::convert_to_slash(fileParts);
   std::string error;
-  auto file = mlir::openOutputFile(filePath, &error);
+  auto file = mlir::openOutputFile(path == "-" ? path : filePath, &error);
   if (!file) {
     llvm::errs() << "Unable to dump debug output to " << filePath << "\n";
     return;
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Transforms/ConvertToHAL.cpp b/compiler/src/iree/compiler/Dialect/HAL/Transforms/ConvertToHAL.cpp
index eedb427..decde18 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Transforms/ConvertToHAL.cpp
+++ b/compiler/src/iree/compiler/Dialect/HAL/Transforms/ConvertToHAL.cpp
@@ -91,15 +91,6 @@
 
     // Cleanup conversion attributes used for spooky action at a distance.
     moduleOp->removeAttr("stream.affinity.default");
-    for (auto executableOp : moduleOp.getOps<IREE::HAL::ExecutableOp>()) {
-      for (auto variantOp :
-           executableOp.getOps<IREE::HAL::ExecutableVariantOp>()) {
-        for (auto exportOp :
-             variantOp.getOps<IREE::HAL::ExecutableExportOp>()) {
-          exportOp->removeAttr("hal.interface.bindings");
-        }
-      }
-    }
   }
 };
 
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Transforms/DumpExecutableBenchmarks.cpp b/compiler/src/iree/compiler/Dialect/HAL/Transforms/DumpExecutableBenchmarks.cpp
index 7ad1f4a..9d7cc63 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Transforms/DumpExecutableBenchmarks.cpp
+++ b/compiler/src/iree/compiler/Dialect/HAL/Transforms/DumpExecutableBenchmarks.cpp
@@ -41,7 +41,6 @@
 using Vec3 = std::tuple<unsigned, unsigned, unsigned>;
 
 struct Binding {
-  unsigned set = 0;
   unsigned binding = 0;
   int64_t size = 0;
 };
@@ -119,15 +118,9 @@
 
       // Work around needing a mutable key for the set; C++ was a mistake.
       dispatchOp.forEachEntryPointAttr([&](SymbolRefAttr entryPointAttr) {
-        auto exportOp =
-            symbolTable.lookupNearestSymbolFrom<IREE::HAL::ExecutableExportOp>(
-                dispatchOp, entryPointAttr);
-        auto bindingAttrs = IREE::HAL::getInterfaceBindingAttrs(
-            exportOp, dispatchOp.getResources().size());
-
         SmallVector<Binding> bindings;
-        for (auto [bindingAttr, resourceLength] :
-             llvm::zip_equal(bindingAttrs, dispatchOp.getResourceLengths())) {
+        for (auto [i, resourceLength] :
+             llvm::enumerate(dispatchOp.getResourceLengths())) {
           APInt resourceLengthInt;
           if (!matchPattern(resourceLength,
                             m_ConstantInt(&resourceLengthInt))) {
@@ -136,9 +129,7 @@
                                     << "` (non-constant resource length)\n";);
             return;
           }
-          bindings.push_back({(unsigned)bindingAttr.getSet(),
-                              (unsigned)bindingAttr.getBinding(),
-                              resourceLengthInt.getSExtValue()});
+          bindings.push_back({(unsigned)i, resourceLengthInt.getSExtValue()});
         }
 
         auto &dispatchParamsSet = map[entryPointAttr];
@@ -297,47 +288,24 @@
               /*binding_capacity=*/Value{})
           .getResult();
 
-  // Get the layout required to set up the dispatches.
+  // Constant values.
   auto layoutAttr = exportOp.getLayoutAttr();
-  auto pipelineLayout =
-      funcBuilder
-          .create<IREE::HAL::PipelineLayoutLookupOp>(
-              loc, IREE::HAL::PipelineLayoutType::get(loc.getContext()), device,
-              layoutAttr)
-          .getResult();
-
-  // Push constant values.
-  if (int64_t pushConstantCount = layoutAttr.getPushConstants()) {
-    int pushConstantBase = 0; // always 0 today
-    SmallVector<Value> pushConstants;
-    pushConstants.reserve(pushConstantCount);
+  SmallVector<Value> constantValues;
+  if (int64_t pushConstantCount = layoutAttr.getConstants()) {
+    constantValues.reserve(pushConstantCount);
     for (int64_t i = 0; i < pushConstantCount; ++i) {
-      pushConstants.push_back(funcBuilder.create<arith::ConstantOp>(
+      constantValues.push_back(funcBuilder.create<arith::ConstantOp>(
           loc, dispatchParams.uniformOperands[i]));
     }
-    funcBuilder.create<IREE::HAL::CommandBufferPushConstantsOp>(
-        loc, commandBuffer, pipelineLayout,
-        funcBuilder.getIndexAttr(pushConstantBase), pushConstants);
   }
 
-  // Push descriptor sets.
+  // Binding values.
   Value buffer =
       bufferGlobalOp.createLoadOp(loc, funcBuilder).getLoadedGlobalValue();
-  int64_t currentSet = -1;
-  SmallVector<IREE::HAL::DescriptorSetBindingValue> bindingValues;
-  auto flushSet = [&]() {
-    funcBuilder.create<IREE::HAL::CommandBufferPushDescriptorSetOp>(
-        loc, commandBuffer, pipelineLayout, currentSet, bindingValues);
-    bindingValues.clear();
-  };
+  SmallVector<BindingValue> bindingValues;
   int64_t bufferOffset = 0;
   for (auto binding : dispatchParams.bindings) {
-    if (currentSet != -1 && currentSet != binding.set)
-      flushSet();
-    currentSet = binding.set;
-    IREE::HAL::DescriptorSetBindingValue bindingValue;
-    bindingValue.ordinal =
-        funcBuilder.create<arith::ConstantIndexOp>(loc, binding.binding);
+    BindingValue bindingValue;
     bindingValue.buffer = buffer;
     bindingValue.byteOffset = indexSet.get(bufferOffset);
     bindingValue.byteLength = indexSet.get(binding.size);
@@ -345,8 +313,6 @@
     bufferOffset =
         IREE::Util::align(bufferOffset + binding.size, kBufferAlignment);
   }
-  if (currentSet != -1)
-    flushSet();
 
   // @executable::@variant::@export
   auto exportRefAttr =
@@ -378,12 +344,10 @@
       loc, indexSet.get(0), batchSizeArg, indexSet.get(1), ValueRange{},
       [&](OpBuilder &forBuilder, Location loc, Value iv, ValueRange iters) {
         // Dispatch.
-        auto flags = forBuilder.getAttr<IREE::HAL::DispatchFlagsAttr>(
-            IREE::HAL::DispatchFlags::None);
         forBuilder.create<IREE::HAL::CommandBufferDispatchOp>(
             loc, commandBuffer, executable, ordinal,
-            workgroupCountOp.getWorkgroupX(), workgroupCountOp.getWorkgroupY(),
-            workgroupCountOp.getWorkgroupZ(), flags);
+            workgroupCountOp.getResults(), constantValues, bindingValues,
+            IREE::HAL::DispatchFlags::None);
 
         // Barrier following the dispatch to block the next dispatch.
         auto sourceStage = IREE::HAL::ExecutionStageBitfield::CommandRetire |
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Transforms/ElideRedundantCommands.cpp b/compiler/src/iree/compiler/Dialect/HAL/Transforms/ElideRedundantCommands.cpp
index 2d22a86..f496b19 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Transforms/ElideRedundantCommands.cpp
+++ b/compiler/src/iree/compiler/Dialect/HAL/Transforms/ElideRedundantCommands.cpp
@@ -24,36 +24,7 @@
 
 namespace {
 
-struct DescriptorState {
-  Value buffer;
-  Value offset;
-  Value length;
-};
-
-struct DescriptorSetState {
-  Value pipelineLayout;
-  SmallVector<DescriptorState, 32> descriptors;
-
-  DescriptorState &getDescriptor(int64_t index) {
-    if (index >= descriptors.size()) {
-      descriptors.resize(index + 1);
-    }
-    return descriptors[index];
-  }
-
-  void clear() {
-    pipelineLayout = {};
-    descriptors.clear();
-  }
-};
-
 struct CommandBufferState {
-  // Push constants can only be reused with compatible layouts.
-  Value pushConstantLayout;
-  SmallVector<Value, 32> pushConstants;
-
-  SmallVector<DescriptorSetState, 4> descriptorSets;
-
   // Set after we know a full barrier has been issued; any subsequent barrier
   // until a real operation is redundant. We could track more fine-grained state
   // here such as which stages are being waited on.
@@ -61,26 +32,6 @@
   // been passed as a function/branch argument and we don't have visibility.
   // We need to use IPO to track that.
   IREE::HAL::CommandBufferExecutionBarrierOp previousFullBarrier;
-
-  Value &getPushConstant(int64_t index) {
-    if (index >= pushConstants.size()) {
-      pushConstants.resize(index + 1);
-    }
-    return pushConstants[index];
-  }
-
-  DescriptorSetState *getDescriptorSet(Value set) {
-    APInt setInt;
-    if (!matchPattern(set, m_ConstantInt(&setInt))) {
-      // Dynamic set value; not analyzable with this approach.
-      return nullptr;
-    }
-    int64_t index = setInt.getSExtValue();
-    if (index >= descriptorSets.size()) {
-      descriptorSets.resize(index + 1);
-    }
-    return &descriptorSets[index];
-  }
 };
 
 using CommandBufferStateMap = DenseMap<Value, CommandBufferState>;
@@ -110,98 +61,6 @@
   }
 }
 
-static LogicalResult processOp(IREE::HAL::CommandBufferPushConstantsOp op,
-                               CommandBufferState &state) {
-  // Push constant state is only shared with the same layout.
-  if (state.pushConstantLayout != op.getPipelineLayout()) {
-    state.pushConstantLayout = op.getPipelineLayout();
-    state.pushConstants.clear();
-  }
-
-  // Today we only eat constants from the beginning or end of the range
-  // (hopefully removing the entire op). Sparse constant sets aren't worth it.
-  int64_t baseIndex = op.getOffset().getSExtValue();
-  llvm::BitVector redundantIndices(op.getValues().size());
-  for (auto value : llvm::enumerate(op.getValues())) {
-    auto &stateValue = state.getPushConstant(baseIndex + value.index());
-    if (value.value() == stateValue) {
-      // Redundant value.
-      redundantIndices.set(value.index());
-    } else {
-      stateValue = value.value();
-    }
-  }
-  if (redundantIndices.none())
-    return success(); // no-op
-
-  // If all bits are set we can just kill the op.
-  if (redundantIndices.all()) {
-    op.erase();
-    return success();
-  }
-
-  int lastRedundant = redundantIndices.find_last();
-  int lastNonRedundant = redundantIndices.find_last_unset();
-  if (lastRedundant != -1 && lastRedundant > lastNonRedundant) {
-    // Eat the last few constants.
-    int redundantCount = redundantIndices.size() - lastRedundant;
-    op.getValuesMutable().erase(lastRedundant, redundantCount);
-  }
-
-  int firstRedundant = redundantIndices.find_first();
-  int firstNonRedundant = redundantIndices.find_first_unset();
-  if (firstRedundant != -1 && firstRedundant < firstNonRedundant) {
-    // Eat the first few constants by adjusting the offset and slicing out the
-    // values.
-    op.setOffsetAttr(Builder(op).getIndexAttr(baseIndex + firstRedundant + 1));
-    op.getValuesMutable().erase(0, firstRedundant + 1);
-  }
-
-  assert(op.getValues().size() > 0 && "should not have removed all");
-  return success();
-}
-
-static LogicalResult processOp(IREE::HAL::CommandBufferPushDescriptorSetOp op,
-                               CommandBufferState &state) {
-  auto *setState = state.getDescriptorSet(op.getSet());
-  if (!setState)
-    return failure();
-
-  bool isLayoutEqual = setState->pipelineLayout == op.getPipelineLayout();
-  setState->pipelineLayout = op.getPipelineLayout();
-
-  int64_t descriptorCount = op.getBindingBuffers().size();
-  llvm::BitVector redundantIndices(descriptorCount);
-  for (int64_t index = 0; index < descriptorCount; ++index) {
-    auto &descriptor = setState->getDescriptor(index);
-    auto buffer = op.getBindingBuffers()[index];
-    auto offset = op.getBindingOffsets()[index];
-    auto length = op.getBindingLengths()[index];
-    if (descriptor.buffer == buffer && descriptor.offset == offset &&
-        descriptor.length == length) {
-      // Redundant descriptor.
-      redundantIndices.set(index);
-    } else {
-      descriptor.buffer = buffer;
-      descriptor.offset = offset;
-      descriptor.length = length;
-    }
-  }
-
-  // Bail early if no redundant bindings.
-  if (isLayoutEqual && redundantIndices.none()) {
-    return success(); // no-op
-  }
-
-  // If all bits are set we can just kill the op.
-  if (isLayoutEqual && redundantIndices.all()) {
-    op.erase();
-    return success();
-  }
-
-  return success();
-}
-
 //===----------------------------------------------------------------------===//
 // --iree-hal-elide-redundant-commands
 //===----------------------------------------------------------------------===//
@@ -241,18 +100,6 @@
               .Case([&](IREE::HAL::CommandBufferExecutionBarrierOp op) {
                 processOp(op, stateMap[op.getCommandBuffer()]);
               })
-              .Case([&](IREE::HAL::CommandBufferPushConstantsOp op) {
-                resetCommandBufferBarrierBit(op);
-                if (failed(processOp(op, stateMap[op.getCommandBuffer()]))) {
-                  invalidateState(op.getCommandBuffer());
-                }
-              })
-              .Case([&](IREE::HAL::CommandBufferPushDescriptorSetOp op) {
-                resetCommandBufferBarrierBit(op);
-                if (failed(processOp(op, stateMap[op.getCommandBuffer()]))) {
-                  invalidateState(op.getCommandBuffer());
-                }
-              })
               .Case<IREE::HAL::CommandBufferDeviceOp,
                     IREE::HAL::CommandBufferBeginDebugGroupOp,
                     IREE::HAL::CommandBufferEndDebugGroupOp,
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Transforms/MaterializeInterfaces.cpp b/compiler/src/iree/compiler/Dialect/HAL/Transforms/MaterializeInterfaces.cpp
index 37d9603..9f3bee7 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Transforms/MaterializeInterfaces.cpp
+++ b/compiler/src/iree/compiler/Dialect/HAL/Transforms/MaterializeInterfaces.cpp
@@ -263,19 +263,14 @@
 makePipelineLayoutAttr(const PipelineLayout &pipelineLayout,
                        IREE::HAL::ExecutableTargetAttr targetAttr,
                        OpBuilder &builder) {
-  SmallVector<IREE::HAL::DescriptorSetLayoutAttr> setLayoutAttrs;
-  for (const auto &setLayout : pipelineLayout.setLayouts) {
-    SmallVector<IREE::HAL::DescriptorSetBindingAttr> bindingAttrs;
-    for (const auto &binding : setLayout.bindings) {
-      bindingAttrs.push_back(IREE::HAL::DescriptorSetBindingAttr::get(
-          builder.getContext(), binding.ordinal, binding.type, binding.flags));
-    }
-    setLayoutAttrs.push_back(IREE::HAL::DescriptorSetLayoutAttr::get(
-        builder.getContext(), setLayout.ordinal, bindingAttrs,
-        setLayout.flags));
+  SmallVector<IREE::HAL::PipelineBindingAttr> bindingAttrs;
+  for (const auto &binding : pipelineLayout.bindings) {
+    bindingAttrs.push_back(IREE::HAL::PipelineBindingAttr::get(
+        builder.getContext(), binding.type, binding.flags));
   }
-  return IREE::HAL::PipelineLayoutAttr::get(
-      builder.getContext(), pipelineLayout.pushConstantCount, setLayoutAttrs);
+  return IREE::HAL::PipelineLayoutAttr::get(builder.getContext(), bindingAttrs,
+                                            pipelineLayout.constantCount,
+                                            pipelineLayout.flags);
 }
 
 // Converts the usage of the given primitive |arg| to interface methods.
@@ -297,8 +292,8 @@
 static void
 convertBindingUsage(mlir::FunctionOpInterface sourceFuncOp, BlockArgument arg,
                     IREE::HAL::PipelineLayoutAttr pipelineLayoutAttr,
-                    IREE::HAL::DescriptorSetLayoutAttr setLayoutAttr,
-                    IREE::HAL::DescriptorSetBindingAttr bindingAttr) {
+                    int64_t bindingOrdinal,
+                    IREE::HAL::PipelineBindingAttr bindingAttr) {
   if (arg.use_empty())
     return; // no-op
   for (auto &use : llvm::make_early_inc_range(arg.getUses())) {
@@ -309,8 +304,7 @@
         arg.getArgNumber(), "stream.alignment");
     auto newOp = builder.create<IREE::HAL::InterfaceBindingSubspanOp>(
         oldOp.getLoc(), oldOp.getType(), pipelineLayoutAttr,
-        APInt(64, setLayoutAttr.getOrdinal()),
-        APInt(64, bindingAttr.getOrdinal()), oldOp.getByteOffset(),
+        APInt(64, bindingOrdinal), oldOp.getByteOffset(),
         oldOp.getDynamicDims(), alignmentAttr, bindingAttr.getFlags());
     oldOp.replaceAllUsesWith(newOp.getResult());
     oldOp.erase();
@@ -347,13 +341,10 @@
     if (!llvm::isa<IREE::Stream::BindingType>(arg.getType())) {
       continue; // unhandled arg type (primitive/etc)
     }
-    auto setBinding = resourceMap[resourceIdx++];
-    auto setLayoutAttr = layoutAttr.getSetLayout(setBinding.first);
-    assert(setLayoutAttr && "layout must be consistent");
-    auto bindingAttr = setLayoutAttr.getBinding(setBinding.second);
+    auto binding = resourceMap[resourceIdx++];
+    auto bindingAttr = layoutAttr.getBinding(binding);
     assert(bindingAttr && "layout must be consistent");
-    convertBindingUsage(sourceFuncOp, arg, layoutAttr, setLayoutAttr,
-                        bindingAttr);
+    convertBindingUsage(sourceFuncOp, arg, layoutAttr, binding, bindingAttr);
   }
 
   // Remove all arguments now that we've turned them into lookup ops.
@@ -396,7 +387,7 @@
         exportOp->getAttrOfType<IREE::HAL::PipelineLayoutAttr>(
             "hal.interface.layout");
     const auto &pipelineLayout = layoutAnalysis.getPipelineLayout(exportOp);
-    const PipelineResourceMap &resourceMap = pipelineLayout.resourceMap;
+    const auto &resourceMap = pipelineLayout.resourceMap;
 
     // Clone the updated function declaration into each variant.
     ExportExpansions exportExpansions;
@@ -444,17 +435,6 @@
       exportExpansions[oldRefAttr].push_back(
           std::make_pair(newRefAttr, variantOp.getTargetAttr()));
 
-      // Annotate the export with the a mapping of the resources to the
-      // interface bindings. This is used during conversion.
-      SmallVector<Attribute> bindingAttrs;
-      for (auto setBinding : resourceMap) {
-        bindingAttrs.push_back(IREE::HAL::InterfaceBindingAttr::get(
-            newExportOp.getContext(), setBinding.first, setBinding.second));
-      }
-      newExportOp->setAttr(
-          "hal.interface.bindings",
-          ArrayAttr::get(newExportOp.getContext(), bindingAttrs));
-
       // Clone the workgroup count calculation function.
       if (!exportOp.getWorkgroupCount().empty()) {
         mlir::IRMapping mapper;
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Transforms/MaterializeResourceCaches.cpp b/compiler/src/iree/compiler/Dialect/HAL/Transforms/MaterializeResourceCaches.cpp
index 16c57e2..63b49fd 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Transforms/MaterializeResourceCaches.cpp
+++ b/compiler/src/iree/compiler/Dialect/HAL/Transforms/MaterializeResourceCaches.cpp
@@ -32,37 +32,10 @@
 
 namespace {
 
-// TODO(#18154): switch default to true and then remove.
-static llvm::cl::opt<bool> clExperimentalExecutableCreate2{
-    "iree-hal-experimental-executable-create2",
-    llvm::cl::desc("Whether to emit iree_hal_executable_create2 ops."),
-    llvm::cl::init(false),
-};
-
 //===----------------------------------------------------------------------===//
 // --iree-hal-materialize-resource-caches
 //===----------------------------------------------------------------------===//
 
-struct DescriptorSetLayout {
-  // All locations that use the layout.
-  SetVector<Location> locs;
-  // Value within the initializer once materialized.
-  Value initializerValue;
-};
-using DescriptorSetLayoutKey =
-    std::pair<ArrayAttr, IREE::HAL::DescriptorSetLayoutFlags>;
-
-struct PipelineLayout {
-  // All locations that use the layout.
-  SetVector<Location> locs;
-  // Lookup ops for this layout.
-  SmallVector<IREE::HAL::PipelineLayoutLookupOp> lookupOps;
-  // Global once materialized.
-  IREE::Util::GlobalOpInterface globalOp;
-  // Value within the initializer once materialized.
-  Value initializerValue;
-};
-
 struct Executable {
   // All locations that use the executable.
   SetVector<Location> locs;
@@ -76,22 +49,22 @@
 
 struct DeviceResources {
   DeviceResources() = default;
-  explicit DeviceResources(IREE::Util::GlobalOpInterface deviceOp)
-      : deviceOp(deviceOp) {}
+  explicit DeviceResources(IREE::Util::GlobalOpInterface deviceOp,
+                           DeviceAnalysis &deviceAnalysis)
+      : deviceOp(deviceOp),
+        deviceSet(deviceAnalysis.lookupDeviceTargets(deviceOp).value_or(
+            DeviceSet())) {}
 
   // Global !hal.device.
   IREE::Util::GlobalOpInterface deviceOp;
 
+  // Analyzed device targets.
+  DeviceSet deviceSet;
+
   // Fallback devices that should be checked for resources.
   // These are derived from the transitive set of #hal.device.fallback attrs.
   SetVector<DeviceResources *> fallbackDeviceResources;
 
-  // Descriptor set layouts used on the device, keyed by [bindingAttrs, flags].
-  llvm::MapVector<DescriptorSetLayoutKey, DescriptorSetLayout>
-      descriptorSetLayouts;
-  // Pipeline layouts used on the device, keyed by layout attr.
-  llvm::MapVector<IREE::HAL::PipelineLayoutAttr, PipelineLayout>
-      pipelineLayouts;
   // Executables used on the device, keyed by name.
   llvm::MapVector<StringAttr, Executable> executables;
 };
@@ -106,41 +79,6 @@
   return prefixedName.str();
 }
 
-static void declareDevicePipelineLayout(IREE::Util::GlobalOpInterface deviceOp,
-                                        PipelineLayout &pipelineLayout,
-                                        size_t pipelineLayoutIndex,
-                                        OpBuilder &moduleBuilder) {
-  // Create global in the module.
-  auto symbolName = getDeviceNamePrefix(deviceOp) + "_pipeline_layout_" +
-                    std::to_string(pipelineLayoutIndex);
-  LLVM_DEBUG(DBGS() << "+ creating device `"
-                    << deviceOp.getGlobalName().getValue()
-                    << "` pipeline global `" << symbolName << "`\n");
-  auto layoutType = moduleBuilder.getType<PipelineLayoutType>();
-  auto globalOp = moduleBuilder.create<IREE::Util::GlobalOp>(
-      moduleBuilder.getFusedLoc(llvm::to_vector(pipelineLayout.locs)),
-      symbolName,
-      /*isMutable=*/false, layoutType);
-  globalOp.setPrivate();
-  pipelineLayout.globalOp = globalOp;
-
-  // Replace lookups with the global.
-  for (auto lookupOp : pipelineLayout.lookupOps) {
-    LLVM_DEBUG({
-      DBGS() << "  - replacing lookup: ";
-      lookupOp.print(llvm::dbgs());
-      llvm::dbgs() << "\n";
-    });
-    OpBuilder lookupBuilder(lookupOp);
-    auto loadedValue =
-        pipelineLayout.globalOp.createLoadOp(lookupOp.getLoc(), lookupBuilder)
-            .getLoadedGlobalValue();
-    lookupOp.replaceAllUsesWith(loadedValue);
-    lookupOp.erase();
-  }
-  pipelineLayout.lookupOps.clear();
-}
-
 static void declareDeviceExecutable(IREE::Util::GlobalOpInterface deviceOp,
                                     Executable &executable,
                                     size_t executableIndex,
@@ -177,17 +115,6 @@
   executable.lookupOps.clear();
 }
 
-static DescriptorSetLayoutKey
-getDescriptorSetLayoutKey(IREE::HAL::DescriptorSetLayoutAttr setLayoutAttr) {
-  auto bindingAttrs =
-      llvm::to_vector_of<Attribute>(setLayoutAttr.getBindings());
-  return DescriptorSetLayoutKey{
-      ArrayAttr::get(setLayoutAttr.getContext(), bindingAttrs),
-      setLayoutAttr.getFlags().value_or(
-          IREE::HAL::DescriptorSetLayoutFlags::None),
-  };
-}
-
 // Inlines a constant block as a function in |moduleBuilder| and then inserts
 // a call to it in |callerBuilder|.
 static SmallVector<Value> inlineConstantBlockOp(
@@ -268,31 +195,13 @@
           blockName, blockOp, moduleBuilder, caseBuilder, initializerDevice));
     }
 
-    Value executableValue;
-    if (clExperimentalExecutableCreate2) {
-      executableValue =
-          caseBuilder.createOrFold<IREE::HAL::ExecutableCreate2Op>(
-              loc, executableType, initializerDevice,
-              SymbolRefAttr::get(
-                  executable.executableOp.getSymNameAttr(),
-                  {SymbolRefAttr::get(variantOp.getSymNameAttr())}),
-              constantValues);
-    } else {
-      // Gather each of the pipeline layouts needed for each entry point in
-      // the executable.
-      SmallVector<Value> pipelineLayoutValues;
-      for (auto exportOp : variantOp.getExportOps()) {
-        auto &pipelineLayout =
-            deviceResources.pipelineLayouts[exportOp.getLayoutAttr()];
-        pipelineLayoutValues.push_back(pipelineLayout.initializerValue);
-      }
-
-      executableValue = caseBuilder.createOrFold<IREE::HAL::ExecutableCreateOp>(
-          loc, executableType, initializerDevice,
-          SymbolRefAttr::get(executable.executableOp.getSymNameAttr(),
-                             {SymbolRefAttr::get(variantOp.getSymNameAttr())}),
-          pipelineLayoutValues, constantValues);
-    }
+    Value executableValue =
+        caseBuilder.createOrFold<IREE::HAL::ExecutableCreateOp>(
+            loc, executableType, initializerDevice,
+            SymbolRefAttr::get(
+                executable.executableOp.getSymNameAttr(),
+                {SymbolRefAttr::get(variantOp.getSymNameAttr())}),
+            constantValues);
 
     caseBuilder.create<scf::YieldOp>(loc, executableValue);
   }
@@ -327,39 +236,6 @@
                                       OpBuilder &moduleBuilder,
                                       Value initializerDevice,
                                       OpBuilder &initializerBuilder) {
-  // Initialize all descriptor set layouts for use by the pipeline layouts.
-  auto setLayoutType = initializerBuilder.getType<DescriptorSetLayoutType>();
-  for (auto [i, it] : llvm::enumerate(deviceResources.descriptorSetLayouts)) {
-    auto [bindingAttrs, flags] = it.first;
-    auto &descriptorSetLayout = it.second;
-    descriptorSetLayout.initializerValue =
-        initializerBuilder.createOrFold<IREE::HAL::DescriptorSetLayoutCreateOp>(
-            initializerBuilder.getFusedLoc(
-                llvm::to_vector(descriptorSetLayout.locs)),
-            setLayoutType, initializerDevice, flags, bindingAttrs);
-  }
-
-  // Initialize all pipeline layouts required for executable creation.
-  auto pipelineLayoutType = initializerBuilder.getType<PipelineLayoutType>();
-  for (auto [i, it] : llvm::enumerate(deviceResources.pipelineLayouts)) {
-    auto &[layoutAttr, pipelineLayout] = it;
-    SmallVector<Value> setLayoutValues;
-    for (auto setLayoutAttr : layoutAttr.getSetLayouts()) {
-      auto key = getDescriptorSetLayoutKey(setLayoutAttr);
-      setLayoutValues.push_back(
-          deviceResources.descriptorSetLayouts[key].initializerValue);
-    }
-    pipelineLayout.initializerValue =
-        initializerBuilder.createOrFold<IREE::HAL::PipelineLayoutCreateOp>(
-            pipelineLayout.globalOp.getLoc(), pipelineLayoutType,
-            initializerDevice,
-            initializerBuilder.getIndexAttr(layoutAttr.getPushConstants()),
-            setLayoutValues);
-    pipelineLayout.globalOp.createStoreOp(pipelineLayout.globalOp.getLoc(),
-                                          pipelineLayout.initializerValue,
-                                          initializerBuilder);
-  }
-
   // Initialize all executables.
   for (auto [i, it] : llvm::enumerate(deviceResources.executables)) {
     auto &[executableName, executable] = it;
@@ -375,20 +251,6 @@
                                          DeviceResources &fallbackResources,
                                          Value initializerDevice,
                                          OpBuilder &initializerBuilder) {
-  // Load fallback pipeline layouts for all required by this device.
-  for (auto &[layoutAttr, pipelineLayout] : deviceResources.pipelineLayouts) {
-    auto fallbackGlobalOp =
-        fallbackResources.pipelineLayouts[layoutAttr].globalOp;
-    assert(fallbackGlobalOp && "should have created global");
-    Value fallbackPipelineLayout =
-        fallbackGlobalOp
-            .createLoadOp(pipelineLayout.globalOp.getLoc(), initializerBuilder)
-            .getLoadedGlobalValue();
-    pipelineLayout.globalOp.createStoreOp(pipelineLayout.globalOp.getLoc(),
-                                          fallbackPipelineLayout,
-                                          initializerBuilder);
-  }
-
   // Load fallback executables for all required by this device.
   for (auto &[executableName, executable] : deviceResources.executables) {
     auto fallbackGlobalOp =
@@ -501,7 +363,7 @@
                       << deviceOp.getGlobalName().getValue()
                       << "` resources...\n");
     allDeviceResources.try_emplace(deviceOp.getGlobalName(),
-                                   DeviceResources(deviceOp));
+                                   DeviceResources(deviceOp, deviceAnalysis));
   }
 
   // Link fallbacks between the resources.
@@ -538,31 +400,7 @@
     for (auto &block : funcOp.getFunctionBody()) {
       if (block
               .walk([&](Operation *op) -> WalkResult {
-                if (auto lookupOp = dyn_cast<PipelineLayoutLookupOp>(op)) {
-                  auto *deviceResources =
-                      tryGetDeviceResources(lookupOp, lookupOp.getDevice());
-                  if (!deviceResources) {
-                    return WalkResult::interrupt();
-                  }
-                  auto layoutAttr = lookupOp.getLayoutAttr();
-                  LLVM_DEBUG(DBGS()
-                             << "+ requiring pipeline layout from lookup: `"
-                             << layoutAttr << "`\n");
-                  auto &pipelineLayout =
-                      deviceResources->pipelineLayouts[layoutAttr];
-                  pipelineLayout.locs.insert(lookupOp.getLoc());
-                  pipelineLayout.lookupOps.push_back(lookupOp);
-                  for (auto setLayoutAttr : layoutAttr.getSetLayouts()) {
-                    LLVM_DEBUG(
-                        DBGS()
-                        << "+ requiring descriptor set layout from lookup: `"
-                        << setLayoutAttr << "`\n");
-                    auto key = getDescriptorSetLayoutKey(setLayoutAttr);
-                    auto &setLayout =
-                        deviceResources->descriptorSetLayouts[key];
-                    setLayout.locs.insert(lookupOp.getLoc());
-                  }
-                } else if (auto lookupOp = dyn_cast<ExecutableLookupOp>(op)) {
+                if (auto lookupOp = dyn_cast<ExecutableLookupOp>(op)) {
                   auto *deviceResources =
                       tryGetDeviceResources(lookupOp, lookupOp.getDevice());
                   if (!deviceResources) {
@@ -589,24 +427,6 @@
     for (auto &[executableName, executable] : deviceResources.executables) {
       executable.executableOp =
           symbolTable.lookup<IREE::HAL::ExecutableOp>(executableName);
-      for (auto variantOp :
-           executable.executableOp.getOps<IREE::HAL::ExecutableVariantOp>()) {
-        for (auto exportOp : variantOp.getExportOps()) {
-          auto layoutAttr = exportOp.getLayoutAttr();
-          LLVM_DEBUG(DBGS() << "+ requiring pipeline layout from export: `"
-                            << layoutAttr << "`\n");
-          auto &pipelineLayout = deviceResources.pipelineLayouts[layoutAttr];
-          pipelineLayout.locs.insert(exportOp.getLoc());
-          for (auto setLayoutAttr : layoutAttr.getSetLayouts()) {
-            LLVM_DEBUG(DBGS()
-                       << "+ requiring descriptor set layout from export: `"
-                       << setLayoutAttr << "`\n");
-            auto key = getDescriptorSetLayoutKey(setLayoutAttr);
-            auto &setLayout = deviceResources.descriptorSetLayouts[key];
-            setLayout.locs.insert(exportOp.getLoc());
-          }
-        }
-      }
     }
   }
 
@@ -622,19 +442,6 @@
           DBGS() << "-> requiring fallback resources from device `"
                  << fallbackResources->deviceOp.getGlobalName().getValue()
                  << "`\n");
-      for (auto [setKey, setLayout] : deviceResources.descriptorSetLayouts) {
-        auto &fallbackSetLayout =
-            fallbackResources->descriptorSetLayouts[setKey];
-        fallbackSetLayout.locs.insert(setLayout.locs.begin(),
-                                      setLayout.locs.end());
-      }
-      for (auto [layoutAttr, pipelineLayout] :
-           deviceResources.pipelineLayouts) {
-        auto &fallbackPipelineLayout =
-            fallbackResources->pipelineLayouts[layoutAttr];
-        fallbackPipelineLayout.locs.insert(pipelineLayout.locs.begin(),
-                                           pipelineLayout.locs.end());
-      }
       for (auto [executableName, executable] : deviceResources.executables) {
         auto &fallbackExecutable =
             fallbackResources->executables[executableName];
@@ -676,8 +483,7 @@
                         << deviceResources.deviceOp.getGlobalName().getValue()
                         << "` resources...\n");
       // Skip devices with no resources.
-      if (deviceResources.pipelineLayouts.empty() &&
-          deviceResources.executables.empty()) {
+      if (deviceResources.executables.empty()) {
         LLVM_DEBUG(DBGS() << "~ skipping device with no resources\n");
         continue;
       }
@@ -696,11 +502,6 @@
       // lookup ops to reference them.
       OpBuilder moduleBuilder(moduleOp);
       moduleBuilder.setInsertionPointAfter(deviceResources.deviceOp);
-      for (auto [i, it] : llvm::enumerate(deviceResources.pipelineLayouts)) {
-        auto &[layoutAttr, pipelineLayout] = it;
-        declareDevicePipelineLayout(deviceResources.deviceOp, pipelineLayout, i,
-                                    moduleBuilder);
-      }
       for (auto [i, it] : llvm::enumerate(deviceResources.executables)) {
         auto &[executableName, executable] = it;
         declareDeviceExecutable(deviceResources.deviceOp, executable, i,
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/capture_executable_sources.mlir b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/capture_executable_sources.mlir
index e7838d6..c5e799a 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/capture_executable_sources.mlir
+++ b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/capture_executable_sources.mlir
@@ -1,12 +1,10 @@
 // RUN: iree-opt --split-input-file --pass-pipeline='builtin.module(iree-hal-capture-executable-sources{stage=configured})' %s | FileCheck %s
 
 #executable_target = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64">
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK-DAG: #[[EX0_VARIANT0_LOC:.+]] = loc("module_ex0_variant0.configured.mlir"
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/convert_to_hal.mlir b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/convert_to_hal.mlir
index 1de9b90..6d443f5 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/convert_to_hal.mlir
+++ b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/convert_to_hal.mlir
@@ -7,35 +7,20 @@
 #executable_target_embedded_elf_aarch64 = #hal.executable.target<"llvm-cpu", "embedded-elf-aarch64">
 #executable_target_embedded_elf_x86_64 = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64">
 
-// CHECK: #[[PIPELINE_LAYOUT_ATTR_0:.+]] = #hal.pipeline.layout
-#pipeline_layout_0 = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    // CHECK-SAME: <0, storage_buffer>
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    // CHECK-SAME: <1, storage_buffer>
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    // CHECK-SAME: <2, storage_buffer>
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
-]>
-// CHECK: #[[PIPELINE_LAYOUT_ATTR_1:.+]] = #hal.pipeline.layout
-#pipeline_layout_1 = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    // CHECK-SAME: <4, storage_buffer>
-    #hal.descriptor_set.binding<4, storage_buffer>
-  ]>,
-  #hal.descriptor_set.layout<1, bindings = [
-    // CHECK-SAME: <5, storage_buffer>
-    #hal.descriptor_set.binding<5, storage_buffer>,
-    // CHECK-SAME: <6, storage_buffer>
-    #hal.descriptor_set.binding<6, storage_buffer>
-  ]>
+// CHECK: #[[PIPELINE_LAYOUT_ATTR:.+]] = #hal.pipeline.layout
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  // CHECK-SAME: #hal.pipeline.binding<storage_buffer>
+  #hal.pipeline.binding<storage_buffer>,
+  // CHECK-SAME: #hal.pipeline.binding<storage_buffer>
+  #hal.pipeline.binding<storage_buffer>,
+  // CHECK-SAME: #hal.pipeline.binding<storage_buffer>
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK: hal.executable private @ex
 hal.executable private @ex {
   hal.executable.variant public @embedded_elf_aarch64 target(#executable_target_embedded_elf_aarch64) {
-    hal.executable.export public @dispatch ordinal(0) layout(#pipeline_layout_0) {
+    hal.executable.export public @dispatch ordinal(0) layout(#pipeline_layout) {
     ^bb0(%device: !hal.device, %arg0: index, %arg1: index, %arg2: index):  // no predecessors
       %c1 = arith.constant 1 : index
       %0 = affine.apply affine_map<()[s0] -> (s0 ceildiv 4)>()[%arg0]
@@ -46,15 +31,7 @@
     }
   }
   hal.executable.variant public @embedded_elf_x86_64 target(#executable_target_embedded_elf_x86_64) {
-    hal.executable.export public @dispatch ordinal(0) layout(#pipeline_layout_1) attributes {
-      // Override the bindings. The other variant uses the default ones.
-      // CHECK-NOT: hal.interface.bindings
-      hal.interface.bindings = [
-        #hal.interface.binding<0, 4>,
-        #hal.interface.binding<1, 5>,
-        #hal.interface.binding<1, 6>
-      ]
-    } {
+    hal.executable.export public @dispatch ordinal(0) layout(#pipeline_layout) {
     ^bb0(%device: !hal.device, %arg0: index, %arg1: index, %arg2: index):  // no predecessors
       %c1 = arith.constant 1 : index
       %0 = affine.apply affine_map<()[s0] -> (s0 ceildiv 4)>()[%arg0]
@@ -108,9 +85,8 @@
 
   // CHECK: %[[CMD:.+]] = hal.command_buffer.create
   // CHECK-SAME: device(%[[DEVICE]] : !hal.device)
-  // CHECK-SAME: mode("OneShot|AllowInlineExecution")
   // CHECK-SAME: categories("Transfer|Dispatch")
-  %timepoint = stream.cmd.execute
+  %timepoint = stream.cmd.execute once
       with(%arg0_resource as %arg0_capture: !stream.resource<external>{%c16},
             %arg1_resource as %arg1_capture: !stream.resource<external>{%c16},
             %result_resource as %result_capture: !stream.resource<external>{%c16}) {
@@ -121,42 +97,28 @@
     // CHECK-DAG: %[[SWITCH0:.+]] = arith.select %[[FORMAT_AARCH64]], %c0, %[[SWITCH1]]
     // CHECK: scf.index_switch %[[SWITCH0]]
     // CHECK: case 0 {
-    // CHECK:   %[[PIPELINE_LAYOUT_0:.+]] = hal.pipeline_layout.lookup
-    // CHECK-SAME: device(%[[DEVICE]] : !hal.device)
-    // CHECK-SAME: layout(#[[PIPELINE_LAYOUT_ATTR_0]]) : !hal.pipeline_layout
-    // CHECK:   hal.command_buffer.push_descriptor_set<%[[CMD]] : !hal.command_buffer>
-    // CHECK-SAME: layout(%[[PIPELINE_LAYOUT_0]] : !hal.pipeline_layout)[%c0]
-    // CHECK-SAME: bindings([
-    // CHECK:     %c0 = (%[[ARG0_BUFFER]] : !hal.buffer)[%c0, %c16],
-    // CHECK:     %c1 = (%[[ARG1_BUFFER]] : !hal.buffer)[%c0, %c16],
-    // CHECK:     %c2 = (%[[RESULT_BUFFER]] : !hal.buffer)[%c0, %c16]
-    // CHECK:   ])
     // CHECK-DAG: %[[EXECUTABLE_0:.+]] = hal.executable.lookup device(%[[DEVICE]] : !hal.device) executable(@ex) : !hal.executable
     // CHECK-DAG: %[[ORDINAL_0:.+]] = hal.executable.export.ordinal target(@ex::@embedded_elf_aarch64::@dispatch) : index
     // CHECK:   hal.command_buffer.dispatch<%[[CMD]] : !hal.command_buffer>
     // CHECK-SAME: target(%[[EXECUTABLE_0]] : !hal.executable)[%[[ORDINAL_0]]]
     // CHECK-SAME: workgroups([%c1, %c1, %c1])
+    // CHECK-SAME: bindings([
+    // CHECK-NEXT:   (%[[ARG0_BUFFER]] : !hal.buffer)[%c0, %c16],
+    // CHECK-NEXT:   (%[[ARG1_BUFFER]] : !hal.buffer)[%c0, %c16],
+    // CHECK-NEXT:   (%[[RESULT_BUFFER]] : !hal.buffer)[%c0, %c16]
+    // CHECK-NEXT: ])
     // CHECK:   scf.yield
     // CHECK: }
     // CHECK: case 1 {
-    // CHECK:   %[[PIPELINE_LAYOUT_1:.+]] = hal.pipeline_layout.lookup
-    // CHECK-SAME: device(%[[DEVICE]] : !hal.device)
-    // CHECK-SAME: layout(#[[PIPELINE_LAYOUT_ATTR_1]]) : !hal.pipeline_layout
-    // CHECK:   hal.command_buffer.push_descriptor_set<%[[CMD]] : !hal.command_buffer>
-    // CHECK-SAME: layout(%[[PIPELINE_LAYOUT_1]] : !hal.pipeline_layout)[%c0]
-    // CHECK-SAME: bindings([
-    // CHECK:     %c4 = (%[[ARG0_BUFFER]] : !hal.buffer)[%c0, %c16]
-    // CHECK:   ])
-    // CHECK:   hal.command_buffer.push_descriptor_set<%[[CMD]] : !hal.command_buffer>
-    // CHECK-SAME: layout(%[[PIPELINE_LAYOUT_1]] : !hal.pipeline_layout)[%c1]
-    // CHECK-SAME: bindings([
-    // CHECK:     %c5 = (%[[ARG1_BUFFER]] : !hal.buffer)[%c0, %c16],
-    // CHECK:     %c6 = (%[[RESULT_BUFFER]] : !hal.buffer)[%c0, %c16]
-    // CHECK:   ])
     // CHECK-DAG: %[[EXECUTABLE_1:.+]] = hal.executable.lookup device(%[[DEVICE]] : !hal.device) executable(@ex) : !hal.executable
     // CHECK-DAG: %[[ORDINAL_1:.+]] = hal.executable.export.ordinal target(@ex::@embedded_elf_x86_64::@dispatch) : index
     // CHECK:   hal.command_buffer.dispatch<%[[CMD]] : !hal.command_buffer>
     // CHECK-SAME: target(%[[EXECUTABLE_1]] : !hal.executable)[%[[ORDINAL_1]]]
+    // CHECK-SAME: bindings([
+    // CHECK-NEXT:   (%[[ARG0_BUFFER]] : !hal.buffer)[%c0, %c16]
+    // CHECK-NEXT:   (%[[ARG1_BUFFER]] : !hal.buffer)[%c0, %c16],
+    // CHECK-NEXT:   (%[[RESULT_BUFFER]] : !hal.buffer)[%c0, %c16]
+    // CHECK-NEXT: ])
     // CHECK:   scf.yield
     // CHECK: }
     stream.cmd.dispatch {
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/dump_executable_benchmarks.mlir b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/dump_executable_benchmarks.mlir
index bd6d630..d91f19e 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/dump_executable_benchmarks.mlir
+++ b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/dump_executable_benchmarks.mlir
@@ -6,22 +6,18 @@
 // Ensure devices are copied and made available:
 #executable_target_embedded_elf_x86_64 = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64">
 // CHECK: util.global private @device
-util.global private @device = #hal.device.target<"llvm-cpu", [
+util.global private @device = #hal.device.target<"local", [
   #executable_target_embedded_elf_x86_64
 ]> : !hal.device
 
-#pipeline_layout_0 = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout_0 = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
-#pipeline_layout_1 = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout_1 = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // Executable should be dumped:
@@ -74,14 +70,7 @@
 // Create command buffer:
 // CHECK: %[[CMD:.+]] = hal.command_buffer.create
 
-// Setup dispatch constants and bindings:
-// CHECK: hal.command_buffer.push_constants<%[[CMD]] : !hal.command_buffer> layout(%{{.+}} : !hal.pipeline_layout) offset(0) values([%c100_i32, %c200_i32]) : i32, i32
 // CHECK: %[[BUFFER:.+]] = util.global.load @ex0_embedded_elf_x86_64_dispatch0_512_buffer
-// CHECK: hal.command_buffer.push_descriptor_set<%[[CMD]] : !hal.command_buffer> layout(%{{.+}} : !hal.pipeline_layout)[%c0] bindings([
-// CHECK-NEXT:    %c0 = (%[[BUFFER]] : !hal.buffer)[%c0, %c32],
-// CHECK-NEXT:    %c1 = (%[[BUFFER]] : !hal.buffer)[%c256, %c32],
-// CHECK-NEXT:    %c2 = (%[[BUFFER]] : !hal.buffer)[%c512, %c32]
-// CHECK-NEXT:  ])
 
 // Calculate the workgroup count, which we leave symbolic until after
 // translation:
@@ -96,7 +85,15 @@
 
 // Dispatch up to batch size dispatches:
 // CHECK: scf.for %{{.+}} = %c0 to %[[BATCH_SIZE]] step %c1 {
-// CHECK-NEXT: hal.command_buffer.dispatch<%[[CMD]] : !hal.command_buffer> target(%[[EXECUTABLE:.+]] : !hal.executable)[%[[ORDINAL_0]]] workgroups([%[[WORKGROUP_X]], %[[WORKGROUP_Y]], %[[WORKGROUP_Z]]])
+// CHECK-NEXT: hal.command_buffer.dispatch<%[[CMD]] : !hal.command_buffer>
+// CHECK-SAME:   target(%[[EXECUTABLE:.+]] : !hal.executable)[%[[ORDINAL_0]]]
+// CHECK-SAME:   workgroups([%[[WORKGROUP_X]], %[[WORKGROUP_Y]], %[[WORKGROUP_Z]]])
+// CHECK-SAME:   constants([%c100_i32, %c200_i32])
+// CHECK-SAME:   bindings([
+// CHECK-NEXT:     (%[[BUFFER]] : !hal.buffer)[%c0, %c32],
+// CHECK-NEXT:     (%[[BUFFER]] : !hal.buffer)[%c256, %c32],
+// CHECK-NEXT:     (%[[BUFFER]] : !hal.buffer)[%c512, %c32]
+// CHECK-NEXT:   ])
 // CHECK-NEXT: hal.command_buffer.execution_barrier
 // CHECK-NEXT: }
 
@@ -174,17 +171,15 @@
 
 #executable_target_embedded_elf_aarch64 = #hal.executable.target<"llvm-cpu", "embedded-elf-aarch64">
 #executable_target_embedded_elf_x86_64 = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64">
-util.global private @device_a = #hal.device.target<"llvm-cpu", [
+util.global private @device_a = #hal.device.target<"local", [
   #executable_target_embedded_elf_aarch64
 ]> : !hal.device
-util.global private @device_b = #hal.device.target<"llvm-cpu", [
+util.global private @device_b = #hal.device.target<"local", [
   #executable_target_embedded_elf_x86_64
 ]> : !hal.device
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 hal.executable private @ex_0 {
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/dump_executable_sources.mlir b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/dump_executable_sources.mlir
index 5de1c9c..d22f7a1 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/dump_executable_sources.mlir
+++ b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/dump_executable_sources.mlir
@@ -4,12 +4,10 @@
 // but this is much easier to test with lit.
 
 #executable_target_embedded_elf_x86_64 = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64">
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK: hal.executable public @ex0
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/elide_redundant_commands.mlir b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/elide_redundant_commands.mlir
index 861d4e0..1b9070d 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/elide_redundant_commands.mlir
+++ b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/elide_redundant_commands.mlir
@@ -3,127 +3,21 @@
 // Tests that redundant barriers are elided but barriers gaurding ops are not.
 
 // CHECK-LABEL: @elideRedundantBarriers
-// CHECK-SAME: (%[[CMD:.+]]: !hal.command_buffer, %[[LAYOUT:.+]]: !hal.pipeline_layout)
-util.func public @elideRedundantBarriers(%cmd: !hal.command_buffer, %pipeline_layout: !hal.pipeline_layout) {
+// CHECK-SAME: (%[[CMD:.+]]: !hal.command_buffer, %[[BUFFER:.+]]: !hal.buffer)
+util.func public @elideRedundantBarriers(%cmd: !hal.command_buffer, %buffer: !hal.buffer) {
   %c0 = arith.constant 0 : index
   %c1 = arith.constant 1 : index
-  %c42_i32 = arith.constant 42 : i32
   // CHECK: hal.command_buffer.execution_barrier
   hal.command_buffer.execution_barrier<%cmd : !hal.command_buffer> source("Dispatch|Transfer|CommandRetire") target("CommandIssue|Dispatch|Transfer") flags("None")
   // CHECK-NOT: hal.command_buffer.execution_barrier
   hal.command_buffer.execution_barrier<%cmd : !hal.command_buffer> source("Dispatch|Transfer|CommandRetire") target("CommandIssue|Dispatch|Transfer") flags("None")
-  // CHECK: hal.command_buffer.push_constants
-  hal.command_buffer.push_constants<%cmd : !hal.command_buffer> layout(%pipeline_layout : !hal.pipeline_layout) offset(0) values([%c42_i32]) : i32
+  // CHECK: hal.command_buffer.copy_buffer
+  hal.command_buffer.copy_buffer<%cmd : !hal.command_buffer>
+      source(%buffer : !hal.buffer)[%c0]
+      target(%buffer : !hal.buffer)[%c0]
+      length(%c1)
   // CHECK: hal.command_buffer.execution_barrier
   hal.command_buffer.execution_barrier<%cmd : !hal.command_buffer> source("Dispatch|Transfer|CommandRetire") target("CommandIssue|Dispatch|Transfer") flags("None")
   // CHECK: util.return
   util.return
 }
-
-// -----
-
-// CHECK-LABEL: @elidePushConstants
-util.func public @elidePushConstants(%cmd: !hal.command_buffer, %pipeline_layout: !hal.pipeline_layout) {
-  // CHECK-DAG: %[[C0:.+]] = arith.constant 0
-  %c0 = arith.constant 0 : i32
-  // CHECK-DAG: %[[C1:.+]] = arith.constant 1
-  %c1 = arith.constant 1 : i32
-  // CHECK: hal.command_buffer.push_constants{{.+}} offset(0) values([%[[C0]], %[[C1]]])
-  hal.command_buffer.push_constants<%cmd : !hal.command_buffer>
-      layout(%pipeline_layout : !hal.pipeline_layout)
-      offset(0)
-      values([%c0, %c1]) : i32, i32
-  // CHECK-NOT: hal.command_buffer.push_constants
-  hal.command_buffer.push_constants<%cmd : !hal.command_buffer>
-      layout(%pipeline_layout : !hal.pipeline_layout)
-      offset(0)
-      values([%c0, %c1]) : i32, i32
-  // CHECK-NOT: hal.command_buffer.push_constants
-  hal.command_buffer.push_constants<%cmd : !hal.command_buffer>
-      layout(%pipeline_layout : !hal.pipeline_layout)
-      offset(0)
-      values([%c0, %c1]) : i32, i32
-  // CHECK: util.return
-  util.return
-}
-
-// -----
-
-// CHECK-LABEL: @elidePushConstantsPrefix
-util.func public @elidePushConstantsPrefix(%cmd: !hal.command_buffer, %pipeline_layout: !hal.pipeline_layout) {
-  // CHECK-DAG: %[[C0:.+]] = arith.constant 0
-  %c0 = arith.constant 0 : i32
-  // CHECK-DAG: %[[C1:.+]] = arith.constant 1
-  %c1 = arith.constant 1 : i32
-  // CHECK: hal.command_buffer.push_constants{{.+}} offset(0) values([%[[C0]]])
-  hal.command_buffer.push_constants<%cmd : !hal.command_buffer>
-      layout(%pipeline_layout : !hal.pipeline_layout)
-      offset(0)
-      values([%c0]) : i32
-  // CHECK: hal.command_buffer.push_constants{{.+}} offset(1) values([%[[C1]]])
-  hal.command_buffer.push_constants<%cmd : !hal.command_buffer>
-      layout(%pipeline_layout : !hal.pipeline_layout)
-      offset(0)
-      values([%c0, %c1]) : i32, i32
-  // CHECK-NOT: hal.command_buffer.push_constants
-  hal.command_buffer.push_constants<%cmd : !hal.command_buffer>
-      layout(%pipeline_layout : !hal.pipeline_layout)
-      offset(1)
-      values([%c1]) : i32
-  // CHECK: util.return
-  util.return
-}
-
-// -----
-
-// CHECK-LABEL: @elidePushConstantsSuffix
-util.func public @elidePushConstantsSuffix(%cmd: !hal.command_buffer, %pipeline_layout: !hal.pipeline_layout) {
-  // CHECK-DAG: %[[C0:.+]] = arith.constant 0
-  %c0 = arith.constant 0 : i32
-  // CHECK-DAG: %[[C1:.+]] = arith.constant 1
-  %c1 = arith.constant 1 : i32
-  // CHECK-DAG: %[[C2:.+]] = arith.constant 2
-  %c2 = arith.constant 2 : i32
-  // CHECK: hal.command_buffer.push_constants{{.+}} offset(0) values([%[[C0]], %[[C1]], %[[C2]]])
-  hal.command_buffer.push_constants<%cmd : !hal.command_buffer>
-      layout(%pipeline_layout : !hal.pipeline_layout)
-      offset(0)
-      values([%c0, %c1, %c2]) : i32, i32, i32
-  // CHECK: hal.command_buffer.push_constants{{.+}} offset(1) values([%[[C0]]])
-  hal.command_buffer.push_constants<%cmd : !hal.command_buffer>
-      layout(%pipeline_layout : !hal.pipeline_layout)
-      offset(1)
-      values([%c0, %c2]) : i32, i32
-  // CHECK: util.return
-  util.return
-}
-
-// -----
-
-// NOTE: today we just check for complete equality.
-
-// CHECK-LABEL: @elidePushDescriptorSet
-// CHECK-SAME: (%[[CMD:.+]]: !hal.command_buffer, %[[LAYOUT:.+]]: !hal.pipeline_layout, %[[BUFFER0:.+]]: !hal.buffer, %[[BUFFER1:.+]]: !hal.buffer)
-util.func public @elidePushDescriptorSet(%cmd: !hal.command_buffer, %pipeline_layout: !hal.pipeline_layout, %buffer0: !hal.buffer, %buffer1: !hal.buffer) {
-  %c0 = arith.constant 0 : index
-  %c1 = arith.constant 1 : index
-  // CHECK-DAG: %[[SIZE0:.+]] = arith.constant 100
-  %size0 = arith.constant 100 : index
-  // CHECK-DAG: %[[SIZE1:.+]] = arith.constant 101
-  %size1 = arith.constant 101 : index
-  //      CHECK: hal.command_buffer.push_descriptor_set<%[[CMD]] : !hal.command_buffer> layout(%[[LAYOUT]] : !hal.pipeline_layout)[%c0] bindings([
-  // CHECK-NEXT:   %c0 = (%[[BUFFER0]] : !hal.buffer)[%c0, %[[SIZE0]]],
-  // CHECK-NEXT:   %c1 = (%[[BUFFER1]] : !hal.buffer)[%c0, %[[SIZE1]]]
-  // CHECK-NEXT: ])
-  hal.command_buffer.push_descriptor_set<%cmd : !hal.command_buffer> layout(%pipeline_layout : !hal.pipeline_layout)[%c0] bindings([
-    %c0 = (%buffer0 : !hal.buffer)[%c0, %size0],
-    %c1 = (%buffer1 : !hal.buffer)[%c0, %size1]
-  ])
-  // CHECK-NOT: hal.command_buffer.push_descriptor_set
-  hal.command_buffer.push_descriptor_set<%cmd : !hal.command_buffer> layout(%pipeline_layout : !hal.pipeline_layout)[%c0] bindings([
-    %c0 = (%buffer0 : !hal.buffer)[%c0, %size0],
-    %c1 = (%buffer1 : !hal.buffer)[%c0, %size1]
-  ])
-  // CHECK: util.return
-  util.return
-}
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/materialize_dispatch_instrumentation.mlir b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/materialize_dispatch_instrumentation.mlir
index e86f141..17f0da0 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/materialize_dispatch_instrumentation.mlir
+++ b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/materialize_dispatch_instrumentation.mlir
@@ -1,7 +1,7 @@
 // RUN: iree-opt --split-input-file --pass-pipeline='builtin.module(iree-hal-materialize-dispatch-instrumentation{buffer-size=64mib})' %s | FileCheck %s
 
 module attributes {hal.device.targets = [
-  #hal.device.target<"llvm-cpu", [
+  #hal.device.target<"local", [
     #hal.executable.target<"llvm-cpu", "embedded-elf-arm_64">,
     #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64">
   ]> : !hal.device
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/materialize_interfaces.mlir b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/materialize_interfaces.mlir
index 5623e7f..2fb1a66 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/materialize_interfaces.mlir
+++ b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/materialize_interfaces.mlir
@@ -10,17 +10,15 @@
 ]> : !hal.device
 
 // CHECK: #pipeline_layout = #hal.pipeline.layout<
-// CHECK-SAME: push_constants = 1
-// CHECK-SAME: sets = [
-// CHECK-SAME:   <0, bindings = [
-// CHECK-SAME:     <0, storage_buffer, "ReadOnly|Indirect">
-// CHECK-SAME:     <1, storage_buffer, "ReadOnly|Indirect">
-// CHECK-SAME:     <2, storage_buffer, Indirect>
+// CHECK-SAME: constants = 1
+// CHECK-SAME: bindings = [
+// CHECK-SAME:   #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">
+// CHECK-SAME:   #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">
+// CHECK-SAME:   #hal.pipeline.binding<storage_buffer, Indirect>
 
 // CHECK: hal.executable private @ex
 // CHECK:   hal.executable.variant public @arm_64 target(#executable_target_arm_64
 // CHECK:     hal.executable.export public @entry ordinal(0) layout(#pipeline_layout)
-// CHECK-SAME:   hal.interface.bindings = [#hal.interface.binding<0, 0>, #hal.interface.binding<0, 1>, #hal.interface.binding<0, 2>]
 // CHECK-NEXT: ^bb0(%[[DEVICE:.+]]: !hal.device, %[[ARG0:.+]]: index, %[[ARG1:.+]]: index):
 // CHECK-NEXT:   hal.return %[[ARG0]], %[[ARG1]], %[[ARG0]] : index, index, index
 // CHECK-NEXT: }
@@ -29,7 +27,6 @@
 // CHECK-NEXT:  func.func @entry
 // CHECK:   hal.executable.variant public @x86_64 target(#executable_target_x86_64
 // CHECK:     hal.executable.export public @entry ordinal(0) layout(#pipeline_layout)
-// CHECK-SAME:   hal.interface.bindings = [#hal.interface.binding<0, 0>, #hal.interface.binding<0, 1>, #hal.interface.binding<0, 2>]
 // CHECK-NEXT: ^bb0(%[[DEVICE:.+]]: !hal.device, %[[ARG0:.+]]: index, %[[ARG1:.+]]: index):
 // CHECK-NEXT:   hal.return %[[ARG0]], %[[ARG1]], %[[ARG0]] : index, index, index
 // CHECK-NEXT: }
@@ -159,10 +156,8 @@
 // CHECK:   hal.executable.variant public @riscv_32
 // CHECK:   hal.executable.variant public @x86_64
 hal.executable.source private @ex {
-  hal.executable.export public @entry layout(#hal.pipeline.layout<push_constants = 0, sets = [
-    #hal.descriptor_set.layout<0, bindings = [
-      #hal.descriptor_set.binding<0, storage_buffer>
-    ]>
+  hal.executable.export public @entry layout(#hal.pipeline.layout<bindings = [
+    #hal.pipeline.binding<storage_buffer>
   ]>)
   builtin.module {
     func.func @entry() {
@@ -227,10 +222,8 @@
   // CHECK:   hal.executable.variant public @riscv_32
   // CHECK:   hal.executable.variant public @x86_64
   hal.executable.source public @ex {
-    hal.executable.export public @entry layout(#hal.pipeline.layout<push_constants = 0, sets = [
-      #hal.descriptor_set.layout<0, bindings = [
-        #hal.descriptor_set.binding<0, storage_buffer>
-      ]>
+    hal.executable.export public @entry layout(#hal.pipeline.layout<bindings = [
+      #hal.pipeline.binding<storage_buffer>
     ]>)
     builtin.module {
       func.func @entry() {
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/materialize_resource_caches.mlir b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/materialize_resource_caches.mlir
index 4e562f6..4a24e78 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/materialize_resource_caches.mlir
+++ b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/materialize_resource_caches.mlir
@@ -1,51 +1,13 @@
 // RUN: iree-opt --split-input-file --iree-hal-materialize-resource-caches %s | FileCheck %s
 
-// CHECK: util.global private @device = #hal.device.ordinal<0>
-util.global private @device = #hal.device.ordinal<0> : !hal.device
-//      CHECK: util.global private @__device_pipeline_layout_0 : !hal.pipeline_layout
-// CHECK-NEXT: util.initializer {
-//  CHECK-DAG:   %[[DEVICE:.+]] = util.global.load @device
-//  CHECK-DAG:   %[[SET_LAYOUT_0:.+]] = hal.descriptor_set_layout.create
-// CHECK-SAME:       device(%[[DEVICE]] : !hal.device)
-// CHECK-SAME:       flags("None")
-// CHECK-SAME:       bindings([
-// CHECK-SAME:         #hal.descriptor_set.binding<0, storage_buffer>,
-// CHECK-SAME:         #hal.descriptor_set.binding<1, storage_buffer>
-// CHECK-SAME:       ]) : !hal.descriptor_set_layout
-// CHECK-NEXT:   %[[PIPELINE_LAYOUT:.+]] = hal.pipeline_layout.create
-// CHECK-SAME:       device(%[[DEVICE]] : !hal.device)
-// CHECK-SAME:       push_constants(1)
-// CHECK-SAME:       layouts([%[[SET_LAYOUT_0]]]) : !hal.pipeline_layout
-// CHECK-NEXT:   util.global.store %[[PIPELINE_LAYOUT]], @__device_pipeline_layout_0 : !hal.pipeline_layout
-
-// CHECK-LABEL: @exeLayoutLookup
-util.func public @exeLayoutLookup() -> !hal.pipeline_layout {
-  %device = util.global.load @device : !hal.device
-  // CHECK: %[[LOADED_LAYOUT:.+]] = util.global.load @__device_pipeline_layout_0 : !hal.pipeline_layout
-  %0 = hal.pipeline_layout.lookup device(%device : !hal.device) layout(#hal.pipeline.layout<push_constants = 1, sets = [
-    #hal.descriptor_set.layout<0, bindings = [
-      #hal.descriptor_set.binding<0, storage_buffer>,
-      #hal.descriptor_set.binding<1, storage_buffer>
-    ]>
-  ]>) : !hal.pipeline_layout
-  // CHECK-NEXT: util.return %[[LOADED_LAYOUT]]
-  util.return %0 : !hal.pipeline_layout
-}
-
-// -----
-
-#pipeline_layout_0 = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout_0 = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
-#pipeline_layout_1 = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout_1 = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK: hal.executable private @exe
@@ -83,22 +45,12 @@
 util.global private @device = #hal.device.ordinal<0> : !hal.device
 
 // Cached resources for the device.
-// CHECK: util.global private @__device_pipeline_layout_0 : !hal.pipeline_layout
-// CHECK: util.global private @__device_pipeline_layout_1 : !hal.pipeline_layout
 // CHECK: util.global private @__device_executable_0_exe : !hal.executable
 
 // Device initializer for all resources used with the device:
 // CHECK: util.initializer
 // CHECK:   %[[DEVICE:.+]] = util.global.load @device
 
-// Create pipeline layouts (and required descriptor set layouts):
-// CHECK:   %[[SET_LAYOUT_0:.+]] = hal.descriptor_set_layout.create device(%[[DEVICE]] : !hal.device)
-// CHECK:   %[[SET_LAYOUT_1:.+]] = hal.descriptor_set_layout.create device(%[[DEVICE]] : !hal.device)
-// CHECK:   %[[PIPELINE_LAYOUT_0:.+]] = hal.pipeline_layout.create device(%[[DEVICE]] : !hal.device) push_constants(0) layouts([%[[SET_LAYOUT_0]]]) : !hal.pipeline_layout
-// CHECK:   util.global.store %[[PIPELINE_LAYOUT_0]], @__device_pipeline_layout_0
-// CHECK:   %[[PIPELINE_LAYOUT_1:.+]] = hal.pipeline_layout.create device(%device : !hal.device) push_constants(0) layouts([%[[SET_LAYOUT_1]]]) : !hal.pipeline_layout
-// CHECK:   util.global.store %[[PIPELINE_LAYOUT_1]], @__device_pipeline_layout_1
-
 // Switch on the supported formats:
 // CHECK:   %{{.+}}, %[[FORMAT_VMVX:.+]] = hal.device.query<%[[DEVICE]] : !hal.device> key("hal.executable.format" :: "vmvx-bytecode-fb")
 // CHECK:   %[[VMVX_CONDITION:.+]] = scf.execute_region -> i1 {
@@ -120,7 +72,6 @@
 // CHECK:     %[[EXE:.+]] = hal.executable.create
 // CHECK-SAME:  device(%[[DEVICE]] : !hal.device)
 // CHECK-SAME:  target(@exe::@vmvx)
-// CHECK-SAME:  layouts([%[[PIPELINE_LAYOUT_0]], %[[PIPELINE_LAYOUT_0]], %[[PIPELINE_LAYOUT_1]]])
 // CHECK-SAME:  constants([%[[CONST_01]]#0, %[[CONST_01]]#1, %[[CONST_2]]])
 // CHECK-SAME:  : !hal.executable
 
@@ -172,11 +123,9 @@
       %ok, %selected = hal.device.query<%device : !hal.device> key("some" :: "feature") : i1, i1
       hal.return %selected : i1
     }
-    hal.executable.export @entry0 ordinal(0) layout(#hal.pipeline.layout<push_constants = 0, sets = [
-      #hal.descriptor_set.layout<0, bindings = [
-        #hal.descriptor_set.binding<0, storage_buffer>,
-        #hal.descriptor_set.binding<1, storage_buffer>
-      ]>
+    hal.executable.export @entry0 ordinal(0) layout(#hal.pipeline.layout<bindings = [
+      #hal.pipeline.binding<storage_buffer>,
+      #hal.pipeline.binding<storage_buffer>
     ]>)
     // CHECK-NOT: hal.executable.constant.block
     hal.executable.constant.block() -> (i32, i32) as ("foo", "bar") {
@@ -189,13 +138,9 @@
 
 // CHECK: util.global private @primary_device
 util.global private @primary_device = #hal.device.ordinal<0> : !hal.device
-// CHECK-NEXT: util.global private @__primary_device_pipeline_layout_0
 // CHECK-NEXT: util.global private @__primary_device_executable_0_exe
 // CHECK-NEXT: util.initializer
 //      CHECK:   util.global.load @primary_device
-//      CHECK:   hal.descriptor_set_layout.create
-//      CHECK:   hal.pipeline_layout.create
-//      CHECK:   util.global.store {{.+}}, @__primary_device_pipeline_layout_0
 //      CHECK:   hal.executable.create
 //      CHECK:   util.global.store {{.+}}, @__primary_device_executable_0_exe
 //      CHECK: util.func private @__primary_device_executable_0_exe_constant_block_0
@@ -205,7 +150,6 @@
   #hal.device.ordinal<1> : !hal.device,
   #hal.device.fallback<@primary_device> : !hal.device
 ]> : !hal.device
-// CHECK-NEXT: util.global private @__optional_device_pipeline_layout_0
 // CHECK-NEXT: util.global private @__optional_device_executable_0_exe
 // CHECK-NEXT: util.initializer
 //  CHECK-DAG:   %[[OPTIONAL_DEVICE:.+]] = util.global.load @optional_device
@@ -214,14 +158,9 @@
 //  CHECK-DAG:   %[[INDEX:.+]] = arith.select %[[DEVICE_EQ]]
 //  CHECK-DAG:   scf.index_switch %[[INDEX]]
 //      CHECK:   case 0
-//      CHECK:     %[[PRIMARY_LAYOUT:.+]] = util.global.load @__primary_device_pipeline_layout_0
-//      CHECK:     util.global.store %[[PRIMARY_LAYOUT]], @__optional_device_pipeline_layout_0
 //      CHECK:     %[[PRIMARY_EXE:.+]] = util.global.load @__primary_device_executable_0_exe
 //      CHECK:     util.global.store %[[PRIMARY_EXE]], @__optional_device_executable_0_exe
 //      CHECK:   default
-//      CHECK:     hal.descriptor_set_layout.create
-//      CHECK:     hal.pipeline_layout.create
-//      CHECK:     util.global.store {{.+}}, @__optional_device_pipeline_layout_0
 //      CHECK:     hal.executable.create
 //      CHECK:     util.global.store {{.+}}, @__optional_device_executable_0_exe
 //      CHECK: util.func private @__optional_device_executable_0_exe_constant_block_0
@@ -248,23 +187,17 @@
 
 hal.executable private @exe {
   hal.executable.variant @vmvx target(<"vmvx", "vmvx-bytecode-fb">) {
-    hal.executable.export @entry0 ordinal(0) layout(#hal.pipeline.layout<push_constants = 0, sets = [
-      #hal.descriptor_set.layout<0, bindings = [
-        #hal.descriptor_set.binding<0, storage_buffer>
-      ]>
+    hal.executable.export @entry0 ordinal(0) layout(#hal.pipeline.layout<bindings = [
+      #hal.pipeline.binding<storage_buffer>
     ]>)
   }
 }
 
 // CHECK-LABEL: util.global private @primary_device
 util.global private @primary_device = #hal.device.ordinal<0> : !hal.device
-// CHECK-NEXT: util.global private @__primary_device_pipeline_layout_0
 // CHECK-NEXT: util.global private @__primary_device_executable_0_exe
 // CHECK-NEXT: util.initializer
 //      CHECK:   util.global.load @primary_device
-//      CHECK:   hal.descriptor_set_layout.create
-//      CHECK:   hal.pipeline_layout.create
-//      CHECK:   util.global.store {{.+}}, @__primary_device_pipeline_layout_0
 //      CHECK:   hal.executable.create
 //      CHECK:   util.global.store {{.+}}, @__primary_device_executable_0_exe
 
@@ -273,7 +206,6 @@
   #hal.device.ordinal<1> : !hal.device,
   #hal.device.fallback<@primary_device> : !hal.device
 ]> : !hal.device
-// CHECK-NEXT: util.global private @__optional_device_0_pipeline_layout_0
 // CHECK-NEXT: util.global private @__optional_device_0_executable_0_exe
 // CHECK-NEXT: util.initializer
 //  CHECK-DAG:   %[[OPTIONAL_DEVICE_0:.+]] = util.global.load @optional_device_0
@@ -281,8 +213,6 @@
 //  CHECK-DAG:   %[[DEVICE_EQ:.+]] = util.cmp.eq %[[OPTIONAL_DEVICE_0]], %[[PRIMARY_DEVICE]]
 //  CHECK-DAG:   %[[INDEX:.+]] = arith.select %[[DEVICE_EQ]]
 //  CHECK-DAG:   scf.index_switch %[[INDEX]]
-//      CHECK:     util.global.load @__primary_device_pipeline_layout_0
-//      CHECK:     util.global.store {{.+}}, @__optional_device_0_pipeline_layout_0
 //      CHECK:     util.global.load @__primary_device_executable_0_exe
 //      CHECK:     util.global.store {{.+}}, @__optional_device_0_executable_0_exe
 
@@ -291,7 +221,6 @@
   #hal.device.ordinal<2> : !hal.device,
   #hal.device.fallback<@optional_device_0> : !hal.device
 ]> : !hal.device
-// CHECK-NEXT: util.global private @__optional_device_1_pipeline_layout_0
 // CHECK-NEXT: util.global private @__optional_device_1_executable_0_exe
 // CHECK-NEXT: util.initializer
 //  CHECK-DAG:   %[[OPTIONAL_DEVICE_1:.+]] = util.global.load @optional_device_1
@@ -299,8 +228,6 @@
 //  CHECK-DAG:   %[[DEVICE_EQ:.+]] = util.cmp.eq %[[OPTIONAL_DEVICE_1]], %[[OPTIONAL_DEVICE_0]]
 //  CHECK-DAG:   %[[INDEX:.+]] = arith.select %[[DEVICE_EQ]]
 //  CHECK-DAG:   scf.index_switch %[[INDEX]]
-//      CHECK:     util.global.load @__optional_device_0_pipeline_layout_0
-//      CHECK:     util.global.store {{.+}}, @__optional_device_1_pipeline_layout_0
 //      CHECK:     util.global.load @__optional_device_0_executable_0_exe
 //      CHECK:     util.global.store {{.+}}, @__optional_device_1_executable_0_exe
 
@@ -322,34 +249,13 @@
 // could rework the pass to support only materializing what's required based on
 // what resources are looked up.
 
-#pipeline_layout_0 = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout_0 = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 util.global private @device : !hal.device
 
-util.global private @_descriptor_set_layout_0 : !hal.descriptor_set_layout
-util.initializer {
-  %c0 = arith.constant 0 : index
-  %device = hal.devices.get %c0 : !hal.device
-  %descriptor_set_layout = hal.descriptor_set_layout.create device(%device : !hal.device) flags("None") bindings([#hal.descriptor_set.binding<0, storage_buffer>, #hal.descriptor_set.binding<1, storage_buffer>]) : !hal.descriptor_set_layout
-  util.global.store %descriptor_set_layout, @_descriptor_set_layout_0 : !hal.descriptor_set_layout
-  util.return
-}
-
-util.global private @_pipeline_layout_0 : !hal.pipeline_layout
-util.initializer {
-  %_descriptor_set_layout_0 = util.global.load @_descriptor_set_layout_0 : !hal.descriptor_set_layout
-  %c0 = arith.constant 0 : index
-  %device = hal.devices.get %c0 : !hal.device
-  %pipeline_layout = hal.pipeline_layout.create device(%device : !hal.device) push_constants(0) layouts([%_descriptor_set_layout_0]) : !hal.pipeline_layout
-  util.global.store %pipeline_layout, @_pipeline_layout_0 : !hal.pipeline_layout
-  util.return
-}
-
 util.global private @_executable_exe : !hal.executable
 util.initializer {
   %c0 = arith.constant 0 : index
@@ -359,8 +265,7 @@
   %variant = arith.select %format_supported, %c0, %c-1 : index
   %selected = scf.index_switch %variant -> !hal.executable
   case 0 {
-    %_pipeline_layout_0 = util.global.load @_pipeline_layout_0 : !hal.pipeline_layout
-    %exe = hal.executable.create device(%device : !hal.device) target(@exe0::@vmvx) layouts([%_pipeline_layout_0]) : !hal.executable
+    %exe = hal.executable.create device(%device : !hal.device) target(@exe0::@vmvx) : !hal.executable
     scf.yield %exe : !hal.executable
   }
   default {
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/outline_memoize_regions.mlir b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/outline_memoize_regions.mlir
index 6f1f799..8dd0e55 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/outline_memoize_regions.mlir
+++ b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/outline_memoize_regions.mlir
@@ -194,6 +194,9 @@
 //       CHECK: hal.command_buffer.dispatch.indirect<%[[CMD]] : !hal.command_buffer>
 //  CHECK-SAME:   target(%[[APPLY_EXECUTABLE]] : !hal.executable)
 //  CHECK-SAME:   workgroups(%[[APPLY_BUFFER]] : !hal.buffer)
+//  CHECK-SAME:   bindings([
+//  CHECK-NEXT:     (%[[APPLY_BUFFER]] : !hal.buffer)[%c0, %c1]
+//  CHECK-NEXT:   ])
 //       CHECK: hal.command_buffer.execution_barrier
 //       CHECK: hal.command_buffer.finalize
 //       CHECK: util.return %[[CMD]]
@@ -216,6 +219,8 @@
   %affinity = arith.constant 2 : i64
   %executable = util.global.load immutable @executable : !hal.executable
   %buffer = util.global.load immutable @buffer : !hal.buffer
+  %c0 = arith.constant 0 : index
+  %c1 = arith.constant 1 : index
   // CHECK-NOT: hal.device.memoize
   // CHECK: %[[CMD:.+]] = util.call @__memoize_command_buffer_memoize_lookup
   %result = hal.device.memoize<%device : !hal.device> affinity(%affinity) -> !hal.command_buffer {
@@ -225,6 +230,9 @@
     hal.command_buffer.dispatch.indirect<%cmd : !hal.command_buffer>
         target(%executable : !hal.executable)[%dispatch_ordinal]
         workgroups(%buffer : !hal.buffer)[%offset]
+        bindings([
+          (%buffer : !hal.buffer)[%c0, %c1]
+        ])
         flags(None)
     hal.command_buffer.execution_barrier<%cmd : !hal.command_buffer>
         source(CommandIssue) target(CommandProcess) flags(None)
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/preprocess_executables.mlir b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/preprocess_executables.mlir
index 6742b24..0f4a829 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/preprocess_executables.mlir
+++ b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/preprocess_executables.mlir
@@ -21,7 +21,9 @@
 hal.executable private @executable_a {
   // CHECK: hal.executable.variant public @variant_a
   hal.executable.variant public @variant_a target(#hal.executable.target<"cuda", "cuda-nvptx-fb", {replace_i64 = 123 : i64}>) {
-    hal.executable.export public @dispatch_a ordinal(0) layout(#hal.pipeline.layout<push_constants = 0, sets = [<0, bindings = [<0, storage_buffer>]>]>) {
+    hal.executable.export public @dispatch_a ordinal(0) layout(#hal.pipeline.layout<bindings = [
+      #hal.pipeline.binding<storage_buffer>
+    ]>) {
     ^bb0(%arg0: !hal.device, %arg1: index):
       %c1 = arith.constant 1 : index
       hal.return %c1, %c1, %c1 : index, index, index
@@ -37,7 +39,9 @@
   }
   // CHECK: hal.executable.variant public @variant_unmodified
   hal.executable.variant public @variant_unmodified target(#hal.executable.target<"cuda", "cuda-nvptx-fb", {}>) {
-    hal.executable.export public @dispatch_unmodified ordinal(0) layout(#hal.pipeline.layout<push_constants = 0, sets = [<0, bindings = [<0, storage_buffer>]>]>) {
+    hal.executable.export public @dispatch_unmodified ordinal(0) layout(#hal.pipeline.layout<bindings = [
+      #hal.pipeline.binding<storage_buffer>
+    ]>) {
     ^bb0(%arg0: !hal.device, %arg1: index):
       %c1 = arith.constant 1 : index
       hal.return %c1, %c1, %c1 : index, index, index
@@ -57,7 +61,9 @@
 hal.executable private @executable_b {
   // CHECK: hal.executable.variant public @variant_b
   hal.executable.variant public @variant_b target(#hal.executable.target<"cuda", "cuda-nvptx-fb", {replace_i64 = 456 : i64}>) {
-    hal.executable.export public @dispatch_b ordinal(0) layout(#hal.pipeline.layout<push_constants = 0, sets = [<0, bindings = [<0, storage_buffer>]>]>) {
+    hal.executable.export public @dispatch_b ordinal(0) layout(#hal.pipeline.layout<bindings = [
+      #hal.pipeline.binding<storage_buffer>
+    ]>) {
     ^bb0(%arg0: !hal.device):
       %c1 = arith.constant 1 : index
       hal.return %c1, %c1, %c1 : index, index, index
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/prune_executables.mlir b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/prune_executables.mlir
index 8d2ea70..3c711ef 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/prune_executables.mlir
+++ b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/prune_executables.mlir
@@ -5,10 +5,8 @@
 // as part of this pass for consistency (after running no executables/variants/
 // exports that are unused exist).
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // Should be removed as there are no uses.
@@ -57,10 +55,8 @@
 
 // Tests that an export with no references is dropped.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @exe {
   hal.executable.variant public @variant target(<"backend", "format">) {
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/repeat_dispatches.mlir b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/repeat_dispatches.mlir
index a139ece..164bcbf 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/repeat_dispatches.mlir
+++ b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/repeat_dispatches.mlir
@@ -1,24 +1,25 @@
 // RUN: iree-opt --split-input-file --pass-pipeline='builtin.module(util.func(iree-hal-repeat-dispatches{count=2}))' %s | FileCheck %s
 
-util.global @_executable : !hal.executable
+util.global @executable : !hal.executable
 
 // CHECK-LABEL: @duplicate_dispatches
-//  CHECK-SAME: (%[[CMD1:.+]]: !hal.command_buffer,
-//  CHECK-SAME:  %[[CMD2:.+]]: !hal.command_buffer)
-util.func public @duplicate_dispatches(%cmd1 : !hal.command_buffer, %cmd2 : !hal.command_buffer) {
-  // CHECK: %[[EXE:.+]] = util.global.load @_executable
-  %exe = util.global.load @_executable : !hal.executable
+//  CHECK-SAME: (%[[CMD1:[a-z0-9]+]]: !hal.command_buffer,
+//  CHECK-SAME:  %[[CMD2:[a-z0-9]+]]: !hal.command_buffer,
+//  CHECK-SAME:  %[[BUFFER:.+]]: !hal.buffer)
+util.func public @duplicate_dispatches(%cmd1: !hal.command_buffer, %cmd2: !hal.command_buffer, %buffer: !hal.buffer) {
+  // CHECK: %[[EXE:.+]] = util.global.load @executable
+  %exe = util.global.load @executable : !hal.executable
 
   %c0 = arith.constant 0 : index
   %c1 = arith.constant 1 : index
   %c2 = arith.constant 2 : index
   %c3 = arith.constant 3 : index
-  hal.command_buffer.dispatch<%cmd1 : !hal.command_buffer> target(%exe : !hal.executable)[%c0] workgroups([%c1, %c1, %c1]) flags(None)
+  hal.command_buffer.dispatch<%cmd1 : !hal.command_buffer> target(%exe : !hal.executable)[%c0] workgroups([%c1, %c1, %c1]) bindings([(%buffer : !hal.buffer)[%c0, %c1]]) flags(None)
   hal.command_buffer.execution_barrier<%cmd1 : !hal.command_buffer> source("Dispatch|CommandRetire") target("CommandIssue|Dispatch") flags("None")
-  hal.command_buffer.dispatch<%cmd1 : !hal.command_buffer> target(%exe : !hal.executable)[%c1] workgroups([%c2, %c2, %c2]) flags(None)
+  hal.command_buffer.dispatch<%cmd1 : !hal.command_buffer> target(%exe : !hal.executable)[%c1] workgroups([%c2, %c2, %c2]) bindings([(%buffer : !hal.buffer)[%c0, %c1]]) flags(None)
 
-  hal.command_buffer.dispatch<%cmd2 : !hal.command_buffer> target(%exe : !hal.executable)[%c2] workgroups([%c1, %c1, %c1]) flags(None)
-  hal.command_buffer.dispatch<%cmd2 : !hal.command_buffer> target(%exe : !hal.executable)[%c3] workgroups([%c2, %c2, %c2]) flags(None)
+  hal.command_buffer.dispatch<%cmd2 : !hal.command_buffer> target(%exe : !hal.executable)[%c2] workgroups([%c1, %c1, %c1]) bindings([(%buffer : !hal.buffer)[%c0, %c1]]) flags(None)
+  hal.command_buffer.dispatch<%cmd2 : !hal.command_buffer> target(%exe : !hal.executable)[%c3] workgroups([%c2, %c2, %c2]) bindings([(%buffer : !hal.buffer)[%c0, %c1]]) flags(None)
   hal.command_buffer.execution_barrier<%cmd2 : !hal.command_buffer> source("Dispatch|CommandRetire") target("CommandIssue|Dispatch") flags("None")
 
   util.return
@@ -46,20 +47,21 @@
 
 // -----
 
-util.global @_executable : !hal.executable
+util.global @executable : !hal.executable
 
 // CHECK-LABEL: @nested_dispatch
 //  CHECK-SAME: (%[[CMD1:.+]]: !hal.command_buffer,
+//  CHECK-SAME:  %[[BUFFER:.+]]: !hal.buffer,
 //  CHECK-SAME:  %[[IDX:.+]]: index)
-util.func public @nested_dispatch(%cmd1 : !hal.command_buffer, %idx : index) {
-  // CHECK: %[[EXE:.+]] = util.global.load @_executable
-  %exe = util.global.load @_executable : !hal.executable
+util.func public @nested_dispatch(%cmd1: !hal.command_buffer, %buffer: !hal.buffer, %idx: index) {
+  // CHECK: %[[EXE:.+]] = util.global.load @executable
+  %exe = util.global.load @executable : !hal.executable
 
   %c0 = arith.constant 0 : index
   %c1 = arith.constant 1 : index
   scf.index_switch %idx
   case 0 {
-    hal.command_buffer.dispatch<%cmd1 : !hal.command_buffer> target(%exe : !hal.executable)[%c0] workgroups([%c1, %c1, %c1]) flags(None)
+    hal.command_buffer.dispatch<%cmd1 : !hal.command_buffer> target(%exe : !hal.executable)[%c0] workgroups([%c1, %c1, %c1]) bindings([(%buffer : !hal.buffer)[%c0, %c1]]) flags(None)
     scf.yield
   }
   default {
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/resolve_export_ordinals.mlir b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/resolve_export_ordinals.mlir
index 5ce9f72..45658e9 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/resolve_export_ordinals.mlir
+++ b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/resolve_export_ordinals.mlir
@@ -2,19 +2,15 @@
 
 hal.executable @exe0 {
   hal.executable.variant @target target(<"vmvx", "vmvx-bytecode-fb">) {
-    hal.executable.export @entry123 ordinal(123) layout(#hal.pipeline.layout<push_constants = 0, sets = [
-      #hal.descriptor_set.layout<0, bindings = [
-        #hal.descriptor_set.binding<0, storage_buffer>
-      ]>
+    hal.executable.export @entry123 ordinal(123) layout(#hal.pipeline.layout<bindings = [
+      #hal.pipeline.binding<storage_buffer>
     ]>)
   }
 }
 hal.executable @exe1 {
   hal.executable.variant @target target(<"vmvx", "vmvx-bytecode-fb">) {
-    hal.executable.export @entry456 ordinal(456) layout(#hal.pipeline.layout<push_constants = 0, sets = [
-      #hal.descriptor_set.layout<0, bindings = [
-        #hal.descriptor_set.binding<0, storage_buffer>
-      ]>
+    hal.executable.export @entry456 ordinal(456) layout(#hal.pipeline.layout<bindings = [
+      #hal.pipeline.binding<storage_buffer>
     ]>)
   }
 }
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/strip_executable_contents.mlir b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/strip_executable_contents.mlir
index ef9bff7..7cc2a58 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/strip_executable_contents.mlir
+++ b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/strip_executable_contents.mlir
@@ -5,11 +5,9 @@
   // CHECK: hal.executable.variant public @backend
   hal.executable.variant @backend target(#hal.executable.target<"backend", "format">) {
     // CHECK: hal.executable.export public @entry0
-    hal.executable.export @entry0 ordinal(0) layout(#hal.pipeline.layout<push_constants = 0, sets = [
-      #hal.descriptor_set.layout<0, bindings = [
-        #hal.descriptor_set.binding<0, storage_buffer>,
-        #hal.descriptor_set.binding<1, storage_buffer>
-      ]>
+    hal.executable.export @entry0 ordinal(0) layout(#hal.pipeline.layout<bindings = [
+      #hal.pipeline.binding<storage_buffer>,
+      #hal.pipeline.binding<storage_buffer>
     ]>)
     // CHECK-NOT: builtin.module
     builtin.module {
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/substitute_executables.mlir b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/substitute_executables.mlir
index ba2baad..19b229b 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/substitute_executables.mlir
+++ b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/substitute_executables.mlir
@@ -7,7 +7,9 @@
 // CHECK: hal.executable private @executable0
 hal.executable private @executable0 {
   hal.executable.variant public @variant target(<"cuda", "cuda-nvptx-fb">) {
-    hal.executable.export public @dispatch0 ordinal(0) layout(#hal.pipeline.layout<push_constants = 0, sets = [<0, bindings = [<0, storage_buffer>]>]>) {
+    hal.executable.export public @dispatch0 ordinal(0) layout(#hal.pipeline.layout<bindings = [
+      #hal.pipeline.binding<storage_buffer>
+    ]>) {
     ^bb0(%arg0: !hal.device, %arg1: index, %arg2: index):
       // CHECK: arith.constant 123
       %c1 = arith.constant 1 : index
@@ -34,7 +36,9 @@
   // CHECK-SAME:   path = "substitute_executables_replacement.obj",
   // CHECK-SAME:   data = dense<[72, 69, 76, 76, 79, 33,
   hal.executable.variant public @variant target(<"cuda", "cuda-nvptx-fb">) {
-    hal.executable.export public @dispatch1 ordinal(0) layout(#hal.pipeline.layout<push_constants = 0, sets = [<0, bindings = [<0, storage_buffer>]>]>) {
+    hal.executable.export public @dispatch1 ordinal(0) layout(#hal.pipeline.layout<bindings = [
+      #hal.pipeline.binding<storage_buffer>
+    ]>) {
     ^bb0(%arg0: !hal.device, %arg1: index, %arg2: index):
       // CHECK: arith.constant 100 : index
       %c100 = arith.constant 100 : index
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/substitute_executables_replacement.mlir b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/substitute_executables_replacement.mlir
index 7c96db9..d12d3d1 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/substitute_executables_replacement.mlir
+++ b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/substitute_executables_replacement.mlir
@@ -1,7 +1,9 @@
 // Replacement executable for substitute_executables.mlir.
 hal.executable private @executable0 {
   hal.executable.variant public @variant target(<"cuda", "cuda-nvptx-fb">) {
-    hal.executable.export public @dispatch0 ordinal(0) layout(#hal.pipeline.layout<push_constants = 0, sets = [<0, bindings = [<0, storage_buffer>]>]>) {
+    hal.executable.export public @dispatch0 ordinal(0) layout(#hal.pipeline.layout<bindings = [
+      #hal.pipeline.binding<storage_buffer>
+    ]>) {
     ^bb0(%arg0: !hal.device, %arg1: index, %arg2: index):
       %c123 = arith.constant 123 : index
       hal.return %c123, %c123, %c123 : index, index, index
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/verify_devices.mlir b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/verify_devices.mlir
index b4e2264..e13f448 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/verify_devices.mlir
+++ b/compiler/src/iree/compiler/Dialect/HAL/Transforms/test/verify_devices.mlir
@@ -53,7 +53,7 @@
   util.global private @optional = #hal.device.fallback<@device> : !hal.device
   util.global private @ordinal = #hal.device.ordinal<0> : !hal.device
   util.global private @selected = #hal.device.select<[
-    #hal.device.target<"llvm-cpu"> : !hal.device,
+    #hal.device.target<"local"> : !hal.device,
     #hal.device.target<"vmvx"> : !hal.device
   ]> : !hal.device
   util.func private @func() -> () attributes {
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Utils/BUILD.bazel b/compiler/src/iree/compiler/Dialect/HAL/Utils/BUILD.bazel
index 7745815..2a77d86 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Utils/BUILD.bazel
+++ b/compiler/src/iree/compiler/Dialect/HAL/Utils/BUILD.bazel
@@ -13,6 +13,22 @@
 )
 
 iree_compiler_cc_library(
+    name = "ExecutableDebugInfoUtils",
+    srcs = [
+        "ExecutableDebugInfoUtils.cpp",
+    ],
+    hdrs = [
+        "ExecutableDebugInfoUtils.h",
+    ],
+    deps = [
+        "//compiler/src/iree/compiler/Dialect/HAL/IR",
+        "//compiler/src/iree/compiler/Utils",
+        "//runtime/src/iree/schemas:executable_debug_info_c_fbs",
+        "@llvm-project//mlir:IR",
+    ],
+)
+
+iree_compiler_cc_library(
     name = "LLVMLinkerUtils",
     srcs = [
         "LLVMLinkerUtils.cpp",
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Utils/CMakeLists.txt b/compiler/src/iree/compiler/Dialect/HAL/Utils/CMakeLists.txt
index 696c5f3..22e7732 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/Utils/CMakeLists.txt
+++ b/compiler/src/iree/compiler/Dialect/HAL/Utils/CMakeLists.txt
@@ -12,6 +12,21 @@
 
 iree_cc_library(
   NAME
+    ExecutableDebugInfoUtils
+  HDRS
+    "ExecutableDebugInfoUtils.h"
+  SRCS
+    "ExecutableDebugInfoUtils.cpp"
+  DEPS
+    MLIRIR
+    iree::compiler::Dialect::HAL::IR
+    iree::compiler::Utils
+    iree::schemas::executable_debug_info_c_fbs
+  PUBLIC
+)
+
+iree_cc_library(
+  NAME
     LLVMLinkerUtils
   HDRS
     "LLVMLinkerUtils.h"
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Utils/ExecutableDebugInfoUtils.cpp b/compiler/src/iree/compiler/Dialect/HAL/Utils/ExecutableDebugInfoUtils.cpp
new file mode 100644
index 0000000..b3d04ba
--- /dev/null
+++ b/compiler/src/iree/compiler/Dialect/HAL/Utils/ExecutableDebugInfoUtils.cpp
@@ -0,0 +1,105 @@
+// Copyright 2024 The IREE Authors
+//
+// Licensed under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+
+#include "iree/compiler/Dialect/HAL/Utils/ExecutableDebugInfoUtils.h"
+
+#include "iree/compiler/Utils/ModuleUtils.h"
+#include "iree/schemas/executable_debug_info_builder.h"
+#include "mlir/IR/DialectResourceBlobManager.h"
+
+namespace mlir::iree_compiler::IREE::HAL {
+
+flatbuffers_vec_ref_t createSourceFilesVec(int debugLevel,
+                                           DictionaryAttr sourcesAttr,
+                                           FlatbufferBuilder &fbb) {
+  if (debugLevel < 1) {
+    // No debug information.
+    return 0;
+  } else if (!sourcesAttr || sourcesAttr.empty()) {
+    // No sources embedded in the IR.
+    return 0;
+  }
+  SmallVector<iree_hal_debug_SourceFileDef_ref_t> sourceFileRefs;
+  for (auto sourceAttr : llvm::reverse(sourcesAttr.getValue())) {
+    if (auto resourceAttr = dyn_cast_if_present<DenseResourceElementsAttr>(
+            sourceAttr.getValue())) {
+      auto filenameRef = fbb.createString(sourceAttr.getName());
+      auto contentRef = fbb.streamUint8Vec([&](llvm::raw_ostream &os) {
+        auto blobData = resourceAttr.getRawHandle().getBlob()->getData();
+        os.write(blobData.data(), blobData.size());
+        return true;
+      });
+      sourceFileRefs.push_back(
+          iree_hal_debug_SourceFileDef_create(fbb, filenameRef, contentRef));
+    }
+  }
+  std::reverse(sourceFileRefs.begin(), sourceFileRefs.end());
+  return fbb.createOffsetVecDestructive(sourceFileRefs);
+}
+
+SmallVector<flatbuffers_ref_t>
+createExportDefs(int debugLevel,
+                 ArrayRef<IREE::HAL::ExecutableExportOp> exportOps,
+                 FlatbufferBuilder &fbb) {
+  SmallVector<flatbuffers_ref_t> exportDefs;
+  exportDefs.resize(exportOps.size(), 0);
+
+  if (debugLevel < 1) {
+    // No debug information.
+    return exportDefs;
+  }
+
+  for (auto exportOp : exportOps) {
+    auto ordinalAttr = exportOp.getOrdinalAttr();
+    assert(ordinalAttr && "ordinals must be assigned");
+    int64_t ordinal = ordinalAttr.getInt();
+
+    flatbuffers_ref_t nameRef = 0;
+    if (debugLevel >= 1) {
+      nameRef = fbb.createString(exportOp.getName());
+    }
+
+    flatbuffers_ref_t locationRef = 0;
+    if (debugLevel >= 1) {
+      if (auto loc = findFirstFileLoc(exportOp.getLoc())) {
+        auto filenameRef = fbb.createString(loc->getFilename());
+        locationRef = iree_hal_debug_FileLineLocDef_create(fbb, filenameRef,
+                                                           loc->getLine());
+      }
+    }
+
+    flatbuffers_vec_ref_t stageLocationsRef = 0;
+    if (debugLevel >= 3) {
+      SmallVector<iree_hal_debug_StageLocationDef_ref_t> stageLocationRefs;
+      if (auto locsAttr = exportOp.getSourceLocsAttr()) {
+        for (auto locAttr : locsAttr.getValue()) {
+          if (auto loc =
+                  findFirstFileLoc(cast<LocationAttr>(locAttr.getValue()))) {
+            auto stageNameRef = fbb.createString(locAttr.getName());
+            auto filenameRef = fbb.createString(loc->getFilename());
+            stageLocationRefs.push_back(iree_hal_debug_StageLocationDef_create(
+                fbb, stageNameRef,
+                iree_hal_debug_FileLineLocDef_create(fbb, filenameRef,
+                                                     loc->getLine())));
+          }
+        }
+      }
+      if (!stageLocationRefs.empty()) {
+        stageLocationsRef = fbb.createOffsetVecDestructive(stageLocationRefs);
+      }
+    }
+
+    iree_hal_debug_ExportDef_start(fbb);
+    iree_hal_debug_ExportDef_name_add(fbb, nameRef);
+    iree_hal_debug_ExportDef_location_add(fbb, locationRef);
+    iree_hal_debug_ExportDef_stage_locations_add(fbb, stageLocationsRef);
+    exportDefs[ordinal] = iree_hal_debug_ExportDef_end(fbb);
+  }
+
+  return exportDefs;
+}
+
+} // namespace mlir::iree_compiler::IREE::HAL
diff --git a/compiler/src/iree/compiler/Dialect/HAL/Utils/ExecutableDebugInfoUtils.h b/compiler/src/iree/compiler/Dialect/HAL/Utils/ExecutableDebugInfoUtils.h
new file mode 100644
index 0000000..0a6cd02
--- /dev/null
+++ b/compiler/src/iree/compiler/Dialect/HAL/Utils/ExecutableDebugInfoUtils.h
@@ -0,0 +1,43 @@
+// Copyright 2024 The IREE Authors
+//
+// Licensed under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+
+#ifndef IREE_COMPILER_DIALECT_HAL_UTILS_EXECUTABLEDEBUGINFOUTILS_H_
+#define IREE_COMPILER_DIALECT_HAL_UTILS_EXECUTABLEDEBUGINFOUTILS_H_
+
+#include "iree/compiler/Dialect/HAL/IR/HALOps.h"
+#include "iree/compiler/Dialect/HAL/IR/HALTypes.h"
+#include "iree/compiler/Utils/FlatbufferUtils.h"
+
+namespace mlir::iree_compiler::IREE::HAL {
+
+// Creates a `[iree.hal.debug.SourceFileDef]` vector from the given sources
+// dictionary (filename keys to resource elements contents).
+//
+// |debugLevel| generally corresponds to the gcc-style levels 0-3:
+//   0: no debug information
+//   1: minimal debug information
+//   2: default debug information
+//   3: maximal debug information
+flatbuffers_vec_ref_t createSourceFilesVec(int debugLevel,
+                                           DictionaryAttr sourcesAttr,
+                                           FlatbufferBuilder &fbb);
+
+// Creates one `iree.hal.debug.ExportDef` for every export and returns them in
+// the same order.
+//
+// |debugLevel| generally corresponds to the gcc-style levels 0-3:
+//   0: no debug information
+//   1: minimal debug information
+//   2: default debug information
+//   3: maximal debug information
+SmallVector<flatbuffers_ref_t>
+createExportDefs(int debugLevel,
+                 ArrayRef<IREE::HAL::ExecutableExportOp> exportOps,
+                 FlatbufferBuilder &fbb);
+
+} // namespace mlir::iree_compiler::IREE::HAL
+
+#endif //  IREE_COMPILER_DIALECT_HAL_UTILS_EXECUTABLEDEBUGINFOUTILS_H_
diff --git a/compiler/src/iree/compiler/Dialect/HAL/hal.imports.mlir b/compiler/src/iree/compiler/Dialect/HAL/hal.imports.mlir
index 9cd824d..a935732 100644
--- a/compiler/src/iree/compiler/Dialect/HAL/hal.imports.mlir
+++ b/compiler/src/iree/compiler/Dialect/HAL/hal.imports.mlir
@@ -201,7 +201,7 @@
   %binding_capacity : i32
 ) -> !vm.ref<!hal.command_buffer>
 attributes {
-  minimum_version = 3 : i32  // command buffer API version
+  minimum_version = 5 : i32  // command buffer API version
 }
 
 // Finalizes recording into the command buffer and prepares it for submission.
@@ -286,27 +286,6 @@
   %element_count : i64
 )
 
-// TODO(#18154): remove this in favor of inlined constants.
-//
-// Pushes constants for consumption by dispatches.
-vm.import private @command_buffer.push_constants(
-  %command_buffer : !vm.ref<!hal.command_buffer>,
-  %pipeline_layout : !vm.ref<!hal.pipeline_layout>,
-  %offset : i32,
-  %values : i32 ...
-)
-
-// TODO(#18154): remove this in favor of inlined bindings.
-//
-// Pushes a descriptor set to the given set number.
-vm.import private @command_buffer.push_descriptor_set(
-  %command_buffer : !vm.ref<!hal.command_buffer>,
-  %pipeline_layout : !vm.ref<!hal.pipeline_layout>,
-  %set : i32,
-  // <binding, slot, buffer, offset, length>
-  %bindings : tuple<i32, i32, !vm.ref<!hal.buffer>, i64, i64>...
-)
-
 // Dispatches an execution request.
 vm.import private @command_buffer.dispatch(
   %command_buffer : !vm.ref<!hal.command_buffer>,
@@ -315,7 +294,10 @@
   %workgroup_x : i32,
   %workgroup_y : i32,
   %workgroup_z : i32,
-  %flags : i64
+  %flags : i64,
+  %constants : i32 ...,
+  // <reserved, slot, buffer, offset, length>
+  %bindings : tuple<i32, i32, !vm.ref<!hal.buffer>, i64, i64>...
 )
 
 // Dispatches an execution request with the dispatch parameters loaded from the
@@ -327,60 +309,11 @@
   %workgroups_buffer_slot : i32,
   %workgroups_buffer : !vm.ref<!hal.buffer>,
   %workgroups_offset : i64,
-  %flags : i64
-)
-
-// TODO(#18154): replace @command_buffer.dispatch.
-//
-// Dispatches an execution request.
-vm.import private @command_buffer.dispatch2(
-  %command_buffer : !vm.ref<!hal.command_buffer>,
-  %executable : !vm.ref<!hal.executable>,
-  %entry_point : i32,
-  %workgroup_x : i32,
-  %workgroup_y : i32,
-  %workgroup_z : i32,
   %flags : i64,
   %constants : i32 ...,
   // <reserved, slot, buffer, offset, length>
   %bindings : tuple<i32, i32, !vm.ref<!hal.buffer>, i64, i64>...
 )
-attributes {
-  minimum_version = 4 : i32
-}
-
-// TODO(#18154): replace @command_buffer.dispatch.indirect.
-//
-// Dispatches an execution request with the dispatch parameters loaded from the
-// given buffer.
-vm.import private @command_buffer.dispatch2.indirect(
-  %command_buffer : !vm.ref<!hal.command_buffer>,
-  %executable : !vm.ref<!hal.executable>,
-  %entry_point : i32,
-  %workgroups_buffer_slot : i32,
-  %workgroups_buffer : !vm.ref<!hal.buffer>,
-  %workgroups_offset : i64,
-  %flags : i64,
-  %constants : i32 ...,
-  // <reserved, slot, buffer, offset, length>
-  %bindings : tuple<i32, i32, !vm.ref<!hal.buffer>, i64, i64>...
-)
-attributes {
-  minimum_version = 4 : i32
-}
-
-//===----------------------------------------------------------------------===//
-// iree_hal_descriptor_set_layout_t
-//===----------------------------------------------------------------------===//
-
-// Creates a descriptor set layout that defines the bindings used within a set.
-vm.import private @descriptor_set_layout.create(
-  %device : !vm.ref<!hal.device>,
-  %flags : i32,
-  // <binding, type, flags>
-  %bindings : tuple<i32, i32, i32>...
-) -> !vm.ref<!hal.descriptor_set_layout>
-attributes {nosideeffects}
 
 //===----------------------------------------------------------------------===//
 // iree_hal_device_t
@@ -506,21 +439,9 @@
   %device : !vm.ref<!hal.device>,
   %executable_format : !vm.buffer,
   %executable_data : !vm.buffer,
-  %constants : !vm.buffer,
-  %pipeline_layouts : !vm.ref<!hal.pipeline_layout>...
-) -> !vm.ref<!hal.executable>
-attributes {nosideeffects}
-
-// TODO(#18154): replace @executable.create.
-// Creates an executable for use with the specified device.
-vm.import private @executable.create2(
-  %device : !vm.ref<!hal.device>,
-  %executable_format : !vm.buffer,
-  %executable_data : !vm.buffer,
   %constants : !vm.buffer
 ) -> !vm.ref<!hal.executable>
 attributes {
-  minimum_version = 4 : i32,
   nosideeffects
 }
 
@@ -566,17 +487,4 @@
 ) -> i32
 attributes {vm.yield}
 
-//===----------------------------------------------------------------------===//
-// iree_hal_pipeline_layout_t
-//===----------------------------------------------------------------------===//
-
-// Creates an pipeline layout from the given descriptor sets and push constant
-// required size.
-vm.import private @pipeline_layout.create(
-  %device : !vm.ref<!hal.device>,
-  %push_constants : i32,
-  %set_layouts : !vm.ref<!hal.descriptor_set_layout>...
-) -> !vm.ref<!hal.pipeline_layout>
-attributes {nosideeffects}
-
 }  // module
diff --git a/compiler/src/iree/compiler/Dialect/VMVX/Conversion/HALToVMVX/ConvertHALToVMVX.cpp b/compiler/src/iree/compiler/Dialect/VMVX/Conversion/HALToVMVX/ConvertHALToVMVX.cpp
index 7f17192..2f1aa4f 100644
--- a/compiler/src/iree/compiler/Dialect/VMVX/Conversion/HALToVMVX/ConvertHALToVMVX.cpp
+++ b/compiler/src/iree/compiler/Dialect/VMVX/Conversion/HALToVMVX/ConvertHALToVMVX.cpp
@@ -224,9 +224,7 @@
         .replaceOpWithNewOp<IREE::Util::ListGetOp>(
             op, bindingType, bindingsArg,
             rewriter.createOrFold<arith::ConstantIndexOp>(
-                op.getLoc(), op.getLayout().getFlatBindingIndex(
-                                 op.getSet().getSExtValue(),
-                                 op.getBinding().getSExtValue())))
+                op.getLoc(), op.getBinding().getSExtValue()))
         .getResult();
     return success();
   }
@@ -249,12 +247,13 @@
     IndexSet indexSet(op.getLoc(), rewriter);
     auto bindingType = llvm::cast<IREE::Util::ListType>(bindingsArg.getType())
                            .getElementType();
-    auto sourceBuffer = rewriter
-                            .create<IREE::Util::ListGetOp>(
-                                op.getLoc(), bindingType, bindingsArg,
-                                rewriter.createOrFold<arith::ConstantIndexOp>(
-                                    op.getLoc(), op.getFlatBindingIndex()))
-                            .getResult();
+    auto sourceBuffer =
+        rewriter
+            .create<IREE::Util::ListGetOp>(
+                op.getLoc(), bindingType, bindingsArg,
+                rewriter.createOrFold<arith::ConstantIndexOp>(
+                    op.getLoc(), op.getBinding().getSExtValue()))
+            .getResult();
 
     if (op.getByteOffset() && !matchPattern(op.getByteOffset(), m_Zero())) {
       // Offsetted binding: replace with a BufferSubspanOp.
diff --git a/compiler/src/iree/compiler/Dialect/VMVX/Conversion/HALToVMVX/test/interface_ops.mlir b/compiler/src/iree/compiler/Dialect/VMVX/Conversion/HALToVMVX/test/interface_ops.mlir
index dabea33..91a8bb6 100644
--- a/compiler/src/iree/compiler/Dialect/VMVX/Conversion/HALToVMVX/test/interface_ops.mlir
+++ b/compiler/src/iree/compiler/Dialect/VMVX/Conversion/HALToVMVX/test/interface_ops.mlir
@@ -1,10 +1,8 @@
 // RUN: iree-opt --split-input-file --iree-vmvx-conversion --canonicalize %s | FileCheck %s
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // CHECK: util.global private @__constant_5xi32 : !util.buffer
@@ -33,9 +31,9 @@
   %c1 = arith.constant 1 : index
   %0 = memref.get_global @__constant_5xi32 : memref<5xi32>
   //      CHECK: %[[BINDING0:.+]] = util.list.get %[[BINDINGS]][%c0] : !util.list<!util.buffer>
-  %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<5xf32>
+  %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<5xf32>
   //      CHECK: %[[BINDING1:.+]] = util.list.get %[[BINDINGS]][%c1] : !util.list<!util.buffer>
-  %2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) : memref<5xi32>
+  %2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) : memref<5xi32>
   %workgroup_size_x = hal.interface.workgroup.size[0] : index
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
diff --git a/compiler/src/iree/compiler/Dialect/VMVX/IR/VMVXOps.td b/compiler/src/iree/compiler/Dialect/VMVX/IR/VMVXOps.td
index 218a38f..6321e1c 100644
--- a/compiler/src/iree/compiler/Dialect/VMVX/IR/VMVXOps.td
+++ b/compiler/src/iree/compiler/Dialect/VMVX/IR/VMVXOps.td
@@ -73,7 +73,6 @@
   }];
   let arguments = (ins
     HAL_PipelineLayoutAttr:$layout,
-    IndexAttr:$set,
     IndexAttr:$binding
   );
   let results = (outs
@@ -81,7 +80,6 @@
   );
   let assemblyFormat = [{
     `layout` `(` $layout `)`
-    `set` `(` $set `)`
     `binding` `(` $binding `)`
     attr-dict
   }];
diff --git a/compiler/src/iree/compiler/Dialect/VMVX/Transforms/ResolveBufferDescriptors.cpp b/compiler/src/iree/compiler/Dialect/VMVX/Transforms/ResolveBufferDescriptors.cpp
index 5d38c4f..cc40323 100644
--- a/compiler/src/iree/compiler/Dialect/VMVX/Transforms/ResolveBufferDescriptors.cpp
+++ b/compiler/src/iree/compiler/Dialect/VMVX/Transforms/ResolveBufferDescriptors.cpp
@@ -315,7 +315,7 @@
         rewriter
             .create<IREE::VMVX::GetRawInterfaceBindingBufferOp>(
                 loc, op.getBaseBuffer().getType(), binding.getLayout(),
-                binding.getSetAttr(), binding.getBindingAttr())
+                binding.getBindingAttr())
             .getResult());
 
     rewriter.eraseOp(op);
diff --git a/compiler/src/iree/compiler/Dialect/VMVX/Transforms/test/resolve_buffer_descriptors.mlir b/compiler/src/iree/compiler/Dialect/VMVX/Transforms/test/resolve_buffer_descriptors.mlir
index f2b1ffc..e810002 100644
--- a/compiler/src/iree/compiler/Dialect/VMVX/Transforms/test/resolve_buffer_descriptors.mlir
+++ b/compiler/src/iree/compiler/Dialect/VMVX/Transforms/test/resolve_buffer_descriptors.mlir
@@ -60,15 +60,13 @@
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 func.func @resolve_binding_subspan_zero_offset() -> (!util.buffer, index, index, index, index, index) {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<512x384xf32>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<512x384xf32>
   %base_buffer, %offset, %sizes:2, %strides:2 = vmvx.get_buffer_descriptor %0 : memref<512x384xf32> -> !util.buffer, index, index, index, index, index
   return %base_buffer, %offset, %sizes#0, %sizes#1, %strides#0, %strides#1 : !util.buffer, index, index, index, index, index
 }
@@ -77,19 +75,17 @@
 // CHECK-DAG:   %[[C384:.+]] = arith.constant 384 : index
 // CHECK-DAG:   %[[C1:.+]] = arith.constant 1 : index
 // CHECK-DAG:   %[[C0:.+]] = arith.constant 0 : index
-//     CHECK:   %[[CAST:.+]] = vmvx.get_raw_interface_binding_buffer layout({{.+}}) set(0) binding(0)
+//     CHECK:   %[[CAST:.+]] = vmvx.get_raw_interface_binding_buffer layout({{.+}}) binding(0)
 //     CHECK:   return %[[CAST]], %[[C0]], %[[C512]], %[[C384]], %[[C384]], %[[C1]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 func.func @resolve_binding_subspan_offset_index(%arg0 : index) -> (!util.buffer, index, index, index, index, index) {
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%arg0) : memref<512x384xindex>
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%arg0) : memref<512x384xindex>
   %base_buffer, %offset, %sizes:2, %strides:2 = vmvx.get_buffer_descriptor %0 : memref<512x384xindex> -> !util.buffer, index, index, index, index, index
   return %base_buffer, %offset, %sizes#0, %sizes#1, %strides#0, %strides#1 : !util.buffer, index, index, index, index, index
 }
@@ -100,26 +96,24 @@
 // CHECK-DAG:   %[[C1:.+]] = arith.constant 1 : index
 // CHECK-DAG:   %[[INDEX_SIZE:.+]] = util.sizeof index
 // CHECK-DAG:   %[[OFFSET:.+]] = affine.apply #map()[%arg0, %[[INDEX_SIZE]]]
-//     CHECK:   %[[CAST:.+]] = vmvx.get_raw_interface_binding_buffer layout({{.+}}) set(0) binding(0)
+//     CHECK:   %[[CAST:.+]] = vmvx.get_raw_interface_binding_buffer layout({{.+}}) binding(0)
 //     CHECK:   return %[[CAST]], %[[OFFSET]], %[[C512]], %[[C384]], %[[C384]], %[[C1]]
 
 // -----
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 func.func @resolve_binding_subspan_dyn_dims(%arg0 : index, %arg1 : index) -> (!util.buffer, index, index, index, index, index) {
   %c0 = arith.constant 0 : index
-  %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<?x?xindex>{%arg0, %arg1}
+  %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<?x?xindex>{%arg0, %arg1}
   %base_buffer, %offset, %sizes:2, %strides:2 = vmvx.get_buffer_descriptor %0 : memref<?x?xindex> -> !util.buffer, index, index, index, index, index
   return %base_buffer, %offset, %sizes#0, %sizes#1, %strides#0, %strides#1 : !util.buffer, index, index, index, index, index
 }
 //     CHECK: func @resolve_binding_subspan_dyn_dims(
 // CHECK-DAG:   %[[C1:.+]] = arith.constant 1 : index
-//     CHECK:   %[[CAST:.+]] = vmvx.get_raw_interface_binding_buffer layout({{.+}}) set(0) binding(0)
+//     CHECK:   %[[CAST:.+]] = vmvx.get_raw_interface_binding_buffer layout({{.+}}) binding(0)
 //     CHECK:   return %[[CAST]], %{{.+}}, %arg0, %arg1, %arg1, %[[C1]]
 
 // -----
diff --git a/compiler/src/iree/compiler/DispatchCreation/CollapseDimensions.cpp b/compiler/src/iree/compiler/DispatchCreation/CollapseDimensions.cpp
index 1db69fc..f0b0322 100644
--- a/compiler/src/iree/compiler/DispatchCreation/CollapseDimensions.cpp
+++ b/compiler/src/iree/compiler/DispatchCreation/CollapseDimensions.cpp
@@ -739,7 +739,7 @@
       FailureOr<AffineMap> consumerToProducerMap =
           getConsumerLoopToProducerLoopsMap(*operand);
       if (failed(consumerToProducerMap)) {
-        didChange |= producerInfo.getCollapsibleLoops().size();
+        didChange |= !producerInfo.getCollapsibleLoops().empty();
         producerInfo.clear();
         continue;
       }
diff --git a/compiler/src/iree/compiler/InputConversion/Common/IREEImportPublic.cpp b/compiler/src/iree/compiler/InputConversion/Common/IREEImportPublic.cpp
index 7b950f7..de1f736 100644
--- a/compiler/src/iree/compiler/InputConversion/Common/IREEImportPublic.cpp
+++ b/compiler/src/iree/compiler/InputConversion/Common/IREEImportPublic.cpp
@@ -94,43 +94,34 @@
   }
 }
 
-static IREE::HAL::DescriptorSetBindingAttr
-convertDescriptorSetBinding(IREE::Input::DescriptorSetBindingAttr src) {
-  return IREE::HAL::DescriptorSetBindingAttr::get(
-      src.getContext(), src.getOrdinal(), convertDescriptorType(src.getType()),
+static IREE::HAL::PipelineBindingAttr
+convertPipelineBinding(IREE::Input::PipelineBindingAttr src) {
+  return IREE::HAL::PipelineBindingAttr::get(
+      src.getContext(), convertDescriptorType(src.getType()),
       convertDescriptorFlags(src.getFlags()));
 }
 
-static std::optional<IREE::HAL::DescriptorSetLayoutFlags>
-convertDescriptorSetLayoutFlags(
-    std::optional<IREE::Input::DescriptorSetLayoutFlags> src) {
+static std::optional<IREE::HAL::PipelineLayoutFlags> convertPipelineLayoutFlags(
+    std::optional<IREE::Input::PipelineLayoutFlags> src) {
   if (!src.has_value())
     return std::nullopt;
   switch (*src) {
-  case IREE::Input::DescriptorSetLayoutFlags::None:
-    return IREE::HAL::DescriptorSetLayoutFlags::None;
-  case IREE::Input::DescriptorSetLayoutFlags::Indirect:
-    return IREE::HAL::DescriptorSetLayoutFlags::Indirect;
+  case IREE::Input::PipelineLayoutFlags::None:
+    return IREE::HAL::PipelineLayoutFlags::None;
+  case IREE::Input::PipelineLayoutFlags::Indirect:
+    return IREE::HAL::PipelineLayoutFlags::Indirect;
   default:
     return std::nullopt;
   }
 }
 
-static IREE::HAL::DescriptorSetLayoutAttr
-convertDescriptorSetLayout(IREE::Input::DescriptorSetLayoutAttr src) {
-  return IREE::HAL::DescriptorSetLayoutAttr::get(
-      src.getContext(), src.getOrdinal(),
-      convertAttributes<IREE::HAL::DescriptorSetBindingAttr>(
-          src.getBindings(), convertDescriptorSetBinding),
-      convertDescriptorSetLayoutFlags(src.getFlags()));
-}
-
 static IREE::HAL::PipelineLayoutAttr
 convertPipelineLayout(IREE::Input::PipelineLayoutAttr src) {
   return IREE::HAL::PipelineLayoutAttr::get(
-      src.getContext(), src.getPushConstants(),
-      convertAttributes<IREE::HAL::DescriptorSetLayoutAttr>(
-          src.getSetLayouts(), convertDescriptorSetLayout));
+      src.getContext(),
+      convertAttributes<IREE::HAL::PipelineBindingAttr>(src.getBindings(),
+                                                        convertPipelineBinding),
+      src.getConstants(), convertPipelineLayoutFlags(src.getFlags()));
 }
 
 static IREE::HAL::ExecutableObjectAttr
diff --git a/compiler/src/iree/compiler/InputConversion/Common/test/iree_import_public.mlir b/compiler/src/iree/compiler/InputConversion/Common/test/iree_import_public.mlir
index d9cff44..0c48dbc 100644
--- a/compiler/src/iree/compiler/InputConversion/Common/test/iree_import_public.mlir
+++ b/compiler/src/iree/compiler/InputConversion/Common/test/iree_import_public.mlir
@@ -385,11 +385,11 @@
 // -----
 // CHECK: #[[PTX:.*]] = #hal.executable.target<"cuda", "cuda-nvptx-fb">
 
-// CHECK: #[[LAYOUT:.*]] = #hal.pipeline.layout<push_constants = 1,
-// CHECK-SAME: sets = [<0, bindings = [
-// CHECK-SAME:             <0, storage_buffer, ReadOnly>,
-// CHECK-SAME:             <1, storage_buffer>
-// CHECK-SAME: ]>]>
+// CHECK: #[[LAYOUT:.*]] = #hal.pipeline.layout<constants = 1,
+// CHECK-SAME: bindings = [
+// CHECK-SAME:   #hal.pipeline.binding<storage_buffer, ReadOnly>,
+// CHECK-SAME:   #hal.pipeline.binding<storage_buffer>
+// CHECK-SAME: ]>
 
 // CHECK: hal.executable.source private @executable
 // CHECK-SAME: {objects = #hal.executable.objects<{
@@ -409,11 +409,9 @@
     }>
   } {
     iree_input.executable.export public @add ordinal(0)
-      layout(#iree_input.pipeline.layout<push_constants = 1, sets = [
-        <0, bindings = [
-            <0, storage_buffer, ReadOnly>,
-            <1, storage_buffer>
-        ]>
+      layout(#iree_input.pipeline.layout<constants = 1, bindings = [
+        #iree_input.pipeline.binding<storage_buffer, ReadOnly>,
+        #iree_input.pipeline.binding<storage_buffer>
       ]>) attributes {
       workgroup_size = [64 : index, 1 : index, 1 : index]
     }
diff --git a/compiler/src/iree/compiler/Modules/HAL/Inline/Transforms/InlineExecutables.cpp b/compiler/src/iree/compiler/Modules/HAL/Inline/Transforms/InlineExecutables.cpp
index a048e79..a4f60ef 100644
--- a/compiler/src/iree/compiler/Modules/HAL/Inline/Transforms/InlineExecutables.cpp
+++ b/compiler/src/iree/compiler/Modules/HAL/Inline/Transforms/InlineExecutables.cpp
@@ -104,17 +104,14 @@
       // Build dispatch function signature that the stream.cmd.dispatch ops will
       // map to.
       auto layoutAttr = exportOp.getLayout();
-      size_t totalBindingCount = 0;
-      for (auto setLayout : layoutAttr.getSetLayouts()) {
-        totalBindingCount += setLayout.getBindings().size();
-      }
+      size_t bindingCount = layoutAttr.getBindings().size();
       SmallVector<Type> inputTypes;
       inputTypes.append(exportOp.getWorkgroupCountBody()->getNumArguments() - 1,
                         indexType); // workload
-      inputTypes.append(layoutAttr.getPushConstants(), i32Type);
-      inputTypes.append(totalBindingCount, bufferType); // buffers
-      inputTypes.append(totalBindingCount, indexType);  // offsets
-      inputTypes.append(totalBindingCount, indexType);  // lengths
+      inputTypes.append(layoutAttr.getConstants(), i32Type);
+      inputTypes.append(bindingCount, bufferType); // buffers
+      inputTypes.append(bindingCount, indexType);  // offsets
+      inputTypes.append(bindingCount, indexType);  // lengths
       auto dispatchFuncType =
           innerModuleBuilder.getFunctionType(inputTypes, {});
 
@@ -136,13 +133,13 @@
         return exportOp.emitOpError("missing body function");
       }
       if (bodyFuncOp.isPublic()) {
-        if (failed(rewriteWorkgroupSignature(layoutAttr, totalBindingCount,
+        if (failed(rewriteWorkgroupSignature(layoutAttr, bindingCount,
                                              bodyFuncOp))) {
           return failure();
         }
         bodyFuncOp.setPrivate(); // so we only do it once
       }
-      buildDispatchFunc(exportOp, layoutAttr, totalBindingCount, bodyFuncOp,
+      buildDispatchFunc(exportOp, layoutAttr, bindingCount, bodyFuncOp,
                         dispatchFuncOp);
 
       // Map from what the stream.cmd.dispatch ops is using to the new function.
@@ -185,7 +182,7 @@
   // about the function signatures.
   LogicalResult
   rewriteWorkgroupSignature(IREE::HAL::PipelineLayoutAttr layoutAttr,
-                            size_t totalBindingCount,
+                            size_t bindingCount,
                             FunctionOpInterface bodyFuncOp) {
     auto *entryBlock = &bodyFuncOp.front();
     auto builder = OpBuilder::atBlockBegin(entryBlock);
@@ -209,10 +206,10 @@
 
     // Expand push constants by replacing buffer accesses with the flattened
     // args.
-    newArgTypes.append(layoutAttr.getPushConstants(), i32Type);
+    newArgTypes.append(layoutAttr.getConstants(), i32Type);
     auto constantBuffer = entryBlock->getArgument(argOffset++);
     SmallVector<Value> constantArgs;
-    for (unsigned i = 0; i < layoutAttr.getPushConstants(); ++i) {
+    for (unsigned i = 0; i < layoutAttr.getConstants(); ++i) {
       constantArgs.push_back(
           entryBlock->addArgument(i32Type, constantBuffer.getLoc()));
     }
@@ -221,10 +218,10 @@
     }
 
     // Expand buffer list by replacing list accesses with the flattened args.
-    newArgTypes.append(totalBindingCount, bufferType);
+    newArgTypes.append(bindingCount, bufferType);
     auto bindingList = entryBlock->getArgument(argOffset++);
     SmallVector<Value> bindingArgs;
-    for (unsigned i = 0; i < totalBindingCount; ++i) {
+    for (unsigned i = 0; i < bindingCount; ++i) {
       bindingArgs.push_back(
           entryBlock->addArgument(bufferType, bindingList.getLoc()));
     }
@@ -329,7 +326,7 @@
   // Builds a function that calls a workgroup body and marshals arguments.
   //
   // Incoming:
-  //   (workload..., push_constants...,
+  //   (workload..., constants...,
   //    binding_buffers..., binding_offsets..., binding_lengths...)
   // Body (as translated):
   //   (local_memory, [constants], [bindings],
@@ -338,8 +335,7 @@
   //    workgroup_count_x, workgroup_count_y, workgroup_count_z)
   void buildDispatchFunc(IREE::HAL::ExecutableExportOp exportOp,
                          IREE::HAL::PipelineLayoutAttr layoutAttr,
-                         size_t totalBindingCount,
-                         FunctionOpInterface bodyFuncOp,
+                         size_t bindingCount, FunctionOpInterface bodyFuncOp,
                          FunctionOpInterface dispatchFuncOp) {
     auto loc = exportOp.getLoc();
     auto builder = OpBuilder::atBlockBegin(dispatchFuncOp.addEntryBlock());
@@ -369,18 +365,18 @@
     workgroupArgs.push_back(localMemory);
 
     // Pass all constants through.
-    for (int64_t i = 0; i < layoutAttr.getPushConstants(); ++i) {
+    for (int64_t i = 0; i < layoutAttr.getConstants(); ++i) {
       workgroupArgs.push_back(dispatchFuncOp.getArgument(argOffset++));
     }
 
     // Pass all buffers through as subspans with the binding offset and length
     // factored in. IPO can propagate the subspans (hopefully).
-    for (size_t i = 0; i < totalBindingCount; ++i) {
+    for (size_t i = 0; i < bindingCount; ++i) {
       auto bindingBuffer = dispatchFuncOp.getArgument(argOffset + i);
       auto bindingOffset =
-          dispatchFuncOp.getArgument(argOffset + totalBindingCount + i);
-      auto bindingLength = dispatchFuncOp.getArgument(
-          argOffset + totalBindingCount + totalBindingCount + i);
+          dispatchFuncOp.getArgument(argOffset + bindingCount + i);
+      auto bindingLength = dispatchFuncOp.getArgument(argOffset + bindingCount +
+                                                      bindingCount + i);
       Value bufferSize =
           builder.create<IREE::Util::BufferSizeOp>(loc, bindingBuffer);
       Value bindingView = builder.create<IREE::Util::BufferSubspanOp>(
diff --git a/compiler/src/iree/compiler/Modules/HAL/Inline/Transforms/test/inline_executables.mlir b/compiler/src/iree/compiler/Modules/HAL/Inline/Transforms/test/inline_executables.mlir
index 1ee7e25..4dff570 100644
--- a/compiler/src/iree/compiler/Modules/HAL/Inline/Transforms/test/inline_executables.mlir
+++ b/compiler/src/iree/compiler/Modules/HAL/Inline/Transforms/test/inline_executables.mlir
@@ -8,14 +8,11 @@
 hal.executable private @ex {
   hal.executable.variant public @vmvx_ir target(<"vmvx-inline", "vmvx-ir">) {
     hal.executable.export public @dispatch_0 ordinal(0) layout(
-         #hal.pipeline.layout<push_constants = 2,
-                                sets = [
-                                  <0, bindings = [
-                                    <0, storage_buffer>,
-                                    <1, storage_buffer>,
-                                    <2, storage_buffer>
-                                  ]>
-                                ]>) {
+      #hal.pipeline.layout<constants = 2, bindings = [
+        #hal.pipeline.binding<storage_buffer>,
+        #hal.pipeline.binding<storage_buffer>,
+        #hal.pipeline.binding<storage_buffer>
+      ]>) {
     ^bb0(%arg0: !hal.device, %workload_x: index, %workload_y: index):
       %count_x = affine.apply affine_map<()[s0] -> (s0 ceildiv 4)>()[%workload_x]
       %count_y = affine.apply affine_map<()[s0] -> (s0 ceildiv 4)>()[%workload_y]
diff --git a/compiler/src/iree/compiler/Modules/HAL/Loader/Conversion/HALLoaderToVM/Patterns.cpp b/compiler/src/iree/compiler/Modules/HAL/Loader/Conversion/HALLoaderToVM/Patterns.cpp
index 8574102..687830d 100644
--- a/compiler/src/iree/compiler/Modules/HAL/Loader/Conversion/HALLoaderToVM/Patterns.cpp
+++ b/compiler/src/iree/compiler/Modules/HAL/Loader/Conversion/HALLoaderToVM/Patterns.cpp
@@ -89,19 +89,19 @@
         castToI32(adaptor.getWorkgroupY(), rewriter),
         castToI32(adaptor.getWorkgroupZ(), rewriter),
     };
-    auto pushConstants = adaptor.getPushConstants();
+    auto constants = adaptor.getConstants();
     SmallVector<int16_t, 5> segmentSizes = {
         /*executable=*/-1,
         /*entry_point=*/-1,
         /*workgroup_x=*/-1,
         /*workgroup_y=*/-1,
         /*workgroup_z=*/-1,
-        /*push_constants=*/
-        static_cast<int16_t>(pushConstants.size()),
+        /*constants=*/
+        static_cast<int16_t>(constants.size()),
         /*bindings=*/
         static_cast<int16_t>(adaptor.getBindingBuffers().size()),
     };
-    callOperands.append(pushConstants.begin(), pushConstants.end());
+    callOperands.append(constants.begin(), constants.end());
     for (auto [bindingBuffer, bindingOffset, bindingLength] : llvm::zip_equal(
              adaptor.getBindingBuffers(), adaptor.getBindingOffsets(),
              adaptor.getBindingLengths())) {
diff --git a/compiler/src/iree/compiler/Modules/HAL/Loader/Conversion/StreamToHALLoader/test/cmd_ops.mlir b/compiler/src/iree/compiler/Modules/HAL/Loader/Conversion/StreamToHALLoader/test/cmd_ops.mlir
index a767492..9370e09 100644
--- a/compiler/src/iree/compiler/Modules/HAL/Loader/Conversion/StreamToHALLoader/test/cmd_ops.mlir
+++ b/compiler/src/iree/compiler/Modules/HAL/Loader/Conversion/StreamToHALLoader/test/cmd_ops.mlir
@@ -3,13 +3,9 @@
 // NOTE: all other stream.cmd.* ops are handled by the hal_inline conversions.
 
 // Executables are required to translate the dispatch calls.
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<4, storage_buffer>
-  ]>,
-  #hal.descriptor_set.layout<1, bindings = [
-    #hal.descriptor_set.binding<5, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable private @ex {
   hal.executable.variant public @variant target(#hal.executable.target<"llvm", "embedded-elf-x86_64">) {
diff --git a/compiler/src/iree/compiler/Modules/HAL/Loader/IR/HALLoaderOps.td b/compiler/src/iree/compiler/Modules/HAL/Loader/IR/HALLoaderOps.td
index aa045b5..ac7bb0e 100644
--- a/compiler/src/iree/compiler/Modules/HAL/Loader/IR/HALLoaderOps.td
+++ b/compiler/src/iree/compiler/Modules/HAL/Loader/IR/HALLoaderOps.td
@@ -144,7 +144,7 @@
     HAL_Dim:$workgroup_x,
     HAL_Dim:$workgroup_y,
     HAL_Dim:$workgroup_z,
-    Variadic<I32>:$push_constants,
+    Variadic<I32>:$constants,
     Variadic<Util_BufferType>:$binding_buffers,
     Variadic<HAL_DeviceSize>:$binding_offsets,
     Variadic<HAL_DeviceSize>:$binding_lengths
@@ -158,7 +158,7 @@
         $workgroup_y `,`
         $workgroup_z
     `]` `)`
-    (`constants` `(` `[` $push_constants^ `]` `)`)?
+    (`constants` `(` `[` $constants^ `]` `)`)?
     `bindings` `(` `[`
     custom<DispatchBindings>($binding_buffers,
                              type($binding_buffers),
diff --git a/compiler/src/iree/compiler/Modules/HAL/Loader/hal_loader.imports.mlir b/compiler/src/iree/compiler/Modules/HAL/Loader/hal_loader.imports.mlir
index d811bc7..a76e2d2 100644
--- a/compiler/src/iree/compiler/Modules/HAL/Loader/hal_loader.imports.mlir
+++ b/compiler/src/iree/compiler/Modules/HAL/Loader/hal_loader.imports.mlir
@@ -32,7 +32,7 @@
   %workgroup_x : i32,
   %workgroup_y : i32,
   %workgroup_z : i32,
-  %push_constants : i32 ...,
+  %constants : i32 ...,
   // <buffer, offset, length>
   %bindings : tuple<!vm.buffer, i64, i64>...
 )
diff --git a/docs/website/docs/community/blog/posts/cuda-backend.md b/docs/website/docs/community/blog/posts/cuda-backend.md
index ab2ee21..7c5adc3 100644
--- a/docs/website/docs/community/blog/posts/cuda-backend.md
+++ b/docs/website/docs/community/blog/posts/cuda-backend.md
@@ -82,7 +82,7 @@
   entry_points:[string];
 
   // Block sizes for each entry point.
-  block_sizes:[CUDABlockSizeDef];
+  block_sizes:[CUDABlockSize];
 
   // PTX string of the module.
   ptx_image:string;
diff --git a/docs/website/docs/community/blog/posts/microkernels.md b/docs/website/docs/community/blog/posts/microkernels.md
index 1473a79..7e14195 100644
--- a/docs/website/docs/community/blog/posts/microkernels.md
+++ b/docs/website/docs/community/blog/posts/microkernels.md
@@ -338,7 +338,7 @@
 [...]
 // -----// IR Dump After Inliner (inline) //----- //
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu = "znver4", cpu_features = "+mmx,+popcnt,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+avx,+avx2,+sse4a,+fma,+avx512f,+bmi,+bmi2,+aes,+pclmul,+avx512vl,+avx512bw,+avx512dq,+avx512cd,+avx512vbmi,+avx512ifma,+avx512vpopcntdq,+avx512vbmi2,+gfni,+vpclmulqdq,+avx512vnni,+avx512bitalg,+avx512bf16,+adx,+clflushopt,+clwb,+clzero,+cx16,+cx8,+crc32,+f16c,+fsgsbase,+fxsr,+invpcid,+lzcnt,+movbe,+mwaitx,+pku,+prfchw,+rdpid,+rdpru,+rdrnd,+rdseed,+sahf,+sha,+shstk,+vaes,+wbnoinvd,+x87,+xsave,+xsavec,+xsaveopt,+xsaves,+evex512", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", native_vector_size = 64 : index, target_triple = "x86_64-unknown-unknown-eabi-elf", ukernels = "all"}>
-#device_target_llvm_cpu = #hal.device.target<"llvm-cpu", {executable_targets = [#executable_target_embedded_elf_x86_64_]}> : !hal.device
+#device_target_llvm_cpu = #hal.device.target<"local", {executable_targets = [#executable_target_embedded_elf_x86_64_]}> : !hal.device
 module attributes {hal.device.targets = [#device_target_llvm_cpu]} {
   func.func @matmul_dynamic(%arg0: !hal.buffer_view, %arg1: !hal.buffer_view, %arg2: !hal.buffer_view) -> !hal.buffer_view attributes {iree.abi.stub, iree.reflection = {iree.abi.declaration = "sync func @matmul_dynamic(%input0: tensor<?x?xf32>, %input1: tensor<?x?xf32>, %input2: tensor<?x?xf32>) -> (%output0: tensor<?x?xf32>)"}} {
     %0 = hal.buffer_view.dim<%arg0 : !hal.buffer_view>[0] : index
@@ -367,7 +367,7 @@
 // -----// IR Dump After CSE (cse) //----- //
 #executable_target_embedded_elf_x86_64_ = #hal.executable.target<"llvm-cpu", "embedded-elf-x86_64", {cpu = "znver4", cpu_features = "+mmx,+popcnt,+sse,+sse2,+sse3,+ssse3,+sse4.1,+sse4.2,+avx,+avx2,+sse4a,+fma,+avx512f,+bmi,+bmi2,+aes,+pclmul,+avx512vl,+avx512bw,+avx512dq,+avx512cd,+avx512vbmi,+avx512ifma,+avx512vpopcntdq,+avx512vbmi2,+gfni,+vpclmulqdq,+avx512vnni,+avx512bitalg,+avx512bf16,+adx,+clflushopt,+clwb,+clzero,+cx16,+cx8,+crc32,+f16c,+fsgsbase,+fxsr,+invpcid,+lzcnt,+movbe,+mwaitx,+pku,+prfchw,+rdpid,+rdpru,+rdrnd,+rdseed,+sahf,+sha,+shstk,+vaes,+wbnoinvd,+x87,+xsave,+xsavec,+xsaveopt,+xsaves,+evex512", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", native_vector_size = 64 : index, target_triple = "x86_64-unknown-unknown-eabi-elf", ukernels = "all"}>
 #map = affine_map<()[s0] -> (s0 ceildiv 16)>
-#device_target_llvm_cpu = #hal.device.target<"llvm-cpu", {executable_targets = [#executable_target_embedded_elf_x86_64_]}> : !hal.device
+#device_target_llvm_cpu = #hal.device.target<"local", {executable_targets = [#executable_target_embedded_elf_x86_64_]}> : !hal.device
 module attributes {hal.device.targets = [#device_target_llvm_cpu]} {
   func.func @matmul_dynamic(%arg0: !hal.buffer_view, %arg1: !hal.buffer_view, %arg2: !hal.buffer_view) -> !hal.buffer_view attributes {iree.abi.stub, iree.reflection = {iree.abi.declaration = "sync func @matmul_dynamic(%input0: tensor<?x?xf32>, %input1: tensor<?x?xf32>, %input2: tensor<?x?xf32>) -> (%output0: tensor<?x?xf32>)"}} {
     %cst = arith.constant 0.000000e+00 : f32
@@ -467,9 +467,9 @@
     %53 = arith.shli %52, %c32_i64 : i64
     %54 = arith.ori %51, %53 : i64
     %55 = arith.index_castui %54 : i64 to index
-    %56 = hal.interface.binding.subspan layout(#layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x?x16x1xf32>>{%30, %35}
-    %57 = hal.interface.binding.subspan layout(#layout) set(0) binding(0) alignment(64) offset(%20) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x?x16x1xf32>>{%40, %45}
-    %58 = hal.interface.binding.subspan layout(#layout) set(0) binding(1) alignment(64) offset(%25) : !flow.dispatch.tensor<readwrite:tensor<?x?x16x16xf32>>{%50, %55}
+    %56 = hal.interface.binding.subspan layout(#layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x?x16x1xf32>>{%30, %35}
+    %57 = hal.interface.binding.subspan layout(#layout) binding(0) alignment(64) offset(%20) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<?x?x16x1xf32>>{%40, %45}
+    %58 = hal.interface.binding.subspan layout(#layout) binding(1) alignment(64) offset(%25) : !flow.dispatch.tensor<readwrite:tensor<?x?x16x16xf32>>{%50, %55}
     %workgroup_id_x = hal.interface.workgroup.id[0] : index
     %workgroup_count_x = hal.interface.workgroup.count[0] : index
     %workgroup_id_y = hal.interface.workgroup.id[1] : index
@@ -566,11 +566,11 @@
   %53 = arith.shli %52, %c32_i64 : i64
   %54 = arith.ori %51, %53 : i64
   %55 = arith.index_castui %54 : i64 to index
-  %56 = hal.interface.binding.subspan layout(#layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<?x?x16x1xf32, #hal.descriptor_type<storage_buffer>>{%30, %35}
+  %56 = hal.interface.binding.subspan layout(#layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<?x?x16x1xf32, #hal.descriptor_type<storage_buffer>>{%30, %35}
   memref.assume_alignment %56, 64 : memref<?x?x16x1xf32, #hal.descriptor_type<storage_buffer>>
-  %57 = hal.interface.binding.subspan layout(#layout) set(0) binding(0) alignment(64) offset(%20) flags(ReadOnly) : memref<?x?x16x1xf32, strided<[?, 16, 1, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>{%40, %45}
+  %57 = hal.interface.binding.subspan layout(#layout) binding(0) alignment(64) offset(%20) flags(ReadOnly) : memref<?x?x16x1xf32, strided<[?, 16, 1, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>{%40, %45}
   memref.assume_alignment %57, 1 : memref<?x?x16x1xf32, strided<[?, 16, 1, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>
-  %58 = hal.interface.binding.subspan layout(#layout) set(0) binding(1) alignment(64) offset(%25) : memref<?x?x16x16xf32, strided<[?, 256, 16, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>{%50, %55}
+  %58 = hal.interface.binding.subspan layout(#layout) binding(1) alignment(64) offset(%25) : memref<?x?x16x16xf32, strided<[?, 256, 16, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>{%50, %55}
   memref.assume_alignment %58, 1 : memref<?x?x16x16xf32, strided<[?, 256, 16, 1], offset: ?>, #hal.descriptor_type<storage_buffer>>
   %workgroup_id_x = hal.interface.workgroup.id[0] : index
   %workgroup_count_x = hal.interface.workgroup.count[0] : index
@@ -657,11 +657,11 @@
     %53 = arith.shli %52, %c32_i64 : i64
     %54 = arith.ori %51, %53 : i64
     %55 = arith.index_castui %54 : i64 to index
-    %56 = hal.interface.binding.subspan layout(#layout) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<?x?x16x1xf32>{%30, %35}
+    %56 = hal.interface.binding.subspan layout(#layout) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<?x?x16x1xf32>{%30, %35}
     memref.assume_alignment %56, 64 : memref<?x?x16x1xf32>
-    %57 = hal.interface.binding.subspan layout(#layout) set(0) binding(0) alignment(64) offset(%20) flags(ReadOnly) : memref<?x?x16x1xf32, strided<[?, 16, 1, 1], offset: ?>>{%40, %45}
+    %57 = hal.interface.binding.subspan layout(#layout) binding(0) alignment(64) offset(%20) flags(ReadOnly) : memref<?x?x16x1xf32, strided<[?, 16, 1, 1], offset: ?>>{%40, %45}
     memref.assume_alignment %57, 1 : memref<?x?x16x1xf32, strided<[?, 16, 1, 1], offset: ?>>
-    %58 = hal.interface.binding.subspan layout(#layout) set(0) binding(1) alignment(64) offset(%25) : memref<?x?x16x16xf32, strided<[?, 256, 16, 1], offset: ?>>{%50, %55}
+    %58 = hal.interface.binding.subspan layout(#layout) binding(1) alignment(64) offset(%25) : memref<?x?x16x16xf32, strided<[?, 256, 16, 1], offset: ?>>{%50, %55}
     memref.assume_alignment %58, 1 : memref<?x?x16x16xf32, strided<[?, 256, 16, 1], offset: ?>>
     %workgroup_id_x = hal.interface.workgroup.id[0] : index
     %workgroup_count_x = hal.interface.workgroup.count[0] : index
diff --git a/docs/website/docs/developers/design-docs/metal-hal-driver.md b/docs/website/docs/developers/design-docs/metal-hal-driver.md
index 855ddad..95ec026 100644
--- a/docs/website/docs/developers/design-docs/metal-hal-driver.md
+++ b/docs/website/docs/developers/design-docs/metal-hal-driver.md
@@ -70,8 +70,6 @@
 [`iree_hal_buffer_t`][hal-buffer]                               | [`MTLBuffer`][mtl-buffer]
 [`iree_hal_executable_t`][hal-executable]                       | [`MTLLibrary`][mtl-library]
 [`iree_hal_executable_cache_t`][hal-executable-cache]           | N/A
-[`iree_hal_descriptor_set_layout_t`][hal-descriptor-set-layout] | N/A
-[`iree_hal_pipeline_layout_t`][hal-pipeline-layout]             | N/A
 
 In the following subsections, we go over each pair to provide more details.
 
@@ -195,7 +193,7 @@
 IREE [`iree_hal_executable_t`][hal-executable] represents a GPU program archive with
 a driver-defined format. It maps naturally to Metal [`MTLLibrary`][mtl-library].
 An entry point in a `MTLLibrary` is a [`MTLFunction`][mtl-function]. We define
-[`iree_hal_metal_kernel_params_t`][metal-kernel-library] to wrap around a
+[`iree_hal_metal_executable_t`][metal-executable] to wrap around a
 `MTLLibrary`, its `MTLFunction`s, and also `MTLComputePipelineState` objects
 constructed from `MTLFunction`s.
 
@@ -268,33 +266,6 @@
 for each entry point and use it as the threadgroup size when later dispatching
 the `MTLFunction` corresponding to the entry point.
 
-### Resource descriptors
-
-A descriptor is an opaque handle pointing to a resource that is accessed in
-the compute kernel. IREE's HAL models several concepts related to GPU resource
-management explicitly:
-
-* [`iree_hal_descriptor_set_layout_t`][hal-descriptor-set-layout]: a schema for
-  describing an array of descriptor bindings. Each descriptor binding specifies
-  the resource type, access mode and other information.
-* [`iree_hal_pipeline_layout_t`][hal-pipeline-layout]: a schema for describing all
-  the resources accessed by a compute pipeline. It includes zero or more
-  `DescriptorSetLayout`s and (optional) push constants.
-
-However, this isn't totally matching Metal's paradigm.
-In the Metal framework, the closest concept to descriptor sets would be [argument
-buffer][mtl-argument-buffer]. There is no direct correspondence to
-descriptor set layout and pipeline layout. Rather, the layout is implicitly
-encoded in Metal shaders as MSL structs. The APIs for creating argument buffers
-do not encourage early creation without pipelines: one typically creates them
-for each `MTLFunction`.
-
-All of this means it's better to defer the creation of the argument buffer
-until the point of compute pipeline creation and dispatch. Therefore, the Metal
-HAL driver's `iree_hal_metal_descriptor_set_layout_t` and
-`iree_hal_metal_pipeline_layout_t` are just containers holding the information
-up for recording [command buffer dispatch](#command-buffer-dispatch).
-
 ### Command buffer dispatch
 
 Metal HAL driver command buffer dispatch recording performs the following steps
@@ -319,8 +290,6 @@
 [hal-allocator]: https://github.com/iree-org/iree/blob/main/runtime/src/iree/hal/allocator.h
 [hal-buffer]: https://github.com/iree-org/iree/blob/main/runtime/src/iree/hal/buffer.h
 [hal-command-buffer]: https://github.com/iree-org/iree/blob/main/runtime/src/iree/hal/command_buffer.h
-[hal-descriptor-set-layout]: https://github.com/iree-org/iree/blob/main/runtime/src/iree/hal/pipeline_layout.h
-[hal-pipeline-layout]: https://github.com/iree-org/iree/blob/main/runtime/src/iree/hal/pipeline_layout.h
 [hal-device]: https://github.com/iree-org/iree/blob/main/runtime/src/iree/hal/device.h
 [hal-driver]: https://github.com/iree-org/iree/blob/main/runtime/src/iree/hal/driver.h
 [hal-executable]: https://github.com/iree-org/iree/blob/main/runtime/src/iree/hal/executable.h
@@ -328,11 +297,10 @@
 [hal-semaphore]: https://github.com/iree-org/iree/blob/main/runtime/src/iree/hal/semaphore.h
 [metal-device]: https://github.com/iree-org/iree/tree/main/experimental/metal/metal_device.h
 [metal-driver]: https://github.com/iree-org/iree/tree/main/experimental/metal/metal_driver.h
-[metal-kernel-library]: https://github.com/iree-org/iree/tree/main/experimental/metal/kernel_library.h
+[metal-executable]: https://github.com/iree-org/iree/tree/main/experimental/metal/executable.h
 [metal-shared-event]: https://github.com/iree-org/iree/tree/main/experimental/metal/shared_event.h
 [metal-spirv-target]: https://github.com/iree-org/iree/tree/main/compiler/plugins/target/MetalSPIRV
 [metal-builtin-kernels]: https://github.com/iree-org/iree/tree/main/runtime/src/iree/hal/drivers/metal/builtin/
-[mtl-argument-buffer]: https://developer.apple.com/documentation/metal/buffers/about_argument_buffers?language=objc
 [mtl-argument-encoder]: https://developer.apple.com/documentation/metal/mtlargumentencoder?language=objc
 [mtl-buffer]: https://developer.apple.com/documentation/metal/mtlbuffer?language=objc
 [mtl-command-buffer]: https://developer.apple.com/documentation/metal/mtlcommandbuffer?language=objc
diff --git a/experimental/webgpu/BUILD.bazel b/experimental/webgpu/BUILD.bazel
index 4e802e6..c7cec08 100644
--- a/experimental/webgpu/BUILD.bazel
+++ b/experimental/webgpu/BUILD.bazel
@@ -53,9 +53,11 @@
         "//runtime/src/iree/hal/drivers/webgpu/platform",
         "//runtime/src/iree/hal/drivers/webgpu/shaders",
         "//runtime/src/iree/hal/utils:buffer_transfer",
+        "//runtime/src/iree/hal/utils:executable_debug_info",
         "//runtime/src/iree/hal/utils:file_transfer",
         "//runtime/src/iree/hal/utils:memory_file",
-        "//runtime/src/iree/schemas:wgsl_executable_def_c_fbs",
+        "//runtime/src/iree/schemas:executable_debug_info_c_fbs",
+        "//runtime/src/iree/schemas:webgpu_executable_def_c_fbs",
         "@webgpu_headers",
     ],
 )
diff --git a/experimental/webgpu/CMakeLists.txt b/experimental/webgpu/CMakeLists.txt
index 967e55b..fa71067 100644
--- a/experimental/webgpu/CMakeLists.txt
+++ b/experimental/webgpu/CMakeLists.txt
@@ -50,7 +50,7 @@
     iree::experimental::webgpu::shaders
     iree::hal::utils::file_transfer
     iree::hal::utils::memory_file
-    iree::schemas::wgsl_executable_def_c_fbs
+    iree::schemas::webgpu_executable_def_c_fbs
   PUBLIC
 )
 
diff --git a/experimental/webgpu/builtins.c b/experimental/webgpu/builtins.c
index cfeced7..de0d306 100644
--- a/experimental/webgpu/builtins.c
+++ b/experimental/webgpu/builtins.c
@@ -10,8 +10,8 @@
 #include "iree/base/api.h"
 
 static const char* iree_hal_webgpu_builtins_find_code(const char* file_name) {
-  const iree_file_toc_t* files = iree_hal_wgsl_builtin_shaders_create();
-  for (size_t i = 0; i < iree_hal_wgsl_builtin_shaders_size(); ++i) {
+  const iree_file_toc_t* files = iree_hal_webgpu_builtin_shaders_create();
+  for (size_t i = 0; i < iree_hal_webgpu_builtin_shaders_size(); ++i) {
     if (strcmp(file_name, files[i].name) == 0) {
       return files[i].data;
     }
diff --git a/experimental/webgpu/command_buffer.c b/experimental/webgpu/command_buffer.c
index 2a7047b..9240320 100644
--- a/experimental/webgpu/command_buffer.c
+++ b/experimental/webgpu/command_buffer.c
@@ -147,12 +147,12 @@
     // Currently open pass - NULL if no open pass.
     WGPUComputePassEncoder compute_pass;
 
-    // All available push constants updated each time push_constants is called.
+    // All available push constants updated each time constants is called.
     // Reset only with the command buffer and otherwise will maintain its values
-    // during recording to allow for partial push_constants updates.
-    uint32_t push_constants[IREE_HAL_WEBGPU_MAX_PUSH_CONSTANT_COUNT];
+    // during recording to allow for partial constants updates.
+    uint32_t constants[IREE_HAL_WEBGPU_MAX_PUSH_CONSTANT_COUNT];
 
-    // TODO(benvanik): add a push_constants dirty bit so we know if we need to
+    // TODO(benvanik): add a constants dirty bit so we know if we need to
     // upload more. Today we'll stage the same values for each dispatch.
 
     // Snapshot of descriptor sets as populated by push_descriptor_set.
@@ -750,7 +750,7 @@
   return iree_ok_status();
 }
 
-static iree_status_t iree_hal_webgpu_command_buffer_push_constants(
+static iree_status_t iree_hal_webgpu_command_buffer_constants(
     iree_hal_command_buffer_t* base_command_buffer,
     iree_hal_pipeline_layout_t* pipeline_layout, iree_host_size_t offset,
     const void* values, iree_host_size_t values_length) {
@@ -758,7 +758,7 @@
       iree_hal_webgpu_command_buffer_cast(base_command_buffer);
 
   if (IREE_UNLIKELY(offset + values_length >=
-                    sizeof(command_buffer->state.push_constants))) {
+                    sizeof(command_buffer->state.constants))) {
     return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
                             "push constant range %" PRIhsz " (length=%" PRIhsz
                             ") out of range",
@@ -766,7 +766,7 @@
   }
 
   // NOTE: command buffer state change only; enqueues no tasks.
-  memcpy((uint8_t*)&command_buffer->state.push_constants + offset, values,
+  memcpy((uint8_t*)&command_buffer->state.constants + offset, values,
          values_length);
 
   return iree_ok_status();
@@ -819,14 +819,14 @@
 
   // Upload push constant data - this may incur a segment flush if the staging
   // buffer is exhausted.
-  iree_host_size_t push_constant_count =
-      iree_hal_webgpu_pipeline_layout_push_constant_count(entry_point->layout);
-  iree_const_byte_span_t push_constant_data = iree_make_const_byte_span(
-      command_buffer->state.push_constants,
-      push_constant_count * sizeof(command_buffer->state.push_constants[0]));
+  iree_host_size_t constant_count =
+      iree_hal_webgpu_pipeline_layout_constant_count(entry_point->layout);
+  iree_const_byte_span_t constant_data = iree_make_const_byte_span(
+      command_buffer->state.constants,
+      constant_count * sizeof(command_buffer->state.constants[0]));
   uint32_t params_offset = 0;
   IREE_RETURN_IF_ERROR(iree_hal_webgpu_command_buffer_append_parameters(
-      command_buffer, push_constant_data, &params_offset));
+      command_buffer, constant_data, &params_offset));
 
   // Acquire the compute pass we'll encode the dispatch into - this may be
   // fresh or reused from prior commands.
@@ -835,7 +835,7 @@
       command_buffer, &compute_pass));
   wgpuComputePassEncoderSetPipeline(compute_pass, entry_point->pipeline);
 
-  if (push_constant_count > 0) {
+  if (constant_count > 0) {
     // Bind the push constant emulation bind group at the staging buffer
     // relative offset for this dispatch.
     wgpuComputePassEncoderSetBindGroup(
@@ -872,7 +872,7 @@
     command_buffer->state.bind_groups_empty &= ~(1ull << i);
   }
 
-  if (push_constant_count > 0) {
+  if (constant_count > 0) {
     // Pad up to IREE_HAL_WEBGPU_PARAMS_BIND_GROUP_INDEX with empty bind groups.
     WGPUBindGroup empty_handle =
         command_buffer->staging_buffer->empty_bind_group;
@@ -926,7 +926,7 @@
   return iree_ok_status();
 }
 
-static iree_status_t iree_hal_webgpu_command_buffer_prepare_dispatch2(
+static iree_status_t iree_hal_webgpu_command_buffer_prepare_dispatch(
     iree_hal_webgpu_command_buffer_t* command_buffer,
     iree_hal_executable_t* executable, uint32_t ordinal,
     iree_const_byte_span_t constants, iree_hal_buffer_ref_list_t bindings,
@@ -994,7 +994,7 @@
   return iree_ok_status();
 }
 
-static iree_status_t iree_hal_webgpu_command_buffer_dispatch2(
+static iree_status_t iree_hal_webgpu_command_buffer_dispatch(
     iree_hal_command_buffer_t* base_command_buffer,
     iree_hal_executable_t* executable, int32_t entry_point,
     const uint32_t workgroup_count[3], iree_const_byte_span_t constants,
@@ -1003,7 +1003,7 @@
       iree_hal_webgpu_command_buffer_cast(base_command_buffer);
 
   WGPUComputePassEncoder compute_pass = NULL;
-  IREE_RETURN_IF_ERROR(iree_hal_webgpu_command_buffer_prepare_dispatch2(
+  IREE_RETURN_IF_ERROR(iree_hal_webgpu_command_buffer_prepare_dispatch(
       command_buffer, executable, entry_point, constants, bindings, flags,
       &compute_pass));
   wgpuComputePassEncoderDispatchWorkgroups(
@@ -1012,7 +1012,7 @@
   return iree_ok_status();
 }
 
-static iree_status_t iree_hal_webgpu_command_buffer_dispatch2_indirect(
+static iree_status_t iree_hal_webgpu_command_buffer_dispatch_indirect(
     iree_hal_command_buffer_t* base_command_buffer,
     iree_hal_executable_t* executable, int32_t entry_point,
     iree_hal_buffer_ref_t workgroups_ref, iree_const_byte_span_t constants,
@@ -1021,7 +1021,7 @@
       iree_hal_webgpu_command_buffer_cast(base_command_buffer);
 
   WGPUComputePassEncoder compute_pass = NULL;
-  IREE_RETURN_IF_ERROR(iree_hal_webgpu_command_buffer_prepare_dispatch2(
+  IREE_RETURN_IF_ERROR(iree_hal_webgpu_command_buffer_prepare_dispatch(
       command_buffer, executable, entry_point, constants, bindings, flags,
       &compute_pass));
   wgpuComputePassEncoderDispatchWorkgroupsIndirect(
@@ -1045,10 +1045,10 @@
     .fill_buffer = iree_hal_webgpu_command_buffer_fill_buffer,
     .update_buffer = iree_hal_webgpu_command_buffer_update_buffer,
     .copy_buffer = iree_hal_webgpu_command_buffer_copy_buffer,
-    .push_constants = iree_hal_webgpu_command_buffer_push_constants,
+    .constants = iree_hal_webgpu_command_buffer_constants,
     .push_descriptor_set = iree_hal_webgpu_command_buffer_push_descriptor_set,
     .dispatch = iree_hal_webgpu_command_buffer_dispatch,
     .dispatch_indirect = iree_hal_webgpu_command_buffer_dispatch_indirect,
-    .dispatch2 = iree_hal_webgpu_command_buffer_dispatch2,
-    .dispatch2_indirect = iree_hal_webgpu_command_buffer_dispatch2_indirect,
+    .dispatch = iree_hal_webgpu_command_buffer_dispatch,
+    .dispatch_indirect = iree_hal_webgpu_command_buffer_dispatch_indirect,
 };
diff --git a/experimental/webgpu/executable.c b/experimental/webgpu/executable.c
index ff38225..9191f5b 100644
--- a/experimental/webgpu/executable.c
+++ b/experimental/webgpu/executable.c
@@ -10,11 +10,14 @@
 
 #include "iree/base/api.h"
 #include "iree/base/internal/inline_array.h"
+#include "iree/hal/utils/executable_debug_info.h"
 
 // flatcc schemas:
 #include "iree/base/internal/flatcc/parsing.h"
-#include "iree/schemas/wgsl_executable_def_reader.h"
-#include "iree/schemas/wgsl_executable_def_verifier.h"
+#include "iree/schemas/executable_debug_info_reader.h"
+#include "iree/schemas/executable_debug_info_verifier.h"
+#include "iree/schemas/webgpu_executable_def_reader.h"
+#include "iree/schemas/webgpu_executable_def_verifier.h"
 
 typedef struct iree_hal_webgpu_executable_t {
   iree_hal_resource_t resource;
@@ -46,7 +49,7 @@
   // Run flatcc generated verification. This ensures all pointers are in-bounds
   // and that we can safely walk the file, but not that the actual contents of
   // the flatbuffer meet our expectations.
-  int verify_ret = iree_hal_wgsl_ExecutableDef_verify_as_root(
+  int verify_ret = iree_hal_webgpu_ExecutableDef_verify_as_root(
       flatbuffer_data.data, flatbuffer_data.data_length);
   if (verify_ret != flatcc_verify_ok) {
     return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
@@ -54,18 +57,18 @@
                             flatcc_verify_error_string(verify_ret));
   }
 
-  iree_hal_wgsl_ExecutableDef_table_t executable_def =
-      iree_hal_wgsl_ExecutableDef_as_root(flatbuffer_data.data);
+  iree_hal_webgpu_ExecutableDef_table_t executable_def =
+      iree_hal_webgpu_ExecutableDef_as_root(flatbuffer_data.data);
 
-  iree_hal_wgsl_ShaderModuleDef_vec_t shader_modules_vec =
-      iree_hal_wgsl_ExecutableDef_shader_modules_get(executable_def);
+  iree_hal_webgpu_ShaderModuleDef_vec_t shader_modules_vec =
+      iree_hal_webgpu_ExecutableDef_shader_modules_get(executable_def);
   size_t shader_module_count =
-      iree_hal_wgsl_ShaderModuleDef_vec_len(shader_modules_vec);
+      iree_hal_webgpu_ShaderModuleDef_vec_len(shader_modules_vec);
   for (size_t i = 0; i < shader_module_count; ++i) {
-    iree_hal_wgsl_ShaderModuleDef_table_t shader_module_def =
-        iree_hal_wgsl_ShaderModuleDef_vec_at(shader_modules_vec, i);
-    if (flatbuffers_string_len(
-            iree_hal_wgsl_ShaderModuleDef_code_get(shader_module_def)) == 0) {
+    iree_hal_webgpu_ShaderModuleDef_table_t shader_module_def =
+        iree_hal_webgpu_ShaderModuleDef_vec_at(shader_modules_vec, i);
+    if (flatbuffers_string_len(iree_hal_webgpu_ShaderModuleDef_wgsl_source_get(
+            shader_module_def)) == 0) {
       return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
                               "shader module %zu WGSL code is missing/empty",
                               i);
@@ -73,7 +76,7 @@
   }
 
   flatbuffers_uint32_vec_t entry_points_vec =
-      iree_hal_wgsl_ExecutableDef_entry_points_get(executable_def);
+      iree_hal_webgpu_ExecutableDef_entry_points_get(executable_def);
   size_t entry_point_count = flatbuffers_uint32_vec_len(entry_points_vec);
   if (entry_point_count != expected_entry_point_count) {
     return iree_make_status(IREE_STATUS_FAILED_PRECONDITION,
@@ -96,14 +99,16 @@
 }
 
 static iree_status_t iree_hal_webgpu_create_wgsl_shader_module(
-    WGPUDevice device, iree_hal_wgsl_ShaderModuleDef_table_t shader_module_def,
+    WGPUDevice device,
+    iree_hal_webgpu_ShaderModuleDef_table_t shader_module_def,
     WGPUShaderModule* out_shader_module) {
   IREE_ASSERT_ARGUMENT(shader_module_def);
   IREE_ASSERT_ARGUMENT(out_shader_module);
   *out_shader_module = NULL;
   IREE_TRACE_ZONE_BEGIN(z0);
 
-  const char* code = iree_hal_wgsl_ShaderModuleDef_code_get(shader_module_def);
+  const char* code =
+      iree_hal_webgpu_ShaderModuleDef_wgsl_source_get(shader_module_def);
 
   const WGPUShaderModuleWGSLDescriptor descriptor = {
       .chain =
@@ -229,17 +234,17 @@
       z0, iree_hal_webgpu_executable_flatbuffer_verify(
               executable_params->executable_data,
               executable_params->pipeline_layout_count));
-  iree_hal_wgsl_ExecutableDef_table_t executable_def =
-      iree_hal_wgsl_ExecutableDef_as_root(
+  iree_hal_webgpu_ExecutableDef_table_t executable_def =
+      iree_hal_webgpu_ExecutableDef_as_root(
           executable_params->executable_data.data);
 
   // Create shader modules. This will be cheap on some implementations like
   // Metal that need pipeline information in order to be JIT'ed from WGSL while
   // on others it can be more expensive.
-  iree_hal_wgsl_ShaderModuleDef_vec_t shader_modules_vec =
-      iree_hal_wgsl_ExecutableDef_shader_modules_get(executable_def);
+  iree_hal_webgpu_ShaderModuleDef_vec_t shader_modules_vec =
+      iree_hal_webgpu_ExecutableDef_shader_modules_get(executable_def);
   size_t shader_module_count =
-      iree_hal_wgsl_ShaderModuleDef_vec_len(shader_modules_vec);
+      iree_hal_webgpu_ShaderModuleDef_vec_len(shader_modules_vec);
   iree_inline_array(WGPUShaderModule, shader_modules, shader_module_count,
                     host_allocator);
   memset(iree_inline_array_data(shader_modules), 0,
@@ -247,7 +252,7 @@
   iree_status_t status = iree_ok_status();
   for (size_t i = 0; i < shader_module_count; ++i) {
     status = iree_hal_webgpu_create_wgsl_shader_module(
-        device, iree_hal_wgsl_ShaderModuleDef_vec_at(shader_modules_vec, i),
+        device, iree_hal_webgpu_ShaderModuleDef_vec_at(shader_modules_vec, i),
         iree_inline_array_at(shader_modules, i));
     if (!iree_status_is_ok(status)) break;
   }
@@ -268,9 +273,13 @@
     executable->host_allocator = host_allocator;
     executable->entry_point_count = executable_params->pipeline_layout_count;
 
+    // Publish any embedded source files to the tracing infrastructure.
+    iree_hal_debug_publish_source_files(
+        iree_hal_hip_ExecutableDef_source_files_get(executable_def));
+
     // Create one pipeline per entry point.
     flatbuffers_uint32_vec_t entry_points_vec =
-        iree_hal_wgsl_ExecutableDef_entry_points_get(executable_def);
+        iree_hal_webgpu_ExecutableDef_entry_points_get(executable_def);
     for (iree_host_size_t i = 0; i < executable->entry_point_count; i++) {
       uint32_t module_ordinal = flatbuffers_uint32_vec_at(entry_points_vec, i);
       status = iree_hal_webgpu_create_pipeline(
diff --git a/experimental/webgpu/pipeline_layout.c b/experimental/webgpu/pipeline_layout.c
index a5c940c..c55fcbe 100644
--- a/experimental/webgpu/pipeline_layout.c
+++ b/experimental/webgpu/pipeline_layout.c
@@ -159,7 +159,7 @@
   iree_hal_resource_t resource;
   iree_allocator_t host_allocator;
   WGPUPipelineLayout handle;
-  iree_host_size_t push_constant_count;
+  iree_host_size_t constant_count;
   iree_hal_webgpu_set_binding_info_t set_binding_info;
   iree_host_size_t set_layout_count;
   iree_hal_descriptor_set_layout_t* set_layouts[];
@@ -177,7 +177,7 @@
 iree_status_t iree_hal_webgpu_pipeline_layout_create(
     WGPUDevice device, iree_host_size_t set_layout_count,
     iree_hal_descriptor_set_layout_t* const* set_layouts,
-    iree_host_size_t push_constant_count,
+    iree_host_size_t constant_count,
     iree_hal_webgpu_staging_buffer_t* staging_buffer,
     iree_allocator_t host_allocator,
     iree_hal_pipeline_layout_t** out_pipeline_layout) {
@@ -198,8 +198,8 @@
 
   // Pad to IREE_HAL_WEBGPU_PARAMS_BIND_GROUP_INDEX for push constant emulation.
   iree_host_size_t bind_group_layouts_count =
-      push_constant_count > 0 ? IREE_HAL_WEBGPU_PARAMS_BIND_GROUP_INDEX + 1
-                              : set_layout_count;
+      constant_count > 0 ? IREE_HAL_WEBGPU_PARAMS_BIND_GROUP_INDEX + 1
+                         : set_layout_count;
 
   // Populate a WGPUBindGroupLayout array with the provided set layouts, then
   // set the staging buffer's bind group layout at the right index, padding
@@ -215,7 +215,7 @@
     *iree_inline_array_at(bind_group_layouts, i) =
         staging_buffer->empty_bind_group_layout;
   }
-  if (push_constant_count > 0) {
+  if (constant_count > 0) {
     *iree_inline_array_at(bind_group_layouts,
                           IREE_HAL_WEBGPU_PARAMS_BIND_GROUP_INDEX) =
         staging_buffer->bind_group_layout;
@@ -247,7 +247,7 @@
                                  &pipeline_layout->resource);
     pipeline_layout->host_allocator = host_allocator;
     pipeline_layout->handle = handle;
-    pipeline_layout->push_constant_count = push_constant_count;
+    pipeline_layout->constant_count = constant_count;
 
     pipeline_layout->set_layout_count = set_layout_count;
     pipeline_layout->set_binding_info.set_count = set_layout_count;
@@ -292,10 +292,10 @@
   return iree_hal_webgpu_pipeline_layout_cast(layout)->handle;
 }
 
-iree_host_size_t iree_hal_webgpu_pipeline_layout_push_constant_count(
+iree_host_size_t iree_hal_webgpu_pipeline_layout_constant_count(
     iree_hal_pipeline_layout_t* layout) {
   IREE_ASSERT_ARGUMENT(layout);
-  return iree_hal_webgpu_pipeline_layout_cast(layout)->push_constant_count;
+  return iree_hal_webgpu_pipeline_layout_cast(layout)->constant_count;
 }
 
 const iree_hal_webgpu_set_binding_info_t*
diff --git a/experimental/webgpu/pipeline_layout.h b/experimental/webgpu/pipeline_layout.h
index e15b3bf..c620f9c 100644
--- a/experimental/webgpu/pipeline_layout.h
+++ b/experimental/webgpu/pipeline_layout.h
@@ -61,7 +61,7 @@
 iree_status_t iree_hal_webgpu_pipeline_layout_create(
     WGPUDevice device, iree_host_size_t set_layout_count,
     iree_hal_descriptor_set_layout_t* const* set_layouts,
-    iree_host_size_t push_constant_count,
+    iree_host_size_t constant_count,
     iree_hal_webgpu_staging_buffer_t* staging_buffer,
     iree_allocator_t host_allocator,
     iree_hal_pipeline_layout_t** out_pipeline_layout);
@@ -69,7 +69,7 @@
 WGPUPipelineLayout iree_hal_webgpu_pipeline_layout_handle(
     iree_hal_pipeline_layout_t* layout);
 
-iree_host_size_t iree_hal_webgpu_pipeline_layout_push_constant_count(
+iree_host_size_t iree_hal_webgpu_pipeline_layout_constant_count(
     iree_hal_pipeline_layout_t* layout);
 
 const iree_hal_webgpu_set_binding_info_t*
diff --git a/experimental/webgpu/shaders/BUILD.bazel b/experimental/webgpu/shaders/BUILD.bazel
index bc6077e..f420361 100644
--- a/experimental/webgpu/shaders/BUILD.bazel
+++ b/experimental/webgpu/shaders/BUILD.bazel
@@ -20,5 +20,5 @@
     c_file_output = "builtin_shaders.c",
     flatten = True,
     h_file_output = "builtin_shaders.h",
-    identifier = "iree_hal_wgsl_builtin_shaders",
+    identifier = "iree_hal_webgpu_builtin_shaders",
 )
diff --git a/experimental/webgpu/shaders/CMakeLists.txt b/experimental/webgpu/shaders/CMakeLists.txt
index 78cbc8c..04cc457 100644
--- a/experimental/webgpu/shaders/CMakeLists.txt
+++ b/experimental/webgpu/shaders/CMakeLists.txt
@@ -20,7 +20,7 @@
   H_FILE_OUTPUT
     "builtin_shaders.h"
   IDENTIFIER
-    "iree_hal_wgsl_builtin_shaders"
+    "iree_hal_webgpu_builtin_shaders"
   FLATTEN
   PUBLIC
 )
diff --git a/experimental/webgpu/webgpu_device.c b/experimental/webgpu/webgpu_device.c
index 165dee3..5498caf 100644
--- a/experimental/webgpu/webgpu_device.c
+++ b/experimental/webgpu/webgpu_device.c
@@ -295,13 +295,13 @@
 }
 
 static iree_status_t iree_hal_webgpu_device_create_pipeline_layout(
-    iree_hal_device_t* base_device, iree_host_size_t push_constants,
+    iree_hal_device_t* base_device, iree_host_size_t constants,
     iree_host_size_t set_layout_count,
     iree_hal_descriptor_set_layout_t* const* set_layouts,
     iree_hal_pipeline_layout_t** out_pipeline_layout) {
   iree_hal_webgpu_device_t* device = iree_hal_webgpu_device_cast(base_device);
   return iree_hal_webgpu_pipeline_layout_create(
-      device->handle, set_layout_count, set_layouts, push_constants,
+      device->handle, set_layout_count, set_layouts, constants,
       &device->staging_buffer, device->host_allocator, out_pipeline_layout);
 }
 
diff --git a/llvm-external-projects/iree-dialects/include/iree-dialects/Dialect/Input/InputBase.td b/llvm-external-projects/iree-dialects/include/iree-dialects/Dialect/Input/InputBase.td
index 526cdd6..223dcdc 100644
--- a/llvm-external-projects/iree-dialects/include/iree-dialects/Dialect/Input/InputBase.td
+++ b/llvm-external-projects/iree-dialects/include/iree-dialects/Dialect/Input/InputBase.td
@@ -265,68 +265,49 @@
   let cppNamespace = "::mlir::iree_compiler::IREE::Input";
 }
 
-def IREEInput_DescriptorSetBindingAttr :
-    AttrDef<IREEInput_Dialect, "DescriptorSetBinding", []> {
-  let mnemonic = "descriptor_set.binding";
+def IREEInput_PipelineBindingAttr :
+    AttrDef<IREEInput_Dialect, "PipelineBinding", []> {
+  let mnemonic = "pipeline.binding";
   let summary = [{descriptor set binding specification}];
 
   let parameters = (ins
-    AttrParameter<"int64_t", "">:$ordinal,
     AttrParameter<"DescriptorType", "">:$type,
     OptionalParameter<"std::optional<DescriptorFlags>">:$flags
   );
 
   let assemblyFormat = [{
-    `<` $ordinal `,` $type (`,` $flags^)? `>`
+    `<` $type (`,` $flags^)? `>`
   }];
 }
 
-def IREEInput_DescriptorSetLayoutFlags_None :
+def IREEInput_PipelineLayoutFlags_None :
     I32BitEnumAttrCase<"None", 0x0000>;
-def IREEInput_DescriptorSetLayoutFlags_Indirect :
+def IREEInput_PipelineLayoutFlags_Indirect :
     I32BitEnumAttrCase<"Indirect", 0x0001>;
-def IREEInput_DescriptorSetLayoutFlagsAttr :
-    I32BitEnumAttr<"DescriptorSetLayoutFlags", "valid DescriptorSetLayout flags", [
-      IREEInput_DescriptorSetLayoutFlags_None,
-      IREEInput_DescriptorSetLayoutFlags_Indirect,
+def IREEInput_PipelineLayoutFlagsAttr :
+    I32BitEnumAttr<"PipelineLayoutFlags", "valid PipelineLayout flags", [
+      IREEInput_PipelineLayoutFlags_None,
+      IREEInput_PipelineLayoutFlags_Indirect,
     ]> {
   let cppNamespace = "::mlir::iree_compiler::IREE::Input";
 }
 
-def IREEInput_DescriptorSetLayoutAttr :
-    AttrDef<IREEInput_Dialect, "DescriptorSetLayout", []> {
-  let mnemonic = "descriptor_set.layout";
-  let summary = [{descriptor set layout specification}];
-
-  let parameters = (ins
-    AttrParameter<"int64_t", "">:$ordinal,
-    ArrayRefParameter<"DescriptorSetBindingAttr", "">:$bindings,
-    OptionalParameter<"std::optional<DescriptorSetLayoutFlags>">:$flags
-  );
-
-  let assemblyFormat = [{
-    `<`
-    $ordinal `,`
-    `bindings` `=` `[` $bindings `]`
-    (`,` `flags` `=` $flags^)?
-    `>`
-  }];
-}
-
 def IREEInput_PipelineLayoutAttr :
     AttrDef<IREEInput_Dialect, "PipelineLayout", []> {
   let mnemonic = "pipeline.layout";
   let summary = [{executable entry point layout specification}];
 
   let parameters = (ins
-    AttrParameter<"int64_t", "">:$pushConstants,
-    ArrayRefParameter<"DescriptorSetLayoutAttr", "">:$setLayouts
+    ArrayRefParameter<"PipelineBindingAttr", "">:$bindings,
+    OptionalParameter<"int64_t", "0">:$constants,
+    OptionalParameter<"std::optional<PipelineLayoutFlags>">:$flags
   );
 
   let assemblyFormat = [{
     `<`
-    `push_constants` `=` $pushConstants `,`
-    `sets` `=` `[` $setLayouts `]`
+    (`constants` `=` $constants^ `,`)?
+    `bindings` `=` `[` qualified($bindings) `]`
+    (`,` `flags` `=` $flags^)?
     `>`
   }];
 }
diff --git a/llvm-external-projects/iree-dialects/include/iree-dialects/Dialect/Input/InputDialect.h b/llvm-external-projects/iree-dialects/include/iree-dialects/Dialect/Input/InputDialect.h
index 41ecf9f..0ae1d30 100644
--- a/llvm-external-projects/iree-dialects/include/iree-dialects/Dialect/Input/InputDialect.h
+++ b/llvm-external-projects/iree-dialects/include/iree-dialects/Dialect/Input/InputDialect.h
@@ -48,24 +48,24 @@
 
 template <>
 struct FieldParser<
-    std::optional<mlir::iree_compiler::IREE::Input::DescriptorSetLayoutFlags>> {
-  static FailureOr<mlir::iree_compiler::IREE::Input::DescriptorSetLayoutFlags>
+    std::optional<mlir::iree_compiler::IREE::Input::PipelineLayoutFlags>> {
+  static FailureOr<mlir::iree_compiler::IREE::Input::PipelineLayoutFlags>
   parse(AsmParser &parser) {
     std::string value;
     if (parser.parseKeywordOrString(&value))
       return failure();
     auto result = mlir::iree_compiler::IREE::Input::symbolizeEnum<
-        mlir::iree_compiler::IREE::Input::DescriptorSetLayoutFlags>(value);
+        mlir::iree_compiler::IREE::Input::PipelineLayoutFlags>(value);
     if (!result.has_value())
       return failure();
     return result.value();
   }
 };
 
-static inline AsmPrinter &operator<<(
-    AsmPrinter &printer,
-    std::optional<mlir::iree_compiler::IREE::Input::DescriptorSetLayoutFlags>
-        param) {
+static inline AsmPrinter &
+operator<<(AsmPrinter &printer,
+           std::optional<mlir::iree_compiler::IREE::Input::PipelineLayoutFlags>
+               param) {
   printer << (param.has_value()
                   ? mlir::iree_compiler::IREE::Input::stringifyEnum(
                         param.value())
diff --git a/runtime/iree.natvis b/runtime/iree.natvis
index bde544c..f792f73 100644
--- a/runtime/iree.natvis
+++ b/runtime/iree.natvis
@@ -590,13 +590,11 @@
   <DisplayString Condition="ptr!=0 &amp;&amp; strcmp(iree_vm_ref_type_descriptors[type]->type_name.data, &quot;hal.buffer_view&quot;)==0">{(iree_hal_buffer_view_t*)ptr}</DisplayString>
   <DisplayString Condition="ptr!=0 &amp;&amp; strcmp(iree_vm_ref_type_descriptors[type]->type_name.data, &quot;hal.channel&quot;)==0">{(iree_hal_channel_t*)ptr}</DisplayString>
   <DisplayString Condition="ptr!=0 &amp;&amp; strcmp(iree_vm_ref_type_descriptors[type]->type_name.data, &quot;hal.command_buffer&quot;)==0">{(iree_hal_command_buffer_t*)ptr}</DisplayString>
-  <DisplayString Condition="ptr!=0 &amp;&amp; strcmp(iree_vm_ref_type_descriptors[type]->type_name.data, &quot;hal.descriptor_set_layout&quot;)==0">{(iree_hal_descriptor_set_layout_t*)ptr}</DisplayString>
   <DisplayString Condition="ptr!=0 &amp;&amp; strcmp(iree_vm_ref_type_descriptors[type]->type_name.data, &quot;hal.device&quot;)==0">{(iree_hal_device_t*)ptr}</DisplayString>
   <DisplayString Condition="ptr!=0 &amp;&amp; strcmp(iree_vm_ref_type_descriptors[type]->type_name.data, &quot;hal.driver&quot;)==0">{(iree_hal_driver_t*)ptr}</DisplayString>
   <DisplayString Condition="ptr!=0 &amp;&amp; strcmp(iree_vm_ref_type_descriptors[type]->type_name.data, &quot;hal.event&quot;)==0">{(iree_hal_event_t*)ptr}</DisplayString>
   <DisplayString Condition="ptr!=0 &amp;&amp; strcmp(iree_vm_ref_type_descriptors[type]->type_name.data, &quot;hal.executable&quot;)==0">{(iree_hal_executable_t*)ptr}</DisplayString>
   <DisplayString Condition="ptr!=0 &amp;&amp; strcmp(iree_vm_ref_type_descriptors[type]->type_name.data, &quot;hal.executable_cache&quot;)==0">{(iree_hal_executable_cache_t*)ptr}</DisplayString>
-  <DisplayString Condition="ptr!=0 &amp;&amp; strcmp(iree_vm_ref_type_descriptors[type]->type_name.data, &quot;hal.pipeline_layout&quot;)==0">{(iree_hal_pipeline_layout_t*)ptr}</DisplayString>
   <DisplayString Condition="ptr!=0 &amp;&amp; strcmp(iree_vm_ref_type_descriptors[type]->type_name.data, &quot;hal.semaphore&quot;)==0">{(iree_hal_semaphore_t*)ptr}</DisplayString>
   <!-- vm -->
   <DisplayString Condition="ptr!=0 &amp;&amp; strcmp(iree_vm_ref_type_descriptors[type]->type_name.data, &quot;vm.buffer&quot;)==0">{(iree_vm_buffer_t*)ptr}</DisplayString>
@@ -612,13 +610,11 @@
     <ExpandedItem Condition="ptr!=0 &amp;&amp; strcmp(iree_vm_ref_type_descriptors[type]->type_name.data, &quot;hal.buffer_view&quot;)==0">(iree_hal_buffer_view_t*)ptr</ExpandedItem>
     <ExpandedItem Condition="ptr!=0 &amp;&amp; strcmp(iree_vm_ref_type_descriptors[type]->type_name.data, &quot;hal.channel&quot;)==0">(iree_hal_channel_t*)ptr</ExpandedItem>
     <ExpandedItem Condition="ptr!=0 &amp;&amp; strcmp(iree_vm_ref_type_descriptors[type]->type_name.data, &quot;hal.command_buffer&quot;)==0">(iree_hal_command_buffer_t*)ptr</ExpandedItem>
-    <ExpandedItem Condition="ptr!=0 &amp;&amp; strcmp(iree_vm_ref_type_descriptors[type]->type_name.data, &quot;hal.descriptor_set_layout&quot;)==0">(iree_hal_descriptor_set_layout_t*)ptr</ExpandedItem>
     <ExpandedItem Condition="ptr!=0 &amp;&amp; strcmp(iree_vm_ref_type_descriptors[type]->type_name.data, &quot;hal.device&quot;)==0">(iree_hal_device_t*)ptr</ExpandedItem>
     <ExpandedItem Condition="ptr!=0 &amp;&amp; strcmp(iree_vm_ref_type_descriptors[type]->type_name.data, &quot;hal.driver&quot;)==0">(iree_hal_driver_t*)ptr</ExpandedItem>
     <ExpandedItem Condition="ptr!=0 &amp;&amp; strcmp(iree_vm_ref_type_descriptors[type]->type_name.data, &quot;hal.event&quot;)==0">(iree_hal_event_t*)ptr</ExpandedItem>
     <ExpandedItem Condition="ptr!=0 &amp;&amp; strcmp(iree_vm_ref_type_descriptors[type]->type_name.data, &quot;hal.executable&quot;)==0">(iree_hal_executable_t*)ptr</ExpandedItem>
     <ExpandedItem Condition="ptr!=0 &amp;&amp; strcmp(iree_vm_ref_type_descriptors[type]->type_name.data, &quot;hal.executable_cache&quot;)==0">(iree_hal_executable_cache_t*)ptr</ExpandedItem>
-    <ExpandedItem Condition="ptr!=0 &amp;&amp; strcmp(iree_vm_ref_type_descriptors[type]->type_name.data, &quot;hal.pipeline_layout&quot;)==0">(iree_hal_pipeline_layout_t*)ptr</ExpandedItem>
     <ExpandedItem Condition="ptr!=0 &amp;&amp; strcmp(iree_vm_ref_type_descriptors[type]->type_name.data, &quot;hal.semaphore&quot;)==0">(iree_hal_semaphore_t*)ptr</ExpandedItem>
     <!-- vm -->
     <ExpandedItem Condition="ptr!=0 &amp;&amp; strcmp(iree_vm_ref_type_descriptors[type]->type_name.data, &quot;vm.buffer&quot;)==0">(iree_vm_buffer_t*)ptr</ExpandedItem>
diff --git a/runtime/src/iree/base/internal/threading_win32.c b/runtime/src/iree/base/internal/threading_win32.c
index 0091af1..66e6a07 100644
--- a/runtime/src/iree/base/internal/threading_win32.c
+++ b/runtime/src/iree/base/internal/threading_win32.c
@@ -278,8 +278,7 @@
   int affinity_desc_length = snprintf(
       affinity_desc, IREE_ARRAYSIZE(affinity_desc), "group=%d, id=%d, smt=%d",
       affinity.group, affinity.id, affinity.smt);
-  IREE_TRACE_ZONE_APPEND_TEXT_STRING_VIEW(z0, affinity_desc,
-                                          affinity_desc_length);
+  IREE_TRACE_ZONE_APPEND_TEXT(z0, affinity_desc, affinity_desc_length);
 #endif  // IREE_TRACING_FEATURES & IREE_TRACING_FEATURE_INSTRUMENTATION
 
   GROUP_AFFINITY group_affinity;
diff --git a/runtime/src/iree/hal/BUILD.bazel b/runtime/src/iree/hal/BUILD.bazel
index e08b751..2ca77dd 100644
--- a/runtime/src/iree/hal/BUILD.bazel
+++ b/runtime/src/iree/hal/BUILD.bazel
@@ -61,8 +61,6 @@
         "fence.h",
         "file.c",
         "file.h",
-        "pipeline_layout.c",
-        "pipeline_layout.h",
         "queue.h",
         "resource.h",
         "semaphore.c",
diff --git a/runtime/src/iree/hal/CMakeLists.txt b/runtime/src/iree/hal/CMakeLists.txt
index 359cf03..4aa7781 100644
--- a/runtime/src/iree/hal/CMakeLists.txt
+++ b/runtime/src/iree/hal/CMakeLists.txt
@@ -54,8 +54,6 @@
     "fence.h"
     "file.c"
     "file.h"
-    "pipeline_layout.c"
-    "pipeline_layout.h"
     "queue.h"
     "resource.h"
     "semaphore.c"
diff --git a/runtime/src/iree/hal/api.h b/runtime/src/iree/hal/api.h
index 3e58737..cac86a4 100644
--- a/runtime/src/iree/hal/api.h
+++ b/runtime/src/iree/hal/api.h
@@ -25,7 +25,6 @@
 #include "iree/hal/executable_cache.h"  // IWYU pragma: export
 #include "iree/hal/fence.h"             // IWYU pragma: export
 #include "iree/hal/file.h"              // IWYU pragma: export
-#include "iree/hal/pipeline_layout.h"   // IWYU pragma: export
 #include "iree/hal/queue.h"             // IWYU pragma: export
 #include "iree/hal/resource.h"          // IWYU pragma: export
 #include "iree/hal/semaphore.h"         // IWYU pragma: export
diff --git a/runtime/src/iree/hal/command_buffer.c b/runtime/src/iree/hal/command_buffer.c
index 802330f..cf77e4a 100644
--- a/runtime/src/iree/hal/command_buffer.c
+++ b/runtime/src/iree/hal/command_buffer.c
@@ -512,116 +512,9 @@
   return status;
 }
 
-IREE_API_EXPORT iree_status_t iree_hal_command_buffer_push_constants(
-    iree_hal_command_buffer_t* command_buffer,
-    iree_hal_pipeline_layout_t* pipeline_layout, iree_host_size_t offset,
-    const void* values, iree_host_size_t values_length) {
-  IREE_ASSERT_ARGUMENT(command_buffer);
-  IREE_ASSERT_ARGUMENT(pipeline_layout);
-  IREE_ASSERT_ARGUMENT(values);
-  if (IREE_UNLIKELY(values_length == 0)) {
-    return iree_ok_status();
-  }
-  IREE_TRACE_ZONE_BEGIN(z0);
-  IF_VALIDATING(command_buffer, {
-    IREE_RETURN_AND_END_ZONE_IF_ERROR(
-        z0, iree_hal_command_buffer_push_constants_validation(
-                command_buffer, VALIDATION_STATE(command_buffer),
-                pipeline_layout, offset, values, values_length));
-  });
-  iree_status_t status = _VTABLE_DISPATCH(command_buffer, push_constants)(
-      command_buffer, pipeline_layout, offset, values, values_length);
-  IREE_TRACE_ZONE_END(z0);
-  return status;
-}
-
-IREE_API_EXPORT iree_status_t iree_hal_command_buffer_push_descriptor_set(
-    iree_hal_command_buffer_t* command_buffer,
-    iree_hal_pipeline_layout_t* pipeline_layout, uint32_t set,
-    iree_host_size_t binding_count, const iree_hal_buffer_ref_t* bindings) {
-  IREE_ASSERT_ARGUMENT(command_buffer);
-  IREE_ASSERT_ARGUMENT(pipeline_layout);
-  IREE_ASSERT_ARGUMENT(!binding_count || bindings);
-  IREE_TRACE_ZONE_BEGIN(z0);
-  IF_VALIDATING(command_buffer, {
-    IREE_RETURN_AND_END_ZONE_IF_ERROR(
-        z0, iree_hal_command_buffer_push_descriptor_set_validation(
-                command_buffer, VALIDATION_STATE(command_buffer),
-                pipeline_layout, set, binding_count, bindings));
-  });
-  iree_status_t status = _VTABLE_DISPATCH(command_buffer, push_descriptor_set)(
-      command_buffer, pipeline_layout, set, binding_count, bindings);
-  IREE_TRACE_ZONE_END(z0);
-  return status;
-}
-
 IREE_API_EXPORT iree_status_t iree_hal_command_buffer_dispatch(
     iree_hal_command_buffer_t* command_buffer,
     iree_hal_executable_t* executable, int32_t entry_point,
-    uint32_t workgroup_x, uint32_t workgroup_y, uint32_t workgroup_z,
-    iree_hal_dispatch_flags_t flags) {
-  IREE_ASSERT_ARGUMENT(command_buffer);
-  IREE_ASSERT_ARGUMENT(executable);
-  if ((workgroup_x | workgroup_y | workgroup_z) == 0) {
-    // No-op dispatch. All implementations are expected to do this but we ensure
-    // it happens here to avoid the overhead of going all the way down into the
-    // device layer for something we know should have no (intentional)
-    // side-effects. Note that this does mean that validation is skipped and
-    // the executable/etc could be bogus but that's fine.
-    return iree_ok_status();
-  }
-  IREE_TRACE_ZONE_BEGIN(z0);
-  IF_VALIDATING(command_buffer, {
-    IREE_RETURN_AND_END_ZONE_IF_ERROR(
-        z0, iree_hal_command_buffer_dispatch_validation(
-                command_buffer, VALIDATION_STATE(command_buffer), executable,
-                entry_point, workgroup_x, workgroup_y, workgroup_z, flags));
-  });
-#if IREE_HAL_VERBOSE_TRACING_ENABLE
-  // TODO(benvanik): add a tracing.h helper that does the snprintf directly
-  // into a tracy_malloc buffer so that we can avoid the memcpy. Today this can
-  // take 4-5us which adds too much overhead when trying to get accurate timings
-  // with tracing enabled. Because benchmarks shouldn't be run with asserts
-  // enabled we only enable these when assertions are enabled. Ideally we'd
-  // slice off a much larger allocation and then suballocate from that ourselves
-  // so that we could avoid the tracy_malloc overheads per-dispatch.
-  IREE_TRACE({
-    char xyz_string[32];
-    int xyz_string_length =
-        snprintf(xyz_string, IREE_ARRAYSIZE(xyz_string), "%ux%ux%u",
-                 workgroup_x, workgroup_y, workgroup_z);
-    IREE_TRACE_ZONE_APPEND_TEXT_STRING_VIEW(z0, xyz_string, xyz_string_length);
-  });
-#endif  // IREE_HAL_VERBOSE_TRACING_ENABLE
-  iree_status_t status = _VTABLE_DISPATCH(command_buffer, dispatch)(
-      command_buffer, executable, entry_point, workgroup_x, workgroup_y,
-      workgroup_z, flags);
-  IREE_TRACE_ZONE_END(z0);
-  return status;
-}
-
-IREE_API_EXPORT iree_status_t iree_hal_command_buffer_dispatch_indirect(
-    iree_hal_command_buffer_t* command_buffer,
-    iree_hal_executable_t* executable, int32_t entry_point,
-    iree_hal_buffer_ref_t workgroups_ref, iree_hal_dispatch_flags_t flags) {
-  IREE_ASSERT_ARGUMENT(command_buffer);
-  IREE_ASSERT_ARGUMENT(executable);
-  IREE_TRACE_ZONE_BEGIN(z0);
-  IF_VALIDATING(command_buffer, {
-    IREE_RETURN_AND_END_ZONE_IF_ERROR(
-        z0, iree_hal_command_buffer_dispatch_indirect_validation(
-                command_buffer, VALIDATION_STATE(command_buffer), executable,
-                entry_point, workgroups_ref, flags));
-  });
-  iree_status_t status = _VTABLE_DISPATCH(command_buffer, dispatch_indirect)(
-      command_buffer, executable, entry_point, workgroups_ref, flags);
-  IREE_TRACE_ZONE_END(z0);
-  return status;
-}
-
-IREE_API_EXPORT iree_status_t iree_hal_command_buffer_dispatch2(
-    iree_hal_command_buffer_t* command_buffer,
-    iree_hal_executable_t* executable, int32_t entry_point,
     const uint32_t workgroup_count[3], iree_const_byte_span_t constants,
     iree_hal_buffer_ref_list_t bindings, iree_hal_dispatch_flags_t flags) {
   IREE_ASSERT_ARGUMENT(command_buffer);
@@ -650,18 +543,18 @@
     int xyz_string_length =
         snprintf(xyz_string, IREE_ARRAYSIZE(xyz_string), "%ux%ux%u",
                  workgroup_count[0], workgroup_count[1], workgroup_count[2]);
-    IREE_TRACE_ZONE_APPEND_TEXT_STRING_VIEW(z0, xyz_string, xyz_string_length);
+    IREE_TRACE_ZONE_APPEND_TEXT(z0, xyz_string, xyz_string_length);
   });
 #endif  // IREE_HAL_VERBOSE_TRACING_ENABLE
 
   IF_VALIDATING(command_buffer, {
     IREE_RETURN_AND_END_ZONE_IF_ERROR(
-        z0, iree_hal_command_buffer_dispatch2_validation(
+        z0, iree_hal_command_buffer_dispatch_validation(
                 command_buffer, VALIDATION_STATE(command_buffer), executable,
                 entry_point, workgroup_count, constants, bindings, flags));
   });
 
-  iree_status_t status = _VTABLE_DISPATCH(command_buffer, dispatch2)(
+  iree_status_t status = _VTABLE_DISPATCH(command_buffer, dispatch)(
       command_buffer, executable, entry_point, workgroup_count, constants,
       bindings, flags);
 
@@ -669,7 +562,7 @@
   return status;
 }
 
-IREE_API_EXPORT iree_status_t iree_hal_command_buffer_dispatch2_indirect(
+IREE_API_EXPORT iree_status_t iree_hal_command_buffer_dispatch_indirect(
     iree_hal_command_buffer_t* command_buffer,
     iree_hal_executable_t* executable, int32_t entry_point,
     iree_hal_buffer_ref_t workgroups_ref, iree_const_byte_span_t constants,
@@ -679,11 +572,11 @@
   IREE_TRACE_ZONE_BEGIN(z0);
   IF_VALIDATING(command_buffer, {
     IREE_RETURN_AND_END_ZONE_IF_ERROR(
-        z0, iree_hal_command_buffer_dispatch2_indirect_validation(
+        z0, iree_hal_command_buffer_dispatch_indirect_validation(
                 command_buffer, VALIDATION_STATE(command_buffer), executable,
                 entry_point, workgroups_ref, constants, bindings, flags));
   });
-  iree_status_t status = _VTABLE_DISPATCH(command_buffer, dispatch2_indirect)(
+  iree_status_t status = _VTABLE_DISPATCH(command_buffer, dispatch_indirect)(
       command_buffer, executable, entry_point, workgroups_ref, constants,
       bindings, flags);
   IREE_TRACE_ZONE_END(z0);
diff --git a/runtime/src/iree/hal/command_buffer.h b/runtime/src/iree/hal/command_buffer.h
index 43a876f..fe9b3e8 100644
--- a/runtime/src/iree/hal/command_buffer.h
+++ b/runtime/src/iree/hal/command_buffer.h
@@ -16,7 +16,6 @@
 #include "iree/hal/channel.h"
 #include "iree/hal/event.h"
 #include "iree/hal/executable.h"
-#include "iree/hal/pipeline_layout.h"
 #include "iree/hal/queue.h"
 #include "iree/hal/resource.h"
 
@@ -32,6 +31,8 @@
 
 // A bitfield specifying the mode of operation for a command buffer.
 enum iree_hal_command_buffer_mode_bits_t {
+  IREE_HAL_COMMAND_BUFFER_MODE_DEFAULT = 0u,
+
   // Command buffer will be submitted once and never used again.
   // This may enable in-place patching of command buffers that reduce overhead
   // when it's known that command buffers will not be reused.
@@ -91,11 +92,8 @@
 //
 // Roughly maps to VkDescriptorSetBinding.
 typedef struct iree_hal_buffer_ref_t {
-  // TODO(#18154): change ordinal to `reserved` after binding simplification.
-  // The binding number of this entry and corresponds to a resource of the
-  // same binding number in the executable interface. Only used by certain
-  // calls.
-  uint32_t ordinal : 8;
+  // Currently unused and should be 0.
+  uint32_t reserved : 8;
   // Binding table slot the buffer will be sourced from if buffer is NULL.
   // Only valid on command buffers that support indirect execution.
   uint32_t buffer_slot : 24;
@@ -499,7 +497,7 @@
     // the binding table range and the range of the reference.
     const iree_hal_buffer_binding_t* binding =
         &binding_table.bindings[buffer_ref.buffer_slot];
-    out_resolved_ref->ordinal = buffer_ref.ordinal;
+    out_resolved_ref->reserved = buffer_ref.reserved;
     out_resolved_ref->buffer_slot = 0;
     out_resolved_ref->buffer = binding->buffer;
     return iree_hal_buffer_calculate_range(
@@ -721,75 +719,6 @@
     iree_hal_collective_op_t op, uint32_t param, iree_hal_buffer_ref_t send_ref,
     iree_hal_buffer_ref_t recv_ref, iree_device_size_t element_count);
 
-// TODO(#18154): deprecated and will be replaced with simplified bindings.
-//
-// Pushes an inline set of constants that can be accessed by subsequent
-// dispatches using a compatible pipeline layout.
-//
-// Push constants are treated as opaque bytes, meaning that they may be
-// bit-casted floats, bit-packed booleans, etc. |offset| and |values_length| are
-// in bytes.
-IREE_API_EXPORT iree_status_t iree_hal_command_buffer_push_constants(
-    iree_hal_command_buffer_t* command_buffer,
-    iree_hal_pipeline_layout_t* pipeline_layout, iree_host_size_t offset,
-    const void* values, iree_host_size_t values_length);
-
-// TODO(#18154): deprecated and will be replaced with simplified bindings.
-//
-// Pushes descriptor set bindings and associates them with |set|.
-// This uses an internal ringbuffer inside of the command buffer to avoid the
-// need for creating and binding descriptor sets and managing their lifetime.
-//
-// The |bindings| will remain bound and valid on the command buffer during
-// recording. Each binding must have its ordinal specified indicating which
-// descriptor set slots are being assigned.
-//
-// Provided bindings may have a buffer directly referenced that will be recorded
-// into the command buffer and kept live for the lifetime of the command buffer.
-// Alternatively bindings can reference slots in the binding table the capacity
-// of which was specified upon command buffer creation. Such indirect bindings
-// have their buffers specified upon submission and the buffers in the provided
-// binding table are kept live only until the submission referencing them
-// completes.
-IREE_API_EXPORT iree_status_t iree_hal_command_buffer_push_descriptor_set(
-    iree_hal_command_buffer_t* command_buffer,
-    iree_hal_pipeline_layout_t* pipeline_layout, uint32_t set,
-    iree_host_size_t binding_count, const iree_hal_buffer_ref_t* bindings);
-
-// TODO(#18154): deprecated and will be replaced with simplified bindings.
-//
-// Dispatches an execution request.
-// The request may execute overlapped with any other transfer operation or
-// dispatch made within the same barrier-defined sequence.
-//
-// The executable specified must be registered for use with the device driver
-// owning this queue. It must not be unregistered until all requests that use
-// it have completed.
-//
-// Fails if the queue does not support dispatch operations (as indicated by
-// can_dispatch).
-IREE_API_EXPORT iree_status_t iree_hal_command_buffer_dispatch(
-    iree_hal_command_buffer_t* command_buffer,
-    iree_hal_executable_t* executable, int32_t entry_point,
-    uint32_t workgroup_x, uint32_t workgroup_y, uint32_t workgroup_z,
-    iree_hal_dispatch_flags_t flags);
-
-// TODO(#18154): deprecated and will be replaced with simplified bindings.
-//
-// Dispatches an execution request with deferred workgroup counts.
-// This is the same as iree_hal_command_buffer_dispatch but the workgroup counts
-// are read from the given |workgroups_buffer| at offset |workgroups_offset| as
-// 3 uint32_t XYZ values before performing the dispatch. This allows prior
-// dispatches within the command sequence to populate the workgroup counts.
-//
-// The buffer must have been allocated with
-// IREE_HAL_BUFFER_USAGE_DISPATCH_INDIRECT_PARAMS and be of
-// IREE_HAL_MEMORY_TYPE_DEVICE_VISIBLE.
-IREE_API_EXPORT iree_status_t iree_hal_command_buffer_dispatch_indirect(
-    iree_hal_command_buffer_t* command_buffer,
-    iree_hal_executable_t* executable, int32_t entry_point,
-    iree_hal_buffer_ref_t workgroups_ref, iree_hal_dispatch_flags_t flags);
-
 // Dispatches an execution request.
 // The request may execute overlapped with any other transfer operation or
 // dispatch made within the same barrier-defined sequence. The executable
@@ -799,9 +728,9 @@
 // The provided constant data and binding list will be recorded into the command
 // buffer and need not remain live beyond the call.
 //
-// Fails if the queue does not support dispatch operations (as indicated by
-// can_dispatch).
-IREE_API_EXPORT iree_status_t iree_hal_command_buffer_dispatch2(
+// Fails if the queue does not support dispatch operations or
+// IREE_HAL_COMMAND_CATEGORY_DISPATCH was not set.
+IREE_API_EXPORT iree_status_t iree_hal_command_buffer_dispatch(
     iree_hal_command_buffer_t* command_buffer,
     iree_hal_executable_t* executable, int32_t entry_point,
     const uint32_t workgroup_count[3], iree_const_byte_span_t constants,
@@ -818,7 +747,7 @@
 // The buffer must have been allocated with
 // IREE_HAL_BUFFER_USAGE_DISPATCH_INDIRECT_PARAMS and be of
 // IREE_HAL_MEMORY_TYPE_DEVICE_VISIBLE.
-IREE_API_EXPORT iree_status_t iree_hal_command_buffer_dispatch2_indirect(
+IREE_API_EXPORT iree_status_t iree_hal_command_buffer_dispatch_indirect(
     iree_hal_command_buffer_t* command_buffer,
     iree_hal_executable_t* executable, int32_t entry_point,
     iree_hal_buffer_ref_t workgroups_ref, iree_const_byte_span_t constants,
@@ -966,34 +895,13 @@
       iree_hal_buffer_ref_t send_ref, iree_hal_buffer_ref_t recv_ref,
       iree_device_size_t element_count);
 
-  iree_status_t(IREE_API_PTR* push_constants)(
-      iree_hal_command_buffer_t* command_buffer,
-      iree_hal_pipeline_layout_t* pipeline_layout, iree_host_size_t offset,
-      const void* values, iree_host_size_t values_length);
-
-  iree_status_t(IREE_API_PTR* push_descriptor_set)(
-      iree_hal_command_buffer_t* command_buffer,
-      iree_hal_pipeline_layout_t* pipeline_layout, uint32_t set,
-      iree_host_size_t binding_count, const iree_hal_buffer_ref_t* bindings);
-
   iree_status_t(IREE_API_PTR* dispatch)(
       iree_hal_command_buffer_t* command_buffer,
       iree_hal_executable_t* executable, int32_t entry_point,
-      uint32_t workgroup_x, uint32_t workgroup_y, uint32_t workgroup_z,
-      iree_hal_dispatch_flags_t flags);
-
-  iree_status_t(IREE_API_PTR* dispatch_indirect)(
-      iree_hal_command_buffer_t* command_buffer,
-      iree_hal_executable_t* executable, int32_t entry_point,
-      iree_hal_buffer_ref_t workgroups_ref, iree_hal_dispatch_flags_t flags);
-
-  iree_status_t(IREE_API_PTR* dispatch2)(
-      iree_hal_command_buffer_t* command_buffer,
-      iree_hal_executable_t* executable, int32_t entry_point,
       const uint32_t workgroup_count[3], iree_const_byte_span_t constants,
       iree_hal_buffer_ref_list_t bindings, iree_hal_dispatch_flags_t flags);
 
-  iree_status_t(IREE_API_PTR* dispatch2_indirect)(
+  iree_status_t(IREE_API_PTR* dispatch_indirect)(
       iree_hal_command_buffer_t* command_buffer,
       iree_hal_executable_t* executable, int32_t entry_point,
       iree_hal_buffer_ref_t workgroups_ref, iree_const_byte_span_t constants,
diff --git a/runtime/src/iree/hal/command_buffer_validation.c b/runtime/src/iree/hal/command_buffer_validation.c
index 4535348..28b9360 100644
--- a/runtime/src/iree/hal/command_buffer_validation.c
+++ b/runtime/src/iree/hal/command_buffer_validation.c
@@ -16,7 +16,6 @@
 #include "iree/hal/detail.h"
 #include "iree/hal/event.h"
 #include "iree/hal/executable.h"
-#include "iree/hal/pipeline_layout.h"
 #include "iree/hal/resource.h"
 
 // Returns success iff the queue supports the given command categories.
@@ -556,108 +555,7 @@
   return iree_ok_status();
 }
 
-iree_status_t iree_hal_command_buffer_push_constants_validation(
-    iree_hal_command_buffer_t* command_buffer,
-    iree_hal_command_buffer_validation_state_t* validation_state,
-    iree_hal_pipeline_layout_t* pipeline_layout, iree_host_size_t offset,
-    const void* values, iree_host_size_t values_length) {
-  IREE_RETURN_IF_ERROR(iree_hal_command_buffer_validate_categories(
-      command_buffer, validation_state, IREE_HAL_COMMAND_CATEGORY_DISPATCH));
-
-  if (IREE_UNLIKELY((values_length % 4) != 0)) {
-    return iree_make_status(
-        IREE_STATUS_INVALID_ARGUMENT,
-        "invalid alignment %" PRIhsz ", must be 4-byte aligned", values_length);
-  }
-
-  // TODO(benvanik): validate offset and value count with layout.
-
-  return iree_ok_status();
-}
-
-iree_status_t iree_hal_command_buffer_push_descriptor_set_validation(
-    iree_hal_command_buffer_t* command_buffer,
-    iree_hal_command_buffer_validation_state_t* validation_state,
-    iree_hal_pipeline_layout_t* pipeline_layout, uint32_t set,
-    iree_host_size_t binding_count, const iree_hal_buffer_ref_t* bindings) {
-  IREE_RETURN_IF_ERROR(iree_hal_command_buffer_validate_categories(
-      command_buffer, validation_state, IREE_HAL_COMMAND_CATEGORY_DISPATCH));
-
-  // TODO(benvanik): validate set index.
-
-  // TODO(benvanik): use pipeline layout to derive usage and access bits.
-  // For now we conservatively say _any_ access may be performed (read/write).
-  iree_hal_buffer_binding_requirements_t requirements = {
-      .required_compatibility = IREE_HAL_BUFFER_COMPATIBILITY_QUEUE_DISPATCH,
-      .usage = IREE_HAL_BUFFER_USAGE_DISPATCH_STORAGE,
-      .access = IREE_HAL_MEMORY_ACCESS_ANY,
-      .type = IREE_HAL_MEMORY_TYPE_DEVICE_VISIBLE,
-  };
-  for (iree_host_size_t i = 0; i < binding_count; ++i) {
-    // TODO(benvanik): validate binding ordinal against pipeline layout.
-    requirements.max_byte_offset = bindings[i].offset + bindings[i].length;
-    IREE_RETURN_IF_ERROR(
-        iree_hal_command_buffer_validate_buffer_requirements(
-            command_buffer, validation_state, bindings[i], requirements),
-        "set[%u] binding[%u] (arg[%" PRIhsz "])", set, bindings[i].ordinal, i);
-  }
-
-  return iree_ok_status();
-}
-
-iree_status_t iree_hal_command_buffer_dispatch_validation(
-    iree_hal_command_buffer_t* command_buffer,
-    iree_hal_command_buffer_validation_state_t* validation_state,
-    iree_hal_executable_t* executable, int32_t entry_point,
-    uint32_t workgroup_x, uint32_t workgroup_y, uint32_t workgroup_z,
-    iree_hal_dispatch_flags_t flags) {
-  IREE_RETURN_IF_ERROR(iree_hal_command_buffer_validate_categories(
-      command_buffer, validation_state, IREE_HAL_COMMAND_CATEGORY_DISPATCH));
-  IREE_RETURN_IF_ERROR(iree_hal_command_buffer_validate_dispatch_bindings(
-      command_buffer, validation_state, executable, entry_point));
-  return iree_ok_status();
-}
-
-iree_status_t iree_hal_command_buffer_dispatch_indirect_validation(
-    iree_hal_command_buffer_t* command_buffer,
-    iree_hal_command_buffer_validation_state_t* validation_state,
-    iree_hal_executable_t* executable, int32_t entry_point,
-    iree_hal_buffer_ref_t workgroups_ref, iree_hal_dispatch_flags_t flags) {
-  IREE_RETURN_IF_ERROR(iree_hal_command_buffer_validate_categories(
-      command_buffer, validation_state, IREE_HAL_COMMAND_CATEGORY_DISPATCH));
-
-  if ((workgroups_ref.offset % sizeof(uint32_t)) != 0) {
-    return iree_make_status(
-        IREE_STATUS_INVALID_ARGUMENT,
-        "workgroup count offset does not match the required natural alignment "
-        "of uint32_t (offset=%" PRIdsz ", min_byte_alignment=%" PRIhsz ")",
-        workgroups_ref.offset, sizeof(uint32_t));
-  } else if (workgroups_ref.length < 3 * sizeof(uint32_t)) {
-    return iree_make_status(IREE_STATUS_OUT_OF_RANGE,
-                            "workgroup count buffer does not have the capacity "
-                            "to store the required 3 uint32_t values "
-                            "(length=%" PRIdsz ", min_length=%" PRIhsz ")",
-                            workgroups_ref.length, 3 * sizeof(uint32_t));
-  }
-
-  const iree_hal_buffer_binding_requirements_t workgroups_reqs = {
-      .required_compatibility = IREE_HAL_BUFFER_COMPATIBILITY_QUEUE_DISPATCH,
-      .usage = IREE_HAL_BUFFER_USAGE_DISPATCH_INDIRECT_PARAMS,
-      .access = IREE_HAL_MEMORY_ACCESS_READ,
-      .type = IREE_HAL_MEMORY_TYPE_DEVICE_VISIBLE,
-      .max_byte_offset = workgroups_ref.offset + workgroups_ref.length,
-      .min_byte_alignment = sizeof(uint32_t),
-  };
-  IREE_RETURN_IF_ERROR(iree_hal_command_buffer_validate_buffer_requirements(
-      command_buffer, validation_state, workgroups_ref, workgroups_reqs));
-
-  IREE_RETURN_IF_ERROR(iree_hal_command_buffer_validate_dispatch_bindings(
-      command_buffer, validation_state, executable, entry_point));
-
-  return iree_ok_status();
-}
-
-static iree_status_t iree_hal_command_buffer_dispatch2_validation_base(
+static iree_status_t iree_hal_command_buffer_dispatch_validation_base(
     iree_hal_command_buffer_t* command_buffer,
     iree_hal_command_buffer_validation_state_t* validation_state,
     iree_hal_executable_t* executable, int32_t entry_point,
@@ -686,24 +584,24 @@
     IREE_RETURN_IF_ERROR(
         iree_hal_command_buffer_validate_buffer_requirements(
             command_buffer, validation_state, bindings.values[i], requirements),
-        "binding[%u] (arg[%" PRIhsz "])", bindings.values[i].ordinal, i);
+        "binding[%" PRIhsz "]", i);
   }
 
   return iree_ok_status();
 }
 
-iree_status_t iree_hal_command_buffer_dispatch2_validation(
+iree_status_t iree_hal_command_buffer_dispatch_validation(
     iree_hal_command_buffer_t* command_buffer,
     iree_hal_command_buffer_validation_state_t* validation_state,
     iree_hal_executable_t* executable, int32_t entry_point,
     const uint32_t workgroup_count[3], iree_const_byte_span_t constants,
     iree_hal_buffer_ref_list_t bindings, iree_hal_dispatch_flags_t flags) {
-  return iree_hal_command_buffer_dispatch2_validation_base(
+  return iree_hal_command_buffer_dispatch_validation_base(
       command_buffer, validation_state, executable, entry_point, constants,
       bindings, flags);
 }
 
-iree_status_t iree_hal_command_buffer_dispatch2_indirect_validation(
+iree_status_t iree_hal_command_buffer_dispatch_indirect_validation(
     iree_hal_command_buffer_t* command_buffer,
     iree_hal_command_buffer_validation_state_t* validation_state,
     iree_hal_executable_t* executable, int32_t entry_point,
@@ -734,7 +632,7 @@
   IREE_RETURN_IF_ERROR(iree_hal_command_buffer_validate_buffer_requirements(
       command_buffer, validation_state, workgroups_ref, workgroups_reqs));
 
-  return iree_hal_command_buffer_dispatch2_validation_base(
+  return iree_hal_command_buffer_dispatch_validation_base(
       command_buffer, validation_state, executable, entry_point, constants,
       bindings, flags);
 }
diff --git a/runtime/src/iree/hal/command_buffer_validation.h b/runtime/src/iree/hal/command_buffer_validation.h
index 505982f..2174c06 100644
--- a/runtime/src/iree/hal/command_buffer_validation.h
+++ b/runtime/src/iree/hal/command_buffer_validation.h
@@ -126,43 +126,14 @@
     iree_hal_buffer_ref_t send_ref, iree_hal_buffer_ref_t recv_ref,
     iree_device_size_t element_count);
 
-// TODO(#18154): deprecated and will be replaced with simplified bindings.
-iree_status_t iree_hal_command_buffer_push_constants_validation(
-    iree_hal_command_buffer_t* command_buffer,
-    iree_hal_command_buffer_validation_state_t* validation_state,
-    iree_hal_pipeline_layout_t* pipeline_layout, iree_host_size_t offset,
-    const void* values, iree_host_size_t values_length);
-
-// TODO(#18154): deprecated and will be replaced with simplified bindings.
-iree_status_t iree_hal_command_buffer_push_descriptor_set_validation(
-    iree_hal_command_buffer_t* command_buffer,
-    iree_hal_command_buffer_validation_state_t* validation_state,
-    iree_hal_pipeline_layout_t* pipeline_layout, uint32_t set,
-    iree_host_size_t binding_count, const iree_hal_buffer_ref_t* bindings);
-
-// TODO(#18154): deprecated and will be replaced with simplified bindings.
 iree_status_t iree_hal_command_buffer_dispatch_validation(
     iree_hal_command_buffer_t* command_buffer,
     iree_hal_command_buffer_validation_state_t* validation_state,
     iree_hal_executable_t* executable, int32_t entry_point,
-    uint32_t workgroup_x, uint32_t workgroup_y, uint32_t workgroup_z,
-    iree_hal_dispatch_flags_t flags);
-
-// TODO(#18154): deprecated and will be replaced with simplified bindings.
-iree_status_t iree_hal_command_buffer_dispatch_indirect_validation(
-    iree_hal_command_buffer_t* command_buffer,
-    iree_hal_command_buffer_validation_state_t* validation_state,
-    iree_hal_executable_t* executable, int32_t entry_point,
-    iree_hal_buffer_ref_t workgroups_ref, iree_hal_dispatch_flags_t flags);
-
-iree_status_t iree_hal_command_buffer_dispatch2_validation(
-    iree_hal_command_buffer_t* command_buffer,
-    iree_hal_command_buffer_validation_state_t* validation_state,
-    iree_hal_executable_t* executable, int32_t entry_point,
     const uint32_t workgroup_count[3], iree_const_byte_span_t constants,
     iree_hal_buffer_ref_list_t bindings, iree_hal_dispatch_flags_t flags);
 
-iree_status_t iree_hal_command_buffer_dispatch2_indirect_validation(
+iree_status_t iree_hal_command_buffer_dispatch_indirect_validation(
     iree_hal_command_buffer_t* command_buffer,
     iree_hal_command_buffer_validation_state_t* validation_state,
     iree_hal_executable_t* executable, int32_t entry_point,
diff --git a/runtime/src/iree/hal/cts/CMakeLists.txt b/runtime/src/iree/hal/cts/CMakeLists.txt
index 090a2a6..1e7ea8f 100644
--- a/runtime/src/iree/hal/cts/CMakeLists.txt
+++ b/runtime/src/iree/hal/cts/CMakeLists.txt
@@ -10,15 +10,13 @@
   "command_buffer"
   "command_buffer_copy_buffer"
   "command_buffer_dispatch"
+  "command_buffer_dispatch_constants"
   "command_buffer_fill_buffer"
-  "command_buffer_push_constants"
   "command_buffer_update_buffer"
-  "descriptor_set_layout"
   "driver"
   "event"
   "executable_cache"
   "file"
-  "pipeline_layout"
   "semaphore"
   "semaphore_submission"
   PARENT_SCOPE
@@ -29,7 +27,7 @@
 # connected to a functional compiler target, these tests can be skipped.
 set(IREE_EXECUTABLE_CTS_TESTS
   "command_buffer_dispatch"
-  "command_buffer_push_constants"
+  "command_buffer_dispatch_constants"
   "executable_cache"
   PARENT_SCOPE
 )
@@ -37,7 +35,7 @@
 # List of testdata/{name}.mlir source files.
 set(IREE_ALL_CTS_EXECUTABLE_SOURCES
   "command_buffer_dispatch_test"
-  "command_buffer_push_constants_test"
+  "command_buffer_dispatch_constants_test"
   "executable_cache_test"
   PARENT_SCOPE
 )
@@ -122,6 +120,19 @@
 
 iree_cc_library(
   NAME
+    command_buffer_dispatch_constants_test_library
+  HDRS
+    "command_buffer_dispatch_constants_test.h"
+  DEPS
+    ::cts_test_base
+    iree::base
+    iree::hal
+    iree::testing::gtest
+  TESTONLY
+)
+
+iree_cc_library(
+  NAME
     command_buffer_fill_buffer_test_library
   HDRS
     "command_buffer_fill_buffer_test.h"
@@ -135,19 +146,6 @@
 
 iree_cc_library(
   NAME
-    command_buffer_push_constants_test_library
-  HDRS
-    "command_buffer_push_constants_test.h"
-  DEPS
-    ::cts_test_base
-    iree::base
-    iree::hal
-    iree::testing::gtest
-  TESTONLY
-)
-
-iree_cc_library(
-  NAME
     command_buffer_update_buffer_test_library
   HDRS
     "command_buffer_update_buffer_test.h"
@@ -161,19 +159,6 @@
 
 iree_cc_library(
   NAME
-    descriptor_set_layout_test_library
-  HDRS
-    "descriptor_set_layout_test.h"
-  DEPS
-    ::cts_test_base
-    iree::base
-    iree::hal
-    iree::testing::gtest
-  TESTONLY
-)
-
-iree_cc_library(
-  NAME
     driver_test_library
   HDRS
     "driver_test.h"
@@ -226,19 +211,6 @@
 
 iree_cc_library(
   NAME
-    pipeline_layout_test_library
-  HDRS
-    "pipeline_layout_test.h"
-  DEPS
-    ::cts_test_base
-    iree::base
-    iree::hal
-    iree::testing::gtest
-  TESTONLY
-)
-
-iree_cc_library(
-  NAME
     semaphore_test_library
   HDRS
     "semaphore_test.h"
diff --git a/runtime/src/iree/hal/cts/command_buffer_push_constants_test.h b/runtime/src/iree/hal/cts/command_buffer_dispatch_constants_test.h
similarity index 66%
rename from runtime/src/iree/hal/cts/command_buffer_push_constants_test.h
rename to runtime/src/iree/hal/cts/command_buffer_dispatch_constants_test.h
index 06fa747..d53c7b6 100644
--- a/runtime/src/iree/hal/cts/command_buffer_push_constants_test.h
+++ b/runtime/src/iree/hal/cts/command_buffer_dispatch_constants_test.h
@@ -4,8 +4,8 @@
 // See https://llvm.org/LICENSE.txt for license information.
 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 
-#ifndef IREE_HAL_CTS_COMMAND_BUFFER_PUSH_CONSTANTS_TEST_H_
-#define IREE_HAL_CTS_COMMAND_BUFFER_PUSH_CONSTANTS_TEST_H_
+#ifndef IREE_HAL_CTS_COMMAND_BUFFER_DISPATCH_CONSTANTS_TEST_H_
+#define IREE_HAL_CTS_COMMAND_BUFFER_DISPATCH_CONSTANTS_TEST_H_
 
 #include "iree/base/api.h"
 #include "iree/base/string_view.h"
@@ -18,29 +18,13 @@
 
 using ::testing::ContainerEq;
 
-class CommandBufferPushConstantsTest : public CTSTestBase<> {
+class CommandBufferDispatchConstantsTest : public CTSTestBase<> {
  protected:
   void PrepareExecutable() {
     IREE_ASSERT_OK(iree_hal_executable_cache_create(
         device_, iree_make_cstring_view("default"),
         iree_loop_inline(&loop_status_), &executable_cache_));
 
-    iree_hal_descriptor_set_layout_binding_t descriptor_set_layout_bindings[] =
-        {
-            {
-                0,
-                IREE_HAL_DESCRIPTOR_TYPE_STORAGE_BUFFER,
-                IREE_HAL_DESCRIPTOR_FLAG_NONE,
-            },
-        };
-    IREE_ASSERT_OK(iree_hal_descriptor_set_layout_create(
-        device_, IREE_HAL_DESCRIPTOR_SET_LAYOUT_FLAG_NONE,
-        IREE_ARRAYSIZE(descriptor_set_layout_bindings),
-        descriptor_set_layout_bindings, &descriptor_set_layout_));
-    IREE_ASSERT_OK(iree_hal_pipeline_layout_create(
-        device_, /*push_constants=*/4, /*set_layout_count=*/1,
-        &descriptor_set_layout_, &pipeline_layout_));
-
     iree_hal_executable_params_t executable_params;
     iree_hal_executable_params_initialize(&executable_params);
     executable_params.caching_mode =
@@ -48,9 +32,7 @@
     executable_params.executable_format =
         iree_make_cstring_view(get_test_executable_format());
     executable_params.executable_data = get_test_executable_data(
-        iree_make_cstring_view("command_buffer_push_constants_test.bin"));
-    executable_params.pipeline_layout_count = 1;
-    executable_params.pipeline_layouts = &pipeline_layout_;
+        iree_make_cstring_view("command_buffer_dispatch_constants_test.bin"));
     // No executable-level "specialization constants" (not to be confused with
     // per-dispatch varying "push constants").
     executable_params.constant_count = 0;
@@ -62,20 +44,16 @@
 
   void CleanupExecutable() {
     iree_hal_executable_release(executable_);
-    iree_hal_pipeline_layout_release(pipeline_layout_);
-    iree_hal_descriptor_set_layout_release(descriptor_set_layout_);
     iree_hal_executable_cache_release(executable_cache_);
     IREE_ASSERT_OK(loop_status_);
   }
 
   iree_status_t loop_status_ = iree_ok_status();
   iree_hal_executable_cache_t* executable_cache_ = NULL;
-  iree_hal_descriptor_set_layout_t* descriptor_set_layout_ = NULL;
-  iree_hal_pipeline_layout_t* pipeline_layout_ = NULL;
   iree_hal_executable_t* executable_ = NULL;
 };
 
-TEST_F(CommandBufferPushConstantsTest, DispatchWithPushConstants) {
+TEST_F(CommandBufferDispatchConstantsTest, DispatchWithDispatchConstants) {
   ASSERT_NO_FATAL_FAILURE(PrepareExecutable());
 
   iree_hal_command_buffer_t* command_buffer = NULL;
@@ -99,7 +77,7 @@
   IREE_ASSERT_OK(iree_hal_allocator_allocate_buffer(
       device_allocator_, output_params, 4 * sizeof(uint32_t), &output_buffer));
 
-  iree_hal_buffer_ref_t descriptor_set_bindings[] = {
+  iree_hal_buffer_ref_t binding_refs[] = {
       {
           /*binding=*/0,
           /*buffer_slot=*/0,
@@ -108,20 +86,19 @@
           iree_hal_buffer_byte_length(output_buffer),
       },
   };
+  iree_hal_buffer_ref_list_t bindings = {
+      /*.count=*/IREE_ARRAYSIZE(binding_refs),
+      /*.values=*/binding_refs,
+  };
 
-  IREE_ASSERT_OK(iree_hal_command_buffer_push_descriptor_set(
-      command_buffer, pipeline_layout_, /*set=*/0,
-      IREE_ARRAYSIZE(descriptor_set_bindings), descriptor_set_bindings));
+  std::vector<uint32_t> constant_data{11, 22, 33, 44};
+  iree_const_byte_span_t constants = iree_make_const_byte_span(
+      constant_data.data(), constant_data.size() * sizeof(constant_data[0]));
 
-  std::vector<uint32_t> push_constants{11, 22, 33, 44};
-  IREE_ASSERT_OK(iree_hal_command_buffer_push_constants(
-      command_buffer, pipeline_layout_, /*offset=*/0, push_constants.data(),
-      push_constants.size() * sizeof(uint32_t)));
-
+  uint32_t workgroup_count[3] = {1, 1, 1};
   IREE_ASSERT_OK(iree_hal_command_buffer_dispatch(
-      command_buffer, executable_, /*entry_point=*/0,
-      /*workgroup_x=*/1, /*workgroup_y=*/1, /*workgroup_z=*/1,
-      IREE_HAL_DISPATCH_FLAG_NONE));
+      command_buffer, executable_, /*entry_point=*/0, workgroup_count,
+      constants, bindings, IREE_HAL_DISPATCH_FLAG_NONE));
   IREE_ASSERT_OK(iree_hal_command_buffer_execution_barrier(
       command_buffer,
       /*source_stage_mask=*/IREE_HAL_EXECUTION_STAGE_DISPATCH |
@@ -144,7 +121,7 @@
       /*data_length=*/output_data.size() * sizeof(uint32_t),
       IREE_HAL_TRANSFER_BUFFER_FLAG_DEFAULT, iree_infinite_timeout()));
 
-  EXPECT_THAT(output_data, ContainerEq(push_constants));
+  EXPECT_THAT(output_data, ContainerEq(constant_data));
 
   iree_hal_command_buffer_release(command_buffer);
   iree_hal_buffer_release(output_buffer);
@@ -153,4 +130,4 @@
 
 }  // namespace iree::hal::cts
 
-#endif  // IREE_HAL_CTS_COMMAND_BUFFER_PUSH_CONSTANTS_TEST_H_
+#endif  // IREE_HAL_CTS_COMMAND_BUFFER_DISPATCH_CONSTANTS_TEST_H_
diff --git a/runtime/src/iree/hal/cts/command_buffer_dispatch_test.h b/runtime/src/iree/hal/cts/command_buffer_dispatch_test.h
index 6d19793..83a264b 100644
--- a/runtime/src/iree/hal/cts/command_buffer_dispatch_test.h
+++ b/runtime/src/iree/hal/cts/command_buffer_dispatch_test.h
@@ -25,27 +25,6 @@
         device_, iree_make_cstring_view("default"),
         iree_loop_inline(&loop_status_), &executable_cache_));
 
-    iree_hal_descriptor_set_layout_binding_t descriptor_set_layout_bindings[] =
-        {
-            {
-                0,
-                IREE_HAL_DESCRIPTOR_TYPE_STORAGE_BUFFER,
-                IREE_HAL_DESCRIPTOR_FLAG_NONE,
-            },
-            {
-                1,
-                IREE_HAL_DESCRIPTOR_TYPE_STORAGE_BUFFER,
-                IREE_HAL_DESCRIPTOR_FLAG_NONE,
-            },
-        };
-    IREE_ASSERT_OK(iree_hal_descriptor_set_layout_create(
-        device_, IREE_HAL_DESCRIPTOR_SET_LAYOUT_FLAG_NONE,
-        IREE_ARRAYSIZE(descriptor_set_layout_bindings),
-        descriptor_set_layout_bindings, &descriptor_set_layout_));
-    IREE_ASSERT_OK(iree_hal_pipeline_layout_create(
-        device_, /*push_constants=*/0, /*set_layout_count=*/1,
-        &descriptor_set_layout_, &pipeline_layout_));
-
     iree_hal_executable_params_t executable_params;
     iree_hal_executable_params_initialize(&executable_params);
     executable_params.caching_mode =
@@ -54,8 +33,6 @@
         iree_make_cstring_view(get_test_executable_format());
     executable_params.executable_data = get_test_executable_data(
         iree_make_cstring_view("command_buffer_dispatch_test.bin"));
-    executable_params.pipeline_layout_count = 1;
-    executable_params.pipeline_layouts = &pipeline_layout_;
 
     IREE_ASSERT_OK(iree_hal_executable_cache_prepare_executable(
         executable_cache_, &executable_params, &executable_));
@@ -63,16 +40,12 @@
 
   void CleanupExecutable() {
     iree_hal_executable_release(executable_);
-    iree_hal_pipeline_layout_release(pipeline_layout_);
-    iree_hal_descriptor_set_layout_release(descriptor_set_layout_);
     iree_hal_executable_cache_release(executable_cache_);
     IREE_ASSERT_OK(loop_status_);
   }
 
   iree_status_t loop_status_ = iree_ok_status();
   iree_hal_executable_cache_t* executable_cache_ = NULL;
-  iree_hal_descriptor_set_layout_t* descriptor_set_layout_ = NULL;
-  iree_hal_pipeline_layout_t* pipeline_layout_ = NULL;
   iree_hal_executable_t* executable_ = NULL;
 };
 
@@ -90,20 +63,20 @@
   iree_hal_buffer_t* output_buffer = NULL;
   CreateFilledDeviceBuffer<float>(4 * sizeof(float), -9.0f, &output_buffer);
 
-  iree_hal_buffer_ref_t descriptor_set_bindings[2];
-  iree_hal_buffer_binding_t bindings[2];
+  iree_hal_buffer_ref_t binding_refs[2];
+  iree_hal_buffer_binding_t binding_table_values[2];
   iree_hal_buffer_binding_table_t binding_table =
       iree_hal_buffer_binding_table_empty();
   switch (GetParam()) {
     case RecordingType::kDirect:
-      descriptor_set_bindings[0] = {
+      binding_refs[0] = {
           /*binding=*/0,
           /*buffer_slot=*/0,
           /*buffer=*/input_buffer,
           /*offset=*/1 * sizeof(float),
           /*length=*/2 * sizeof(float),
       };
-      descriptor_set_bindings[1] = {
+      binding_refs[1] = {
           /*binding=*/1,
           /*buffer_slot=*/0,
           /*buffer=*/output_buffer,
@@ -112,26 +85,26 @@
       };
       break;
     case RecordingType::kIndirect:
-      binding_table.count = IREE_ARRAYSIZE(descriptor_set_bindings);
-      binding_table.bindings = bindings;
-      bindings[0] = {
+      binding_table.count = IREE_ARRAYSIZE(binding_refs);
+      binding_table.bindings = binding_table_values;
+      binding_table_values[0] = {
           /*buffer=*/input_buffer,
           /*offset=*/1 * sizeof(float),
           /*length=*/2 * sizeof(float),
       };
-      descriptor_set_bindings[0] = {
+      binding_refs[0] = {
           /*binding=*/0,
           /*buffer_slot=*/0,
           /*buffer=*/NULL,
           /*offset=*/0,
           /*length=*/2 * sizeof(float),
       };
-      bindings[1] = {
+      binding_table_values[1] = {
           /*buffer=*/output_buffer,
           /*offset=*/1 * sizeof(float),
           /*length=*/2 * sizeof(float),
       };
-      descriptor_set_bindings[1] = {
+      binding_refs[1] = {
           /*binding=*/1,
           /*buffer_slot=*/1,
           /*buffer=*/NULL,
@@ -140,6 +113,10 @@
       };
       break;
   }
+  iree_hal_buffer_ref_list_t bindings = {
+      /*.count=*/IREE_ARRAYSIZE(binding_refs),
+      /*.values=*/binding_refs,
+  };
 
   iree_hal_command_buffer_t* command_buffer = NULL;
   IREE_ASSERT_OK(iree_hal_command_buffer_create(
@@ -148,14 +125,11 @@
       binding_table.count, &command_buffer));
   IREE_ASSERT_OK(iree_hal_command_buffer_begin(command_buffer));
 
-  IREE_ASSERT_OK(iree_hal_command_buffer_push_descriptor_set(
-      command_buffer, pipeline_layout_, /*set=*/0,
-      IREE_ARRAYSIZE(descriptor_set_bindings), descriptor_set_bindings));
-
+  uint32_t workgroup_count[3] = {1, 1, 1};
   IREE_ASSERT_OK(iree_hal_command_buffer_dispatch(
-      command_buffer, executable_, /*entry_point=*/0,
-      /*workgroup_x=*/1, /*workgroup_y=*/1, /*workgroup_z=*/1,
-      IREE_HAL_DISPATCH_FLAG_NONE));
+      command_buffer, executable_, /*entry_point=*/0, workgroup_count,
+      iree_const_byte_span_empty(), bindings, IREE_HAL_DISPATCH_FLAG_NONE));
+
   IREE_ASSERT_OK(iree_hal_command_buffer_execution_barrier(
       command_buffer,
       /*source_stage_mask=*/IREE_HAL_EXECUTION_STAGE_DISPATCH |
diff --git a/runtime/src/iree/hal/cts/descriptor_set_layout_test.h b/runtime/src/iree/hal/cts/descriptor_set_layout_test.h
deleted file mode 100644
index ffeb03c..0000000
--- a/runtime/src/iree/hal/cts/descriptor_set_layout_test.h
+++ /dev/null
@@ -1,91 +0,0 @@
-// Copyright 2021 The IREE Authors
-//
-// Licensed under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-
-#ifndef IREE_HAL_CTS_DESCRIPTOR_SET_LAYOUT_TEST_H_
-#define IREE_HAL_CTS_DESCRIPTOR_SET_LAYOUT_TEST_H_
-
-#include "iree/base/api.h"
-#include "iree/hal/api.h"
-#include "iree/hal/cts/cts_test_base.h"
-#include "iree/testing/gtest.h"
-#include "iree/testing/status_matchers.h"
-
-namespace iree::hal::cts {
-
-class DescriptorSetLayoutTest : public CTSTestBase<> {};
-
-// Note: bindingCount == 0 is valid in VkDescriptorSetLayoutCreateInfo:
-// https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkDescriptorSetLayoutCreateInfo.html
-TEST_F(DescriptorSetLayoutTest, CreateWithNoBindings) {
-  iree_hal_descriptor_set_layout_t* descriptor_set_layout = NULL;
-  IREE_ASSERT_OK(iree_hal_descriptor_set_layout_create(
-      device_, IREE_HAL_DESCRIPTOR_SET_LAYOUT_FLAG_NONE,
-      /*binding_count=*/0,
-      /*bindings=*/NULL, &descriptor_set_layout));
-  iree_hal_descriptor_set_layout_release(descriptor_set_layout);
-}
-
-TEST_F(DescriptorSetLayoutTest, CreateWithOneBinding) {
-  iree_hal_descriptor_set_layout_t* descriptor_set_layout = NULL;
-  iree_hal_descriptor_set_layout_binding_t descriptor_set_layout_bindings[] = {
-      {
-          /*binding=*/0,
-          /*type=*/IREE_HAL_DESCRIPTOR_TYPE_STORAGE_BUFFER,
-          /*flags=*/IREE_HAL_DESCRIPTOR_FLAG_NONE,
-      },
-  };
-  IREE_ASSERT_OK(iree_hal_descriptor_set_layout_create(
-      device_, IREE_HAL_DESCRIPTOR_SET_LAYOUT_FLAG_NONE,
-      IREE_ARRAYSIZE(descriptor_set_layout_bindings),
-      descriptor_set_layout_bindings, &descriptor_set_layout));
-  iree_hal_descriptor_set_layout_release(descriptor_set_layout);
-}
-
-TEST_F(DescriptorSetLayoutTest, CreateWithTwoBindings) {
-  iree_hal_descriptor_set_layout_t* descriptor_set_layout = NULL;
-  iree_hal_descriptor_set_layout_binding_t descriptor_set_layout_bindings[] = {
-      {
-          /*binding=*/0,
-          /*type=*/IREE_HAL_DESCRIPTOR_TYPE_STORAGE_BUFFER,
-          /*flags=*/IREE_HAL_DESCRIPTOR_FLAG_NONE,
-      },
-      {
-          /*binding=*/1,
-          /*type=*/IREE_HAL_DESCRIPTOR_TYPE_STORAGE_BUFFER,
-          /*flags=*/IREE_HAL_DESCRIPTOR_FLAG_NONE,
-      },
-  };
-  IREE_ASSERT_OK(iree_hal_descriptor_set_layout_create(
-      device_, IREE_HAL_DESCRIPTOR_SET_LAYOUT_FLAG_NONE,
-      IREE_ARRAYSIZE(descriptor_set_layout_bindings),
-      descriptor_set_layout_bindings, &descriptor_set_layout));
-  iree_hal_descriptor_set_layout_release(descriptor_set_layout);
-}
-
-TEST_F(DescriptorSetLayoutTest, CreateWithPushDescriptorType) {
-  iree_hal_descriptor_set_layout_t* descriptor_set_layout = NULL;
-  iree_hal_descriptor_set_layout_binding_t descriptor_set_layout_bindings[] = {
-      {
-          /*binding=*/0,
-          /*type=*/IREE_HAL_DESCRIPTOR_TYPE_STORAGE_BUFFER,
-          /*flags=*/IREE_HAL_DESCRIPTOR_FLAG_NONE,
-      },
-      {
-          /*binding=*/1,
-          /*type=*/IREE_HAL_DESCRIPTOR_TYPE_STORAGE_BUFFER,
-          /*flags=*/IREE_HAL_DESCRIPTOR_FLAG_NONE,
-      },
-  };
-  IREE_ASSERT_OK(iree_hal_descriptor_set_layout_create(
-      device_, IREE_HAL_DESCRIPTOR_SET_LAYOUT_FLAG_NONE,
-      IREE_ARRAYSIZE(descriptor_set_layout_bindings),
-      descriptor_set_layout_bindings, &descriptor_set_layout));
-  iree_hal_descriptor_set_layout_release(descriptor_set_layout);
-}
-
-}  // namespace iree::hal::cts
-
-#endif  // IREE_HAL_CTS_DESCRIPTOR_SET_LAYOUT_TEST_H_
diff --git a/runtime/src/iree/hal/cts/executable_cache_test.h b/runtime/src/iree/hal/cts/executable_cache_test.h
index 8792fd9..5c72f6f 100644
--- a/runtime/src/iree/hal/cts/executable_cache_test.h
+++ b/runtime/src/iree/hal/cts/executable_cache_test.h
@@ -50,29 +50,6 @@
       device_, iree_make_cstring_view("default"),
       iree_loop_inline(&loop_status), &executable_cache));
 
-  // Note: this layout must match the testdata executable.
-  iree_hal_descriptor_set_layout_t* descriptor_set_layout = NULL;
-  iree_hal_descriptor_set_layout_binding_t descriptor_set_layout_bindings[] = {
-      {
-          0,
-          IREE_HAL_DESCRIPTOR_TYPE_STORAGE_BUFFER,
-          IREE_HAL_DESCRIPTOR_FLAG_NONE,
-      },
-      {
-          1,
-          IREE_HAL_DESCRIPTOR_TYPE_STORAGE_BUFFER,
-          IREE_HAL_DESCRIPTOR_FLAG_NONE,
-      },
-  };
-  IREE_ASSERT_OK(iree_hal_descriptor_set_layout_create(
-      device_, IREE_HAL_DESCRIPTOR_SET_LAYOUT_FLAG_NONE,
-      IREE_ARRAYSIZE(descriptor_set_layout_bindings),
-      descriptor_set_layout_bindings, &descriptor_set_layout));
-  iree_hal_pipeline_layout_t* pipeline_layout;
-  IREE_ASSERT_OK(iree_hal_pipeline_layout_create(
-      device_, /*push_constants=*/0, /*set_layout_count=*/1,
-      &descriptor_set_layout, &pipeline_layout));
-
   iree_hal_executable_params_t executable_params;
   iree_hal_executable_params_initialize(&executable_params);
   executable_params.caching_mode =
@@ -81,16 +58,12 @@
       iree_make_cstring_view(get_test_executable_format());
   executable_params.executable_data = get_test_executable_data(
       iree_make_cstring_view("executable_cache_test.bin"));
-  executable_params.pipeline_layout_count = 1;
-  executable_params.pipeline_layouts = &pipeline_layout;
 
   iree_hal_executable_t* executable = NULL;
   IREE_ASSERT_OK(iree_hal_executable_cache_prepare_executable(
       executable_cache, &executable_params, &executable));
 
   iree_hal_executable_release(executable);
-  iree_hal_pipeline_layout_release(pipeline_layout);
-  iree_hal_descriptor_set_layout_release(descriptor_set_layout);
   iree_hal_executable_cache_release(executable_cache);
   IREE_ASSERT_OK(loop_status);
 }
diff --git a/runtime/src/iree/hal/cts/pipeline_layout_test.h b/runtime/src/iree/hal/cts/pipeline_layout_test.h
deleted file mode 100644
index 4342c6a..0000000
--- a/runtime/src/iree/hal/cts/pipeline_layout_test.h
+++ /dev/null
@@ -1,121 +0,0 @@
-// Copyright 2021 The IREE Authors
-//
-// Licensed under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-
-#ifndef IREE_HAL_CTS_PIPELINE_LAYOUT_TEST_H_
-#define IREE_HAL_CTS_PIPELINE_LAYOUT_TEST_H_
-
-#include "iree/base/api.h"
-#include "iree/hal/api.h"
-#include "iree/hal/cts/cts_test_base.h"
-#include "iree/testing/gtest.h"
-#include "iree/testing/status_matchers.h"
-
-namespace iree::hal::cts {
-
-class PipelineLayoutTest : public CTSTestBase<> {};
-
-TEST_F(PipelineLayoutTest, CreateWithNoLayouts) {
-  iree_hal_pipeline_layout_t* pipeline_layout = NULL;
-  IREE_ASSERT_OK(iree_hal_pipeline_layout_create(device_, /*push_constants=*/0,
-                                                 /*set_layout_count=*/0, NULL,
-                                                 &pipeline_layout));
-
-  iree_hal_pipeline_layout_release(pipeline_layout);
-}
-
-TEST_F(PipelineLayoutTest, CreateWithPushConstants) {
-  iree_hal_pipeline_layout_t* pipeline_layout = NULL;
-  // Note: The Vulkan maxPushConstantsSize limit must be at least 128 bytes:
-  // https://www.khronos.org/registry/vulkan/specs/1.2/html/vkspec.html#limits-minmax
-  IREE_ASSERT_OK(iree_hal_pipeline_layout_create(device_, /*push_constants=*/5,
-                                                 /*set_layout_count=*/0, NULL,
-                                                 &pipeline_layout));
-
-  iree_hal_pipeline_layout_release(pipeline_layout);
-}
-
-TEST_F(PipelineLayoutTest, CreateWithOneLayout) {
-  iree_hal_descriptor_set_layout_t* descriptor_set_layout = NULL;
-  iree_hal_descriptor_set_layout_binding_t descriptor_set_layout_bindings[] = {
-      {
-          /*binding=*/0,
-          /*type=*/IREE_HAL_DESCRIPTOR_TYPE_STORAGE_BUFFER,
-          /*flags=*/IREE_HAL_DESCRIPTOR_FLAG_NONE,
-      },
-      {
-          /*binding=*/1,
-          /*type=*/IREE_HAL_DESCRIPTOR_TYPE_STORAGE_BUFFER,
-          /*flags=*/IREE_HAL_DESCRIPTOR_FLAG_NONE,
-      },
-  };
-  IREE_ASSERT_OK(iree_hal_descriptor_set_layout_create(
-      device_, IREE_HAL_DESCRIPTOR_SET_LAYOUT_FLAG_NONE,
-      IREE_ARRAYSIZE(descriptor_set_layout_bindings),
-      descriptor_set_layout_bindings, &descriptor_set_layout));
-
-  iree_hal_pipeline_layout_t* pipeline_layout = NULL;
-  IREE_ASSERT_OK(iree_hal_pipeline_layout_create(
-      device_, /*push_constants=*/0, /*set_layout_count=*/1,
-      &descriptor_set_layout, &pipeline_layout));
-
-  iree_hal_pipeline_layout_release(pipeline_layout);
-  iree_hal_descriptor_set_layout_release(descriptor_set_layout);
-}
-
-TEST_F(PipelineLayoutTest, CreateWithTwoLayouts) {
-  iree_hal_descriptor_set_layout_t* descriptor_set_layouts[2] = {NULL};
-  iree_hal_descriptor_set_layout_binding_t layout_bindings_0[] = {
-      {
-          /*binding=*/0,
-          /*type=*/IREE_HAL_DESCRIPTOR_TYPE_STORAGE_BUFFER,
-          /*flags=*/IREE_HAL_DESCRIPTOR_FLAG_NONE,
-      },
-      {
-          /*binding=*/1,
-          /*type=*/IREE_HAL_DESCRIPTOR_TYPE_STORAGE_BUFFER,
-          /*flags=*/IREE_HAL_DESCRIPTOR_FLAG_NONE,
-      },
-  };
-  IREE_ASSERT_OK(iree_hal_descriptor_set_layout_create(
-      device_, IREE_HAL_DESCRIPTOR_SET_LAYOUT_FLAG_NONE,
-      IREE_ARRAYSIZE(layout_bindings_0), layout_bindings_0,
-      &descriptor_set_layouts[0]));
-
-  iree_hal_descriptor_set_layout_binding_t layout_bindings_1[] = {
-      {
-          /*binding=*/0,
-          /*type=*/IREE_HAL_DESCRIPTOR_TYPE_STORAGE_BUFFER,
-          /*flags=*/IREE_HAL_DESCRIPTOR_FLAG_NONE,
-      },
-      {
-          /*binding=*/1,
-          /*type=*/IREE_HAL_DESCRIPTOR_TYPE_STORAGE_BUFFER,
-          /*flags=*/IREE_HAL_DESCRIPTOR_FLAG_NONE,
-      },
-      {
-          /*binding=*/2,
-          /*type=*/IREE_HAL_DESCRIPTOR_TYPE_STORAGE_BUFFER,
-          /*flags=*/IREE_HAL_DESCRIPTOR_FLAG_NONE,
-      },
-  };
-  IREE_ASSERT_OK(iree_hal_descriptor_set_layout_create(
-      device_, IREE_HAL_DESCRIPTOR_SET_LAYOUT_FLAG_NONE,
-      IREE_ARRAYSIZE(layout_bindings_1), layout_bindings_1,
-      &descriptor_set_layouts[1]));
-
-  iree_hal_pipeline_layout_t* pipeline_layout = NULL;
-  IREE_ASSERT_OK(iree_hal_pipeline_layout_create(
-      device_, /*push_constants=*/0, IREE_ARRAYSIZE(descriptor_set_layouts),
-      descriptor_set_layouts, &pipeline_layout));
-
-  iree_hal_pipeline_layout_release(pipeline_layout);
-  iree_hal_descriptor_set_layout_release(descriptor_set_layouts[0]);
-  iree_hal_descriptor_set_layout_release(descriptor_set_layouts[1]);
-}
-
-}  // namespace iree::hal::cts
-
-#endif  // IREE_HAL_CTS_PIPELINE_LAYOUT_TEST_H_
diff --git a/runtime/src/iree/hal/cts/testdata/command_buffer_push_constants_test.mlir b/runtime/src/iree/hal/cts/testdata/command_buffer_dispatch_constants_test.mlir
similarity index 72%
rename from runtime/src/iree/hal/cts/testdata/command_buffer_push_constants_test.mlir
rename to runtime/src/iree/hal/cts/testdata/command_buffer_dispatch_constants_test.mlir
index df041bd..6369fc4 100644
--- a/runtime/src/iree/hal/cts/testdata/command_buffer_push_constants_test.mlir
+++ b/runtime/src/iree/hal/cts/testdata/command_buffer_dispatch_constants_test.mlir
@@ -1,25 +1,23 @@
 // This program writes push constant values into an output buffer.
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 4, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 4, bindings = [
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 hal.executable.source public @executable {
-  hal.executable.export public @write_push_constants ordinal(0) layout(#pipeline_layout) attributes {workgroup_size = [1 : index, 1 : index, 1 : index]} {
+  hal.executable.export public @write_constants ordinal(0) layout(#pipeline_layout) attributes {workgroup_size = [1 : index, 1 : index, 1 : index]} {
   ^bb0(%arg0: !hal.device):
     %c1 = arith.constant 1 : index
     hal.return %c1, %c1, %c1 : index, index, index
   }
   builtin.module {
-    func.func @write_push_constants() {
+    func.func @write_constants() {
       %input_0 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : i32
       %input_1 = hal.interface.constant.load layout(#pipeline_layout) ordinal(1) : i32
       %input_2 = hal.interface.constant.load layout(#pipeline_layout) ordinal(2) : i32
       %input_3 = hal.interface.constant.load layout(#pipeline_layout) ordinal(3) : i32
 
-      %out = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) : memref<4xi32>
+      %out = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) : memref<4xi32>
 
       %c0 = arith.constant 0 : index
       %c1 = arith.constant 1 : index
diff --git a/runtime/src/iree/hal/cts/testdata/command_buffer_dispatch_test.mlir b/runtime/src/iree/hal/cts/testdata/command_buffer_dispatch_test.mlir
index 136879d..d816c15 100644
--- a/runtime/src/iree/hal/cts/testdata/command_buffer_dispatch_test.mlir
+++ b/runtime/src/iree/hal/cts/testdata/command_buffer_dispatch_test.mlir
@@ -5,11 +5,9 @@
 //   return %result : tensor<2xf32>
 // }
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 hal.executable.source public @executable {
@@ -22,8 +20,8 @@
     func.func @abs() {
       %c0 = arith.constant 0 : index
 
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(4) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2xf32>>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(4) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2xf32>>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(4) offset(%c0) : !flow.dispatch.tensor<readonly:tensor<2xf32>>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(4) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<2xf32>>
 
       %2 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [2], strides = [1] : !flow.dispatch.tensor<readonly:tensor<2xf32>> -> tensor<2xf32>
       %3 = tensor.empty() : tensor<2xf32>
diff --git a/runtime/src/iree/hal/cts/testdata/executable_cache_test.mlir b/runtime/src/iree/hal/cts/testdata/executable_cache_test.mlir
index 43e33be..6553a44 100644
--- a/runtime/src/iree/hal/cts/testdata/executable_cache_test.mlir
+++ b/runtime/src/iree/hal/cts/testdata/executable_cache_test.mlir
@@ -5,11 +5,9 @@
 //   return %result : tensor<f32>
 // }
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 hal.executable.source public @executable {
@@ -22,8 +20,8 @@
     func.func @abs() {
       %c0 = arith.constant 0 : index
 
-      %0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:f32>
-      %1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(32) offset(%c0) : !flow.dispatch.tensor<writeonly:f32>
+      %0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32) offset(%c0) : !flow.dispatch.tensor<readonly:f32>
+      %1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(32) offset(%c0) : !flow.dispatch.tensor<writeonly:f32>
 
       %2 = flow.dispatch.tensor.load %0, offsets = [], sizes = [], strides = [] : !flow.dispatch.tensor<readonly:f32> -> tensor<f32>
       %3 = tensor.empty() : tensor<f32>
diff --git a/runtime/src/iree/hal/device.h b/runtime/src/iree/hal/device.h
index e94352d..82aac60 100644
--- a/runtime/src/iree/hal/device.h
+++ b/runtime/src/iree/hal/device.h
@@ -20,7 +20,6 @@
 #include "iree/hal/executable_cache.h"
 #include "iree/hal/fence.h"
 #include "iree/hal/file.h"
-#include "iree/hal/pipeline_layout.h"
 #include "iree/hal/queue.h"
 #include "iree/hal/resource.h"
 #include "iree/hal/semaphore.h"
@@ -525,12 +524,6 @@
       iree_host_size_t binding_capacity,
       iree_hal_command_buffer_t** out_command_buffer);
 
-  iree_status_t(IREE_API_PTR* create_descriptor_set_layout)(
-      iree_hal_device_t* device, iree_hal_descriptor_set_layout_flags_t flags,
-      iree_host_size_t binding_count,
-      const iree_hal_descriptor_set_layout_binding_t* bindings,
-      iree_hal_descriptor_set_layout_t** out_descriptor_set_layout);
-
   iree_status_t(IREE_API_PTR* create_event)(
       iree_hal_device_t* device, iree_hal_queue_affinity_t queue_affinity,
       iree_hal_event_flags_t flags, iree_hal_event_t** out_event);
@@ -544,12 +537,6 @@
       iree_hal_memory_access_t access, iree_io_file_handle_t* handle,
       iree_hal_external_file_flags_t flags, iree_hal_file_t** out_file);
 
-  iree_status_t(IREE_API_PTR* create_pipeline_layout)(
-      iree_hal_device_t* device, iree_host_size_t push_constants,
-      iree_host_size_t set_layout_count,
-      iree_hal_descriptor_set_layout_t* const* set_layouts,
-      iree_hal_pipeline_layout_t** out_pipeline_layout);
-
   iree_status_t(IREE_API_PTR* create_semaphore)(
       iree_hal_device_t* device, uint64_t initial_value,
       iree_hal_semaphore_flags_t flags, iree_hal_semaphore_t** out_semaphore);
diff --git a/runtime/src/iree/hal/drivers/cuda/BUILD.bazel b/runtime/src/iree/hal/drivers/cuda/BUILD.bazel
index c0e7069..f6551ef 100644
--- a/runtime/src/iree/hal/drivers/cuda/BUILD.bazel
+++ b/runtime/src/iree/hal/drivers/cuda/BUILD.bazel
@@ -37,8 +37,6 @@
         "nccl_channel.h",
         "nop_executable_cache.c",
         "nop_executable_cache.h",
-        "pipeline_layout.c",
-        "pipeline_layout.h",
         "stream_command_buffer.c",
         "stream_command_buffer.h",
         "timepoint_pool.c",
@@ -63,12 +61,14 @@
         "//runtime/src/iree/hal/utils:collective_batch",
         "//runtime/src/iree/hal/utils:deferred_command_buffer",
         "//runtime/src/iree/hal/utils:deferred_work_queue",
+        "//runtime/src/iree/hal/utils:executable_debug_info",
         "//runtime/src/iree/hal/utils:file_transfer",
         "//runtime/src/iree/hal/utils:memory_file",
         "//runtime/src/iree/hal/utils:resource_set",
         "//runtime/src/iree/hal/utils:semaphore_base",
         "//runtime/src/iree/hal/utils:stream_tracing",
         "//runtime/src/iree/schemas:cuda_executable_def_c_fbs",
+        "//runtime/src/iree/schemas:executable_debug_info_c_fbs",
     ],
 )
 
diff --git a/runtime/src/iree/hal/drivers/cuda/CMakeLists.txt b/runtime/src/iree/hal/drivers/cuda/CMakeLists.txt
index e5f4c67..a040b1d 100644
--- a/runtime/src/iree/hal/drivers/cuda/CMakeLists.txt
+++ b/runtime/src/iree/hal/drivers/cuda/CMakeLists.txt
@@ -38,8 +38,6 @@
     "nccl_channel.h"
     "nop_executable_cache.c"
     "nop_executable_cache.h"
-    "pipeline_layout.c"
-    "pipeline_layout.h"
     "stream_command_buffer.c"
     "stream_command_buffer.h"
     "timepoint_pool.c"
@@ -60,12 +58,14 @@
     iree::hal::utils::collective_batch
     iree::hal::utils::deferred_command_buffer
     iree::hal::utils::deferred_work_queue
+    iree::hal::utils::executable_debug_info
     iree::hal::utils::file_transfer
     iree::hal::utils::memory_file
     iree::hal::utils::resource_set
     iree::hal::utils::semaphore_base
     iree::hal::utils::stream_tracing
     iree::schemas::cuda_executable_def_c_fbs
+    iree::schemas::executable_debug_info_c_fbs
   PUBLIC
 )
 
diff --git a/runtime/src/iree/hal/drivers/cuda/cuda_device.c b/runtime/src/iree/hal/drivers/cuda/cuda_device.c
index 0018c10..a12043d 100644
--- a/runtime/src/iree/hal/drivers/cuda/cuda_device.c
+++ b/runtime/src/iree/hal/drivers/cuda/cuda_device.c
@@ -23,7 +23,6 @@
 #include "iree/hal/drivers/cuda/nccl_channel.h"
 #include "iree/hal/drivers/cuda/nccl_dynamic_symbols.h"
 #include "iree/hal/drivers/cuda/nop_executable_cache.h"
-#include "iree/hal/drivers/cuda/pipeline_layout.h"
 #include "iree/hal/drivers/cuda/stream_command_buffer.h"
 #include "iree/hal/drivers/cuda/timepoint_pool.h"
 #include "iree/hal/utils/deferred_command_buffer.h"
@@ -448,12 +447,14 @@
 
   // Enable tracing for the (currently only) stream - no-op if disabled.
   if (iree_status_is_ok(status) && device->params.stream_tracing) {
-    if (device->params.stream_tracing >= IREE_HAL_TRACING_VERBOSITY_MAX ||
-        device->params.stream_tracing < IREE_HAL_TRACING_VERBOSITY_OFF) {
+    if (device->params.stream_tracing >=
+            IREE_HAL_STREAM_TRACING_VERBOSITY_MAX ||
+        device->params.stream_tracing < IREE_HAL_STREAM_TRACING_VERBOSITY_OFF) {
       return iree_make_status(
           IREE_STATUS_INVALID_ARGUMENT,
           "invalid stream_tracing argument: expected to be between %d and %d",
-          IREE_HAL_TRACING_VERBOSITY_OFF, IREE_HAL_TRACING_VERBOSITY_MAX);
+          IREE_HAL_STREAM_TRACING_VERBOSITY_OFF,
+          IREE_HAL_STREAM_TRACING_VERBOSITY_MAX);
     }
 
     iree_hal_cuda_tracing_device_interface_t* tracing_device_interface = NULL;
@@ -876,18 +877,6 @@
   }
 }
 
-static iree_status_t iree_hal_cuda_device_create_descriptor_set_layout(
-    iree_hal_device_t* base_device,
-    iree_hal_descriptor_set_layout_flags_t flags,
-    iree_host_size_t binding_count,
-    const iree_hal_descriptor_set_layout_binding_t* bindings,
-    iree_hal_descriptor_set_layout_t** out_descriptor_set_layout) {
-  iree_hal_cuda_device_t* device = iree_hal_cuda_device_cast(base_device);
-  return iree_hal_cuda_descriptor_set_layout_create(
-      flags, binding_count, bindings, device->host_allocator,
-      out_descriptor_set_layout);
-}
-
 static iree_status_t iree_hal_cuda_device_create_event(
     iree_hal_device_t* base_device, iree_hal_queue_affinity_t queue_affinity,
     iree_hal_event_flags_t flags, iree_hal_event_t** out_event) {
@@ -919,17 +908,6 @@
       iree_hal_device_host_allocator(base_device), out_file);
 }
 
-static iree_status_t iree_hal_cuda_device_create_pipeline_layout(
-    iree_hal_device_t* base_device, iree_host_size_t push_constants,
-    iree_host_size_t set_layout_count,
-    iree_hal_descriptor_set_layout_t* const* set_layouts,
-    iree_hal_pipeline_layout_t** out_pipeline_layout) {
-  iree_hal_cuda_device_t* device = iree_hal_cuda_device_cast(base_device);
-  return iree_hal_cuda_pipeline_layout_create(
-      set_layout_count, set_layouts, push_constants, device->host_allocator,
-      out_pipeline_layout);
-}
-
 static iree_status_t iree_hal_cuda_device_create_semaphore(
     iree_hal_device_t* base_device, uint64_t initial_value,
     iree_hal_semaphore_flags_t flags, iree_hal_semaphore_t** out_semaphore) {
@@ -1143,12 +1121,9 @@
     .query_i64 = iree_hal_cuda_device_query_i64,
     .create_channel = iree_hal_cuda_device_create_channel,
     .create_command_buffer = iree_hal_cuda_device_create_command_buffer,
-    .create_descriptor_set_layout =
-        iree_hal_cuda_device_create_descriptor_set_layout,
     .create_event = iree_hal_cuda_device_create_event,
     .create_executable_cache = iree_hal_cuda_device_create_executable_cache,
     .import_file = iree_hal_cuda_device_import_file,
-    .create_pipeline_layout = iree_hal_cuda_device_create_pipeline_layout,
     .create_semaphore = iree_hal_cuda_device_create_semaphore,
     .query_semaphore_compatibility =
         iree_hal_cuda_device_query_semaphore_compatibility,
diff --git a/runtime/src/iree/hal/drivers/cuda/graph_command_buffer.c b/runtime/src/iree/hal/drivers/cuda/graph_command_buffer.c
index 458faec..3449f82 100644
--- a/runtime/src/iree/hal/drivers/cuda/graph_command_buffer.c
+++ b/runtime/src/iree/hal/drivers/cuda/graph_command_buffer.c
@@ -14,7 +14,6 @@
 #include "iree/hal/drivers/cuda/cuda_dynamic_symbols.h"
 #include "iree/hal/drivers/cuda/cuda_status_util.h"
 #include "iree/hal/drivers/cuda/native_executable.h"
-#include "iree/hal/drivers/cuda/pipeline_layout.h"
 #include "iree/hal/utils/collective_batch.h"
 #include "iree/hal/utils/resource_set.h"
 #include "iree/hal/utils/stream_tracing.h"
@@ -58,12 +57,6 @@
 
   // Iteratively constructed batch of collective operations.
   iree_hal_collective_batch_t collective_batch;
-
-  // TODO(#18189): drop state used by legacy bindings mechanism.
-  int32_t push_constants[IREE_HAL_CUDA_MAX_PUSH_CONSTANT_COUNT];
-  struct {
-    CUdeviceptr bindings[IREE_HAL_CUDA_MAX_DESCRIPTOR_SET_BINDING_COUNT];
-  } descriptor_sets[IREE_HAL_CUDA_MAX_DESCRIPTOR_SET_COUNT];
 } iree_hal_cuda_graph_command_buffer_t;
 
 static const iree_hal_command_buffer_vtable_t
@@ -346,7 +339,7 @@
       cuGraphCreate(&command_buffer->cu_graph, /*flags=*/0), "cuGraphCreate");
 
   IREE_CUDA_GRAPH_COMMAND_BUFFER_TRACE_ZONE_BEGIN(
-      command_buffer, IREE_HAL_TRACING_VERBOSITY_COARSE);
+      command_buffer, IREE_HAL_STREAM_TRACING_VERBOSITY_COARSE);
 
   return iree_ok_status();
 }
@@ -361,7 +354,7 @@
       iree_hal_cuda_graph_command_buffer_flush_collectives(command_buffer));
 
   IREE_CUDA_GRAPH_COMMAND_BUFFER_TRACE_ZONE_END(
-      command_buffer, IREE_HAL_TRACING_VERBOSITY_COARSE);
+      command_buffer, IREE_HAL_STREAM_TRACING_VERBOSITY_COARSE);
 
   // Reset state used during recording.
   command_buffer->cu_barrier_node = NULL;
@@ -396,7 +389,7 @@
 
   (void)command_buffer;
   IREE_CUDA_GRAPH_COMMAND_BUFFER_TRACE_ZONE_BEGIN_EXTERNAL(
-      command_buffer, IREE_HAL_TRACING_VERBOSITY_COARSE,
+      command_buffer, IREE_HAL_STREAM_TRACING_VERBOSITY_COARSE,
       location ? location->file.data : NULL, location ? location->file.size : 0,
       location ? location->line : 0,
       /*func_name=*/NULL, 0, label.data, label.size);
@@ -408,7 +401,7 @@
       iree_hal_cuda_graph_command_buffer_cast(base_command_buffer);
   (void)command_buffer;
   IREE_CUDA_GRAPH_COMMAND_BUFFER_TRACE_ZONE_END(
-      command_buffer, IREE_HAL_TRACING_VERBOSITY_COARSE);
+      command_buffer, IREE_HAL_STREAM_TRACING_VERBOSITY_COARSE);
 }
 
 static iree_status_t
@@ -522,7 +515,7 @@
       iree_hal_cuda_graph_command_buffer_cast(base_command_buffer);
   IREE_TRACE_ZONE_BEGIN(z0);
   IREE_CUDA_GRAPH_COMMAND_BUFFER_TRACE_ZONE_BEGIN(
-      command_buffer, IREE_HAL_TRACING_VERBOSITY_FINE);
+      command_buffer, IREE_HAL_STREAM_TRACING_VERBOSITY_FINE);
 
   IREE_RETURN_AND_END_ZONE_IF_ERROR(
       z0, iree_hal_cuda_graph_command_buffer_flush_collectives(command_buffer));
@@ -562,7 +555,7 @@
       "cuGraphAddMemsetNode");
 
   IREE_CUDA_GRAPH_COMMAND_BUFFER_TRACE_ZONE_END(
-      command_buffer, IREE_HAL_TRACING_VERBOSITY_FINE);
+      command_buffer, IREE_HAL_STREAM_TRACING_VERBOSITY_FINE);
   IREE_TRACE_ZONE_END(z0);
   return iree_ok_status();
 }
@@ -574,7 +567,7 @@
       iree_hal_cuda_graph_command_buffer_cast(base_command_buffer);
   IREE_TRACE_ZONE_BEGIN(z0);
   IREE_CUDA_GRAPH_COMMAND_BUFFER_TRACE_ZONE_BEGIN(
-      command_buffer, IREE_HAL_TRACING_VERBOSITY_FINE);
+      command_buffer, IREE_HAL_STREAM_TRACING_VERBOSITY_FINE);
 
   IREE_RETURN_AND_END_ZONE_IF_ERROR(
       z0, iree_hal_cuda_graph_command_buffer_flush_collectives(command_buffer));
@@ -626,7 +619,7 @@
       "cuGraphAddMemcpyNode");
 
   IREE_CUDA_GRAPH_COMMAND_BUFFER_TRACE_ZONE_END(
-      command_buffer, IREE_HAL_TRACING_VERBOSITY_FINE);
+      command_buffer, IREE_HAL_STREAM_TRACING_VERBOSITY_FINE);
   IREE_TRACE_ZONE_END(z0);
   return iree_ok_status();
 }
@@ -638,7 +631,7 @@
       iree_hal_cuda_graph_command_buffer_cast(base_command_buffer);
   IREE_TRACE_ZONE_BEGIN(z0);
   IREE_CUDA_GRAPH_COMMAND_BUFFER_TRACE_ZONE_BEGIN(
-      command_buffer, IREE_HAL_TRACING_VERBOSITY_FINE);
+      command_buffer, IREE_HAL_STREAM_TRACING_VERBOSITY_FINE);
 
   IREE_RETURN_AND_END_ZONE_IF_ERROR(
       z0, iree_hal_cuda_graph_command_buffer_flush_collectives(command_buffer));
@@ -686,7 +679,7 @@
       "cuGraphAddMemcpyNode");
 
   IREE_CUDA_GRAPH_COMMAND_BUFFER_TRACE_ZONE_END(
-      command_buffer, IREE_HAL_TRACING_VERBOSITY_FINE);
+      command_buffer, IREE_HAL_STREAM_TRACING_VERBOSITY_FINE);
   IREE_TRACE_ZONE_END(z0);
   return iree_ok_status();
 }
@@ -712,197 +705,9 @@
                                           recv_binding, element_count);
 }
 
-static iree_status_t iree_hal_cuda_graph_command_buffer_push_constants(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_pipeline_layout_t* pipeline_layout, iree_host_size_t offset,
-    const void* values, iree_host_size_t values_length) {
-  iree_hal_cuda_graph_command_buffer_t* command_buffer =
-      iree_hal_cuda_graph_command_buffer_cast(base_command_buffer);
-  iree_host_size_t constant_base_index = offset / sizeof(int32_t);
-  for (iree_host_size_t i = 0; i < values_length / sizeof(int32_t); i++) {
-    command_buffer->push_constants[i + constant_base_index] =
-        ((uint32_t*)values)[i];
-  }
-  return iree_ok_status();
-}
-
-static iree_status_t iree_hal_cuda_graph_command_buffer_push_descriptor_set(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_pipeline_layout_t* pipeline_layout, uint32_t set,
-    iree_host_size_t binding_count, const iree_hal_buffer_ref_t* bindings) {
-  if (binding_count > IREE_HAL_CUDA_MAX_DESCRIPTOR_SET_BINDING_COUNT) {
-    return iree_make_status(
-        IREE_STATUS_RESOURCE_EXHAUSTED,
-        "exceeded available binding slots for push "
-        "descriptor set #%" PRIu32 "; requested %" PRIhsz " vs. maximal %d",
-        set, binding_count, IREE_HAL_CUDA_MAX_DESCRIPTOR_SET_BINDING_COUNT);
-  }
-
-  iree_hal_cuda_graph_command_buffer_t* command_buffer =
-      iree_hal_cuda_graph_command_buffer_cast(base_command_buffer);
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  CUdeviceptr* current_bindings = command_buffer->descriptor_sets[set].bindings;
-  for (iree_host_size_t i = 0; i < binding_count; i++) {
-    const iree_hal_buffer_ref_t* binding = &bindings[i];
-    CUdeviceptr device_ptr = 0;
-    if (binding->buffer) {
-      IREE_RETURN_AND_END_ZONE_IF_ERROR(
-          z0, iree_hal_resource_set_insert(command_buffer->resource_set, 1,
-                                           &binding->buffer));
-
-      CUdeviceptr device_buffer = iree_hal_cuda_buffer_device_pointer(
-          iree_hal_buffer_allocated_buffer(binding->buffer));
-      iree_device_size_t offset = iree_hal_buffer_byte_offset(binding->buffer);
-      device_ptr = device_buffer + offset + binding->offset;
-    }
-    current_bindings[binding->ordinal] = device_ptr;
-  }
-
-  IREE_TRACE_ZONE_END(z0);
-  return iree_ok_status();
-}
-
 static iree_status_t iree_hal_cuda_graph_command_buffer_dispatch(
     iree_hal_command_buffer_t* base_command_buffer,
     iree_hal_executable_t* executable, int32_t entry_point,
-    uint32_t workgroup_x, uint32_t workgroup_y, uint32_t workgroup_z,
-    iree_hal_dispatch_flags_t flags) {
-  iree_hal_cuda_graph_command_buffer_t* command_buffer =
-      iree_hal_cuda_graph_command_buffer_cast(base_command_buffer);
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_hal_cuda_graph_command_buffer_flush_collectives(command_buffer));
-
-  // Lookup kernel parameters used for side-channeling additional launch
-  // information from the compiler.
-  iree_hal_cuda_kernel_info_t kernel_info;
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_hal_cuda_native_executable_entry_point_kernel_info(
-              executable, entry_point, &kernel_info));
-
-  IREE_CUDA_GRAPH_COMMAND_BUFFER_TRACE_ZONE_BEGIN_EXTERNAL(
-      command_buffer, IREE_HAL_TRACING_VERBOSITY_FINE,
-      kernel_info.source_filename.data, kernel_info.source_filename.size,
-      kernel_info.source_line, kernel_info.function_name.data,
-      kernel_info.function_name.size,
-      /*name=*/NULL, 0);
-
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_hal_resource_set_insert(command_buffer->resource_set, 1,
-                                       &executable));
-
-  // The total number of descriptors across all descriptor sets.
-  iree_host_size_t descriptor_count =
-      iree_hal_cuda_pipeline_layout_total_binding_count(kernel_info.layout);
-  // The total number of push constants.
-  iree_host_size_t push_constant_count =
-      iree_hal_cuda_pipeline_layout_push_constant_count(kernel_info.layout);
-  // We append push constants to the end of descriptors to form a linear chain
-  // of kernel arguments.
-  iree_host_size_t kernel_params_count = descriptor_count + push_constant_count;
-  iree_host_size_t kernel_params_length = kernel_params_count * sizeof(void*);
-
-  // Per CUDA API requirements, we need two levels of indirection for passing
-  // kernel arguments in.
-  //   "If the kernel has N parameters, then kernelParams needs to be an array
-  //   of N pointers. Each pointer, from kernelParams[0] to kernelParams[N-1],
-  //   points to the region of memory from which the actual parameter will be
-  //   copied."
-  //
-  // (From the cuGraphAddKernelNode API doc in
-  // https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__GRAPH.html#group__CUDA__GRAPH_1g50d871e3bd06c1b835e52f2966ef366b)
-  //
-  // It means each kernel_params[i] is itself a pointer to the corresponding
-  // element at the *second* inline allocation at the end of the current
-  // segment.
-  iree_host_size_t total_size = kernel_params_length * 2;
-  uint8_t* storage_base = NULL;
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_arena_allocate(&command_buffer->arena, total_size,
-                              (void**)&storage_base));
-  void** params_ptr = (void**)storage_base;
-
-  // Set up kernel arguments to point to the payload slots.
-  CUdeviceptr* payload_ptr =
-      (CUdeviceptr*)((uint8_t*)params_ptr + kernel_params_length);
-  for (size_t i = 0; i < kernel_params_count; i++) {
-    params_ptr[i] = &payload_ptr[i];
-  }
-
-  // Copy descriptors from all sets to the end of the current segment for later
-  // access.
-  iree_host_size_t set_count =
-      iree_hal_cuda_pipeline_layout_descriptor_set_count(kernel_info.layout);
-  for (iree_host_size_t i = 0; i < set_count; ++i) {
-    // TODO: cache this information in the kernel info to avoid recomputation.
-    iree_host_size_t binding_count =
-        iree_hal_cuda_descriptor_set_layout_binding_count(
-            iree_hal_cuda_pipeline_layout_descriptor_set_layout(
-                kernel_info.layout, i));
-    iree_host_size_t index =
-        iree_hal_cuda_pipeline_layout_base_binding_index(kernel_info.layout, i);
-    memcpy(payload_ptr + index, command_buffer->descriptor_sets[i].bindings,
-           binding_count * sizeof(CUdeviceptr));
-  }
-
-  // Append the push constants to the kernel arguments.
-  iree_host_size_t base_index =
-      iree_hal_cuda_pipeline_layout_push_constant_index(kernel_info.layout);
-  // As commented in the above, what each kernel parameter points to is a
-  // CUdeviceptr, which as the size of a pointer on the target machine. we are
-  // just storing a 32-bit value for the push constant here instead. So we must
-  // process one element each type, for 64-bit machines.
-  for (iree_host_size_t i = 0; i < push_constant_count; i++) {
-    *((uint32_t*)params_ptr[base_index + i]) =
-        command_buffer->push_constants[i];
-  }
-
-  CUDA_KERNEL_NODE_PARAMS params = {
-      .func = kernel_info.function,
-      .blockDimX = kernel_info.block_size[0],
-      .blockDimY = kernel_info.block_size[1],
-      .blockDimZ = kernel_info.block_size[2],
-      .gridDimX = workgroup_x,
-      .gridDimY = workgroup_y,
-      .gridDimZ = workgroup_z,
-      .kernelParams = params_ptr,
-      .sharedMemBytes = kernel_info.shared_memory_size,
-  };
-
-  if (command_buffer->graph_node_count >=
-      IREE_HAL_CUDA_MAX_CONCURRENT_GRAPH_NODE_COUNT) {
-    return iree_make_status(IREE_STATUS_OUT_OF_RANGE,
-                            "exceeded max concurrent node limit");
-  }
-
-  size_t dependency_count = command_buffer->cu_barrier_node ? 1 : 0;
-  IREE_CUDA_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, command_buffer->symbols,
-      cuGraphAddKernelNode(
-          &command_buffer->cu_graph_nodes[command_buffer->graph_node_count++],
-          command_buffer->cu_graph, &command_buffer->cu_barrier_node,
-          dependency_count, &params),
-      "cuGraphAddKernelNode");
-
-  IREE_CUDA_GRAPH_COMMAND_BUFFER_TRACE_ZONE_END(
-      command_buffer, IREE_HAL_TRACING_VERBOSITY_FINE);
-  IREE_TRACE_ZONE_END(z0);
-  return iree_ok_status();
-}
-
-static iree_status_t iree_hal_cuda_graph_command_buffer_dispatch_indirect(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_executable_t* executable, int32_t entry_point,
-    iree_hal_buffer_ref_t workgroups_ref, iree_hal_dispatch_flags_t flags) {
-  return iree_make_status(IREE_STATUS_UNIMPLEMENTED,
-                          "indirect dispatch not yet implemented");
-}
-
-static iree_status_t iree_hal_cuda_graph_command_buffer_dispatch2(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_executable_t* executable, int32_t entry_point,
     const uint32_t workgroup_count[3], iree_const_byte_span_t constants,
     iree_hal_buffer_ref_list_t bindings, iree_hal_dispatch_flags_t flags) {
   iree_hal_cuda_graph_command_buffer_t* command_buffer =
@@ -914,16 +719,18 @@
 
   // Lookup kernel parameters used for side-channeling additional launch
   // information from the compiler.
-  iree_hal_cuda_kernel_info_t kernel_info;
+  const iree_hal_cuda_kernel_params_t* kernel_params = NULL;
   IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_hal_cuda_native_executable_entry_point_kernel_info(
-              executable, entry_point, &kernel_info));
+      z0, iree_hal_cuda_native_executable_lookup_kernel_params(
+              executable, entry_point, &kernel_params));
 
   IREE_CUDA_GRAPH_COMMAND_BUFFER_TRACE_ZONE_BEGIN_EXTERNAL(
-      command_buffer, IREE_HAL_TRACING_VERBOSITY_FINE,
-      kernel_info.source_filename.data, kernel_info.source_filename.size,
-      kernel_info.source_line, kernel_info.function_name.data,
-      kernel_info.function_name.size, /*name=*/NULL, 0);
+      command_buffer, IREE_HAL_STREAM_TRACING_VERBOSITY_FINE,
+      kernel_params->debug_info.source_filename.data,
+      kernel_params->debug_info.source_filename.size,
+      kernel_params->debug_info.source_line,
+      kernel_params->debug_info.function_name.data,
+      kernel_params->debug_info.function_name.size, /*name=*/NULL, 0);
 
   IREE_RETURN_AND_END_ZONE_IF_ERROR(
       z0, iree_hal_resource_set_insert(command_buffer->resource_set, 1,
@@ -931,7 +738,7 @@
   // We append push constants to the end of descriptors to form a linear chain
   // of kernel arguments.
   iree_host_size_t kernel_params_count =
-      kernel_info.binding_count + kernel_info.constant_count;
+      kernel_params->binding_count + kernel_params->constant_count;
   iree_host_size_t kernel_params_length = kernel_params_count * sizeof(void*);
 
   // TODO: use packed parameters instead of the indirection mechanism - this
@@ -980,21 +787,21 @@
   // CUdeviceptr, which as the size of a pointer on the target machine. we are
   // just storing a 32-bit value for the push constant here instead. So we must
   // process one element each type, for 64-bit machines.
-  for (iree_host_size_t i = 0; i < kernel_info.constant_count; i++) {
-    *((uint32_t*)params_ptr[kernel_info.binding_count + i]) =
+  for (iree_host_size_t i = 0; i < kernel_params->constant_count; i++) {
+    *((uint32_t*)params_ptr[kernel_params->binding_count + i]) =
         ((const uint32_t*)constants.data)[i];
   }
 
   CUDA_KERNEL_NODE_PARAMS params = {
-      .func = kernel_info.function,
-      .blockDimX = kernel_info.block_size[0],
-      .blockDimY = kernel_info.block_size[1],
-      .blockDimZ = kernel_info.block_size[2],
+      .func = kernel_params->function,
+      .blockDimX = kernel_params->block_dims[0],
+      .blockDimY = kernel_params->block_dims[1],
+      .blockDimZ = kernel_params->block_dims[2],
       .gridDimX = workgroup_count[0],
       .gridDimY = workgroup_count[1],
       .gridDimZ = workgroup_count[2],
       .kernelParams = params_ptr,
-      .sharedMemBytes = kernel_info.shared_memory_size,
+      .sharedMemBytes = kernel_params->block_shared_memory_size,
   };
 
   if (command_buffer->graph_node_count >=
@@ -1013,12 +820,12 @@
       "cuGraphAddKernelNode");
 
   IREE_CUDA_GRAPH_COMMAND_BUFFER_TRACE_ZONE_END(
-      command_buffer, IREE_HAL_TRACING_VERBOSITY_FINE);
+      command_buffer, IREE_HAL_STREAM_TRACING_VERBOSITY_FINE);
   IREE_TRACE_ZONE_END(z0);
   return iree_ok_status();
 }
 
-static iree_status_t iree_hal_cuda_graph_command_buffer_dispatch2_indirect(
+static iree_status_t iree_hal_cuda_graph_command_buffer_dispatch_indirect(
     iree_hal_command_buffer_t* base_command_buffer,
     iree_hal_executable_t* executable, int32_t entry_point,
     iree_hal_buffer_ref_t workgroups_ref, iree_const_byte_span_t constants,
@@ -1045,13 +852,7 @@
         .update_buffer = iree_hal_cuda_graph_command_buffer_update_buffer,
         .copy_buffer = iree_hal_cuda_graph_command_buffer_copy_buffer,
         .collective = iree_hal_cuda_graph_command_buffer_collective,
-        .push_constants = iree_hal_cuda_graph_command_buffer_push_constants,
-        .push_descriptor_set =
-            iree_hal_cuda_graph_command_buffer_push_descriptor_set,
         .dispatch = iree_hal_cuda_graph_command_buffer_dispatch,
         .dispatch_indirect =
             iree_hal_cuda_graph_command_buffer_dispatch_indirect,
-        .dispatch2 = iree_hal_cuda_graph_command_buffer_dispatch2,
-        .dispatch2_indirect =
-            iree_hal_cuda_graph_command_buffer_dispatch2_indirect,
 };
diff --git a/runtime/src/iree/hal/drivers/cuda/native_executable.c b/runtime/src/iree/hal/drivers/cuda/native_executable.c
index 06a7ffc..437b644 100644
--- a/runtime/src/iree/hal/drivers/cuda/native_executable.c
+++ b/runtime/src/iree/hal/drivers/cuda/native_executable.c
@@ -11,31 +11,31 @@
 #include "iree/base/api.h"
 #include "iree/hal/drivers/cuda/cuda_dynamic_symbols.h"
 #include "iree/hal/drivers/cuda/cuda_status_util.h"
-#include "iree/hal/drivers/cuda/pipeline_layout.h"
+#include "iree/hal/utils/executable_debug_info.h"
 
 // flatcc schemas:
 #include "iree/base/internal/flatcc/parsing.h"
 #include "iree/schemas/cuda_executable_def_reader.h"
 #include "iree/schemas/cuda_executable_def_verifier.h"
+#include "iree/schemas/executable_debug_info_reader.h"
+#include "iree/schemas/executable_debug_info_verifier.h"
 
 typedef struct iree_hal_cuda_native_executable_t {
   // Abstract resource used for injecting reference counting and vtable;
   // must be at offset 0.
   iree_hal_resource_t resource;
-
   iree_allocator_t host_allocator;
 
   const iree_hal_cuda_dynamic_symbols_t* symbols;
 
-  // The loaded CUDA module.
-  CUmodule cu_module;
+  // Loaded CUDA modules.
+  iree_host_size_t module_count;
+  CUmodule* modules;
 
-  iree_host_size_t entry_point_count;
-  // The list of entry point data pointers, pointing to trailing inline
-  // allocation after the end of this struct.
-  iree_hal_cuda_kernel_info_t entry_points[];
+  // Exported kernels referencing the loaded modules.
+  iree_host_size_t export_count;
+  iree_hal_cuda_kernel_params_t exports[];
 } iree_hal_cuda_native_executable_t;
-// + Additional inline allocation for holding entry point information.
 
 static const iree_hal_executable_vtable_t
     iree_hal_cuda_native_executable_vtable;
@@ -46,6 +46,41 @@
   return (iree_hal_cuda_native_executable_t*)base_value;
 }
 
+typedef struct iree_hal_cuda_limits_t {
+  uint32_t max_block_dims[3];
+  uint32_t max_block_shared_memory_size;
+} iree_hal_cuda_limits_t;
+static iree_status_t iree_hal_cuda_query_limits(
+    const iree_hal_cuda_dynamic_symbols_t* symbols, CUdevice device,
+    iree_hal_cuda_limits_t* out_limits) {
+  memset(out_limits, 0, sizeof(*out_limits));
+
+  IREE_CUDA_RETURN_IF_ERROR(
+      symbols,
+      cuDeviceGetAttribute(&out_limits->max_block_dims[0],
+                           CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_X, device),
+      "cuDeviceGetAttribute");
+  IREE_CUDA_RETURN_IF_ERROR(
+      symbols,
+      cuDeviceGetAttribute(&out_limits->max_block_dims[1],
+                           CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_Y, device),
+      "cuDeviceGetAttribute");
+  IREE_CUDA_RETURN_IF_ERROR(
+      symbols,
+      cuDeviceGetAttribute(&out_limits->max_block_dims[2],
+                           CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_Z, device),
+      "cuDeviceGetAttribute");
+
+  IREE_CUDA_RETURN_IF_ERROR(
+      symbols,
+      cuDeviceGetAttribute(
+          &out_limits->max_block_shared_memory_size,
+          CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_BLOCK_OPTIN, device),
+      "cuDeviceGetAttribute");
+
+  return iree_ok_status();
+}
+
 // Verifies the structure of the flatbuffer so that we can avoid doing so during
 // runtime.
 //
@@ -53,7 +88,8 @@
 // functions with internal linkage), however we shouldn't need to bounds check
 // anything within the flatbuffer after this succeeds.
 static iree_status_t iree_hal_cuda_native_executable_flatbuffer_verify(
-    iree_const_byte_span_t flatbuffer_data) {
+    iree_const_byte_span_t flatbuffer_data,
+    const iree_hal_cuda_limits_t* limits) {
   if (!flatbuffer_data.data) {
     return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
                             "flatbuffer data is not present");
@@ -73,37 +109,99 @@
   iree_hal_cuda_ExecutableDef_table_t executable_def =
       iree_hal_cuda_ExecutableDef_as_root(flatbuffer_data.data);
 
-  flatbuffers_string_vec_t entry_points_vec =
-      iree_hal_cuda_ExecutableDef_entry_points_get(executable_def);
-  size_t entry_point_count = flatbuffers_string_vec_len(entry_points_vec);
-  for (size_t i = 0; i < entry_point_count; ++i) {
-    if (flatbuffers_string_len(
-            flatbuffers_string_vec_at(entry_points_vec, i)) == 0) {
+  iree_hal_cuda_ModuleDef_vec_t modules_vec =
+      iree_hal_cuda_ExecutableDef_modules_get(executable_def);
+  iree_host_size_t module_count = iree_hal_cuda_ModuleDef_vec_len(modules_vec);
+  for (iree_host_size_t i = 0; i < module_count; ++i) {
+    iree_hal_cuda_ModuleDef_table_t module_def =
+        iree_hal_cuda_ModuleDef_vec_at(modules_vec, i);
+    if (!module_def) {
       return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
-                              "executable entry point %zu has no name", i);
+                              "modules[%" PRIhsz "] is NULL", i);
+    }
+    if (flatbuffers_string_len(
+            iree_hal_cuda_ModuleDef_ptx_image_get(module_def)) == 0) {
+      return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
+                              "modules[%" PRIhsz "] contents are empty", i);
     }
   }
 
-  iree_hal_cuda_BlockSizeDef_vec_t block_sizes_vec =
-      iree_hal_cuda_ExecutableDef_block_sizes_get(executable_def);
-  size_t block_size_count = iree_hal_cuda_BlockSizeDef_vec_len(block_sizes_vec);
-  if (block_size_count == 0) {
-    return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
-                            "no block sizes present");
-  }
+  iree_hal_cuda_ExportDef_vec_t exports_vec =
+      iree_hal_cuda_ExecutableDef_exports_get(executable_def);
+  for (iree_host_size_t i = 0; i < iree_hal_cuda_ExportDef_vec_len(exports_vec);
+       ++i) {
+    iree_hal_cuda_ExportDef_table_t export_def =
+        iree_hal_cuda_ExportDef_vec_at(exports_vec, i);
+    if (!export_def) continue;
 
-  if (entry_point_count != block_size_count) {
-    return iree_make_status(
-        IREE_STATUS_INVALID_ARGUMENT,
-        "entry points (%zu) and block sizes (%zu) count mismatch",
-        entry_point_count, block_size_count);
-  }
+    uint32_t module_ordinal =
+        iree_hal_cuda_ExportDef_module_ordinal_get(export_def);
+    if (module_ordinal >= module_count) {
+      return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
+                              "exports[%" PRIhsz
+                              "] module_ordinal %u is out of bounds %" PRIhsz,
+                              i, module_ordinal, module_count);
+    }
 
-  flatbuffers_string_t ptx_image =
-      iree_hal_cuda_ExecutableDef_ptx_image_get(executable_def);
-  if (flatbuffers_string_len(ptx_image) == 0) {
-    return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
-                            "no PTX image present");
+    if (flatbuffers_string_len(
+            iree_hal_cuda_ExportDef_kernel_name_get(export_def)) == 0) {
+      return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
+                              "exports[%" PRIhsz "] name is empty", i);
+    }
+
+    if (iree_hal_cuda_ExportDef_block_dims_is_present(export_def)) {
+      const iree_hal_cuda_BlockDims_t* block_dims =
+          iree_hal_cuda_ExportDef_block_dims_get(export_def);
+      if (block_dims->x > limits->max_block_dims[0] ||
+          block_dims->y > limits->max_block_dims[1] ||
+          block_dims->z > limits->max_block_dims[2]) {
+        return iree_make_status(
+            IREE_STATUS_INVALID_ARGUMENT,
+            "exports[%" PRIhsz
+            "] block dims %ux%ux%u exceeds device maximum %ux%ux%u",
+            i, block_dims->x, block_dims->y, block_dims->z,
+            limits->max_block_dims[0], limits->max_block_dims[1],
+            limits->max_block_dims[2]);
+      }
+    } else {
+      return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
+                              "exports[%" PRIhsz "] blocks dims are missing",
+                              i);
+    }
+
+    uint32_t block_shared_memory_size =
+        iree_hal_cuda_ExportDef_block_shared_memory_size_get(export_def);
+    if (block_shared_memory_size > limits->max_block_shared_memory_size) {
+      return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
+                              "exports[%" PRIhsz
+                              "] requires %uB of shared memory and "
+                              "exceeds the device maximum of %uB per block",
+                              i, block_shared_memory_size,
+                              limits->max_block_shared_memory_size);
+    }
+
+    uint32_t constant_count =
+        iree_hal_cuda_ExportDef_constant_count_get(export_def);
+    if (constant_count > IREE_HAL_CUDA_MAX_DISPATCH_CONSTANT_COUNT) {
+      return iree_make_status(
+          IREE_STATUS_INVALID_ARGUMENT,
+          "exports[%" PRIhsz "] constant_count %u exceeds maximum of %u", i,
+          constant_count, IREE_HAL_CUDA_MAX_DISPATCH_CONSTANT_COUNT);
+    }
+
+    iree_hal_cuda_BindingBits_vec_t binding_flags_vec =
+        iree_hal_cuda_ExportDef_binding_flags_get(export_def);
+    if (iree_hal_cuda_BindingBits_vec_len(binding_flags_vec) >
+        IREE_HAL_CUDA_MAX_DISPATCH_BINDING_COUNT) {
+      return iree_make_status(
+          IREE_STATUS_INVALID_ARGUMENT,
+          "exports[%" PRIhsz "] binding_flags count %zu exceeds maximum of %u",
+          i, iree_hal_cuda_BindingBits_vec_len(binding_flags_vec),
+          IREE_HAL_CUDA_MAX_DISPATCH_BINDING_COUNT);
+    }
+
+    IREE_RETURN_IF_ERROR(iree_hal_debug_verify_export_def(
+        iree_hal_cuda_ExportDef_debug_info_get(export_def)));
   }
 
   return iree_ok_status();
@@ -118,162 +216,173 @@
   IREE_TRACE_ZONE_BEGIN(z0);
 
   *out_executable = NULL;
-  iree_hal_cuda_native_executable_t* executable = NULL;
+
+  // TODO: move to the executable cache to avoid repeated queries.
+  iree_hal_cuda_limits_t limits = {0};
+  IREE_RETURN_AND_END_ZONE_IF_ERROR(
+      z0, iree_hal_cuda_query_limits(symbols, device, &limits));
 
   IREE_RETURN_AND_END_ZONE_IF_ERROR(
       z0, iree_hal_cuda_native_executable_flatbuffer_verify(
-              executable_params->executable_data));
+              executable_params->executable_data, &limits));
 
   iree_hal_cuda_ExecutableDef_table_t executable_def =
       iree_hal_cuda_ExecutableDef_as_root(
           executable_params->executable_data.data);
 
-  flatbuffers_string_t ptx_image =
-      iree_hal_cuda_ExecutableDef_ptx_image_get(executable_def);
-  flatbuffers_uint32_vec_t shared_memory_sizes =
-      iree_hal_cuda_ExecutableDef_shared_memory_size_get(executable_def);
-  flatbuffers_string_vec_t entry_points_vec =
-      iree_hal_cuda_ExecutableDef_entry_points_get(executable_def);
-  iree_hal_cuda_BlockSizeDef_vec_t block_sizes_vec =
-      iree_hal_cuda_ExecutableDef_block_sizes_get(executable_def);
-  iree_host_size_t entry_point_count =
-      flatbuffers_string_vec_len(entry_points_vec);
+  iree_hal_cuda_ModuleDef_vec_t modules_vec =
+      iree_hal_cuda_ExecutableDef_modules_get(executable_def);
+  iree_host_size_t module_count = iree_hal_cuda_ModuleDef_vec_len(modules_vec);
+  iree_hal_cuda_ExportDef_vec_t exports_vec =
+      iree_hal_cuda_ExecutableDef_exports_get(executable_def);
+  iree_host_size_t export_count = iree_hal_cuda_ExportDef_vec_len(exports_vec);
 
   // Calculate the total number of characters across all entry point names. This
   // is only required when tracing so that we can store copies of the names as
   // the flatbuffer storing the strings may be released while the executable is
   // still live.
-  iree_host_size_t total_entry_point_name_chars = 0;
+  iree_host_size_t total_export_info_length = 0;
   IREE_TRACE({
-    for (iree_host_size_t i = 0; i < entry_point_count; i++) {
-      const char* entry_name = flatbuffers_string_vec_at(entry_points_vec, i);
-      total_entry_point_name_chars += flatbuffers_string_len(entry_name);
+    for (iree_host_size_t i = 0; i < export_count; ++i) {
+      iree_hal_cuda_ExportDef_table_t export_def =
+          iree_hal_cuda_ExportDef_vec_at(exports_vec, i);
+      total_export_info_length += iree_hal_debug_calculate_export_info_size(
+          iree_hal_cuda_ExportDef_debug_info_get(export_def));
     }
   });
 
-  // Allocate storage for the kernel module.
-  iree_host_size_t total_size =
-      sizeof(*executable) +
-      entry_point_count * sizeof(executable->entry_points[0]) +
-      total_entry_point_name_chars;
+  // Allocate storage for the executable and its associated data structures.
+  iree_hal_cuda_native_executable_t* executable = NULL;
+  const iree_host_size_t total_size =
+      sizeof(*executable) + module_count * sizeof(executable->modules[0]) +
+      export_count * sizeof(executable->exports[0]) + total_export_info_length;
   IREE_RETURN_AND_END_ZONE_IF_ERROR(
       z0,
       iree_allocator_malloc(host_allocator, total_size, (void**)&executable));
-  IREE_TRACE(
-      char* string_table_buffer =
-          (char*)((char*)executable + sizeof(*executable) +
-                  entry_point_count * sizeof(executable->entry_points[0])));
 
   iree_hal_resource_initialize(&iree_hal_cuda_native_executable_vtable,
                                &executable->resource);
+  executable->host_allocator = host_allocator;
+  executable->symbols = symbols;
+  executable->module_count = module_count;
+  executable->modules =
+      (CUmodule*)((uint8_t*)executable + sizeof(*executable) +
+                  export_count * sizeof(executable->exports[0]));
+  executable->export_count = export_count;
+  IREE_TRACE(
+      iree_hal_debug_export_info_t* export_infos =
+          (iree_hal_debug_export_info_t*)((uint8_t*)executable->modules +
+                                          module_count *
+                                              sizeof(executable->modules[0])));
 
-  // Load the PTX image - this will fail if the device cannot handle the
-  // contents. We could check this prior to creating
-  CUmodule module = NULL;
+  // Publish any embedded source files to the tracing infrastructure.
+  iree_hal_debug_publish_source_files(
+      iree_hal_cuda_ExecutableDef_source_files_get(executable_def));
 
-  iree_status_t status = IREE_CURESULT_TO_STATUS(
-      symbols, cuModuleLoadDataEx(&module, ptx_image, 0, NULL, NULL),
-      "cuModuleLoadDataEx");
+  // Load each module first so that exports can reference them.
+  iree_status_t status = iree_ok_status();
+  for (iree_host_size_t i = 0; i < module_count; ++i) {
+    iree_hal_cuda_ModuleDef_table_t module_def =
+        iree_hal_cuda_ModuleDef_vec_at(modules_vec, i);
 
-  // Query max optin shared memory per block - we'll use it to compare with
-  // kernel usages.
-  int32_t max_shared_memory = 0;
-  if (iree_status_is_ok(status)) {
+    // WARNING: CUDA doesn't take an expected length here so we can't bound it.
+    // It's likely that users could craft inputs that read beyond the extents of
+    // the embedded binary.
+    flatbuffers_string_t ptx_image =
+        iree_hal_cuda_ModuleDef_ptx_image_get(module_def);
+
+    // TODO: pass cuJitOption values to get log info and other info back.
+    // We pass the error buffer today but could use the info log to diagnose
+    // performance warnings.
+    char error_log[8192] = {0};
+    CUjit_option jit_options[] = {
+        CU_JIT_ERROR_LOG_BUFFER,
+        CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES,
+    };
+    void* jit_option_values[] = {
+        (void*)error_log,
+        (void*)(uint32_t)sizeof(error_log),
+    };
+    CUmodule module = NULL;
     status = IREE_CURESULT_TO_STATUS(
         symbols,
-        cuDeviceGetAttribute(
-            &max_shared_memory,
-            CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_BLOCK_OPTIN, device),
-        "cuDeviceGetAttribute");
+        cuModuleLoadDataEx(&module, ptx_image, IREE_ARRAYSIZE(jit_options),
+                           jit_options, jit_option_values),
+        "cuModuleLoadDataEx");
+    if (!iree_status_is_ok(status)) {
+      status = iree_status_annotate(
+          status,
+          IREE_SV("mismatched target chip? missing/wrong bitcode directory?"));
+      if (strlen(error_log) > 0) {
+        status =
+            iree_status_annotate(status, iree_make_cstring_view(error_log));
+      }
+      break;
+    }
+
+    executable->modules[i] = module;
   }
 
   if (iree_status_is_ok(status)) {
-    executable->host_allocator = host_allocator;
-    executable->symbols = symbols;
-    executable->cu_module = module;
-    executable->entry_point_count = entry_point_count;
-    for (iree_host_size_t i = 0; i < entry_point_count; i++) {
-      // Lookup the function in the module; this should always succeed but we
-      // cannot trust that the input was generated by our compiler.
+    for (iree_host_size_t i = 0; i < export_count; ++i) {
+      iree_hal_cuda_ExportDef_table_t export_def =
+          iree_hal_cuda_ExportDef_vec_at(exports_vec, i);
+
+      // Lookup the function in the module; this should always succeed but
+      // we cannot trust that the input was generated by our compiler.
+      uint32_t module_ordinal =
+          iree_hal_cuda_ExportDef_module_ordinal_get(export_def);
+      CUmodule module = executable->modules[module_ordinal];
+      flatbuffers_string_t kernel_name =
+          iree_hal_cuda_ExportDef_kernel_name_get(export_def);
       CUfunction function = NULL;
-      const char* entry_name = flatbuffers_string_vec_at(entry_points_vec, i);
       status = IREE_CURESULT_TO_STATUS(
-          symbols,
-          cuModuleGetFunction(&function, executable->cu_module, entry_name),
+          symbols, cuModuleGetFunction(&function, module, kernel_name),
           "cuModuleGetFunction");
       if (!iree_status_is_ok(status)) break;
       if (!function) {
         status = iree_make_status(IREE_STATUS_NOT_FOUND,
-                                  "exported module function '%s' not found",
-                                  entry_name);
+                                  "exports[%" PRIhsz
+                                  "] kernel `%s` not found in modules[%u]",
+                                  i, kernel_name, module_ordinal);
         break;
       }
 
-      if (shared_memory_sizes[i] > max_shared_memory) {
-        status = iree_make_status(IREE_STATUS_OUT_OF_RANGE,
-                                  "requested shared memory size of %d bytes "
-                                  "larger than allowed size of %d bytes",
-                                  shared_memory_sizes[i], max_shared_memory);
-      } else {
-        status = IREE_CURESULT_TO_STATUS(
-            symbols,
-            cuFuncSetAttribute(function,
-                               CU_FUNC_ATTRIBUTE_MAX_DYNAMIC_SHARED_SIZE_BYTES,
-                               shared_memory_sizes[i]),
-            "cuFuncSetAttribute");
-      }
+      uint32_t block_shared_memory_size =
+          iree_hal_cuda_ExportDef_block_shared_memory_size_get(export_def);
+      status = IREE_CURESULT_TO_STATUS(
+          symbols,
+          cuFuncSetAttribute(function,
+                             CU_FUNC_ATTRIBUTE_MAX_DYNAMIC_SHARED_SIZE_BYTES,
+                             block_shared_memory_size),
+          "cuFuncSetAttribute");
       if (!iree_status_is_ok(status)) break;
 
-      // TODO(#18189): embed all of this on a single flatbuffer table
-      // per-export.
-      //
       // Package required parameters for kernel launches for each entry point.
-      iree_hal_cuda_kernel_info_t* info = &executable->entry_points[i];
-      info->layout = executable_params->pipeline_layouts[i];
-      iree_hal_pipeline_layout_retain(info->layout);
-      info->function = function;
-      info->constant_count =
-          iree_hal_cuda_pipeline_layout_push_constant_count(info->layout);
-      info->binding_count =
-          iree_hal_cuda_pipeline_layout_total_binding_count(info->layout);
-      info->block_size[0] = block_sizes_vec[i].x;
-      info->block_size[1] = block_sizes_vec[i].y;
-      info->block_size[2] = block_sizes_vec[i].z;
-      info->shared_memory_size = shared_memory_sizes[i];
-
-      if (info->binding_count >
-          IREE_HAL_CUDA_MAX_DESCRIPTOR_SET_BINDING_COUNT) {
-        status = iree_make_status(
-            IREE_STATUS_RESOURCE_EXHAUSTED,
-            "exceeded available binding slots; requested %u of maximum %d",
-            info->binding_count,
-            IREE_HAL_CUDA_MAX_DESCRIPTOR_SET_BINDING_COUNT);
-      }
-      if (!iree_status_is_ok(status)) break;
-
-      // Stash the entry point name in the string table for use when tracing.
-      IREE_TRACE({
-        iree_host_size_t entry_name_length = flatbuffers_string_len(entry_name);
-        memcpy(string_table_buffer, entry_name, entry_name_length);
-        info->function_name =
-            iree_make_string_view(string_table_buffer, entry_name_length);
-        string_table_buffer += entry_name_length;
-      });
+      iree_hal_cuda_kernel_params_t* kernel_info = &executable->exports[i];
+      kernel_info->function = function;
+      const iree_hal_cuda_BlockDims_t* block_dims =
+          iree_hal_cuda_ExportDef_block_dims_get(export_def);
+      kernel_info->block_dims[0] = block_dims->x;
+      kernel_info->block_dims[1] = block_dims->y;
+      kernel_info->block_dims[2] = block_dims->z;
+      kernel_info->block_shared_memory_size =
+          iree_hal_cuda_ExportDef_block_shared_memory_size_get(export_def);
+      kernel_info->constant_count =
+          iree_hal_cuda_ExportDef_constant_count_get(export_def);
+      iree_hal_cuda_BindingBits_vec_t binding_flags_vec =
+          iree_hal_cuda_ExportDef_binding_flags_get(export_def);
+      kernel_info->binding_count =
+          iree_hal_cuda_BindingBits_vec_len(binding_flags_vec);
 
       IREE_TRACE({
-        if (iree_hal_cuda_ExecutableDef_source_locations_is_present(
-                executable_def)) {
-          iree_hal_cuda_FileLineLocDef_vec_t source_locs_vec =
-              iree_hal_cuda_ExecutableDef_source_locations_get(executable_def);
-          iree_hal_cuda_FileLineLocDef_table_t source_loc =
-              iree_hal_cuda_FileLineLocDef_vec_at(source_locs_vec, i);
-          flatbuffers_string_t filename =
-              iree_hal_cuda_FileLineLocDef_filename_get(source_loc);
-          uint32_t line = iree_hal_cuda_FileLineLocDef_line_get(source_loc);
-          info->source_filename =
-              iree_make_string_view(filename, flatbuffers_string_len(filename));
-          info->source_line = line;
-        }
+        iree_hal_debug_copy_export_info(
+            iree_hal_cuda_ExportDef_debug_info_get(export_def),
+            &export_infos[i]);
+        kernel_info->debug_info.function_name = export_infos[i].function_name;
+        kernel_info->debug_info.source_filename =
+            export_infos[i].source_filename;
+        kernel_info->debug_info.source_line = export_infos[i].source_line;
       });
     }
   }
@@ -295,30 +404,31 @@
   iree_allocator_t host_allocator = executable->host_allocator;
   IREE_TRACE_ZONE_BEGIN(z0);
 
-  for (iree_host_size_t i = 0; i < executable->entry_point_count; ++i) {
-    iree_hal_pipeline_layout_release(executable->entry_points[i].layout);
+  for (iree_host_size_t i = 0; i < executable->module_count; ++i) {
+    if (executable->modules[i]) {
+      IREE_CUDA_IGNORE_ERROR(executable->symbols,
+                             cuModuleUnload(executable->modules[i]));
+    }
   }
-  if (executable->cu_module) {
-    IREE_CUDA_IGNORE_ERROR(executable->symbols,
-                           cuModuleUnload(executable->cu_module));
-  }
+
   iree_allocator_free(host_allocator, executable);
 
   IREE_TRACE_ZONE_END(z0);
 }
 
-iree_status_t iree_hal_cuda_native_executable_entry_point_kernel_info(
-    iree_hal_executable_t* base_executable, int32_t entry_point,
-    iree_hal_cuda_kernel_info_t* out_info) {
+iree_status_t iree_hal_cuda_native_executable_lookup_kernel_params(
+    iree_hal_executable_t* base_executable, int32_t ordinal,
+    const iree_hal_cuda_kernel_params_t** out_params) {
   iree_hal_cuda_native_executable_t* executable =
       iree_hal_cuda_native_executable_cast(base_executable);
-  if (entry_point >= executable->entry_point_count) {
-    return iree_make_status(IREE_STATUS_OUT_OF_RANGE,
-                            "entry point ordinal %d out of range; executable "
-                            "only contains %" PRIhsz " entry points",
-                            entry_point, executable->entry_point_count);
+  if (ordinal >= executable->export_count) {
+    return iree_make_status(
+        IREE_STATUS_OUT_OF_RANGE,
+        "export ordinal %d out of range; executable contains %" PRIhsz
+        " exports",
+        ordinal, executable->export_count);
   }
-  memcpy(out_info, &executable->entry_points[entry_point], sizeof(*out_info));
+  *out_params = &executable->exports[ordinal];
   return iree_ok_status();
 }
 
diff --git a/runtime/src/iree/hal/drivers/cuda/native_executable.h b/runtime/src/iree/hal/drivers/cuda/native_executable.h
index 226ceda..74cd458 100644
--- a/runtime/src/iree/hal/drivers/cuda/native_executable.h
+++ b/runtime/src/iree/hal/drivers/cuda/native_executable.h
@@ -19,20 +19,31 @@
 extern "C" {
 #endif  // __cplusplus
 
-typedef struct iree_hal_cuda_kernel_info_t {
-  // TODO(#18189): remove when using simplified bindings.
-  iree_hal_pipeline_layout_t* layout;
+// The max number of per-dispatch bindings allowed in the CUDA HAL
+// implementation.
+#define IREE_HAL_CUDA_MAX_DISPATCH_BINDING_COUNT 16
+
+// The max number of per-dispatch constants supported by the CUDA HAL
+// implementation.
+#define IREE_HAL_CUDA_MAX_DISPATCH_CONSTANT_COUNT 64
+
+typedef struct iree_hal_cuda_kernel_debug_info_t {
+  iree_string_view_t function_name;
+  iree_string_view_t source_filename;
+  uint32_t source_line;
+} iree_hal_cuda_kernel_debug_info_t;
+
+typedef struct iree_hal_cuda_kernel_params_t {
   CUfunction function;
+
   uint32_t constant_count;
   uint32_t binding_count;
-  // TODO(#18189): add bitfield indicating indirect bindings.
-  uint32_t block_size[3];
-  uint32_t shared_memory_size;
 
-  IREE_TRACE(iree_string_view_t function_name;)
-  IREE_TRACE(iree_string_view_t source_filename;)
-  IREE_TRACE(uint32_t source_line;)
-} iree_hal_cuda_kernel_info_t;
+  uint32_t block_dims[3];
+  uint32_t block_shared_memory_size;
+
+  IREE_TRACE(iree_hal_cuda_kernel_debug_info_t debug_info;)
+} iree_hal_cuda_kernel_params_t;
 
 // Creates an IREE executable from a CUDA PTX module. The module may contain
 // several kernels that can be extracted along with the associated block size.
@@ -43,9 +54,9 @@
 
 // Returns the kernel launch information for the given |entry_point| in the
 // |executable|.
-iree_status_t iree_hal_cuda_native_executable_entry_point_kernel_info(
+iree_status_t iree_hal_cuda_native_executable_lookup_kernel_params(
     iree_hal_executable_t* executable, int32_t entry_point,
-    iree_hal_cuda_kernel_info_t* out_info);
+    const iree_hal_cuda_kernel_params_t** out_params);
 
 #ifdef __cplusplus
 }  // extern "C"
diff --git a/runtime/src/iree/hal/drivers/cuda/nccl_channel.c b/runtime/src/iree/hal/drivers/cuda/nccl_channel.c
index 5a001da..5e1e7ed 100644
--- a/runtime/src/iree/hal/drivers/cuda/nccl_channel.c
+++ b/runtime/src/iree/hal/drivers/cuda/nccl_channel.c
@@ -559,9 +559,10 @@
     iree_string_view_t collective_str =
         iree_hal_collective_op_format(&entry->op, &string_temp);
     IREE_HAL_STREAM_TRACE_ZONE_BEGIN_EXTERNAL(
-        tracing_context, tracing_event_list, IREE_HAL_TRACING_VERBOSITY_FINE,
-        __FILE__, strlen(__FILE__), (uint32_t)__LINE__, __FUNCTION__,
-        strlen(__FUNCTION__), collective_str.data, collective_str.size);
+        tracing_context, tracing_event_list,
+        IREE_HAL_STREAM_TRACING_VERBOSITY_FINE, __FILE__, strlen(__FILE__),
+        (uint32_t)__LINE__, __FUNCTION__, strlen(__FUNCTION__),
+        collective_str.data, collective_str.size);
   }
 #endif  // IREE_TRACING_FEATURES & IREE_TRACING_FEATURE_INSTRUMENTATION_DEVICE
 
@@ -579,7 +580,7 @@
   // order doesn't matter so long as we end the right number of zones.
   for (iree_host_size_t i = 0; i < batch->count; ++i) {
     IREE_HAL_STREAM_TRACE_ZONE_END(tracing_context, tracing_event_list,
-                                   IREE_HAL_TRACING_VERBOSITY_FINE);
+                                   IREE_HAL_STREAM_TRACING_VERBOSITY_FINE);
   }
 #endif  // IREE_TRACING_FEATURES & IREE_TRACING_FEATURE_INSTRUMENTATION_DEVICE
 
diff --git a/runtime/src/iree/hal/drivers/cuda/pipeline_layout.c b/runtime/src/iree/hal/drivers/cuda/pipeline_layout.c
deleted file mode 100644
index a14d312..0000000
--- a/runtime/src/iree/hal/drivers/cuda/pipeline_layout.c
+++ /dev/null
@@ -1,260 +0,0 @@
-// Copyright 2023 The IREE Authors
-//
-// Licensed under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-
-#include "iree/hal/drivers/cuda/pipeline_layout.h"
-
-#include <stddef.h>
-
-#include "iree/base/api.h"
-#include "iree/base/tracing.h"
-
-//===----------------------------------------------------------------------===//
-// iree_hal_cuda_descriptor_set_layout_t
-//===----------------------------------------------------------------------===//
-
-typedef struct iree_hal_cuda_descriptor_set_layout_t {
-  // Abstract resource used for injecting reference counting and vtable;
-  // must be at offset 0.
-  iree_hal_resource_t resource;
-
-  // The host allocator used for creating this descriptor set layout struct.
-  iree_allocator_t host_allocator;
-
-  // The total number of bindings in this descriptor set.
-  iree_host_size_t binding_count;
-} iree_hal_cuda_descriptor_set_layout_t;
-
-static const iree_hal_descriptor_set_layout_vtable_t
-    iree_hal_cuda_descriptor_set_layout_vtable;
-
-static iree_hal_cuda_descriptor_set_layout_t*
-iree_hal_cuda_descriptor_set_layout_cast(
-    iree_hal_descriptor_set_layout_t* base_value) {
-  IREE_HAL_ASSERT_TYPE(base_value, &iree_hal_cuda_descriptor_set_layout_vtable);
-  return (iree_hal_cuda_descriptor_set_layout_t*)base_value;
-}
-
-static const iree_hal_cuda_descriptor_set_layout_t*
-iree_hal_cuda_descriptor_set_layout_const_cast(
-    const iree_hal_descriptor_set_layout_t* base_value) {
-  IREE_HAL_ASSERT_TYPE(base_value, &iree_hal_cuda_descriptor_set_layout_vtable);
-  return (const iree_hal_cuda_descriptor_set_layout_t*)base_value;
-}
-
-iree_status_t iree_hal_cuda_descriptor_set_layout_create(
-    iree_hal_descriptor_set_layout_flags_t flags,
-    iree_host_size_t binding_count,
-    const iree_hal_descriptor_set_layout_binding_t* bindings,
-    iree_allocator_t host_allocator,
-    iree_hal_descriptor_set_layout_t** out_descriptor_set_layout) {
-  IREE_ASSERT_ARGUMENT(!binding_count || bindings);
-  IREE_ASSERT_ARGUMENT(out_descriptor_set_layout);
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  *out_descriptor_set_layout = NULL;
-
-  iree_hal_cuda_descriptor_set_layout_t* descriptor_set_layout = NULL;
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_allocator_malloc(host_allocator, sizeof(*descriptor_set_layout),
-                                (void**)&descriptor_set_layout));
-
-  iree_hal_resource_initialize(&iree_hal_cuda_descriptor_set_layout_vtable,
-                               &descriptor_set_layout->resource);
-  descriptor_set_layout->host_allocator = host_allocator;
-  descriptor_set_layout->binding_count = binding_count;
-  *out_descriptor_set_layout =
-      (iree_hal_descriptor_set_layout_t*)descriptor_set_layout;
-
-  IREE_TRACE_ZONE_END(z0);
-  return iree_ok_status();
-}
-
-iree_host_size_t iree_hal_cuda_descriptor_set_layout_binding_count(
-    const iree_hal_descriptor_set_layout_t* base_descriptor_set_layout) {
-  const iree_hal_cuda_descriptor_set_layout_t* descriptor_set_layout =
-      iree_hal_cuda_descriptor_set_layout_const_cast(
-          base_descriptor_set_layout);
-  return descriptor_set_layout->binding_count;
-}
-
-static void iree_hal_cuda_descriptor_set_layout_destroy(
-    iree_hal_descriptor_set_layout_t* base_descriptor_set_layout) {
-  iree_hal_cuda_descriptor_set_layout_t* descriptor_set_layout =
-      iree_hal_cuda_descriptor_set_layout_cast(base_descriptor_set_layout);
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  iree_allocator_t host_allocator = descriptor_set_layout->host_allocator;
-
-  iree_allocator_free(host_allocator, descriptor_set_layout);
-
-  IREE_TRACE_ZONE_END(z0);
-}
-
-static const iree_hal_descriptor_set_layout_vtable_t
-    iree_hal_cuda_descriptor_set_layout_vtable = {
-        .destroy = iree_hal_cuda_descriptor_set_layout_destroy,
-};
-
-//===----------------------------------------------------------------------===//
-// iree_hal_cuda_pipeline_layout_t
-//===----------------------------------------------------------------------===//
-
-typedef struct iree_hal_cuda_pipeline_layout_t {
-  // Abstract resource used for injecting reference counting and vtable;
-  // must be at offset 0.
-  iree_hal_resource_t resource;
-
-  // The host allocator used for creating this pipeline layout struct.
-  iree_allocator_t host_allocator;
-
-  // The kernel argument index for push constants.
-  // Note that push constants are placed after all normal descriptors.
-  iree_host_size_t push_constant_base_index;
-  iree_host_size_t push_constant_count;
-
-  iree_host_size_t set_layout_count;
-  // The list of descriptor set layout pointers, pointing to trailing inline
-  // allocation after the end of this struct.
-  struct {
-    iree_hal_descriptor_set_layout_t* set_layout;
-    // Base kernel argument index for this descriptor set.
-    iree_host_size_t base_index;
-  } set_layouts[];
-} iree_hal_cuda_pipeline_layout_t;
-// + Additional inline allocation for holding all descriptor sets.
-
-static const iree_hal_pipeline_layout_vtable_t
-    iree_hal_cuda_pipeline_layout_vtable;
-
-static iree_hal_cuda_pipeline_layout_t* iree_hal_cuda_pipeline_layout_cast(
-    iree_hal_pipeline_layout_t* base_value) {
-  IREE_HAL_ASSERT_TYPE(base_value, &iree_hal_cuda_pipeline_layout_vtable);
-  return (iree_hal_cuda_pipeline_layout_t*)base_value;
-}
-
-static const iree_hal_cuda_pipeline_layout_t*
-iree_hal_cuda_pipeline_layout_const_cast(
-    const iree_hal_pipeline_layout_t* base_value) {
-  IREE_HAL_ASSERT_TYPE(base_value, &iree_hal_cuda_pipeline_layout_vtable);
-  return (const iree_hal_cuda_pipeline_layout_t*)base_value;
-}
-
-iree_status_t iree_hal_cuda_pipeline_layout_create(
-    iree_host_size_t set_layout_count,
-    iree_hal_descriptor_set_layout_t* const* set_layouts,
-    iree_host_size_t push_constant_count, iree_allocator_t host_allocator,
-    iree_hal_pipeline_layout_t** out_pipeline_layout) {
-  IREE_ASSERT_ARGUMENT(!set_layout_count || set_layouts);
-  IREE_ASSERT_ARGUMENT(out_pipeline_layout);
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  *out_pipeline_layout = NULL;
-  if (push_constant_count > IREE_HAL_CUDA_MAX_PUSH_CONSTANT_COUNT) {
-    IREE_TRACE_ZONE_END(z0);
-    return iree_make_status(
-        IREE_STATUS_INVALID_ARGUMENT,
-        "push constant count %" PRIhsz " over the limit of %d",
-        push_constant_count, IREE_HAL_CUDA_MAX_PUSH_CONSTANT_COUNT);
-  }
-
-  // Currently the pipeline layout doesn't do anything.
-  // TODO: Handle creating the argument layout at that time hadling both push
-  // constant and buffers.
-  iree_hal_cuda_pipeline_layout_t* pipeline_layout = NULL;
-  iree_host_size_t total_size =
-      sizeof(*pipeline_layout) +
-      set_layout_count * sizeof(*pipeline_layout->set_layouts);
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_allocator_malloc(host_allocator, total_size,
-                                (void**)&pipeline_layout));
-
-  iree_hal_resource_initialize(&iree_hal_cuda_pipeline_layout_vtable,
-                               &pipeline_layout->resource);
-  pipeline_layout->host_allocator = host_allocator;
-  pipeline_layout->set_layout_count = set_layout_count;
-  iree_host_size_t base_index = 0;
-  for (iree_host_size_t i = 0; i < set_layout_count; ++i) {
-    pipeline_layout->set_layouts[i].set_layout = set_layouts[i];
-    // Copy and retain all descriptor sets so we don't lose them.
-    iree_hal_descriptor_set_layout_retain(set_layouts[i]);
-    pipeline_layout->set_layouts[i].base_index = base_index;
-    base_index +=
-        iree_hal_cuda_descriptor_set_layout_binding_count(set_layouts[i]);
-  }
-  pipeline_layout->push_constant_base_index = base_index;
-  pipeline_layout->push_constant_count = push_constant_count;
-  *out_pipeline_layout = (iree_hal_pipeline_layout_t*)pipeline_layout;
-
-  IREE_TRACE_ZONE_END(z0);
-  return iree_ok_status();
-}
-
-static void iree_hal_cuda_pipeline_layout_destroy(
-    iree_hal_pipeline_layout_t* base_pipeline_layout) {
-  iree_hal_cuda_pipeline_layout_t* pipeline_layout =
-      iree_hal_cuda_pipeline_layout_cast(base_pipeline_layout);
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  iree_allocator_t host_allocator = pipeline_layout->host_allocator;
-
-  for (iree_host_size_t i = 0; i < pipeline_layout->set_layout_count; ++i) {
-    iree_hal_descriptor_set_layout_release(
-        pipeline_layout->set_layouts[i].set_layout);
-  }
-  iree_allocator_free(host_allocator, pipeline_layout);
-
-  IREE_TRACE_ZONE_END(z0);
-}
-
-iree_host_size_t iree_hal_cuda_pipeline_layout_descriptor_set_count(
-    const iree_hal_pipeline_layout_t* base_pipeline_layout) {
-  const iree_hal_cuda_pipeline_layout_t* pipeline_layout =
-      iree_hal_cuda_pipeline_layout_const_cast(base_pipeline_layout);
-  return pipeline_layout->set_layout_count;
-}
-
-const iree_hal_descriptor_set_layout_t*
-iree_hal_cuda_pipeline_layout_descriptor_set_layout(
-    const iree_hal_pipeline_layout_t* base_pipeline_layout, uint32_t set) {
-  const iree_hal_cuda_pipeline_layout_t* pipeline_layout =
-      iree_hal_cuda_pipeline_layout_const_cast(base_pipeline_layout);
-  if (set < pipeline_layout->set_layout_count) {
-    return pipeline_layout->set_layouts[set].set_layout;
-  }
-  return NULL;
-}
-
-iree_host_size_t iree_hal_cuda_pipeline_layout_base_binding_index(
-    const iree_hal_pipeline_layout_t* base_pipeline_layout, uint32_t set) {
-  const iree_hal_cuda_pipeline_layout_t* pipeline_layout =
-      iree_hal_cuda_pipeline_layout_const_cast(base_pipeline_layout);
-  return pipeline_layout->set_layouts[set].base_index;
-}
-
-iree_host_size_t iree_hal_cuda_pipeline_layout_total_binding_count(
-    const iree_hal_pipeline_layout_t* base_pipeline_layout) {
-  return iree_hal_cuda_pipeline_layout_push_constant_index(
-      base_pipeline_layout);
-}
-
-iree_host_size_t iree_hal_cuda_pipeline_layout_push_constant_index(
-    const iree_hal_pipeline_layout_t* base_pipeline_layout) {
-  const iree_hal_cuda_pipeline_layout_t* pipeline_layout =
-      iree_hal_cuda_pipeline_layout_const_cast(base_pipeline_layout);
-  return pipeline_layout->push_constant_base_index;
-}
-
-iree_host_size_t iree_hal_cuda_pipeline_layout_push_constant_count(
-    const iree_hal_pipeline_layout_t* base_pipeline_layout) {
-  const iree_hal_cuda_pipeline_layout_t* pipeline_layout =
-      iree_hal_cuda_pipeline_layout_const_cast(base_pipeline_layout);
-  return pipeline_layout->push_constant_count;
-}
-
-static const iree_hal_pipeline_layout_vtable_t
-    iree_hal_cuda_pipeline_layout_vtable = {
-        .destroy = iree_hal_cuda_pipeline_layout_destroy,
-};
diff --git a/runtime/src/iree/hal/drivers/cuda/pipeline_layout.h b/runtime/src/iree/hal/drivers/cuda/pipeline_layout.h
deleted file mode 100644
index 49eaf54..0000000
--- a/runtime/src/iree/hal/drivers/cuda/pipeline_layout.h
+++ /dev/null
@@ -1,110 +0,0 @@
-// Copyright 2023 The IREE Authors
-//
-// Licensed under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-
-#ifndef IREE_HAL_DRIVERS_CUDA_PIPELINE_LAYOUT_H_
-#define IREE_HAL_DRIVERS_CUDA_PIPELINE_LAYOUT_H_
-
-#include "iree/base/api.h"
-#include "iree/hal/api.h"
-
-#ifdef __cplusplus
-extern "C" {
-#endif  // __cplusplus
-
-// The max number of bindings per descriptor set allowed in the CUDA HAL
-// implementation.
-#define IREE_HAL_CUDA_MAX_DESCRIPTOR_SET_BINDING_COUNT 16
-
-// The max number of descriptor sets allowed in the CUDA HAL implementation.
-//
-// This depends on the general descriptor set planning in IREE and should adjust
-// with it.
-#define IREE_HAL_CUDA_MAX_DESCRIPTOR_SET_COUNT 4
-
-// The max number of push constants supported by the CUDA HAL implementation.
-#define IREE_HAL_CUDA_MAX_PUSH_CONSTANT_COUNT 64
-
-// Note that IREE HAL uses a descriptor binding model for expressing resources
-// to the kernels--each descriptor specifies the resource information, together
-// with a (set, binding) number indicating which "slots" it's bound to.
-//
-// In CUDA, however, we don't have a direct correspondance of such mechanism.
-// Resources are expressed as kernel arguments. Therefore to implement IREE
-// HAL descriptor set and pipepline layout in CUDA, we order and flatten all
-// sets and bindings and map to them to a linear array of kernel arguments.
-//
-// For example, given a pipeline layout with two sets and two bindings each:
-//   (set #, binding #) | kernel argument #
-//   :----------------: | :---------------:
-//   (0, 0)             | 0
-//   (0, 4)             | 1
-//   (2, 1)             | 2
-//   (2, 3)             | 3
-
-//===----------------------------------------------------------------------===//
-// iree_hal_cuda_descriptor_set_layout_t
-//===----------------------------------------------------------------------===//
-
-// Creates a descriptor set layout with the given |bindings|.
-//
-// Bindings in a descriptor set map to a list of consecutive kernel arguments in
-// CUDA kernels.
-iree_status_t iree_hal_cuda_descriptor_set_layout_create(
-    iree_hal_descriptor_set_layout_flags_t flags,
-    iree_host_size_t binding_count,
-    const iree_hal_descriptor_set_layout_binding_t* bindings,
-    iree_allocator_t host_allocator,
-    iree_hal_descriptor_set_layout_t** out_descriptor_set_layout);
-
-// Returns the binding count for the given descriptor set layout.
-iree_host_size_t iree_hal_cuda_descriptor_set_layout_binding_count(
-    const iree_hal_descriptor_set_layout_t* descriptor_set_layout);
-
-//===----------------------------------------------------------------------===//
-// iree_hal_cuda_pipeline_layout_t
-//===----------------------------------------------------------------------===//
-
-// Creates the pipeline layout with the given |set_layouts| and
-// |push_constant_count|.
-//
-// Bindings in the pipeline map to kernel arguments in CUDA kernels, followed by
-// the kernel argument for the push constant data.
-iree_status_t iree_hal_cuda_pipeline_layout_create(
-    iree_host_size_t set_layout_count,
-    iree_hal_descriptor_set_layout_t* const* set_layouts,
-    iree_host_size_t push_constant_count, iree_allocator_t host_allocator,
-    iree_hal_pipeline_layout_t** out_pipeline_layout);
-
-// Returns the total number of sets in the given |pipeline_layout|.
-iree_host_size_t iree_hal_cuda_pipeline_layout_descriptor_set_count(
-    const iree_hal_pipeline_layout_t* pipeline_layout);
-
-// Returns the descriptor set layout of the given |set| in |pipeline_layout|.
-const iree_hal_descriptor_set_layout_t*
-iree_hal_cuda_pipeline_layout_descriptor_set_layout(
-    const iree_hal_pipeline_layout_t* pipeline_layout, uint32_t set);
-
-// Returns the base kernel argument index for the given set.
-iree_host_size_t iree_hal_cuda_pipeline_layout_base_binding_index(
-    const iree_hal_pipeline_layout_t* pipeline_layout, uint32_t set);
-
-// Returns the total number of descriptor bindings across all sets.
-iree_host_size_t iree_hal_cuda_pipeline_layout_total_binding_count(
-    const iree_hal_pipeline_layout_t* pipeline_layout);
-
-// Returns the kernel argument index for push constant data.
-iree_host_size_t iree_hal_cuda_pipeline_layout_push_constant_index(
-    const iree_hal_pipeline_layout_t* pipeline_layout);
-
-// Returns the number of push constants in the pipeline layout.
-iree_host_size_t iree_hal_cuda_pipeline_layout_push_constant_count(
-    const iree_hal_pipeline_layout_t* pipeline_layout);
-
-#ifdef __cplusplus
-}  // extern "C"
-#endif  // __cplusplus
-
-#endif  // IREE_HAL_DRIVERS_CUDA_PIPELINE_LAYOUT_H_
diff --git a/runtime/src/iree/hal/drivers/cuda/stream_command_buffer.c b/runtime/src/iree/hal/drivers/cuda/stream_command_buffer.c
index 927d72f..bc02895 100644
--- a/runtime/src/iree/hal/drivers/cuda/stream_command_buffer.c
+++ b/runtime/src/iree/hal/drivers/cuda/stream_command_buffer.c
@@ -10,7 +10,6 @@
 #include "iree/hal/drivers/cuda/cuda_status_util.h"
 #include "iree/hal/drivers/cuda/native_executable.h"
 #include "iree/hal/drivers/cuda/nccl_channel.h"
-#include "iree/hal/drivers/cuda/pipeline_layout.h"
 #include "iree/hal/utils/collective_batch.h"
 #include "iree/hal/utils/resource_set.h"
 
@@ -38,12 +37,6 @@
 
   // Iteratively constructed batch of collective operations.
   iree_hal_collective_batch_t collective_batch;
-
-  // TODO(#18189): drop state used by legacy bindings mechanism.
-  int32_t push_constants[IREE_HAL_CUDA_MAX_PUSH_CONSTANT_COUNT];
-  struct {
-    CUdeviceptr bindings[IREE_HAL_CUDA_MAX_DESCRIPTOR_SET_BINDING_COUNT];
-  } descriptor_sets[IREE_HAL_CUDA_MAX_DESCRIPTOR_SET_COUNT];
 } iree_hal_cuda_stream_command_buffer_t;
 
 static const iree_hal_command_buffer_vtable_t
@@ -182,7 +175,7 @@
 
   IREE_HAL_STREAM_TRACE_ZONE_BEGIN_EXTERNAL(
       command_buffer->tracing_context, &command_buffer->tracing_event_list,
-      IREE_HAL_TRACING_VERBOSITY_COARSE,
+      IREE_HAL_STREAM_TRACING_VERBOSITY_COARSE,
       /*file_name=*/NULL, 0, /*line=*/0, "iree_hal_cuda_stream_command_buffer",
       strlen("iree_hal_cuda_stream_command_buffer"), /*name=*/NULL, 0);
 
@@ -219,7 +212,7 @@
 
   IREE_HAL_STREAM_TRACE_ZONE_END(command_buffer->tracing_context,
                                  &command_buffer->tracing_event_list,
-                                 IREE_HAL_TRACING_VERBOSITY_COARSE);
+                                 IREE_HAL_STREAM_TRACING_VERBOSITY_COARSE);
 
   IREE_TRACE_ZONE_END(z0);
   return iree_ok_status();
@@ -235,8 +228,9 @@
 
   IREE_HAL_STREAM_TRACE_ZONE_BEGIN_EXTERNAL(
       command_buffer->tracing_context, &command_buffer->tracing_event_list,
-      IREE_HAL_TRACING_VERBOSITY_COARSE, location ? location->file.data : NULL,
-      location ? location->file.size : 0, location ? location->line : 0,
+      IREE_HAL_STREAM_TRACING_VERBOSITY_COARSE,
+      location ? location->file.data : NULL, location ? location->file.size : 0,
+      location ? location->line : 0,
       /*func_name=*/NULL, 0, label.data, label.size);
 
   // TODO: pass along to CUPTI if available.
@@ -252,7 +246,7 @@
 
   IREE_HAL_STREAM_TRACE_ZONE_END(command_buffer->tracing_context,
                                  &command_buffer->tracing_event_list,
-                                 IREE_HAL_TRACING_VERBOSITY_COARSE);
+                                 IREE_HAL_STREAM_TRACING_VERBOSITY_COARSE);
 }
 
 static iree_status_t iree_hal_cuda_stream_command_buffer_execution_barrier(
@@ -474,185 +468,9 @@
   return status;
 }
 
-static iree_status_t iree_hal_cuda_stream_command_buffer_push_constants(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_pipeline_layout_t* pipeline_layout, iree_host_size_t offset,
-    const void* values, iree_host_size_t values_length) {
-  iree_hal_cuda_stream_command_buffer_t* command_buffer =
-      iree_hal_cuda_stream_command_buffer_cast(base_command_buffer);
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  iree_host_size_t constant_base_index = offset / sizeof(int32_t);
-  for (iree_host_size_t i = 0; i < values_length / sizeof(int32_t); i++) {
-    command_buffer->push_constants[i + constant_base_index] =
-        ((uint32_t*)values)[i];
-  }
-
-  IREE_TRACE_ZONE_END(z0);
-  return iree_ok_status();
-}
-
-static iree_status_t iree_hal_cuda_stream_command_buffer_push_descriptor_set(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_pipeline_layout_t* pipeline_layout, uint32_t set,
-    iree_host_size_t binding_count, const iree_hal_buffer_ref_t* bindings) {
-  if (binding_count > IREE_HAL_CUDA_MAX_DESCRIPTOR_SET_BINDING_COUNT) {
-    return iree_make_status(
-        IREE_STATUS_RESOURCE_EXHAUSTED,
-        "exceeded available binding slots for push "
-        "descriptor set #%" PRIu32 "; requested %" PRIhsz " vs. maximal %d",
-        set, binding_count, IREE_HAL_CUDA_MAX_DESCRIPTOR_SET_BINDING_COUNT);
-  }
-
-  iree_hal_cuda_stream_command_buffer_t* command_buffer =
-      iree_hal_cuda_stream_command_buffer_cast(base_command_buffer);
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  CUdeviceptr* current_bindings = command_buffer->descriptor_sets[set].bindings;
-  for (iree_host_size_t i = 0; i < binding_count; i++) {
-    const iree_hal_buffer_ref_t* binding = &bindings[i];
-    CUdeviceptr device_ptr = 0;
-    if (binding->buffer) {
-      IREE_RETURN_AND_END_ZONE_IF_ERROR(
-          z0, iree_hal_resource_set_insert(command_buffer->resource_set, 1,
-                                           &binding->buffer));
-      CUdeviceptr device_buffer = iree_hal_cuda_buffer_device_pointer(
-          iree_hal_buffer_allocated_buffer(binding->buffer));
-      iree_device_size_t offset = iree_hal_buffer_byte_offset(binding->buffer);
-      device_ptr = device_buffer + offset + binding->offset;
-    }
-    current_bindings[binding->ordinal] = device_ptr;
-  }
-
-  IREE_TRACE_ZONE_END(z0);
-  return iree_ok_status();
-}
-
 static iree_status_t iree_hal_cuda_stream_command_buffer_dispatch(
     iree_hal_command_buffer_t* base_command_buffer,
     iree_hal_executable_t* executable, int32_t entry_point,
-    uint32_t workgroup_x, uint32_t workgroup_y, uint32_t workgroup_z,
-    iree_hal_dispatch_flags_t flags) {
-  iree_hal_cuda_stream_command_buffer_t* command_buffer =
-      iree_hal_cuda_stream_command_buffer_cast(base_command_buffer);
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0,
-      iree_hal_cuda_stream_command_buffer_flush_collectives(command_buffer));
-
-  // Lookup kernel parameters used for side-channeling additional launch
-  // information from the compiler.
-  iree_hal_cuda_kernel_info_t kernel_info;
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_hal_cuda_native_executable_entry_point_kernel_info(
-              executable, entry_point, &kernel_info));
-
-  IREE_HAL_STREAM_TRACE_ZONE_BEGIN_EXTERNAL(
-      command_buffer->tracing_context, &command_buffer->tracing_event_list,
-      IREE_HAL_TRACING_VERBOSITY_FINE, kernel_info.source_filename.data,
-      kernel_info.source_filename.size, kernel_info.source_line,
-      kernel_info.function_name.data, kernel_info.function_name.size,
-      /*name=*/NULL, 0);
-
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_hal_resource_set_insert(command_buffer->resource_set, 1,
-                                       &executable));
-
-  // The total number of descriptors across all descriptor sets.
-  iree_host_size_t descriptor_count =
-      iree_hal_cuda_pipeline_layout_total_binding_count(kernel_info.layout);
-  // The total number of push constants.
-  iree_host_size_t push_constant_count =
-      iree_hal_cuda_pipeline_layout_push_constant_count(kernel_info.layout);
-  // We append push constants to the end of descriptors to form a linear chain
-  // of kernel arguments.
-  iree_host_size_t kernel_params_count = descriptor_count + push_constant_count;
-  iree_host_size_t kernel_params_length = kernel_params_count * sizeof(void*);
-
-  // Per CUDA API requirements, we need two levels of indirection for passing
-  // kernel arguments in.
-  //   "If the kernel has N parameters, then kernelParams needs to be an array
-  //   of N pointers. Each pointer, from kernelParams[0] to kernelParams[N-1],
-  //   points to the region of memory from which the actual parameter will be
-  //   copied."
-  //
-  // (From the cuGraphAddKernelNode API doc in
-  // https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__GRAPH.html#group__CUDA__GRAPH_1g50d871e3bd06c1b835e52f2966ef366b)
-  //
-  // It means each kernel_params[i] is itself a pointer to the corresponding
-  // element at the *second* inline allocation at the end of the current
-  // segment.
-  iree_host_size_t total_size = kernel_params_length * 2;
-  uint8_t* storage_base = NULL;
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_arena_allocate(&command_buffer->arena, total_size,
-                              (void**)&storage_base));
-  void** params_ptr = (void**)storage_base;
-
-  // Set up kernel arguments to point to the payload slots.
-  CUdeviceptr* payload_ptr =
-      (CUdeviceptr*)((uint8_t*)params_ptr + kernel_params_length);
-  for (size_t i = 0; i < kernel_params_count; i++) {
-    params_ptr[i] = &payload_ptr[i];
-  }
-
-  // Copy descriptors from all sets to the end of the current segment for later
-  // access.
-  iree_host_size_t set_count =
-      iree_hal_cuda_pipeline_layout_descriptor_set_count(kernel_info.layout);
-  for (iree_host_size_t i = 0; i < set_count; ++i) {
-    // TODO: cache this information in the kernel info to avoid recomputation.
-    iree_host_size_t binding_count =
-        iree_hal_cuda_descriptor_set_layout_binding_count(
-            iree_hal_cuda_pipeline_layout_descriptor_set_layout(
-                kernel_info.layout, i));
-    iree_host_size_t index =
-        iree_hal_cuda_pipeline_layout_base_binding_index(kernel_info.layout, i);
-    memcpy(payload_ptr + index, command_buffer->descriptor_sets[i].bindings,
-           binding_count * sizeof(CUdeviceptr));
-  }
-
-  // Append the push constants to the kernel arguments.
-  iree_host_size_t base_index =
-      iree_hal_cuda_pipeline_layout_push_constant_index(kernel_info.layout);
-  // As commented in the above, what each kernel parameter points to is a
-  // CUdeviceptr, which as the size of a pointer on the target machine. we are
-  // just storing a 32-bit value for the push constant here instead. So we must
-  // process one element each type, for 64-bit machines.
-  for (iree_host_size_t i = 0; i < push_constant_count; i++) {
-    *((uint32_t*)params_ptr[base_index + i]) =
-        command_buffer->push_constants[i];
-  }
-
-  IREE_CUDA_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, command_buffer->cuda_symbols,
-      cuLaunchKernel(kernel_info.function, workgroup_x, workgroup_y,
-                     workgroup_z, kernel_info.block_size[0],
-                     kernel_info.block_size[1], kernel_info.block_size[2],
-                     kernel_info.shared_memory_size, command_buffer->cu_stream,
-                     params_ptr, NULL),
-      "cuLaunchKernel");
-
-  IREE_HAL_STREAM_TRACE_ZONE_END(command_buffer->tracing_context,
-                                 &command_buffer->tracing_event_list,
-                                 IREE_HAL_TRACING_VERBOSITY_FINE);
-
-  IREE_TRACE_ZONE_END(z0);
-  return iree_ok_status();
-}
-
-static iree_status_t iree_hal_cuda_stream_command_buffer_dispatch_indirect(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_executable_t* executable, int32_t entry_point,
-    iree_hal_buffer_ref_t workgroups_ref, iree_hal_dispatch_flags_t flags) {
-  return iree_make_status(IREE_STATUS_UNIMPLEMENTED,
-                          "need cuda implementation of dispatch indirect");
-}
-
-static iree_status_t iree_hal_cuda_stream_command_buffer_dispatch2(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_executable_t* executable, int32_t entry_point,
     const uint32_t workgroup_count[3], iree_const_byte_span_t constants,
     iree_hal_buffer_ref_list_t bindings, iree_hal_dispatch_flags_t flags) {
   iree_hal_cuda_stream_command_buffer_t* command_buffer =
@@ -665,16 +483,19 @@
 
   // Lookup kernel parameters used for side-channeling additional launch
   // information from the compiler.
-  iree_hal_cuda_kernel_info_t kernel_info;
+  const iree_hal_cuda_kernel_params_t* kernel_params = NULL;
   IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_hal_cuda_native_executable_entry_point_kernel_info(
-              executable, entry_point, &kernel_info));
+      z0, iree_hal_cuda_native_executable_lookup_kernel_params(
+              executable, entry_point, &kernel_params));
 
   IREE_HAL_STREAM_TRACE_ZONE_BEGIN_EXTERNAL(
       command_buffer->tracing_context, &command_buffer->tracing_event_list,
-      IREE_HAL_TRACING_VERBOSITY_FINE, kernel_info.source_filename.data,
-      kernel_info.source_filename.size, kernel_info.source_line,
-      kernel_info.function_name.data, kernel_info.function_name.size,
+      IREE_HAL_STREAM_TRACING_VERBOSITY_FINE,
+      kernel_params->debug_info.source_filename.data,
+      kernel_params->debug_info.source_filename.size,
+      kernel_params->debug_info.source_line,
+      kernel_params->debug_info.function_name.data,
+      kernel_params->debug_info.function_name.size,
       /*name=*/NULL, 0);
 
   IREE_RETURN_AND_END_ZONE_IF_ERROR(
@@ -684,7 +505,7 @@
   // We append push constants to the end of descriptors to form a linear chain
   // of kernel arguments.
   iree_host_size_t kernel_params_count =
-      kernel_info.binding_count + kernel_info.constant_count;
+      kernel_params->binding_count + kernel_params->constant_count;
   iree_host_size_t kernel_params_length = kernel_params_count * sizeof(void*);
 
   // TODO: use packed parameters instead of the indirection mechanism - this
@@ -733,29 +554,30 @@
   // CUdeviceptr, which as the size of a pointer on the target machine. we are
   // just storing a 32-bit value for the push constant here instead. So we must
   // process one element each type, for 64-bit machines.
-  for (iree_host_size_t i = 0; i < kernel_info.constant_count; i++) {
-    *((uint32_t*)params_ptr[kernel_info.binding_count + i]) =
+  for (iree_host_size_t i = 0; i < kernel_params->constant_count; i++) {
+    *((uint32_t*)params_ptr[kernel_params->binding_count + i]) =
         ((const uint32_t*)constants.data)[i];
   }
 
   IREE_CUDA_RETURN_AND_END_ZONE_IF_ERROR(
       z0, command_buffer->cuda_symbols,
-      cuLaunchKernel(kernel_info.function, workgroup_count[0],
+      cuLaunchKernel(kernel_params->function, workgroup_count[0],
                      workgroup_count[1], workgroup_count[2],
-                     kernel_info.block_size[0], kernel_info.block_size[1],
-                     kernel_info.block_size[2], kernel_info.shared_memory_size,
+                     kernel_params->block_dims[0], kernel_params->block_dims[1],
+                     kernel_params->block_dims[2],
+                     kernel_params->block_shared_memory_size,
                      command_buffer->cu_stream, params_ptr, NULL),
       "cuLaunchKernel");
 
   IREE_HAL_STREAM_TRACE_ZONE_END(command_buffer->tracing_context,
                                  &command_buffer->tracing_event_list,
-                                 IREE_HAL_TRACING_VERBOSITY_FINE);
+                                 IREE_HAL_STREAM_TRACING_VERBOSITY_FINE);
 
   IREE_TRACE_ZONE_END(z0);
   return iree_ok_status();
 }
 
-static iree_status_t iree_hal_cuda_stream_command_buffer_dispatch2_indirect(
+static iree_status_t iree_hal_cuda_stream_command_buffer_dispatch_indirect(
     iree_hal_command_buffer_t* base_command_buffer,
     iree_hal_executable_t* executable, int32_t entry_point,
     iree_hal_buffer_ref_t workgroups_ref, iree_const_byte_span_t constants,
@@ -782,13 +604,7 @@
         .update_buffer = iree_hal_cuda_stream_command_buffer_update_buffer,
         .copy_buffer = iree_hal_cuda_stream_command_buffer_copy_buffer,
         .collective = iree_hal_cuda_stream_command_buffer_collective,
-        .push_constants = iree_hal_cuda_stream_command_buffer_push_constants,
-        .push_descriptor_set =
-            iree_hal_cuda_stream_command_buffer_push_descriptor_set,
         .dispatch = iree_hal_cuda_stream_command_buffer_dispatch,
         .dispatch_indirect =
             iree_hal_cuda_stream_command_buffer_dispatch_indirect,
-        .dispatch2 = iree_hal_cuda_stream_command_buffer_dispatch2,
-        .dispatch2_indirect =
-            iree_hal_cuda_stream_command_buffer_dispatch2_indirect,
 };
diff --git a/runtime/src/iree/hal/drivers/hip/CMakeLists.txt b/runtime/src/iree/hal/drivers/hip/CMakeLists.txt
index cb9b29e..ef9576d 100644
--- a/runtime/src/iree/hal/drivers/hip/CMakeLists.txt
+++ b/runtime/src/iree/hal/drivers/hip/CMakeLists.txt
@@ -40,8 +40,6 @@
     "native_executable.h"
     "nop_executable_cache.c"
     "nop_executable_cache.h"
-    "pipeline_layout.c"
-    "pipeline_layout.h"
     "rccl_channel.c"
     "rccl_channel.h"
     "stream_command_buffer.c"
@@ -64,6 +62,7 @@
     iree::base::internal::flatcc::parsing
     iree::hal
     iree::hal::utils::collective_batch
+    iree::hal::utils::executable_debug_info
     iree::hal::utils::deferred_command_buffer
     iree::hal::utils::deferred_work_queue
     iree::hal::utils::file_transfer
@@ -71,7 +70,8 @@
     iree::hal::utils::resource_set
     iree::hal::utils::semaphore_base
     iree::hal::utils::stream_tracing
-    iree::schemas::rocm_executable_def_c_fbs
+    iree::schemas::executable_debug_info_c_fbs
+    iree::schemas::hip_executable_def_c_fbs
   PUBLIC
 )
 
diff --git a/runtime/src/iree/hal/drivers/hip/graph_command_buffer.c b/runtime/src/iree/hal/drivers/hip/graph_command_buffer.c
index f0c86ab..d1ef975 100644
--- a/runtime/src/iree/hal/drivers/hip/graph_command_buffer.c
+++ b/runtime/src/iree/hal/drivers/hip/graph_command_buffer.c
@@ -14,7 +14,6 @@
 #include "iree/hal/drivers/hip/dynamic_symbols.h"
 #include "iree/hal/drivers/hip/hip_buffer.h"
 #include "iree/hal/drivers/hip/native_executable.h"
-#include "iree/hal/drivers/hip/pipeline_layout.h"
 #include "iree/hal/drivers/hip/status_util.h"
 #include "iree/hal/utils/collective_batch.h"
 #include "iree/hal/utils/resource_set.h"
@@ -59,12 +58,6 @@
 
   // Iteratively constructed batch of collective operations.
   iree_hal_collective_batch_t collective_batch;
-
-  // TODO(#18189): drop state used by legacy bindings mechanism.
-  int32_t push_constants[IREE_HAL_HIP_MAX_PUSH_CONSTANT_COUNT];
-  struct {
-    hipDeviceptr_t bindings[IREE_HAL_HIP_MAX_DESCRIPTOR_SET_BINDING_COUNT];
-  } descriptor_sets[IREE_HAL_HIP_MAX_DESCRIPTOR_SET_COUNT];
 } iree_hal_hip_graph_command_buffer_t;
 
 static const iree_hal_command_buffer_vtable_t
@@ -352,7 +345,7 @@
       "hipGraphCreate");
 
   IREE_HIP_GRAPH_COMMAND_BUFFER_TRACE_ZONE_BEGIN_EXTERNAL(
-      command_buffer, IREE_HAL_TRACING_VERBOSITY_COARSE,
+      command_buffer, IREE_HAL_STREAM_TRACING_VERBOSITY_COARSE,
       /*file_name=*/NULL, 0, /*line=*/0, "iree_hal_hip_graph_command_buffer",
       strlen("iree_hal_hip_graph_command_buffer"),
       /*name=*/NULL, 0);
@@ -370,7 +363,7 @@
       iree_hal_hip_graph_command_buffer_flush_collectives(command_buffer));
 
   IREE_HIP_GRAPH_COMMAND_BUFFER_TRACE_ZONE_END(
-      command_buffer, IREE_HAL_TRACING_VERBOSITY_COARSE);
+      command_buffer, IREE_HAL_STREAM_TRACING_VERBOSITY_COARSE);
 
   // Reset state used during recording.
   command_buffer->hip_barrier_node = NULL;
@@ -405,7 +398,7 @@
 
   (void)command_buffer;
   IREE_HIP_GRAPH_COMMAND_BUFFER_TRACE_ZONE_BEGIN_EXTERNAL(
-      command_buffer, IREE_HAL_TRACING_VERBOSITY_COARSE,
+      command_buffer, IREE_HAL_STREAM_TRACING_VERBOSITY_COARSE,
       location ? location->file.data : NULL, location ? location->file.size : 0,
       location ? location->line : 0,
       /*func_name=*/NULL, 0, label.data, label.size);
@@ -417,7 +410,7 @@
       iree_hal_hip_graph_command_buffer_cast(base_command_buffer);
   (void)command_buffer;
   IREE_HIP_GRAPH_COMMAND_BUFFER_TRACE_ZONE_END(
-      command_buffer, IREE_HAL_TRACING_VERBOSITY_COARSE);
+      command_buffer, IREE_HAL_STREAM_TRACING_VERBOSITY_COARSE);
 }
 
 static iree_status_t
@@ -531,7 +524,7 @@
       iree_hal_hip_graph_command_buffer_cast(base_command_buffer);
   IREE_TRACE_ZONE_BEGIN(z0);
   IREE_HIP_GRAPH_COMMAND_BUFFER_TRACE_ZONE_BEGIN(
-      command_buffer, IREE_HAL_TRACING_VERBOSITY_FINE);
+      command_buffer, IREE_HAL_STREAM_TRACING_VERBOSITY_FINE);
 
   IREE_RETURN_AND_END_ZONE_IF_ERROR(
       z0, iree_hal_hip_graph_command_buffer_flush_collectives(command_buffer));
@@ -569,8 +562,8 @@
           dependency_count, &params),
       "hipGraphAddMemsetNode");
 
-  IREE_HIP_GRAPH_COMMAND_BUFFER_TRACE_ZONE_END(command_buffer,
-                                               IREE_HAL_TRACING_VERBOSITY_FINE);
+  IREE_HIP_GRAPH_COMMAND_BUFFER_TRACE_ZONE_END(
+      command_buffer, IREE_HAL_STREAM_TRACING_VERBOSITY_FINE);
   IREE_TRACE_ZONE_END(z0);
   return iree_ok_status();
 }
@@ -587,7 +580,7 @@
   }
   IREE_TRACE_ZONE_BEGIN(z0);
   IREE_HIP_GRAPH_COMMAND_BUFFER_TRACE_ZONE_BEGIN(
-      command_buffer, IREE_HAL_TRACING_VERBOSITY_FINE);
+      command_buffer, IREE_HAL_STREAM_TRACING_VERBOSITY_FINE);
 
   IREE_RETURN_AND_END_ZONE_IF_ERROR(
       z0, iree_hal_hip_graph_command_buffer_flush_collectives(command_buffer));
@@ -639,8 +632,8 @@
           dependency_count, &params, command_buffer->hip_context),
       "hipDrvGraphAddMemcpyNode");
 
-  IREE_HIP_GRAPH_COMMAND_BUFFER_TRACE_ZONE_END(command_buffer,
-                                               IREE_HAL_TRACING_VERBOSITY_FINE);
+  IREE_HIP_GRAPH_COMMAND_BUFFER_TRACE_ZONE_END(
+      command_buffer, IREE_HAL_STREAM_TRACING_VERBOSITY_FINE);
   IREE_TRACE_ZONE_END(z0);
   return iree_ok_status();
 }
@@ -657,7 +650,7 @@
   }
   IREE_TRACE_ZONE_BEGIN(z0);
   IREE_HIP_GRAPH_COMMAND_BUFFER_TRACE_ZONE_BEGIN(
-      command_buffer, IREE_HAL_TRACING_VERBOSITY_FINE);
+      command_buffer, IREE_HAL_STREAM_TRACING_VERBOSITY_FINE);
 
   IREE_RETURN_AND_END_ZONE_IF_ERROR(
       z0, iree_hal_hip_graph_command_buffer_flush_collectives(command_buffer));
@@ -703,8 +696,8 @@
           dependency_count, &params, command_buffer->hip_context),
       "hipDrvGraphAddMemcpyNode");
 
-  IREE_HIP_GRAPH_COMMAND_BUFFER_TRACE_ZONE_END(command_buffer,
-                                               IREE_HAL_TRACING_VERBOSITY_FINE);
+  IREE_HIP_GRAPH_COMMAND_BUFFER_TRACE_ZONE_END(
+      command_buffer, IREE_HAL_STREAM_TRACING_VERBOSITY_FINE);
   IREE_TRACE_ZONE_END(z0);
   return iree_ok_status();
 }
@@ -730,189 +723,9 @@
                                           recv_binding, element_count);
 }
 
-static iree_status_t iree_hal_hip_graph_command_buffer_push_constants(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_pipeline_layout_t* pipeline_layout, iree_host_size_t offset,
-    const void* values, iree_host_size_t values_length) {
-  iree_hal_hip_graph_command_buffer_t* command_buffer =
-      iree_hal_hip_graph_command_buffer_cast(base_command_buffer);
-
-  if (IREE_UNLIKELY(offset + values_length >=
-                    sizeof(command_buffer->push_constants))) {
-    return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
-                            "push constant range [%zu, %zu) out of range",
-                            offset, offset + values_length);
-  }
-
-  memcpy((uint8_t*)&command_buffer->push_constants + offset, values,
-         values_length);
-
-  return iree_ok_status();
-}
-
-static iree_status_t iree_hal_hip_graph_command_buffer_push_descriptor_set(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_pipeline_layout_t* pipeline_layout, uint32_t set,
-    iree_host_size_t binding_count, const iree_hal_buffer_ref_t* bindings) {
-  if (binding_count > IREE_HAL_HIP_MAX_DESCRIPTOR_SET_BINDING_COUNT) {
-    return iree_make_status(
-        IREE_STATUS_RESOURCE_EXHAUSTED,
-        "exceeded available binding slots for push "
-        "descriptor set #%" PRIu32 "; requested %" PRIhsz " vs. maximal %d",
-        set, binding_count, IREE_HAL_HIP_MAX_DESCRIPTOR_SET_BINDING_COUNT);
-  }
-
-  iree_hal_hip_graph_command_buffer_t* command_buffer =
-      iree_hal_hip_graph_command_buffer_cast(base_command_buffer);
-  IREE_TRACE_ZONE_BEGIN(z0);
-  hipDeviceptr_t* current_bindings =
-      command_buffer->descriptor_sets[set].bindings;
-  for (iree_host_size_t i = 0; i < binding_count; i++) {
-    const iree_hal_buffer_ref_t* binding = &bindings[i];
-    hipDeviceptr_t device_ptr = NULL;
-    if (binding->buffer) {
-      IREE_RETURN_AND_END_ZONE_IF_ERROR(
-          z0, iree_hal_resource_set_insert(command_buffer->resource_set, 1,
-                                           &binding->buffer));
-
-      hipDeviceptr_t device_buffer = iree_hal_hip_buffer_device_pointer(
-          iree_hal_buffer_allocated_buffer(binding->buffer));
-      iree_device_size_t offset = iree_hal_buffer_byte_offset(binding->buffer);
-      device_ptr = (uint8_t*)device_buffer + offset + binding->offset;
-    }
-
-    current_bindings[binding->ordinal] = device_ptr;
-  }
-
-  IREE_TRACE_ZONE_END(z0);
-  return iree_ok_status();
-}
-
 static iree_status_t iree_hal_hip_graph_command_buffer_dispatch(
     iree_hal_command_buffer_t* base_command_buffer,
     iree_hal_executable_t* executable, int32_t entry_point,
-    uint32_t workgroup_x, uint32_t workgroup_y, uint32_t workgroup_z,
-    iree_hal_dispatch_flags_t flags) {
-  iree_hal_hip_graph_command_buffer_t* command_buffer =
-      iree_hal_hip_graph_command_buffer_cast(base_command_buffer);
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_hal_hip_graph_command_buffer_flush_collectives(command_buffer));
-
-  // Lookup kernel parameters used for side-channeling additional launch
-  // information from the compiler.
-  iree_hal_hip_kernel_info_t kernel_info;
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_hal_hip_native_executable_entry_point_kernel_info(
-              executable, entry_point, &kernel_info));
-
-  IREE_HIP_GRAPH_COMMAND_BUFFER_TRACE_ZONE_BEGIN_EXTERNAL(
-      command_buffer, IREE_HAL_TRACING_VERBOSITY_FINE,
-      kernel_info.source_filename.data, kernel_info.source_filename.size,
-      kernel_info.source_line, kernel_info.function_name.data,
-      kernel_info.function_name.size,
-      /*name=*/NULL, 0);
-
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_hal_resource_set_insert(command_buffer->resource_set, 1,
-                                       &executable));
-  iree_hal_hip_dispatch_layout_t dispatch_params =
-      iree_hal_hip_pipeline_layout_dispatch_layout(kernel_info.layout);
-  // The total number of descriptors across all descriptor sets.
-  iree_host_size_t descriptor_count = dispatch_params.total_binding_count;
-  // The total number of push constants.
-  iree_host_size_t push_constant_count = dispatch_params.push_constant_count;
-  // We append push constants to the end of descriptors to form a linear chain
-  // of kernel arguments.
-  iree_host_size_t kernel_params_count = descriptor_count + push_constant_count;
-  iree_host_size_t kernel_params_length = kernel_params_count * sizeof(void*);
-
-  iree_host_size_t total_size = kernel_params_length * 2;
-  uint8_t* storage_base = NULL;
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_arena_allocate(&command_buffer->arena, total_size,
-                              (void**)&storage_base));
-  void** params_ptr = (void**)storage_base;
-
-  // Set up kernel arguments to point to the payload slots.
-  hipDeviceptr_t* payload_ptr =
-      (hipDeviceptr_t*)((uint8_t*)params_ptr + kernel_params_length);
-  for (size_t i = 0; i < kernel_params_count; i++) {
-    params_ptr[i] = &payload_ptr[i];
-  }
-
-  // Copy descriptors from all sets to the end of the current segment for later
-  // access.
-  iree_host_size_t set_count = dispatch_params.set_layout_count;
-  for (iree_host_size_t i = 0; i < set_count; ++i) {
-    // TODO: cache this information in the kernel info to avoid recomputation.
-    iree_host_size_t binding_count =
-        iree_hal_hip_descriptor_set_layout_binding_count(
-            iree_hal_hip_pipeline_layout_descriptor_set_layout(
-                kernel_info.layout, i));
-    iree_host_size_t index =
-        iree_hal_hip_pipeline_layout_base_binding_index(kernel_info.layout, i);
-    memcpy(payload_ptr + index, command_buffer->descriptor_sets[i].bindings,
-           binding_count * sizeof(hipDeviceptr_t));
-  }
-
-  // Append the push constants to the kernel arguments.
-  iree_host_size_t base_index = dispatch_params.push_constant_base_index;
-
-  // Each kernel parameter points to is a hipDeviceptr_t, which as the size of a
-  // pointer on the target machine. we are just storing a 32-bit value for the
-  // push constant here instead. So we must process one element each type, for
-  // 64-bit machines.
-  for (iree_host_size_t i = 0; i < push_constant_count; i++) {
-    *((uint32_t*)params_ptr[base_index + i]) =
-        command_buffer->push_constants[i];
-  }
-
-  hipKernelNodeParams params = {
-      .blockDim.x = kernel_info.block_size[0],
-      .blockDim.y = kernel_info.block_size[1],
-      .blockDim.z = kernel_info.block_size[2],
-      .gridDim.x = workgroup_x,
-      .gridDim.y = workgroup_y,
-      .gridDim.z = workgroup_z,
-      .func = kernel_info.function,
-      .kernelParams = params_ptr,
-      .sharedMemBytes = kernel_info.shared_memory_size,
-  };
-
-  if (command_buffer->graph_node_count >=
-      IREE_HAL_HIP_MAX_CONCURRENT_GRAPH_NODE_COUNT) {
-    return iree_make_status(IREE_STATUS_OUT_OF_RANGE,
-                            "exceeded max concurrent node limit");
-  }
-
-  size_t dependency_count = command_buffer->hip_barrier_node ? 1 : 0;
-  IREE_HIP_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, command_buffer->symbols,
-      hipGraphAddKernelNode(
-          &command_buffer->hip_graph_nodes[command_buffer->graph_node_count++],
-          command_buffer->hip_graph, &command_buffer->hip_barrier_node,
-          dependency_count, &params),
-      "hipGraphAddKernelNode");
-
-  IREE_HIP_GRAPH_COMMAND_BUFFER_TRACE_ZONE_END(command_buffer,
-                                               IREE_HAL_TRACING_VERBOSITY_FINE);
-  IREE_TRACE_ZONE_END(z0);
-  return iree_ok_status();
-}
-
-static iree_status_t iree_hal_hip_graph_command_buffer_dispatch_indirect(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_executable_t* executable, int32_t entry_point,
-    iree_hal_buffer_ref_t workgroups_ref, iree_hal_dispatch_flags_t flags) {
-  return iree_make_status(IREE_STATUS_UNIMPLEMENTED,
-                          "indirect dispatch not yet implemented");
-}
-
-static iree_status_t iree_hal_hip_graph_command_buffer_dispatch2(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_executable_t* executable, int32_t entry_point,
     const uint32_t workgroup_count[3], iree_const_byte_span_t constants,
     iree_hal_buffer_ref_list_t bindings, iree_hal_dispatch_flags_t flags) {
   iree_hal_hip_graph_command_buffer_t* command_buffer =
@@ -924,16 +737,18 @@
 
   // Lookup kernel parameters used for side-channeling additional launch
   // information from the compiler.
-  iree_hal_hip_kernel_info_t kernel_info;
+  const iree_hal_hip_kernel_params_t* kernel_params = NULL;
   IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_hal_hip_native_executable_entry_point_kernel_info(
-              executable, entry_point, &kernel_info));
+      z0, iree_hal_hip_native_executable_lookup_kernel_params(
+              executable, entry_point, &kernel_params));
 
   IREE_HIP_GRAPH_COMMAND_BUFFER_TRACE_ZONE_BEGIN_EXTERNAL(
-      command_buffer, IREE_HAL_TRACING_VERBOSITY_FINE,
-      kernel_info.source_filename.data, kernel_info.source_filename.size,
-      kernel_info.source_line, kernel_info.function_name.data,
-      kernel_info.function_name.size, /*name=*/NULL, 0);
+      command_buffer, IREE_HAL_STREAM_TRACING_VERBOSITY_FINE,
+      kernel_params->debug_info.source_filename.data,
+      kernel_params->debug_info.source_filename.size,
+      kernel_params->debug_info.source_line,
+      kernel_params->debug_info.function_name.data,
+      kernel_params->debug_info.function_name.size, /*name=*/NULL, 0);
 
   IREE_RETURN_AND_END_ZONE_IF_ERROR(
       z0, iree_hal_resource_set_insert(command_buffer->resource_set, 1,
@@ -942,7 +757,7 @@
   // We append push constants to the end of descriptors to form a linear chain
   // of kernel arguments.
   iree_host_size_t kernel_params_count =
-      kernel_info.binding_count + kernel_info.constant_count;
+      kernel_params->binding_count + kernel_params->constant_count;
   iree_host_size_t kernel_params_length = kernel_params_count * sizeof(void*);
 
   // TODO: use packed parameters instead of the indirection mechanism - this
@@ -981,21 +796,21 @@
   // pointer on the target machine. we are just storing a 32-bit value for the
   // push constant here instead. So we must process one element each type, for
   // 64-bit machines.
-  for (iree_host_size_t i = 0; i < kernel_info.constant_count; i++) {
-    *((uint32_t*)params_ptr[kernel_info.binding_count + i]) =
+  for (iree_host_size_t i = 0; i < kernel_params->constant_count; i++) {
+    *((uint32_t*)params_ptr[kernel_params->binding_count + i]) =
         ((const uint32_t*)constants.data)[i];
   }
 
   hipKernelNodeParams params = {
-      .blockDim.x = kernel_info.block_size[0],
-      .blockDim.y = kernel_info.block_size[1],
-      .blockDim.z = kernel_info.block_size[2],
+      .blockDim.x = kernel_params->block_dims[0],
+      .blockDim.y = kernel_params->block_dims[1],
+      .blockDim.z = kernel_params->block_dims[2],
       .gridDim.x = workgroup_count[0],
       .gridDim.y = workgroup_count[1],
       .gridDim.z = workgroup_count[2],
-      .func = kernel_info.function,
+      .func = kernel_params->function,
       .kernelParams = params_ptr,
-      .sharedMemBytes = kernel_info.shared_memory_size,
+      .sharedMemBytes = kernel_params->block_shared_memory_size,
   };
 
   if (command_buffer->graph_node_count >=
@@ -1013,13 +828,13 @@
           dependency_count, &params),
       "hipGraphAddKernelNode");
 
-  IREE_HIP_GRAPH_COMMAND_BUFFER_TRACE_ZONE_END(command_buffer,
-                                               IREE_HAL_TRACING_VERBOSITY_FINE);
+  IREE_HIP_GRAPH_COMMAND_BUFFER_TRACE_ZONE_END(
+      command_buffer, IREE_HAL_STREAM_TRACING_VERBOSITY_FINE);
   IREE_TRACE_ZONE_END(z0);
   return iree_ok_status();
 }
 
-static iree_status_t iree_hal_hip_graph_command_buffer_dispatch2_indirect(
+static iree_status_t iree_hal_hip_graph_command_buffer_dispatch_indirect(
     iree_hal_command_buffer_t* base_command_buffer,
     iree_hal_executable_t* executable, int32_t entry_point,
     iree_hal_buffer_ref_t workgroups_ref, iree_const_byte_span_t constants,
@@ -1046,13 +861,7 @@
         .update_buffer = iree_hal_hip_graph_command_buffer_update_buffer,
         .copy_buffer = iree_hal_hip_graph_command_buffer_copy_buffer,
         .collective = iree_hal_hip_graph_command_buffer_collective,
-        .push_constants = iree_hal_hip_graph_command_buffer_push_constants,
-        .push_descriptor_set =
-            iree_hal_hip_graph_command_buffer_push_descriptor_set,
         .dispatch = iree_hal_hip_graph_command_buffer_dispatch,
         .dispatch_indirect =
             iree_hal_hip_graph_command_buffer_dispatch_indirect,
-        .dispatch2 = iree_hal_hip_graph_command_buffer_dispatch2,
-        .dispatch2_indirect =
-            iree_hal_hip_graph_command_buffer_dispatch2_indirect,
 };
diff --git a/runtime/src/iree/hal/drivers/hip/hip_device.c b/runtime/src/iree/hal/drivers/hip/hip_device.c
index 1e8e978..d0e3c55 100644
--- a/runtime/src/iree/hal/drivers/hip/hip_device.c
+++ b/runtime/src/iree/hal/drivers/hip/hip_device.c
@@ -21,7 +21,6 @@
 #include "iree/hal/drivers/hip/hip_allocator.h"
 #include "iree/hal/drivers/hip/memory_pools.h"
 #include "iree/hal/drivers/hip/nop_executable_cache.h"
-#include "iree/hal/drivers/hip/pipeline_layout.h"
 #include "iree/hal/drivers/hip/rccl_channel.h"
 #include "iree/hal/drivers/hip/rccl_dynamic_symbols.h"
 #include "iree/hal/drivers/hip/status_util.h"
@@ -446,12 +445,14 @@
 
   // Enable tracing for the (currently only) stream - no-op if disabled.
   if (iree_status_is_ok(status) && device->params.stream_tracing) {
-    if (device->params.stream_tracing >= IREE_HAL_TRACING_VERBOSITY_MAX ||
-        device->params.stream_tracing < IREE_HAL_TRACING_VERBOSITY_OFF) {
+    if (device->params.stream_tracing >=
+            IREE_HAL_STREAM_TRACING_VERBOSITY_MAX ||
+        device->params.stream_tracing < IREE_HAL_STREAM_TRACING_VERBOSITY_OFF) {
       return iree_make_status(
           IREE_STATUS_INVALID_ARGUMENT,
           "invalid stream_tracing argument: expected to be between %d and %d",
-          IREE_HAL_TRACING_VERBOSITY_OFF, IREE_HAL_TRACING_VERBOSITY_MAX);
+          IREE_HAL_STREAM_TRACING_VERBOSITY_OFF,
+          IREE_HAL_STREAM_TRACING_VERBOSITY_MAX);
     }
 
     iree_hal_hip_tracing_device_interface_t* tracing_device_interface = NULL;
@@ -874,18 +875,6 @@
   }
 }
 
-static iree_status_t iree_hal_hip_device_create_descriptor_set_layout(
-    iree_hal_device_t* base_device,
-    iree_hal_descriptor_set_layout_flags_t flags,
-    iree_host_size_t binding_count,
-    const iree_hal_descriptor_set_layout_binding_t* bindings,
-    iree_hal_descriptor_set_layout_t** out_descriptor_set_layout) {
-  iree_hal_hip_device_t* device = iree_hal_hip_device_cast(base_device);
-  return iree_hal_hip_descriptor_set_layout_create(
-      flags, binding_count, bindings, device->host_allocator,
-      out_descriptor_set_layout);
-}
-
 static iree_status_t iree_hal_hip_device_create_event(
     iree_hal_device_t* base_device, iree_hal_queue_affinity_t queue_affinity,
     iree_hal_event_flags_t flags, iree_hal_event_t** out_event) {
@@ -917,17 +906,6 @@
       device->host_allocator, out_executable_cache);
 }
 
-static iree_status_t iree_hal_hip_device_create_pipeline_layout(
-    iree_hal_device_t* base_device, iree_host_size_t push_constants,
-    iree_host_size_t set_layout_count,
-    iree_hal_descriptor_set_layout_t* const* set_layouts,
-    iree_hal_pipeline_layout_t** out_pipeline_layout) {
-  iree_hal_hip_device_t* device = iree_hal_hip_device_cast(base_device);
-  return iree_hal_hip_pipeline_layout_create(
-      set_layout_count, set_layouts, push_constants, device->host_allocator,
-      out_pipeline_layout);
-}
-
 static iree_status_t iree_hal_hip_device_create_semaphore(
     iree_hal_device_t* base_device, uint64_t initial_value,
     iree_hal_semaphore_flags_t flags, iree_hal_semaphore_t** out_semaphore) {
@@ -1140,12 +1118,9 @@
     .query_i64 = iree_hal_hip_device_query_i64,
     .create_channel = iree_hal_hip_device_create_channel,
     .create_command_buffer = iree_hal_hip_device_create_command_buffer,
-    .create_descriptor_set_layout =
-        iree_hal_hip_device_create_descriptor_set_layout,
     .create_event = iree_hal_hip_device_create_event,
     .create_executable_cache = iree_hal_hip_device_create_executable_cache,
     .import_file = iree_hal_hip_device_import_file,
-    .create_pipeline_layout = iree_hal_hip_device_create_pipeline_layout,
     .create_semaphore = iree_hal_hip_device_create_semaphore,
     .query_semaphore_compatibility =
         iree_hal_hip_device_query_semaphore_compatibility,
diff --git a/runtime/src/iree/hal/drivers/hip/native_executable.c b/runtime/src/iree/hal/drivers/hip/native_executable.c
index 19caae9..69a48ba 100644
--- a/runtime/src/iree/hal/drivers/hip/native_executable.c
+++ b/runtime/src/iree/hal/drivers/hip/native_executable.c
@@ -10,33 +10,32 @@
 
 #include "iree/base/api.h"
 #include "iree/hal/drivers/hip/dynamic_symbols.h"
-#include "iree/hal/drivers/hip/pipeline_layout.h"
 #include "iree/hal/drivers/hip/status_util.h"
+#include "iree/hal/utils/executable_debug_info.h"
 
 // flatcc schemas:
 #include "iree/base/internal/flatcc/parsing.h"
-// Using the existing ROCM schema fow now.
-#include "iree/schemas/rocm_executable_def_reader.h"
-#include "iree/schemas/rocm_executable_def_verifier.h"
+#include "iree/schemas/executable_debug_info_reader.h"
+#include "iree/schemas/executable_debug_info_verifier.h"
+#include "iree/schemas/hip_executable_def_reader.h"
+#include "iree/schemas/hip_executable_def_verifier.h"
 
 typedef struct iree_hal_hip_native_executable_t {
   // Abstract resource used for injecting reference counting and vtable;
   // must be at offset 0.
   iree_hal_resource_t resource;
-
   iree_allocator_t host_allocator;
 
   const iree_hal_hip_dynamic_symbols_t* symbols;
 
-  // The loaded HIP module.
-  hipModule_t hip_module;
+  // Loaded HIP modules.
+  iree_host_size_t module_count;
+  hipModule_t* modules;
 
-  iree_host_size_t entry_point_count;
-  // The list of entry point data pointers, pointing to trailing inline
-  // allocation after the end of this struct.
-  iree_hal_hip_kernel_info_t entry_points[];
+  // Exported kernels referencing the loaded modules.
+  iree_host_size_t export_count;
+  iree_hal_hip_kernel_params_t exports[];
 } iree_hal_hip_native_executable_t;
-// + Additional inline allocation for holding entry point information.
 
 static const iree_hal_executable_vtable_t iree_hal_hip_native_executable_vtable;
 
@@ -46,6 +45,40 @@
   return (iree_hal_hip_native_executable_t*)base_value;
 }
 
+typedef struct iree_hal_hip_limits_t {
+  uint32_t max_block_dims[3];
+  uint32_t max_block_shared_memory_size;
+} iree_hal_hip_limits_t;
+static iree_status_t iree_hal_hip_query_limits(
+    const iree_hal_hip_dynamic_symbols_t* symbols, hipDevice_t device,
+    iree_hal_hip_limits_t* out_limits) {
+  memset(out_limits, 0, sizeof(*out_limits));
+
+  IREE_HIP_RETURN_IF_ERROR(
+      symbols,
+      hipDeviceGetAttribute(&out_limits->max_block_dims[0],
+                            hipDeviceAttributeMaxBlockDimX, device),
+      "hipDeviceGetAttribute");
+  IREE_HIP_RETURN_IF_ERROR(
+      symbols,
+      hipDeviceGetAttribute(&out_limits->max_block_dims[1],
+                            hipDeviceAttributeMaxBlockDimY, device),
+      "hipDeviceGetAttribute");
+  IREE_HIP_RETURN_IF_ERROR(
+      symbols,
+      hipDeviceGetAttribute(&out_limits->max_block_dims[2],
+                            hipDeviceAttributeMaxBlockDimZ, device),
+      "hipDeviceGetAttribute");
+
+  IREE_HIP_RETURN_IF_ERROR(
+      symbols,
+      hipDeviceGetAttribute(&out_limits->max_block_shared_memory_size,
+                            hipDeviceAttributeMaxSharedMemoryPerBlock, device),
+      "hipDeviceGetAttribute");
+
+  return iree_ok_status();
+}
+
 // Verifies the structure of the flatbuffer so that we can avoid doing so during
 // runtime.
 //
@@ -53,7 +86,8 @@
 // functions with internal linkage), however we shouldn't need to bounds check
 // anything within the flatbuffer after this succeeds.
 static iree_status_t iree_hal_hip_native_executable_flatbuffer_verify(
-    iree_const_byte_span_t flatbuffer_data) {
+    iree_const_byte_span_t flatbuffer_data,
+    const iree_hal_hip_limits_t* limits) {
   if (!flatbuffer_data.data) {
     return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
                             "flatbuffer data is not present");
@@ -62,7 +96,7 @@
   // Run flatcc generated verification. This ensures all pointers are in-bounds
   // and that we can safely walk the file, but not that the actual contents of
   // the flatbuffer meet our expectations.
-  int verify_ret = iree_hal_rocm_ExecutableDef_verify_as_root(
+  int verify_ret = iree_hal_hip_ExecutableDef_verify_as_root(
       flatbuffer_data.data, flatbuffer_data.data_length);
   if (verify_ret != flatcc_verify_ok) {
     return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
@@ -70,50 +104,102 @@
                             flatcc_verify_error_string(verify_ret));
   }
 
-  iree_hal_rocm_ExecutableDef_table_t executable_def =
-      iree_hal_rocm_ExecutableDef_as_root(flatbuffer_data.data);
+  iree_hal_hip_ExecutableDef_table_t executable_def =
+      iree_hal_hip_ExecutableDef_as_root(flatbuffer_data.data);
 
-  flatbuffers_string_vec_t entry_points_vec =
-      iree_hal_rocm_ExecutableDef_entry_points_get(executable_def);
-  size_t entry_point_count = flatbuffers_string_vec_len(entry_points_vec);
-  if (entry_point_count == 0) {
-    return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
-                            "no entry points present");
-  }
-  for (size_t i = 0; i < entry_point_count; ++i) {
-    if (flatbuffers_string_len(
-            flatbuffers_string_vec_at(entry_points_vec, i)) == 0) {
+  iree_hal_hip_ModuleDef_vec_t modules_vec =
+      iree_hal_hip_ExecutableDef_modules_get(executable_def);
+  iree_host_size_t module_count = iree_hal_hip_ModuleDef_vec_len(modules_vec);
+  for (iree_host_size_t i = 0; i < module_count; ++i) {
+    iree_hal_hip_ModuleDef_table_t module_def =
+        iree_hal_hip_ModuleDef_vec_at(modules_vec, i);
+    if (!module_def) {
       return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
-                              "executable entry point %zu has no name", i);
+                              "modules[%" PRIhsz "] is NULL", i);
+    }
+    if (flatbuffers_string_len(
+            iree_hal_hip_ModuleDef_hsaco_image_get(module_def)) == 0) {
+      return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
+                              "modules[%" PRIhsz "] contents are empty", i);
     }
   }
 
-  iree_hal_rocm_BlockSizeDef_vec_t block_sizes_vec =
-      iree_hal_rocm_ExecutableDef_block_sizes_get(executable_def);
-  size_t block_size_count = iree_hal_rocm_BlockSizeDef_vec_len(block_sizes_vec);
-  if (entry_point_count != block_size_count) {
-    return iree_make_status(
-        IREE_STATUS_INVALID_ARGUMENT,
-        "entry points (%zu) and block sizes (%zu) count mismatch",
-        entry_point_count, block_size_count);
-  }
+  iree_hal_hip_ExportDef_vec_t exports_vec =
+      iree_hal_hip_ExecutableDef_exports_get(executable_def);
+  for (iree_host_size_t i = 0; i < iree_hal_hip_ExportDef_vec_len(exports_vec);
+       ++i) {
+    iree_hal_hip_ExportDef_table_t export_def =
+        iree_hal_hip_ExportDef_vec_at(exports_vec, i);
+    if (!export_def) continue;
 
-  flatbuffers_uint32_vec_t shared_memory_sizes_vec =
-      iree_hal_rocm_ExecutableDef_shared_memory_sizes_get(executable_def);
-  size_t shared_memory_sizes_count =
-      flatbuffers_string_vec_len(shared_memory_sizes_vec);
-  if (entry_point_count != shared_memory_sizes_count) {
-    return iree_make_status(
-        IREE_STATUS_INVALID_ARGUMENT,
-        "entry points (%zu) and shared memory sizes (%zu) count mismatch",
-        entry_point_count, shared_memory_sizes_count);
-  }
+    uint32_t module_ordinal =
+        iree_hal_hip_ExportDef_module_ordinal_get(export_def);
+    if (module_ordinal >= module_count) {
+      return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
+                              "exports[%" PRIhsz
+                              "] module_ordinal %u is out of bounds %" PRIhsz,
+                              i, module_ordinal, module_count);
+    }
 
-  flatbuffers_string_t hsaco_image =
-      iree_hal_rocm_ExecutableDef_hsaco_image_get(executable_def);
-  if (flatbuffers_string_len(hsaco_image) == 0) {
-    return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
-                            "no HSACO image present");
+    if (flatbuffers_string_len(
+            iree_hal_hip_ExportDef_kernel_name_get(export_def)) == 0) {
+      return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
+                              "exports[%" PRIhsz "] name is empty", i);
+    }
+
+    if (iree_hal_hip_ExportDef_block_dims_is_present(export_def)) {
+      const iree_hal_hip_BlockDims_t* block_dims =
+          iree_hal_hip_ExportDef_block_dims_get(export_def);
+      if (block_dims->x > limits->max_block_dims[0] ||
+          block_dims->y > limits->max_block_dims[1] ||
+          block_dims->z > limits->max_block_dims[2]) {
+        return iree_make_status(
+            IREE_STATUS_INVALID_ARGUMENT,
+            "exports[%" PRIhsz
+            "] block dims %ux%ux%u exceeds device maximum %ux%ux%u",
+            i, block_dims->x, block_dims->y, block_dims->z,
+            limits->max_block_dims[0], limits->max_block_dims[1],
+            limits->max_block_dims[2]);
+      }
+    } else {
+      return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
+                              "exports[%" PRIhsz "] blocks dims are missing",
+                              i);
+    }
+
+    uint32_t block_shared_memory_size =
+        iree_hal_hip_ExportDef_block_shared_memory_size_get(export_def);
+    if (block_shared_memory_size > limits->max_block_shared_memory_size) {
+      return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
+                              "exports[%" PRIhsz
+                              "] requires %uB of shared memory and "
+                              "exceeds the device maximum of %uB per block",
+                              i, block_shared_memory_size,
+                              limits->max_block_shared_memory_size);
+    }
+
+    uint32_t constant_count =
+        iree_hal_hip_ExportDef_constant_count_get(export_def);
+    if (constant_count > IREE_HAL_HIP_MAX_DISPATCH_CONSTANT_COUNT) {
+      return iree_make_status(
+          IREE_STATUS_INVALID_ARGUMENT,
+          "exports[%" PRIhsz "] constant_count %u exceeds maximum of %u", i,
+          constant_count, IREE_HAL_HIP_MAX_DISPATCH_CONSTANT_COUNT);
+    }
+
+    iree_hal_hip_BindingBits_vec_t binding_flags_vec =
+        iree_hal_hip_ExportDef_binding_flags_get(export_def);
+    if (iree_hal_hip_BindingBits_vec_len(binding_flags_vec) >
+        IREE_HAL_HIP_MAX_DISPATCH_BINDING_COUNT) {
+      return iree_make_status(
+          IREE_STATUS_INVALID_ARGUMENT,
+          "exports[%" PRIhsz "] binding_flags count %zu exceeds maximum of %u",
+          i, iree_hal_hip_BindingBits_vec_len(binding_flags_vec),
+          IREE_HAL_HIP_MAX_DISPATCH_BINDING_COUNT);
+    }
+
+    IREE_RETURN_IF_ERROR(iree_hal_debug_verify_export_def(
+        iree_hal_hip_ExportDef_debug_info_get(export_def)));
   }
 
   return iree_ok_status();
@@ -129,170 +215,174 @@
   IREE_TRACE_ZONE_BEGIN(z0);
 
   *out_executable = NULL;
-  iree_hal_hip_native_executable_t* executable = NULL;
+
+  // TODO: move to the executable cache to avoid repeated queries.
+  iree_hal_hip_limits_t limits = {0};
+  IREE_RETURN_AND_END_ZONE_IF_ERROR(
+      z0, iree_hal_hip_query_limits(symbols, device, &limits));
 
   IREE_RETURN_AND_END_ZONE_IF_ERROR(
       z0, iree_hal_hip_native_executable_flatbuffer_verify(
-              executable_params->executable_data));
+              executable_params->executable_data, &limits));
 
-  iree_hal_rocm_ExecutableDef_table_t executable_def =
-      iree_hal_rocm_ExecutableDef_as_root(
+  iree_hal_hip_ExecutableDef_table_t executable_def =
+      iree_hal_hip_ExecutableDef_as_root(
           executable_params->executable_data.data);
 
-  flatbuffers_string_vec_t entry_points_vec =
-      iree_hal_rocm_ExecutableDef_entry_points_get(executable_def);
-  iree_hal_rocm_BlockSizeDef_vec_t block_sizes_vec =
-      iree_hal_rocm_ExecutableDef_block_sizes_get(executable_def);
-  flatbuffers_uint32_vec_t shared_memory_sizes_vec =
-      iree_hal_rocm_ExecutableDef_shared_memory_sizes_get(executable_def);
-  flatbuffers_string_t hsaco_image =
-      iree_hal_rocm_ExecutableDef_hsaco_image_get(executable_def);
-  iree_host_size_t entry_point_count =
-      flatbuffers_string_vec_len(entry_points_vec);
+  iree_hal_hip_ModuleDef_vec_t modules_vec =
+      iree_hal_hip_ExecutableDef_modules_get(executable_def);
+  iree_host_size_t module_count = iree_hal_hip_ModuleDef_vec_len(modules_vec);
+  iree_hal_hip_ExportDef_vec_t exports_vec =
+      iree_hal_hip_ExecutableDef_exports_get(executable_def);
+  iree_host_size_t export_count = iree_hal_hip_ExportDef_vec_len(exports_vec);
 
   // Calculate the total number of characters across all entry point names. This
   // is only required when tracing so that we can store copies of the names as
   // the flatbuffer storing the strings may be released while the executable is
   // still live.
-  iree_host_size_t total_entry_point_name_chars = 0;
+  iree_host_size_t total_export_info_length = 0;
   IREE_TRACE({
-    for (iree_host_size_t i = 0; i < entry_point_count; i++) {
-      const char* entry_name = flatbuffers_string_vec_at(entry_points_vec, i);
-      total_entry_point_name_chars += flatbuffers_string_len(entry_name);
+    for (iree_host_size_t i = 0; i < export_count; ++i) {
+      iree_hal_hip_ExportDef_table_t export_def =
+          iree_hal_hip_ExportDef_vec_at(exports_vec, i);
+      total_export_info_length += iree_hal_debug_calculate_export_info_size(
+          iree_hal_hip_ExportDef_debug_info_get(export_def));
     }
   });
 
-  // Allocate storage for the kernel module.
-  iree_host_size_t total_size =
-      sizeof(*executable) +
-      entry_point_count * sizeof(executable->entry_points[0]) +
-      total_entry_point_name_chars;
+  // Allocate storage for the executable and its associated data structures.
+  iree_hal_hip_native_executable_t* executable = NULL;
+  const iree_host_size_t total_size =
+      sizeof(*executable) + module_count * sizeof(executable->modules[0]) +
+      export_count * sizeof(executable->exports[0]) + total_export_info_length;
   IREE_RETURN_AND_END_ZONE_IF_ERROR(
       z0,
       iree_allocator_malloc(host_allocator, total_size, (void**)&executable));
-  IREE_TRACE(
-      char* string_table_buffer =
-          (char*)((char*)executable + sizeof(*executable) +
-                  entry_point_count * sizeof(executable->entry_points[0])));
-
   iree_hal_resource_initialize(&iree_hal_hip_native_executable_vtable,
                                &executable->resource);
+  executable->host_allocator = host_allocator;
+  executable->symbols = symbols;
+  executable->module_count = module_count;
+  executable->modules =
+      (hipModule_t*)((uint8_t*)executable + sizeof(*executable) +
+                     export_count * sizeof(executable->exports[0]));
+  executable->export_count = export_count;
+  IREE_TRACE(
+      iree_hal_debug_export_info_t* export_infos =
+          (iree_hal_debug_export_info_t*)((uint8_t*)executable->modules +
+                                          module_count *
+                                              sizeof(executable->modules[0])));
 
-  // Load the HSACO image - this will fail if the device cannot handle the
-  // contents. We could check this prior to creating
-  hipModule_t module = NULL;
+  // Publish any embedded source files to the tracing infrastructure.
+  iree_hal_debug_publish_source_files(
+      iree_hal_hip_ExecutableDef_source_files_get(executable_def));
 
-  iree_status_t status = IREE_HIP_RESULT_TO_STATUS(
-      symbols, hipModuleLoadDataEx(&module, hsaco_image, 0, NULL, NULL),
-      "hipModuleLoadDataEx");
-  if (!iree_status_is_ok(status)) {
-    status = iree_status_annotate(
-        status,
-        IREE_SV("mismatched target chip? missing/wrong bitcode directory?"));
-  }
+  // Load each module first so that exports can reference them.
+  iree_status_t status = iree_ok_status();
+  for (iree_host_size_t i = 0; i < module_count; ++i) {
+    iree_hal_hip_ModuleDef_table_t module_def =
+        iree_hal_hip_ModuleDef_vec_at(modules_vec, i);
 
-  // Query max optin shared memory per block - we'll use it to compare with
-  // kernel usages.
-  uint32_t max_shared_memory = 0;
-  if (iree_status_is_ok(status)) {
+    // WARNING: HIP doesn't take an expected length here so we can't bound it.
+    // It's likely that users could craft inputs that read beyond the extents of
+    // the embedded binary.
+    flatbuffers_string_t hsaco_image =
+        iree_hal_hip_ModuleDef_hsaco_image_get(module_def);
+
+    // TODO: pass hipJitOption values to get log info and other info back.
+    // We pass the error buffer today but could use the info log to diagnose
+    // performance warnings.
+    char error_log[8192] = {0};
+    hipJitOption jit_options[] = {
+        hipJitOptionErrorLogBuffer,
+        hipJitOptionErrorLogBufferSizeBytes,
+    };
+    void* jit_option_values[] = {
+        (void*)error_log,
+        (void*)(uint32_t)sizeof(error_log),
+    };
+    hipModule_t module = NULL;
     status = IREE_HIP_RESULT_TO_STATUS(
         symbols,
-        hipDeviceGetAttribute(&max_shared_memory,
-                              hipDeviceAttributeMaxSharedMemoryPerBlock,
-                              device),
-        "hipDeviceGetAttribute");
+        hipModuleLoadDataEx(&module, hsaco_image, IREE_ARRAYSIZE(jit_options),
+                            jit_options, jit_option_values),
+        "hipModuleLoadDataEx");
+    if (!iree_status_is_ok(status)) {
+      status = iree_status_annotate(
+          status,
+          IREE_SV("mismatched target chip? missing/wrong bitcode directory?"));
+      if (strlen(error_log) > 0) {
+        status =
+            iree_status_annotate(status, iree_make_cstring_view(error_log));
+      }
+      break;
+    }
+
+    executable->modules[i] = module;
   }
 
   if (iree_status_is_ok(status)) {
-    executable->host_allocator = host_allocator;
-    executable->symbols = symbols;
-    executable->hip_module = module;
-    executable->entry_point_count = entry_point_count;
-    for (iree_host_size_t i = 0; i < entry_point_count; i++) {
-      // Lookup the function in the module; this should always succeed but we
-      // cannot trust that the input was generated by our compiler.
+    for (iree_host_size_t i = 0; i < export_count; ++i) {
+      iree_hal_hip_ExportDef_table_t export_def =
+          iree_hal_hip_ExportDef_vec_at(exports_vec, i);
+
+      // Lookup the function in the module; this should always succeed but
+      // we cannot trust that the input was generated by our compiler.
+      uint32_t module_ordinal =
+          iree_hal_hip_ExportDef_module_ordinal_get(export_def);
+      hipModule_t module = executable->modules[module_ordinal];
+      flatbuffers_string_t kernel_name =
+          iree_hal_hip_ExportDef_kernel_name_get(export_def);
       hipFunction_t function = NULL;
-      const char* entry_name = flatbuffers_string_vec_at(entry_points_vec, i);
       status = IREE_HIP_RESULT_TO_STATUS(
-          symbols,
-          hipModuleGetFunction(&function, executable->hip_module, entry_name),
+          symbols, hipModuleGetFunction(&function, module, kernel_name),
           "hipModuleGetFunction");
       if (!iree_status_is_ok(status)) break;
       if (!function) {
         status = iree_make_status(IREE_STATUS_NOT_FOUND,
-                                  "exported module function '%s' not found",
-                                  entry_name);
+                                  "exports[%" PRIhsz
+                                  "] kernel `%s` not found in modules[%u]",
+                                  i, kernel_name, module_ordinal);
         break;
       }
 
-      if (shared_memory_sizes_vec[i] > max_shared_memory) {
-        status = iree_make_status(
-            IREE_STATUS_INVALID_ARGUMENT,
-            "function '%s' requested shared memory size of %u bytes larger "
-            "than allowed size of %u bytes",
-            entry_name, shared_memory_sizes_vec[i], max_shared_memory);
-      } else {
-        status = IREE_HIP_RESULT_TO_STATUS(
-            symbols,
-            hipFuncSetAttribute(
-                function,
-                (hipFuncAttribute)
-                    HIP_FUNC_ATTRIBUTE_MAX_DYNAMIC_SHARED_SIZE_BYTES,
-                shared_memory_sizes_vec[i]),
-            "hipFuncSetAttribute");
-      }
+      uint32_t block_shared_memory_size =
+          iree_hal_hip_ExportDef_block_shared_memory_size_get(export_def);
+      status = IREE_HIP_RESULT_TO_STATUS(
+          symbols,
+          hipFuncSetAttribute(
+              function,
+              (hipFuncAttribute)
+                  HIP_FUNC_ATTRIBUTE_MAX_DYNAMIC_SHARED_SIZE_BYTES,
+              block_shared_memory_size),
+          "hipFuncSetAttribute");
       if (!iree_status_is_ok(status)) break;
 
-      // TODO(#18189): embed all of this on a single flatbuffer table
-      // per-export.
-      //
       // Package required parameters for kernel launches for each entry point.
-      iree_hal_hip_kernel_info_t* kernel_info = &executable->entry_points[i];
-      kernel_info->layout = executable_params->pipeline_layouts[i];
-      iree_hal_pipeline_layout_retain(kernel_info->layout);
+      iree_hal_hip_kernel_params_t* kernel_info = &executable->exports[i];
       kernel_info->function = function;
-      iree_hal_hip_dispatch_layout_t dispatch_params =
-          iree_hal_hip_pipeline_layout_dispatch_layout(kernel_info->layout);
-      kernel_info->constant_count = dispatch_params.push_constant_count;
-      kernel_info->binding_count = dispatch_params.total_binding_count;
-      kernel_info->block_size[0] = block_sizes_vec[i].x;
-      kernel_info->block_size[1] = block_sizes_vec[i].y;
-      kernel_info->block_size[2] = block_sizes_vec[i].z;
-      kernel_info->shared_memory_size = shared_memory_sizes_vec[i];
-
-      if (kernel_info->binding_count >
-          IREE_HAL_HIP_MAX_DESCRIPTOR_SET_BINDING_COUNT) {
-        status = iree_make_status(
-            IREE_STATUS_RESOURCE_EXHAUSTED,
-            "exceeded available binding slots; requested %u of maximum %d",
-            kernel_info->binding_count,
-            IREE_HAL_HIP_MAX_DESCRIPTOR_SET_BINDING_COUNT);
-      }
-      if (!iree_status_is_ok(status)) break;
-
-      // Stash the entry point name in the string table for use when tracing.
-      IREE_TRACE({
-        iree_host_size_t entry_name_length = flatbuffers_string_len(entry_name);
-        memcpy(string_table_buffer, entry_name, entry_name_length);
-        kernel_info->function_name =
-            iree_make_string_view(string_table_buffer, entry_name_length);
-        string_table_buffer += entry_name_length;
-      });
+      const iree_hal_hip_BlockDims_t* block_dims =
+          iree_hal_hip_ExportDef_block_dims_get(export_def);
+      kernel_info->block_dims[0] = block_dims->x;
+      kernel_info->block_dims[1] = block_dims->y;
+      kernel_info->block_dims[2] = block_dims->z;
+      kernel_info->block_shared_memory_size =
+          iree_hal_hip_ExportDef_block_shared_memory_size_get(export_def);
+      kernel_info->constant_count =
+          iree_hal_hip_ExportDef_constant_count_get(export_def);
+      iree_hal_hip_BindingBits_vec_t binding_flags_vec =
+          iree_hal_hip_ExportDef_binding_flags_get(export_def);
+      kernel_info->binding_count =
+          iree_hal_hip_BindingBits_vec_len(binding_flags_vec);
 
       IREE_TRACE({
-        if (iree_hal_rocm_ExecutableDef_source_locations_is_present(
-                executable_def)) {
-          iree_hal_rocm_FileLineLocDef_vec_t source_locs_vec =
-              iree_hal_rocm_ExecutableDef_source_locations_get(executable_def);
-          iree_hal_rocm_FileLineLocDef_table_t source_loc =
-              iree_hal_rocm_FileLineLocDef_vec_at(source_locs_vec, i);
-          flatbuffers_string_t filename =
-              iree_hal_rocm_FileLineLocDef_filename_get(source_loc);
-          uint32_t line = iree_hal_rocm_FileLineLocDef_line_get(source_loc);
-          kernel_info->source_filename =
-              iree_make_string_view(filename, flatbuffers_string_len(filename));
-          kernel_info->source_line = line;
-        }
+        iree_hal_debug_copy_export_info(
+            iree_hal_hip_ExportDef_debug_info_get(export_def),
+            &export_infos[i]);
+        kernel_info->debug_info.function_name = export_infos[i].function_name;
+        kernel_info->debug_info.source_filename =
+            export_infos[i].source_filename;
+        kernel_info->debug_info.source_line = export_infos[i].source_line;
       });
     }
   }
@@ -314,30 +404,31 @@
   iree_allocator_t host_allocator = executable->host_allocator;
   IREE_TRACE_ZONE_BEGIN(z0);
 
-  for (iree_host_size_t i = 0; i < executable->entry_point_count; ++i) {
-    iree_hal_pipeline_layout_release(executable->entry_points[i].layout);
+  for (iree_host_size_t i = 0; i < executable->module_count; ++i) {
+    if (executable->modules[i]) {
+      IREE_HIP_IGNORE_ERROR(executable->symbols,
+                            hipModuleUnload(executable->modules[i]));
+    }
   }
-  if (executable->hip_module) {
-    IREE_HIP_IGNORE_ERROR(executable->symbols,
-                          hipModuleUnload(executable->hip_module));
-  }
+
   iree_allocator_free(host_allocator, executable);
 
   IREE_TRACE_ZONE_END(z0);
 }
 
-iree_status_t iree_hal_hip_native_executable_entry_point_kernel_info(
-    iree_hal_executable_t* base_executable, int32_t entry_point,
-    iree_hal_hip_kernel_info_t* out_info) {
+iree_status_t iree_hal_hip_native_executable_lookup_kernel_params(
+    iree_hal_executable_t* base_executable, int32_t ordinal,
+    const iree_hal_hip_kernel_params_t** out_params) {
   iree_hal_hip_native_executable_t* executable =
       iree_hal_hip_native_executable_cast(base_executable);
-  if (entry_point >= executable->entry_point_count) {
-    return iree_make_status(IREE_STATUS_OUT_OF_RANGE,
-                            "entry point ordinal %d out of range; executable "
-                            "only contains %" PRIhsz " entry points",
-                            entry_point, executable->entry_point_count);
+  if (ordinal >= executable->export_count) {
+    return iree_make_status(
+        IREE_STATUS_OUT_OF_RANGE,
+        "export ordinal %d out of range; executable contains %" PRIhsz
+        " exports",
+        ordinal, executable->export_count);
   }
-  memcpy(out_info, &executable->entry_points[entry_point], sizeof(*out_info));
+  *out_params = &executable->exports[ordinal];
   return iree_ok_status();
 }
 
diff --git a/runtime/src/iree/hal/drivers/hip/native_executable.h b/runtime/src/iree/hal/drivers/hip/native_executable.h
index d2b1a31..beb1e7c 100644
--- a/runtime/src/iree/hal/drivers/hip/native_executable.h
+++ b/runtime/src/iree/hal/drivers/hip/native_executable.h
@@ -19,20 +19,31 @@
 extern "C" {
 #endif  // __cplusplus
 
-typedef struct iree_hal_hip_kernel_info_t {
-  // TODO(#18189): remove when using simplified bindings.
-  iree_hal_pipeline_layout_t* layout;
+// The max number of per-dispatch bindings allowed in the HIP HAL
+// implementation.
+#define IREE_HAL_HIP_MAX_DISPATCH_BINDING_COUNT 16
+
+// The max number of per-dispatch constants supported by the HIP HAL
+// implementation.
+#define IREE_HAL_HIP_MAX_DISPATCH_CONSTANT_COUNT 64
+
+typedef struct iree_hal_hip_kernel_debug_info_t {
+  iree_string_view_t function_name;
+  iree_string_view_t source_filename;
+  uint32_t source_line;
+} iree_hal_hip_kernel_debug_info_t;
+
+typedef struct iree_hal_hip_kernel_params_t {
   hipFunction_t function;
+
   uint32_t constant_count;
   uint32_t binding_count;
-  // TODO(#18189): add bitfield indicating indirect bindings.
-  uint32_t block_size[3];
-  uint32_t shared_memory_size;
 
-  IREE_TRACE(iree_string_view_t function_name;)
-  IREE_TRACE(iree_string_view_t source_filename;)
-  IREE_TRACE(uint32_t source_line;)
-} iree_hal_hip_kernel_info_t;
+  uint32_t block_dims[3];
+  uint32_t block_shared_memory_size;
+
+  IREE_TRACE(iree_hal_hip_kernel_debug_info_t debug_info;)
+} iree_hal_hip_kernel_params_t;
 
 // Creates an IREE executable from a HSACO module. The module may contain
 // several kernels that can be extracted along with the associated block size.
@@ -43,9 +54,9 @@
 
 // Returns the kernel launch parameters for the given |entry_point| in the
 // |executable|.
-iree_status_t iree_hal_hip_native_executable_entry_point_kernel_info(
+iree_status_t iree_hal_hip_native_executable_lookup_kernel_params(
     iree_hal_executable_t* executable, int32_t entry_point,
-    iree_hal_hip_kernel_info_t* out_info);
+    const iree_hal_hip_kernel_params_t** out_params);
 
 #ifdef __cplusplus
 }  // extern "C"
diff --git a/runtime/src/iree/hal/drivers/hip/pipeline_layout.c b/runtime/src/iree/hal/drivers/hip/pipeline_layout.c
deleted file mode 100644
index a471011..0000000
--- a/runtime/src/iree/hal/drivers/hip/pipeline_layout.c
+++ /dev/null
@@ -1,248 +0,0 @@
-// Copyright 2023 The IREE Authors
-//
-// Licensed under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-
-#include "iree/hal/drivers/hip/pipeline_layout.h"
-
-#include <stddef.h>
-
-#include "iree/base/api.h"
-#include "iree/base/tracing.h"
-
-//===----------------------------------------------------------------------===//
-// iree_hal_hip_descriptor_set_layout_t
-//===----------------------------------------------------------------------===//
-
-typedef struct iree_hal_hip_descriptor_set_layout_t {
-  // Abstract resource used for injecting reference counting and vtable;
-  // must be at offset 0.
-  iree_hal_resource_t resource;
-
-  // The host allocator used for creating this descriptor set layout struct.
-  iree_allocator_t host_allocator;
-
-  // The total number of bindings in this descriptor set.
-  iree_host_size_t binding_count;
-} iree_hal_hip_descriptor_set_layout_t;
-
-static const iree_hal_descriptor_set_layout_vtable_t
-    iree_hal_hip_descriptor_set_layout_vtable;
-
-static iree_hal_hip_descriptor_set_layout_t*
-iree_hal_hip_descriptor_set_layout_cast(
-    iree_hal_descriptor_set_layout_t* base_value) {
-  IREE_HAL_ASSERT_TYPE(base_value, &iree_hal_hip_descriptor_set_layout_vtable);
-  return (iree_hal_hip_descriptor_set_layout_t*)base_value;
-}
-
-static const iree_hal_hip_descriptor_set_layout_t*
-iree_hal_hip_descriptor_set_layout_const_cast(
-    const iree_hal_descriptor_set_layout_t* base_value) {
-  IREE_HAL_ASSERT_TYPE(base_value, &iree_hal_hip_descriptor_set_layout_vtable);
-  return (const iree_hal_hip_descriptor_set_layout_t*)base_value;
-}
-
-iree_status_t iree_hal_hip_descriptor_set_layout_create(
-    iree_hal_descriptor_set_layout_flags_t flags,
-    iree_host_size_t binding_count,
-    const iree_hal_descriptor_set_layout_binding_t* bindings,
-    iree_allocator_t host_allocator,
-    iree_hal_descriptor_set_layout_t** out_descriptor_set_layout) {
-  IREE_ASSERT_ARGUMENT(!binding_count || bindings);
-  IREE_ASSERT_ARGUMENT(out_descriptor_set_layout);
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  *out_descriptor_set_layout = NULL;
-
-  iree_hal_hip_descriptor_set_layout_t* descriptor_set_layout = NULL;
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_allocator_malloc(host_allocator, sizeof(*descriptor_set_layout),
-                                (void**)&descriptor_set_layout));
-
-  iree_hal_resource_initialize(&iree_hal_hip_descriptor_set_layout_vtable,
-                               &descriptor_set_layout->resource);
-  descriptor_set_layout->host_allocator = host_allocator;
-  descriptor_set_layout->binding_count = binding_count;
-  *out_descriptor_set_layout =
-      (iree_hal_descriptor_set_layout_t*)descriptor_set_layout;
-
-  IREE_TRACE_ZONE_END(z0);
-  return iree_ok_status();
-}
-
-iree_host_size_t iree_hal_hip_descriptor_set_layout_binding_count(
-    const iree_hal_descriptor_set_layout_t* base_descriptor_set_layout) {
-  const iree_hal_hip_descriptor_set_layout_t* descriptor_set_layout =
-      iree_hal_hip_descriptor_set_layout_const_cast(base_descriptor_set_layout);
-  return descriptor_set_layout->binding_count;
-}
-
-static void iree_hal_hip_descriptor_set_layout_destroy(
-    iree_hal_descriptor_set_layout_t* base_descriptor_set_layout) {
-  iree_hal_hip_descriptor_set_layout_t* descriptor_set_layout =
-      iree_hal_hip_descriptor_set_layout_cast(base_descriptor_set_layout);
-  iree_allocator_t host_allocator = descriptor_set_layout->host_allocator;
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  iree_allocator_free(host_allocator, descriptor_set_layout);
-
-  IREE_TRACE_ZONE_END(z0);
-}
-
-static const iree_hal_descriptor_set_layout_vtable_t
-    iree_hal_hip_descriptor_set_layout_vtable = {
-        .destroy = iree_hal_hip_descriptor_set_layout_destroy,
-};
-
-//===----------------------------------------------------------------------===//
-// iree_hal_hip_pipeline_layout_t
-//===----------------------------------------------------------------------===//
-
-typedef struct iree_hal_hip_pipeline_layout_t {
-  // Abstract resource used for injecting reference counting and vtable;
-  // must be at offset 0.
-  iree_hal_resource_t resource;
-
-  // The host allocator used for creating this pipeline layout struct.
-  iree_allocator_t host_allocator;
-
-  // The kernel argument index for push constants.
-  // Note that push constants are placed after all normal descriptors.
-  iree_host_size_t push_constant_base_index;
-  iree_host_size_t push_constant_count;
-
-  iree_host_size_t set_layout_count;
-  // The list of descriptor set layout pointers, pointing to trailing inline
-  // allocation after the end of this struct.
-  struct {
-    iree_hal_descriptor_set_layout_t* set_layout;
-    // Base kernel argument index for this descriptor set.
-    iree_host_size_t base_index;
-  } set_layouts[];
-} iree_hal_hip_pipeline_layout_t;
-// + Additional inline allocation for holding all descriptor sets.
-
-static const iree_hal_pipeline_layout_vtable_t
-    iree_hal_hip_pipeline_layout_vtable;
-
-static iree_hal_hip_pipeline_layout_t* iree_hal_hip_pipeline_layout_cast(
-    iree_hal_pipeline_layout_t* base_value) {
-  IREE_HAL_ASSERT_TYPE(base_value, &iree_hal_hip_pipeline_layout_vtable);
-  return (iree_hal_hip_pipeline_layout_t*)base_value;
-}
-
-static const iree_hal_hip_pipeline_layout_t*
-iree_hal_hip_pipeline_layout_const_cast(
-    const iree_hal_pipeline_layout_t* base_value) {
-  IREE_HAL_ASSERT_TYPE(base_value, &iree_hal_hip_pipeline_layout_vtable);
-  return (const iree_hal_hip_pipeline_layout_t*)base_value;
-}
-
-iree_status_t iree_hal_hip_pipeline_layout_create(
-    iree_host_size_t set_layout_count,
-    iree_hal_descriptor_set_layout_t* const* set_layouts,
-    iree_host_size_t push_constant_count, iree_allocator_t host_allocator,
-    iree_hal_pipeline_layout_t** out_pipeline_layout) {
-  IREE_ASSERT_ARGUMENT(!set_layout_count || set_layouts);
-  IREE_ASSERT_ARGUMENT(out_pipeline_layout);
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  *out_pipeline_layout = NULL;
-  if (push_constant_count > IREE_HAL_HIP_MAX_PUSH_CONSTANT_COUNT) {
-    IREE_TRACE_ZONE_END(z0);
-    return iree_make_status(
-        IREE_STATUS_INVALID_ARGUMENT,
-        "push constant count %" PRIhsz " over the limit of %d",
-        push_constant_count, IREE_HAL_HIP_MAX_PUSH_CONSTANT_COUNT);
-  }
-
-  // Currently the pipeline layout doesn't do anything.
-  // TODO: Handle creating the argument layout at that time hadling both push
-  // constant and buffers.
-  iree_hal_hip_pipeline_layout_t* pipeline_layout = NULL;
-  iree_host_size_t total_size =
-      sizeof(*pipeline_layout) +
-      set_layout_count * sizeof(*pipeline_layout->set_layouts);
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_allocator_malloc(host_allocator, total_size,
-                                (void**)&pipeline_layout));
-
-  iree_hal_resource_initialize(&iree_hal_hip_pipeline_layout_vtable,
-                               &pipeline_layout->resource);
-  pipeline_layout->host_allocator = host_allocator;
-  pipeline_layout->set_layout_count = set_layout_count;
-  iree_host_size_t base_index = 0;
-  for (iree_host_size_t i = 0; i < set_layout_count; ++i) {
-    pipeline_layout->set_layouts[i].set_layout = set_layouts[i];
-    // Copy and retain all descriptor sets so we don't lose them.
-    iree_hal_descriptor_set_layout_retain(set_layouts[i]);
-    pipeline_layout->set_layouts[i].base_index = base_index;
-    base_index +=
-        iree_hal_hip_descriptor_set_layout_binding_count(set_layouts[i]);
-  }
-  pipeline_layout->push_constant_base_index = base_index;
-  pipeline_layout->push_constant_count = push_constant_count;
-  *out_pipeline_layout = (iree_hal_pipeline_layout_t*)pipeline_layout;
-
-  IREE_TRACE_ZONE_END(z0);
-  return iree_ok_status();
-}
-
-static void iree_hal_hip_pipeline_layout_destroy(
-    iree_hal_pipeline_layout_t* base_pipeline_layout) {
-  iree_hal_hip_pipeline_layout_t* pipeline_layout =
-      iree_hal_hip_pipeline_layout_cast(base_pipeline_layout);
-  iree_allocator_t host_allocator = pipeline_layout->host_allocator;
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  for (iree_host_size_t i = 0; i < pipeline_layout->set_layout_count; ++i) {
-    iree_hal_descriptor_set_layout_release(
-        pipeline_layout->set_layouts[i].set_layout);
-  }
-  iree_allocator_free(host_allocator, pipeline_layout);
-
-  IREE_TRACE_ZONE_END(z0);
-}
-
-const iree_hal_descriptor_set_layout_t*
-iree_hal_hip_pipeline_layout_descriptor_set_layout(
-    const iree_hal_pipeline_layout_t* base_pipeline_layout, uint32_t set) {
-  const iree_hal_hip_pipeline_layout_t* pipeline_layout =
-      iree_hal_hip_pipeline_layout_const_cast(base_pipeline_layout);
-  if (set < pipeline_layout->set_layout_count) {
-    return pipeline_layout->set_layouts[set].set_layout;
-  }
-  return NULL;
-}
-
-iree_host_size_t iree_hal_hip_pipeline_layout_base_binding_index(
-    const iree_hal_pipeline_layout_t* base_pipeline_layout, uint32_t set) {
-  const iree_hal_hip_pipeline_layout_t* pipeline_layout =
-      iree_hal_hip_pipeline_layout_const_cast(base_pipeline_layout);
-  return pipeline_layout->set_layouts[set].base_index;
-}
-
-static const iree_hal_pipeline_layout_vtable_t
-    iree_hal_hip_pipeline_layout_vtable = {
-        .destroy = iree_hal_hip_pipeline_layout_destroy,
-};
-
-//===----------------------------------------------------------------------===//
-// iree_hal_hip_dispatch_layout_t
-//===----------------------------------------------------------------------===//
-
-iree_hal_hip_dispatch_layout_t iree_hal_hip_pipeline_layout_dispatch_layout(
-    const iree_hal_pipeline_layout_t* base_pipeline_layout) {
-  const iree_hal_hip_pipeline_layout_t* pipeline_layout =
-      iree_hal_hip_pipeline_layout_const_cast(base_pipeline_layout);
-  iree_hal_hip_dispatch_layout_t dispatch_params = {
-      .push_constant_base_index = pipeline_layout->push_constant_base_index,
-      .push_constant_count = pipeline_layout->push_constant_count,
-      .total_binding_count = pipeline_layout->push_constant_base_index,
-      .set_layout_count = pipeline_layout->set_layout_count,
-  };
-
-  return dispatch_params;
-}
diff --git a/runtime/src/iree/hal/drivers/hip/pipeline_layout.h b/runtime/src/iree/hal/drivers/hip/pipeline_layout.h
deleted file mode 100644
index 364e4c6..0000000
--- a/runtime/src/iree/hal/drivers/hip/pipeline_layout.h
+++ /dev/null
@@ -1,92 +0,0 @@
-// Copyright 2023 The IREE Authors
-//
-// Licensed under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-
-#ifndef IREE_HAL_DRIVERS_HIP_PIPELINE_LAYOUT_H_
-#define IREE_HAL_DRIVERS_HIP_PIPELINE_LAYOUT_H_
-
-#include "iree/base/api.h"
-#include "iree/hal/api.h"
-
-#ifdef __cplusplus
-extern "C" {
-#endif  // __cplusplus
-
-// The max number of bindings per descriptor set allowed in the HIP HAL
-// implementation.
-#define IREE_HAL_HIP_MAX_DESCRIPTOR_SET_BINDING_COUNT 16
-
-// The max number of descriptor sets allowed in the HIP HAL implementation.
-//
-// This depends on the general descriptor set planning in IREE and should adjust
-// with it.
-#define IREE_HAL_HIP_MAX_DESCRIPTOR_SET_COUNT 4
-
-// The max number of push constants supported by the HIP HAL implementation.
-#define IREE_HAL_HIP_MAX_PUSH_CONSTANT_COUNT 64
-
-//===----------------------------------------------------------------------===//
-// iree_hal_hip_descriptor_set_layout_t
-//===----------------------------------------------------------------------===//
-
-// Creates a descriptor set layout with the given |bindings|.
-//
-// Bindings in a descriptor set map to a list of consecutive kernel arguments in
-// HIP kernels.
-iree_status_t iree_hal_hip_descriptor_set_layout_create(
-    iree_hal_descriptor_set_layout_flags_t flags,
-    iree_host_size_t binding_count,
-    const iree_hal_descriptor_set_layout_binding_t* bindings,
-    iree_allocator_t host_allocator,
-    iree_hal_descriptor_set_layout_t** out_descriptor_set_layout);
-
-// Returns the binding count for the given descriptor set layout.
-iree_host_size_t iree_hal_hip_descriptor_set_layout_binding_count(
-    const iree_hal_descriptor_set_layout_t* descriptor_set_layout);
-
-//===----------------------------------------------------------------------===//
-// iree_hal_hip_pipeline_layout_t
-//===----------------------------------------------------------------------===//
-
-// Creates the pipeline layout with the given |set_layouts| and
-// |push_constant_count|.
-//
-// Bindings in the pipeline map to kernel arguments in HIP kernels, followed by
-// the kernel argument for the push constant data.
-iree_status_t iree_hal_hip_pipeline_layout_create(
-    iree_host_size_t set_layout_count,
-    iree_hal_descriptor_set_layout_t* const* set_layouts,
-    iree_host_size_t push_constant_count, iree_allocator_t host_allocator,
-    iree_hal_pipeline_layout_t** out_pipeline_layout);
-
-// Returns the total number of sets in the given |pipeline_layout|.
-iree_host_size_t iree_hal_hip_pipeline_layout_descriptor_set_count(
-    const iree_hal_pipeline_layout_t* pipeline_layout);
-
-// Returns the descriptor set layout of the given |set| in |pipeline_layout|.
-const iree_hal_descriptor_set_layout_t*
-iree_hal_hip_pipeline_layout_descriptor_set_layout(
-    const iree_hal_pipeline_layout_t* pipeline_layout, uint32_t set);
-
-// Returns the base kernel argument index for the given set.
-iree_host_size_t iree_hal_hip_pipeline_layout_base_binding_index(
-    const iree_hal_pipeline_layout_t* pipeline_layout, uint32_t set);
-
-typedef struct iree_hal_hip_dispatch_layout_t {
-  iree_host_size_t push_constant_base_index;
-  iree_host_size_t push_constant_count;
-  iree_host_size_t set_layout_count;
-  iree_host_size_t total_binding_count;
-} iree_hal_hip_dispatch_layout_t;
-
-// Returns dispatch layout parameters in a struct form for pipeline layout.
-iree_hal_hip_dispatch_layout_t iree_hal_hip_pipeline_layout_dispatch_layout(
-    const iree_hal_pipeline_layout_t* base_pipeline_layout);
-
-#ifdef __cplusplus
-}  // extern "C"
-#endif  // __cplusplus
-
-#endif  // IREE_HAL_DRIVERS_HIP_PIPELINE_LAYOUT_H_
diff --git a/runtime/src/iree/hal/drivers/hip/rccl_channel.c b/runtime/src/iree/hal/drivers/hip/rccl_channel.c
index 80791ff..1c5c6d2 100644
--- a/runtime/src/iree/hal/drivers/hip/rccl_channel.c
+++ b/runtime/src/iree/hal/drivers/hip/rccl_channel.c
@@ -593,9 +593,10 @@
     iree_string_view_t collective_str =
         iree_hal_collective_op_format(&entry->op, &string_temp);
     IREE_HAL_STREAM_TRACE_ZONE_BEGIN_EXTERNAL(
-        tracing_context, tracing_event_list, IREE_HAL_TRACING_VERBOSITY_FINE,
-        __FILE__, strlen(__FILE__), (uint32_t)__LINE__, __FUNCTION__,
-        strlen(__FUNCTION__), collective_str.data, collective_str.size);
+        tracing_context, tracing_event_list,
+        IREE_HAL_STREAM_TRACING_VERBOSITY_FINE, __FILE__, strlen(__FILE__),
+        (uint32_t)__LINE__, __FUNCTION__, strlen(__FUNCTION__),
+        collective_str.data, collective_str.size);
   }
 #endif  // IREE_TRACING_FEATURE_INSTRUMENTATION_DEVICE
 
@@ -613,7 +614,7 @@
   IREE_TRACE({
     for (iree_host_size_t i = 0; i < batch->count; ++i) {
       IREE_HAL_STREAM_TRACE_ZONE_END(tracing_context, tracing_event_list,
-                                     IREE_HAL_TRACING_VERBOSITY_FINE);
+                                     IREE_HAL_STREAM_TRACING_VERBOSITY_FINE);
     }
   });
 
diff --git a/runtime/src/iree/hal/drivers/hip/stream_command_buffer.c b/runtime/src/iree/hal/drivers/hip/stream_command_buffer.c
index d536612..ff3f95f 100644
--- a/runtime/src/iree/hal/drivers/hip/stream_command_buffer.c
+++ b/runtime/src/iree/hal/drivers/hip/stream_command_buffer.c
@@ -9,7 +9,6 @@
 
 #include "iree/hal/drivers/hip/hip_buffer.h"
 #include "iree/hal/drivers/hip/native_executable.h"
-#include "iree/hal/drivers/hip/pipeline_layout.h"
 #include "iree/hal/drivers/hip/rccl_channel.h"
 #include "iree/hal/drivers/hip/status_util.h"
 #include "iree/hal/utils/collective_batch.h"
@@ -40,12 +39,6 @@
 
   // Iteratively constructed batch of collective operations.
   iree_hal_collective_batch_t collective_batch;
-
-  // TODO(#18189): drop state used by legacy bindings mechanism.
-  int32_t push_constants[IREE_HAL_HIP_MAX_PUSH_CONSTANT_COUNT];
-  struct {
-    hipDeviceptr_t bindings[IREE_HAL_HIP_MAX_DESCRIPTOR_SET_BINDING_COUNT];
-  } descriptor_sets[IREE_HAL_HIP_MAX_DESCRIPTOR_SET_COUNT];
 } iree_hal_hip_stream_command_buffer_t;
 
 static const iree_hal_command_buffer_vtable_t
@@ -183,7 +176,7 @@
 
   IREE_HAL_STREAM_TRACE_ZONE_BEGIN_EXTERNAL(
       command_buffer->tracing_context, &command_buffer->tracing_event_list,
-      IREE_HAL_TRACING_VERBOSITY_COARSE,
+      IREE_HAL_STREAM_TRACING_VERBOSITY_COARSE,
       /*file_name=*/NULL, 0, /*line=*/0, "iree_hal_hip_stream_command_buffer",
       strlen("iree_hal_hip_stream_command_buffer"),
       /*name=*/NULL, 0);
@@ -214,7 +207,7 @@
 
   IREE_HAL_STREAM_TRACE_ZONE_END(command_buffer->tracing_context,
                                  &command_buffer->tracing_event_list,
-                                 IREE_HAL_TRACING_VERBOSITY_COARSE);
+                                 IREE_HAL_STREAM_TRACING_VERBOSITY_COARSE);
 
   IREE_TRACE_ZONE_END(z0);
   return iree_ok_status();
@@ -230,8 +223,9 @@
 
   IREE_HAL_STREAM_TRACE_ZONE_BEGIN_EXTERNAL(
       command_buffer->tracing_context, &command_buffer->tracing_event_list,
-      IREE_HAL_TRACING_VERBOSITY_COARSE, location ? location->file.data : NULL,
-      location ? location->file.size : 0, location ? location->line : 0,
+      IREE_HAL_STREAM_TRACING_VERBOSITY_COARSE,
+      location ? location->file.data : NULL, location ? location->file.size : 0,
+      location ? location->line : 0,
       /*func_name=*/NULL, 0, label.data, label.size);
 }
 
@@ -243,7 +237,7 @@
 
   IREE_HAL_STREAM_TRACE_ZONE_END(command_buffer->tracing_context,
                                  &command_buffer->tracing_event_list,
-                                 IREE_HAL_TRACING_VERBOSITY_COARSE);
+                                 IREE_HAL_STREAM_TRACING_VERBOSITY_COARSE);
 }
 
 static iree_status_t iree_hal_hip_stream_command_buffer_execution_barrier(
@@ -465,175 +459,9 @@
   return status;
 }
 
-static iree_status_t iree_hal_hip_stream_command_buffer_push_constants(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_pipeline_layout_t* pipeline_layout, iree_host_size_t offset,
-    const void* values, iree_host_size_t values_length) {
-  iree_hal_hip_stream_command_buffer_t* command_buffer =
-      iree_hal_hip_stream_command_buffer_cast(base_command_buffer);
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  iree_host_size_t constant_base_index = offset / sizeof(int32_t);
-  for (iree_host_size_t i = 0; i < values_length / sizeof(int32_t); i++) {
-    command_buffer->push_constants[i + constant_base_index] =
-        ((uint32_t*)values)[i];
-  }
-
-  IREE_TRACE_ZONE_END(z0);
-  return iree_ok_status();
-}
-
-static iree_status_t iree_hal_hip_stream_command_buffer_push_descriptor_set(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_pipeline_layout_t* pipeline_layout, uint32_t set,
-    iree_host_size_t binding_count, const iree_hal_buffer_ref_t* bindings) {
-  if (binding_count > IREE_HAL_HIP_MAX_DESCRIPTOR_SET_BINDING_COUNT) {
-    return iree_make_status(
-        IREE_STATUS_RESOURCE_EXHAUSTED,
-        "exceeded available binding slots for push "
-        "descriptor set #%" PRIu32 "; requested %" PRIhsz " vs. maximal %d",
-        set, binding_count, IREE_HAL_HIP_MAX_DESCRIPTOR_SET_BINDING_COUNT);
-  }
-
-  iree_hal_hip_stream_command_buffer_t* command_buffer =
-      iree_hal_hip_stream_command_buffer_cast(base_command_buffer);
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  hipDeviceptr_t* current_bindings =
-      command_buffer->descriptor_sets[set].bindings;
-  for (iree_host_size_t i = 0; i < binding_count; i++) {
-    const iree_hal_buffer_ref_t* binding = &bindings[i];
-    hipDeviceptr_t device_ptr = NULL;
-    if (binding->buffer) {
-      IREE_RETURN_AND_END_ZONE_IF_ERROR(
-          z0, iree_hal_resource_set_insert(command_buffer->resource_set, 1,
-                                           &binding->buffer));
-
-      hipDeviceptr_t device_buffer = iree_hal_hip_buffer_device_pointer(
-          iree_hal_buffer_allocated_buffer(binding->buffer));
-      iree_device_size_t offset = iree_hal_buffer_byte_offset(binding->buffer);
-      device_ptr = (uint8_t*)device_buffer + offset + binding->offset;
-    }
-    current_bindings[binding->ordinal] = device_ptr;
-  }
-
-  IREE_TRACE_ZONE_END(z0);
-  return iree_ok_status();
-}
-
 static iree_status_t iree_hal_hip_stream_command_buffer_dispatch(
     iree_hal_command_buffer_t* base_command_buffer,
     iree_hal_executable_t* executable, int32_t entry_point,
-    uint32_t workgroup_x, uint32_t workgroup_y, uint32_t workgroup_z,
-    iree_hal_dispatch_flags_t flags) {
-  iree_hal_hip_stream_command_buffer_t* command_buffer =
-      iree_hal_hip_stream_command_buffer_cast(base_command_buffer);
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_hal_hip_stream_command_buffer_flush_collectives(command_buffer));
-
-  // Lookup kernel parameters used for side-channeling additional launch
-  // information from the compiler.
-  iree_hal_hip_kernel_info_t kernel_info;
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_hal_hip_native_executable_entry_point_kernel_info(
-              executable, entry_point, &kernel_info));
-
-  IREE_HAL_STREAM_TRACE_ZONE_BEGIN_EXTERNAL(
-      command_buffer->tracing_context, &command_buffer->tracing_event_list,
-      IREE_HAL_TRACING_VERBOSITY_FINE, kernel_info.source_filename.data,
-      kernel_info.source_filename.size, kernel_info.source_line,
-      kernel_info.function_name.data, kernel_info.function_name.size,
-      /*name=*/NULL, 0);
-
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_hal_resource_set_insert(command_buffer->resource_set, 1,
-                                       &executable));
-
-  iree_hal_hip_dispatch_layout_t dispatch_layout =
-      iree_hal_hip_pipeline_layout_dispatch_layout(kernel_info.layout);
-
-  // The total number of descriptors across all descriptor sets.
-  iree_host_size_t descriptor_count = dispatch_layout.total_binding_count;
-  // The total number of push constants.
-  iree_host_size_t push_constant_count = dispatch_layout.push_constant_count;
-  // We append push constants to the end of descriptors to form a linear chain
-  // of kernel arguments.
-  iree_host_size_t kernel_params_count = descriptor_count + push_constant_count;
-  iree_host_size_t kernel_params_length = kernel_params_count * sizeof(void*);
-
-  // Each kernel_params[i] is itself a pointer to the corresponding
-  // element at the *second* inline allocation at the end of the current
-  // segment.
-  iree_host_size_t total_size = kernel_params_length * 2;
-  uint8_t* storage_base = NULL;
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_arena_allocate(&command_buffer->arena, total_size,
-                              (void**)&storage_base));
-  void** params_ptr = (void**)storage_base;
-
-  // Set up kernel arguments to point to the payload slots.
-  hipDeviceptr_t* payload_ptr =
-      (hipDeviceptr_t*)((uint8_t*)params_ptr + kernel_params_length);
-  for (size_t i = 0; i < kernel_params_count; i++) {
-    params_ptr[i] = &payload_ptr[i];
-  }
-
-  // Copy descriptors from all sets to the end of the current segment for later
-  // access.
-  iree_host_size_t set_count = dispatch_layout.set_layout_count;
-  for (iree_host_size_t i = 0; i < set_count; ++i) {
-    // TODO: cache this information in the kernel info to avoid recomputation.
-    iree_host_size_t binding_count =
-        iree_hal_hip_descriptor_set_layout_binding_count(
-            iree_hal_hip_pipeline_layout_descriptor_set_layout(
-                kernel_info.layout, i));
-    iree_host_size_t index =
-        iree_hal_hip_pipeline_layout_base_binding_index(kernel_info.layout, i);
-    memcpy(payload_ptr + index, command_buffer->descriptor_sets[i].bindings,
-           binding_count * sizeof(hipDeviceptr_t));
-  }
-
-  // Append the push constants to the kernel arguments.
-  iree_host_size_t base_index = dispatch_layout.push_constant_base_index;
-  // As commented in the above, what each kernel parameter points to is a
-  // hipDeviceptr_t, which as the size of a pointer on the target machine. we
-  // are just storing a 32-bit value for the push constant here instead. So we
-  // must process one element each type, for 64-bit machines.
-  for (iree_host_size_t i = 0; i < push_constant_count; i++) {
-    *((uint32_t*)params_ptr[base_index + i]) =
-        command_buffer->push_constants[i];
-  }
-
-  iree_status_t status = IREE_HIP_RESULT_TO_STATUS(
-      command_buffer->hip_symbols,
-      hipModuleLaunchKernel(
-          kernel_info.function, workgroup_x, workgroup_y, workgroup_z,
-          kernel_info.block_size[0], kernel_info.block_size[1],
-          kernel_info.block_size[2], kernel_info.shared_memory_size,
-          command_buffer->hip_stream, params_ptr, NULL),
-      "hipModuleLaunchKernel");
-
-  IREE_HAL_STREAM_TRACE_ZONE_END(command_buffer->tracing_context,
-                                 &command_buffer->tracing_event_list,
-                                 IREE_HAL_TRACING_VERBOSITY_FINE);
-
-  IREE_TRACE_ZONE_END(z0);
-  return status;
-}
-
-static iree_status_t iree_hal_hip_stream_command_buffer_dispatch_indirect(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_executable_t* executable, int32_t entry_point,
-    iree_hal_buffer_ref_t workgroups_ref, iree_hal_dispatch_flags_t flags) {
-  return iree_make_status(IREE_STATUS_UNIMPLEMENTED,
-                          "need hip implementation of dispatch indirect");
-}
-
-static iree_status_t iree_hal_hip_stream_command_buffer_dispatch2(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_executable_t* executable, int32_t entry_point,
     const uint32_t workgroup_count[3], iree_const_byte_span_t constants,
     iree_hal_buffer_ref_list_t bindings, iree_hal_dispatch_flags_t flags) {
   iree_hal_hip_stream_command_buffer_t* command_buffer =
@@ -645,16 +473,19 @@
 
   // Lookup kernel parameters used for side-channeling additional launch
   // information from the compiler.
-  iree_hal_hip_kernel_info_t kernel_info;
+  const iree_hal_hip_kernel_params_t* kernel_params = NULL;
   IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_hal_hip_native_executable_entry_point_kernel_info(
-              executable, entry_point, &kernel_info));
+      z0, iree_hal_hip_native_executable_lookup_kernel_params(
+              executable, entry_point, &kernel_params));
 
   IREE_HAL_STREAM_TRACE_ZONE_BEGIN_EXTERNAL(
       command_buffer->tracing_context, &command_buffer->tracing_event_list,
-      IREE_HAL_TRACING_VERBOSITY_FINE, kernel_info.source_filename.data,
-      kernel_info.source_filename.size, kernel_info.source_line,
-      kernel_info.function_name.data, kernel_info.function_name.size,
+      IREE_HAL_STREAM_TRACING_VERBOSITY_FINE,
+      kernel_params->debug_info.source_filename.data,
+      kernel_params->debug_info.source_filename.size,
+      kernel_params->debug_info.source_line,
+      kernel_params->debug_info.function_name.data,
+      kernel_params->debug_info.function_name.size,
       /*name=*/NULL, 0);
 
   IREE_RETURN_AND_END_ZONE_IF_ERROR(
@@ -664,7 +495,7 @@
   // We append push constants to the end of descriptors to form a linear chain
   // of kernel arguments.
   iree_host_size_t kernel_params_count =
-      kernel_info.binding_count + kernel_info.constant_count;
+      kernel_params->binding_count + kernel_params->constant_count;
   iree_host_size_t kernel_params_length = kernel_params_count * sizeof(void*);
 
   // TODO: use packed parameters instead of the indirection mechanism - this
@@ -703,30 +534,30 @@
   // hipDeviceptr_t, which as the size of a pointer on the target machine. we
   // are just storing a 32-bit value for the push constant here instead. So we
   // must process one element each type, for 64-bit machines.
-  for (iree_host_size_t i = 0; i < kernel_info.constant_count; i++) {
-    *((uint32_t*)params_ptr[kernel_info.binding_count + i]) =
+  for (iree_host_size_t i = 0; i < kernel_params->constant_count; i++) {
+    *((uint32_t*)params_ptr[kernel_params->binding_count + i]) =
         ((const uint32_t*)constants.data)[i];
   }
 
   iree_status_t status = IREE_HIP_RESULT_TO_STATUS(
       command_buffer->hip_symbols,
       hipModuleLaunchKernel(
-          kernel_info.function, workgroup_count[0], workgroup_count[1],
-          workgroup_count[2], kernel_info.block_size[0],
-          kernel_info.block_size[1], kernel_info.block_size[2],
-          kernel_info.shared_memory_size, command_buffer->hip_stream,
+          kernel_params->function, workgroup_count[0], workgroup_count[1],
+          workgroup_count[2], kernel_params->block_dims[0],
+          kernel_params->block_dims[1], kernel_params->block_dims[2],
+          kernel_params->block_shared_memory_size, command_buffer->hip_stream,
           params_ptr, NULL),
       "hipModuleLaunchKernel");
 
   IREE_HAL_STREAM_TRACE_ZONE_END(command_buffer->tracing_context,
                                  &command_buffer->tracing_event_list,
-                                 IREE_HAL_TRACING_VERBOSITY_FINE);
+                                 IREE_HAL_STREAM_TRACING_VERBOSITY_FINE);
 
   IREE_TRACE_ZONE_END(z0);
   return status;
 }
 
-static iree_status_t iree_hal_hip_stream_command_buffer_dispatch2_indirect(
+static iree_status_t iree_hal_hip_stream_command_buffer_dispatch_indirect(
     iree_hal_command_buffer_t* base_command_buffer,
     iree_hal_executable_t* executable, int32_t entry_point,
     iree_hal_buffer_ref_t workgroups_ref, iree_const_byte_span_t constants,
@@ -753,13 +584,7 @@
         .update_buffer = iree_hal_hip_stream_command_buffer_update_buffer,
         .copy_buffer = iree_hal_hip_stream_command_buffer_copy_buffer,
         .collective = iree_hal_hip_stream_command_buffer_collective,
-        .push_constants = iree_hal_hip_stream_command_buffer_push_constants,
-        .push_descriptor_set =
-            iree_hal_hip_stream_command_buffer_push_descriptor_set,
         .dispatch = iree_hal_hip_stream_command_buffer_dispatch,
         .dispatch_indirect =
             iree_hal_hip_stream_command_buffer_dispatch_indirect,
-        .dispatch2 = iree_hal_hip_stream_command_buffer_dispatch2,
-        .dispatch2_indirect =
-            iree_hal_hip_stream_command_buffer_dispatch2_indirect,
 };
diff --git a/runtime/src/iree/hal/drivers/local_sync/sync_device.c b/runtime/src/iree/hal/drivers/local_sync/sync_device.c
index b2ce7d7..8445241 100644
--- a/runtime/src/iree/hal/drivers/local_sync/sync_device.c
+++ b/runtime/src/iree/hal/drivers/local_sync/sync_device.c
@@ -17,7 +17,6 @@
 #include "iree/hal/local/executable_environment.h"
 #include "iree/hal/local/inline_command_buffer.h"
 #include "iree/hal/local/local_executable_cache.h"
-#include "iree/hal/local/local_pipeline_layout.h"
 #include "iree/hal/utils/deferred_command_buffer.h"
 #include "iree/hal/utils/file_transfer.h"
 #include "iree/hal/utils/memory_file.h"
@@ -247,17 +246,6 @@
   }
 }
 
-static iree_status_t iree_hal_sync_device_create_descriptor_set_layout(
-    iree_hal_device_t* base_device,
-    iree_hal_descriptor_set_layout_flags_t flags,
-    iree_host_size_t binding_count,
-    const iree_hal_descriptor_set_layout_binding_t* bindings,
-    iree_hal_descriptor_set_layout_t** out_descriptor_set_layout) {
-  return iree_hal_local_descriptor_set_layout_create(
-      flags, binding_count, bindings,
-      iree_hal_device_host_allocator(base_device), out_descriptor_set_layout);
-}
-
 static iree_status_t iree_hal_sync_device_create_event(
     iree_hal_device_t* base_device, iree_hal_queue_affinity_t queue_affinity,
     iree_hal_event_flags_t flags, iree_hal_event_t** out_event) {
@@ -290,16 +278,6 @@
       iree_hal_device_host_allocator(base_device), out_file);
 }
 
-static iree_status_t iree_hal_sync_device_create_pipeline_layout(
-    iree_hal_device_t* base_device, iree_host_size_t push_constants,
-    iree_host_size_t set_layout_count,
-    iree_hal_descriptor_set_layout_t* const* set_layouts,
-    iree_hal_pipeline_layout_t** out_pipeline_layout) {
-  return iree_hal_local_pipeline_layout_create(
-      push_constants, set_layout_count, set_layouts,
-      iree_hal_device_host_allocator(base_device), out_pipeline_layout);
-}
-
 static iree_status_t iree_hal_sync_device_create_semaphore(
     iree_hal_device_t* base_device, uint64_t initial_value,
     iree_hal_semaphore_flags_t flags, iree_hal_semaphore_t** out_semaphore) {
@@ -540,12 +518,9 @@
     .query_i64 = iree_hal_sync_device_query_i64,
     .create_channel = iree_hal_sync_device_create_channel,
     .create_command_buffer = iree_hal_sync_device_create_command_buffer,
-    .create_descriptor_set_layout =
-        iree_hal_sync_device_create_descriptor_set_layout,
     .create_event = iree_hal_sync_device_create_event,
     .create_executable_cache = iree_hal_sync_device_create_executable_cache,
     .import_file = iree_hal_sync_device_import_file,
-    .create_pipeline_layout = iree_hal_sync_device_create_pipeline_layout,
     .create_semaphore = iree_hal_sync_device_create_semaphore,
     .query_semaphore_compatibility =
         iree_hal_sync_device_query_semaphore_compatibility,
diff --git a/runtime/src/iree/hal/drivers/local_task/task_command_buffer.c b/runtime/src/iree/hal/drivers/local_task/task_command_buffer.c
index 3b0a9ba..0e60669 100644
--- a/runtime/src/iree/hal/drivers/local_task/task_command_buffer.c
+++ b/runtime/src/iree/hal/drivers/local_task/task_command_buffer.c
@@ -15,7 +15,6 @@
 #include "iree/hal/local/executable_environment.h"
 #include "iree/hal/local/executable_library.h"
 #include "iree/hal/local/local_executable.h"
-#include "iree/hal/local/local_pipeline_layout.h"
 #include "iree/hal/utils/resource_set.h"
 #include "iree/task/affinity_set.h"
 #include "iree/task/list.h"
@@ -77,24 +76,6 @@
 
     // All execution tasks emitted that must execute after |open_barrier|.
     iree_task_list_t open_tasks;
-
-    // TODO(#18189): remove legacy binding state.
-    // A flattened list of all available descriptor set bindings.
-    // As descriptor sets are pushed/bound the bindings will be updated to
-    // represent the fully-translated binding data pointer.
-    // TODO(benvanik): support proper mapping semantics and track the
-    // iree_hal_buffer_mapping_t and map/unmap where appropriate.
-    void* bindings[IREE_HAL_LOCAL_MAX_DESCRIPTOR_SET_COUNT *
-                   IREE_HAL_LOCAL_MAX_DESCRIPTOR_BINDING_COUNT];
-    iree_device_size_t
-        binding_lengths[IREE_HAL_LOCAL_MAX_DESCRIPTOR_SET_COUNT *
-                        IREE_HAL_LOCAL_MAX_DESCRIPTOR_BINDING_COUNT];
-
-    // TODO(#18189): remove legacy push constant state.
-    // All available push constants updated each time push_constants is called.
-    // Reset only with the command buffer and otherwise will maintain its values
-    // during recording to allow for partial push_constants updates.
-    uint32_t push_constants[IREE_HAL_LOCAL_MAX_PUSH_CONSTANT_COUNT];
   } state;
 } iree_hal_task_command_buffer_t;
 
@@ -723,84 +704,6 @@
 }
 
 //===----------------------------------------------------------------------===//
-// iree_hal_command_buffer_push_constants
-//===----------------------------------------------------------------------===//
-// NOTE: command buffer state change only; enqueues no tasks.
-
-static iree_status_t iree_hal_task_command_buffer_push_constants(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_pipeline_layout_t* pipeline_layout, iree_host_size_t offset,
-    const void* values, iree_host_size_t values_length) {
-  iree_hal_task_command_buffer_t* command_buffer =
-      iree_hal_task_command_buffer_cast(base_command_buffer);
-
-  if (IREE_UNLIKELY(offset + values_length >=
-                    sizeof(command_buffer->state.push_constants))) {
-    return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
-                            "push constant range %" PRIhsz " (length=%" PRIhsz
-                            ") out of range",
-                            offset, values_length);
-  }
-
-  memcpy((uint8_t*)&command_buffer->state.push_constants + offset, values,
-         values_length);
-
-  return iree_ok_status();
-}
-
-//===----------------------------------------------------------------------===//
-// iree_hal_command_buffer_push_descriptor_set
-//===----------------------------------------------------------------------===//
-// NOTE: command buffer state change only; enqueues no tasks.
-
-static iree_status_t iree_hal_task_command_buffer_push_descriptor_set(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_pipeline_layout_t* pipeline_layout, uint32_t set,
-    iree_host_size_t binding_count, const iree_hal_buffer_ref_t* bindings) {
-  iree_hal_task_command_buffer_t* command_buffer =
-      iree_hal_task_command_buffer_cast(base_command_buffer);
-
-  if (IREE_UNLIKELY(set >= IREE_HAL_LOCAL_MAX_DESCRIPTOR_SET_COUNT)) {
-    return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
-                            "set %u out of bounds", set);
-  }
-
-  iree_host_size_t binding_base =
-      set * IREE_HAL_LOCAL_MAX_DESCRIPTOR_BINDING_COUNT;
-  for (iree_host_size_t i = 0; i < binding_count; ++i) {
-    if (IREE_UNLIKELY(bindings[i].ordinal >=
-                      IREE_HAL_LOCAL_MAX_DESCRIPTOR_BINDING_COUNT)) {
-      return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
-                              "buffer binding index out of bounds");
-    }
-    iree_host_size_t binding_ordinal = binding_base + bindings[i].ordinal;
-
-    // TODO(benvanik): batch insert by getting the resources in their own list.
-    IREE_RETURN_IF_ERROR(iree_hal_resource_set_insert(
-        command_buffer->resource_set, 1, &bindings[i].buffer));
-
-    // TODO(benvanik): track mapping so we can properly map/unmap/flush/etc.
-    iree_hal_buffer_mapping_t buffer_mapping = {{0}};
-    if (bindings[i].buffer) {
-      IREE_RETURN_IF_ERROR(iree_hal_buffer_map_range(
-          bindings[i].buffer, IREE_HAL_MAPPING_MODE_PERSISTENT,
-          IREE_HAL_MEMORY_ACCESS_ANY, bindings[i].offset, bindings[i].length,
-          &buffer_mapping));
-      command_buffer->state.bindings[binding_ordinal] =
-          buffer_mapping.contents.data;
-      command_buffer->state.binding_lengths[binding_ordinal] =
-          buffer_mapping.contents.data_length;
-    } else {
-      // TODO(#10144): stash indirect binding reference in the state table.
-      return iree_make_status(IREE_STATUS_UNIMPLEMENTED,
-                              "binding table lookup not yet supported");
-    }
-  }
-
-  return iree_ok_status();
-}
-
-//===----------------------------------------------------------------------===//
 // iree_hal_command_buffer_dispatch
 //===----------------------------------------------------------------------===//
 
@@ -809,8 +712,8 @@
   iree_hal_local_executable_t* executable;
   int32_t ordinal;
 
-  // Total number of available 4 byte push constant values in |push_constants|.
-  uint16_t push_constant_count;
+  // Total number of available 4 byte push constant values in |constants|.
+  uint16_t constant_count;
 
   // Total number of binding base pointers in |binding_ptrs| and
   // |binding_lengths|. The set is packed densely based on which bindings are
@@ -818,7 +721,7 @@
   uint16_t binding_count;
 
   // Following this structure in memory there are 3 tables:
-  // - const uint32_t push_constants[push_constant_count];
+  // - const uint32_t constants[constant_count];
   // - void* binding_ptrs[binding_count];
   // - const size_t binding_lengths[binding_count];
 } iree_hal_task_cmd_dispatch_t;
@@ -839,7 +742,7 @@
       .workgroup_size_x = tile_context->workgroup_size[0],
       .workgroup_size_y = tile_context->workgroup_size[1],
       .workgroup_size_z = tile_context->workgroup_size[2],
-      .push_constant_count = cmd->push_constant_count,
+      .constant_count = cmd->constant_count,
       .workgroup_count_x = tile_context->workgroup_count[0],
       .workgroup_count_y = tile_context->workgroup_count[1],
       .workgroup_count_z = tile_context->workgroup_count[2],
@@ -848,8 +751,8 @@
       .binding_count = cmd->binding_count,
   };
   uint8_t* cmd_ptr = (uint8_t*)cmd + sizeof(*cmd);
-  dispatch_state.push_constants = (uint32_t*)cmd_ptr;
-  cmd_ptr += cmd->push_constant_count * sizeof(*dispatch_state.push_constants);
+  dispatch_state.constants = (uint32_t*)cmd_ptr;
+  cmd_ptr += cmd->constant_count * sizeof(*dispatch_state.constants);
   dispatch_state.binding_ptrs = (void**)cmd_ptr;
   cmd_ptr += cmd->binding_count * sizeof(*dispatch_state.binding_ptrs);
   dispatch_state.binding_lengths = (size_t*)cmd_ptr;
@@ -876,223 +779,9 @@
 static iree_status_t iree_hal_task_command_buffer_build_dispatch(
     iree_hal_command_buffer_t* base_command_buffer,
     iree_hal_executable_t* executable, int32_t entry_point,
-    uint32_t workgroup_x, uint32_t workgroup_y, uint32_t workgroup_z,
-    iree_hal_task_cmd_dispatch_t** out_cmd) {
-  iree_hal_task_command_buffer_t* command_buffer =
-      iree_hal_task_command_buffer_cast(base_command_buffer);
-
-  iree_hal_local_executable_t* local_executable =
-      iree_hal_local_executable_cast(executable);
-  if (IREE_UNLIKELY(!local_executable->pipeline_layouts)) {
-    return iree_make_status(
-        IREE_STATUS_FAILED_PRECONDITION,
-        "layouts not provided during executable creation; cannot dispatch");
-  }
-
-  iree_hal_local_pipeline_layout_t* local_layout =
-      (iree_hal_local_pipeline_layout_t*)
-          local_executable->pipeline_layouts[entry_point];
-  iree_host_size_t push_constant_count = local_layout->push_constants;
-  iree_hal_local_binding_mask_t used_binding_mask = local_layout->used_bindings;
-  iree_host_size_t used_binding_count =
-      iree_math_count_ones_u64(used_binding_mask);
-
-  // To save a few command buffer bytes we narrow these:
-  if (IREE_UNLIKELY(push_constant_count >= UINT16_MAX) ||
-      IREE_UNLIKELY(used_binding_count >= UINT16_MAX)) {
-    return iree_make_status(IREE_STATUS_RESOURCE_EXHAUSTED,
-                            "too many bindings/push constants");
-  }
-
-  iree_hal_task_cmd_dispatch_t* cmd = NULL;
-  iree_host_size_t total_cmd_size =
-      sizeof(*cmd) + push_constant_count * sizeof(uint32_t) +
-      used_binding_count * sizeof(void*) +
-      used_binding_count * sizeof(iree_device_size_t);
-  IREE_RETURN_IF_ERROR(iree_arena_allocate(&command_buffer->arena,
-                                           total_cmd_size, (void**)&cmd));
-
-  cmd->executable = local_executable;
-  cmd->ordinal = entry_point;
-  cmd->push_constant_count = push_constant_count;
-  cmd->binding_count = used_binding_count;
-
-  const uint32_t workgroup_count[3] = {workgroup_x, workgroup_y, workgroup_z};
-  // TODO(benvanik): expose on API or keep fixed on executable.
-  const uint32_t workgroup_size[3] = {1, 1, 1};
-  iree_task_dispatch_initialize(
-      command_buffer->scope,
-      iree_task_make_dispatch_closure(iree_hal_task_cmd_dispatch_tile,
-                                      (void*)cmd),
-      workgroup_size, workgroup_count, &cmd->task);
-
-  // Tell the task system how much workgroup local memory is required for the
-  // dispatch; each invocation of the entry point will have at least as much
-  // scratch memory available during execution.
-  cmd->task.local_memory_size =
-      local_executable->dispatch_attrs
-          ? local_executable->dispatch_attrs[entry_point].local_memory_pages *
-                IREE_HAL_EXECUTABLE_WORKGROUP_LOCAL_MEMORY_PAGE_SIZE
-          : 0;
-
-  // Copy only the push constant range used by the executable.
-  uint8_t* cmd_ptr = (uint8_t*)cmd + sizeof(*cmd);
-  uint32_t* push_constants = (uint32_t*)cmd_ptr;
-  memcpy(push_constants, command_buffer->state.push_constants,
-         push_constant_count * sizeof(*push_constants));
-  cmd_ptr += push_constant_count * sizeof(*push_constants);
-
-  // Produce the dense binding list based on the declared bindings used.
-  // This allows us to change the descriptor sets and bindings counts supported
-  // in the HAL independent of any executable as each executable just gets the
-  // flat dense list and doesn't care about our descriptor set stuff.
-  //
-  // Note that we are just directly setting the binding data pointers here with
-  // no ownership/retaining/etc - it's part of the HAL contract that buffers are
-  // kept valid for the duration they may be in use.
-  void** binding_ptrs = (void**)cmd_ptr;
-  cmd_ptr += used_binding_count * sizeof(*binding_ptrs);
-  size_t* binding_lengths = (size_t*)cmd_ptr;
-  cmd_ptr += used_binding_count * sizeof(*binding_lengths);
-  iree_host_size_t binding_base = 0;
-  for (iree_host_size_t i = 0; i < used_binding_count; ++i) {
-    int mask_offset = iree_math_count_trailing_zeros_u64(used_binding_mask);
-    int binding_ordinal = binding_base + mask_offset;
-    binding_base += mask_offset + 1;
-    used_binding_mask = iree_shr(used_binding_mask, mask_offset + 1);
-    binding_ptrs[i] = command_buffer->state.bindings[binding_ordinal];
-    binding_lengths[i] = command_buffer->state.binding_lengths[binding_ordinal];
-    if (!binding_ptrs[i]) {
-      return iree_make_status(IREE_STATUS_FAILED_PRECONDITION,
-                              "(flat) binding %d is NULL", binding_ordinal);
-    }
-  }
-
-  *out_cmd = cmd;
-  return iree_hal_task_command_buffer_emit_execution_task(command_buffer,
-                                                          &cmd->task.header);
-}
-
-static iree_status_t iree_hal_task_command_buffer_dispatch(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_executable_t* executable, int32_t entry_point,
-    uint32_t workgroup_x, uint32_t workgroup_y, uint32_t workgroup_z,
-    iree_hal_dispatch_flags_t flags) {
-  iree_hal_task_command_buffer_t* command_buffer =
-      iree_hal_task_command_buffer_cast(base_command_buffer);
-  IREE_RETURN_IF_ERROR(iree_hal_resource_set_insert(
-      command_buffer->resource_set, 1, &executable));
-  iree_hal_task_cmd_dispatch_t* cmd = NULL;
-  return iree_hal_task_command_buffer_build_dispatch(
-      base_command_buffer, executable, entry_point, workgroup_x, workgroup_y,
-      workgroup_z, &cmd);
-}
-
-static iree_status_t iree_hal_task_command_buffer_dispatch_indirect(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_executable_t* executable, int32_t entry_point,
-    iree_hal_buffer_ref_t workgroups_ref, iree_hal_dispatch_flags_t flags) {
-  iree_hal_task_command_buffer_t* command_buffer =
-      iree_hal_task_command_buffer_cast(base_command_buffer);
-
-  const void* resources[2] = {executable, workgroups_ref.buffer};
-  IREE_RETURN_IF_ERROR(
-      iree_hal_resource_set_insert(command_buffer->resource_set, 2, resources));
-
-  // TODO(benvanik): track mapping so we can properly map/unmap/flush/etc.
-  iree_hal_buffer_mapping_t buffer_mapping = {{0}};
-  IREE_RETURN_IF_ERROR(iree_hal_buffer_map_range(
-      workgroups_ref.buffer, IREE_HAL_MAPPING_MODE_PERSISTENT,
-      IREE_HAL_MEMORY_ACCESS_READ, workgroups_ref.offset, 3 * sizeof(uint32_t),
-      &buffer_mapping));
-
-  iree_hal_task_cmd_dispatch_t* cmd = NULL;
-  IREE_RETURN_IF_ERROR(iree_hal_task_command_buffer_build_dispatch(
-      base_command_buffer, executable, entry_point, 0, 0, 0, &cmd));
-  cmd->task.workgroup_count.ptr = (const uint32_t*)buffer_mapping.contents.data;
-  cmd->task.header.flags |= IREE_TASK_FLAG_DISPATCH_INDIRECT;
-  return iree_ok_status();
-}
-
-//===----------------------------------------------------------------------===//
-// iree_hal_command_buffer_dispatch2
-//===----------------------------------------------------------------------===//
-
-typedef struct iree_hal_task_cmd_dispatch2_t {
-  iree_task_dispatch_t task;
-  iree_hal_local_executable_t* executable;
-  int32_t ordinal;
-
-  // Total number of available 4 byte push constant values in |push_constants|.
-  uint16_t push_constant_count;
-
-  // Total number of binding base pointers in |binding_ptrs| and
-  // |binding_lengths|. The set is packed densely based on which bindings are
-  // used (known at compile-time).
-  uint16_t binding_count;
-
-  // Following this structure in memory there are 3 tables:
-  // - const uint32_t push_constants[push_constant_count];
-  // - void* binding_ptrs[binding_count];
-  // - const size_t binding_lengths[binding_count];
-} iree_hal_task_cmd_dispatch2_t;
-
-static iree_status_t iree_hal_task_cmd_dispatch2_tile(
-    void* user_context, const iree_task_tile_context_t* tile_context,
-    iree_task_submission_t* pending_submission) {
-  const iree_hal_task_cmd_dispatch2_t* cmd =
-      (const iree_hal_task_cmd_dispatch2_t*)user_context;
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  // We could share this across all workgroups in a dispatch and reduce cache
-  // pressure as all cores would be hitting the same hot read-only cache line.
-  // It'd grow the size of iree_hal_task_cmd_dispatch_t by a few dozen bytes,
-  // though, and so we'd need some profiling to see if it's worth it (fixed
-  // command buffer cost vs potential for saving a cache miss or two).
-  iree_alignas(64) iree_hal_executable_dispatch_state_v0_t dispatch_state = {
-      .workgroup_size_x = tile_context->workgroup_size[0],
-      .workgroup_size_y = tile_context->workgroup_size[1],
-      .workgroup_size_z = tile_context->workgroup_size[2],
-      .push_constant_count = cmd->push_constant_count,
-      .workgroup_count_x = tile_context->workgroup_count[0],
-      .workgroup_count_y = tile_context->workgroup_count[1],
-      .workgroup_count_z = tile_context->workgroup_count[2],
-      .max_concurrency =
-          iree_task_affinity_set_count_ones(cmd->task.header.affinity_set),
-      .binding_count = cmd->binding_count,
-  };
-  uint8_t* cmd_ptr = (uint8_t*)cmd + sizeof(*cmd);
-  dispatch_state.push_constants = (uint32_t*)cmd_ptr;
-  cmd_ptr += cmd->push_constant_count * sizeof(*dispatch_state.push_constants);
-  dispatch_state.binding_ptrs = (void**)cmd_ptr;
-  cmd_ptr += cmd->binding_count * sizeof(*dispatch_state.binding_ptrs);
-  dispatch_state.binding_lengths = (size_t*)cmd_ptr;
-  cmd_ptr += cmd->binding_count * sizeof(*dispatch_state.binding_lengths);
-
-  const iree_alignas(64)
-      iree_hal_executable_workgroup_state_v0_t workgroup_state = {
-          .workgroup_id_x = tile_context->workgroup_xyz[0],
-          .workgroup_id_y = tile_context->workgroup_xyz[1],
-          .workgroup_id_z = tile_context->workgroup_xyz[2],
-          .reserved = 0,
-          .processor_id = tile_context->processor_id,
-          .local_memory = tile_context->local_memory.data,
-          .local_memory_size = (size_t)tile_context->local_memory.data_length,
-      };
-  iree_status_t status = iree_hal_local_executable_issue_call(
-      cmd->executable, cmd->ordinal, &dispatch_state, &workgroup_state,
-      tile_context->worker_id);
-
-  IREE_TRACE_ZONE_END(z0);
-  return status;
-}
-
-static iree_status_t iree_hal_task_command_buffer_build_dispatch2(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_executable_t* executable, int32_t entry_point,
     const uint32_t workgroup_count[3], iree_const_byte_span_t constants,
     iree_hal_buffer_ref_list_t bindings,
-    iree_hal_task_cmd_dispatch2_t** out_cmd) {
+    iree_hal_task_cmd_dispatch_t** out_cmd) {
   iree_hal_task_command_buffer_t* command_buffer =
       iree_hal_task_command_buffer_cast(base_command_buffer);
 
@@ -1103,7 +792,7 @@
     dispatch_attrs = local_executable->dispatch_attrs[entry_point];
   }
 
-  iree_hal_task_cmd_dispatch2_t* cmd = NULL;
+  iree_hal_task_cmd_dispatch_t* cmd = NULL;
   iree_host_size_t total_cmd_size =
       sizeof(*cmd) + dispatch_attrs.constant_count * sizeof(uint32_t) +
       dispatch_attrs.binding_count * sizeof(void*) +
@@ -1113,7 +802,7 @@
 
   cmd->executable = local_executable;
   cmd->ordinal = entry_point;
-  cmd->push_constant_count = dispatch_attrs.constant_count;
+  cmd->constant_count = dispatch_attrs.constant_count;
   cmd->binding_count = dispatch_attrs.binding_count;
 
   // TODO(benvanik): expose on API or keep fixed on executable.
@@ -1146,10 +835,10 @@
         constants.data_length / sizeof(uint32_t));
   }
   uint8_t* cmd_ptr = (uint8_t*)cmd + sizeof(*cmd);
-  uint32_t* push_constants = (uint32_t*)cmd_ptr;
-  memcpy(push_constants, constants.data,
-         dispatch_attrs.constant_count * sizeof(*push_constants));
-  cmd_ptr += dispatch_attrs.constant_count * sizeof(*push_constants);
+  uint32_t* constants_ptr = (uint32_t*)cmd_ptr;
+  memcpy(constants_ptr, constants.data,
+         dispatch_attrs.constant_count * sizeof(*constants_ptr));
+  cmd_ptr += dispatch_attrs.constant_count * sizeof(*constants_ptr);
 
   // Produce the dense binding list based on the declared bindings used.
   //
@@ -1196,7 +885,7 @@
                                                           &cmd->task.header);
 }
 
-static iree_status_t iree_hal_task_command_buffer_dispatch2(
+static iree_status_t iree_hal_task_command_buffer_dispatch(
     iree_hal_command_buffer_t* base_command_buffer,
     iree_hal_executable_t* executable, int32_t entry_point,
     const uint32_t workgroup_count[3], iree_const_byte_span_t constants,
@@ -1207,13 +896,13 @@
   IREE_RETURN_IF_ERROR(iree_hal_resource_set_insert(
       command_buffer->resource_set, 1, &executable));
 
-  iree_hal_task_cmd_dispatch2_t* cmd = NULL;
-  return iree_hal_task_command_buffer_build_dispatch2(
+  iree_hal_task_cmd_dispatch_t* cmd = NULL;
+  return iree_hal_task_command_buffer_build_dispatch(
       base_command_buffer, executable, entry_point, workgroup_count, constants,
       bindings, &cmd);
 }
 
-static iree_status_t iree_hal_task_command_buffer_dispatch2_indirect(
+static iree_status_t iree_hal_task_command_buffer_dispatch_indirect(
     iree_hal_command_buffer_t* base_command_buffer,
     iree_hal_executable_t* executable, int32_t entry_point,
     iree_hal_buffer_ref_t workgroups_ref, iree_const_byte_span_t constants,
@@ -1233,8 +922,8 @@
       &buffer_mapping));
 
   uint32_t workgroup_count[3] = {0};  // unused with the indirect flag
-  iree_hal_task_cmd_dispatch2_t* cmd = NULL;
-  IREE_RETURN_IF_ERROR(iree_hal_task_command_buffer_build_dispatch2(
+  iree_hal_task_cmd_dispatch_t* cmd = NULL;
+  IREE_RETURN_IF_ERROR(iree_hal_task_command_buffer_build_dispatch(
       base_command_buffer, executable, entry_point, workgroup_count, constants,
       bindings, &cmd));
   cmd->task.workgroup_count.ptr = (const uint32_t*)buffer_mapping.contents.data;
@@ -1262,10 +951,6 @@
         .update_buffer = iree_hal_task_command_buffer_update_buffer,
         .copy_buffer = iree_hal_task_command_buffer_copy_buffer,
         .collective = iree_hal_task_command_buffer_collective,
-        .push_constants = iree_hal_task_command_buffer_push_constants,
-        .push_descriptor_set = iree_hal_task_command_buffer_push_descriptor_set,
         .dispatch = iree_hal_task_command_buffer_dispatch,
         .dispatch_indirect = iree_hal_task_command_buffer_dispatch_indirect,
-        .dispatch2 = iree_hal_task_command_buffer_dispatch2,
-        .dispatch2_indirect = iree_hal_task_command_buffer_dispatch2_indirect,
 };
diff --git a/runtime/src/iree/hal/drivers/local_task/task_device.c b/runtime/src/iree/hal/drivers/local_task/task_device.c
index ae39734..df4bb93 100644
--- a/runtime/src/iree/hal/drivers/local_task/task_device.c
+++ b/runtime/src/iree/hal/drivers/local_task/task_device.c
@@ -18,7 +18,6 @@
 #include "iree/hal/drivers/local_task/task_semaphore.h"
 #include "iree/hal/local/executable_environment.h"
 #include "iree/hal/local/local_executable_cache.h"
-#include "iree/hal/local/local_pipeline_layout.h"
 #include "iree/hal/utils/deferred_command_buffer.h"
 #include "iree/hal/utils/file_transfer.h"
 #include "iree/hal/utils/memory_file.h"
@@ -317,17 +316,6 @@
   }
 }
 
-static iree_status_t iree_hal_task_device_create_descriptor_set_layout(
-    iree_hal_device_t* base_device,
-    iree_hal_descriptor_set_layout_flags_t flags,
-    iree_host_size_t binding_count,
-    const iree_hal_descriptor_set_layout_binding_t* bindings,
-    iree_hal_descriptor_set_layout_t** out_descriptor_set_layout) {
-  return iree_hal_local_descriptor_set_layout_create(
-      flags, binding_count, bindings,
-      iree_hal_device_host_allocator(base_device), out_descriptor_set_layout);
-}
-
 static iree_status_t iree_hal_task_device_create_event(
     iree_hal_device_t* base_device, iree_hal_queue_affinity_t queue_affinity,
     iree_hal_event_flags_t flags, iree_hal_event_t** out_event) {
@@ -369,16 +357,6 @@
       iree_hal_device_host_allocator(base_device), out_file);
 }
 
-static iree_status_t iree_hal_task_device_create_pipeline_layout(
-    iree_hal_device_t* base_device, iree_host_size_t push_constants,
-    iree_host_size_t set_layout_count,
-    iree_hal_descriptor_set_layout_t* const* set_layouts,
-    iree_hal_pipeline_layout_t** out_pipeline_layout) {
-  return iree_hal_local_pipeline_layout_create(
-      push_constants, set_layout_count, set_layouts,
-      iree_hal_device_host_allocator(base_device), out_pipeline_layout);
-}
-
 static iree_status_t iree_hal_task_device_create_semaphore(
     iree_hal_device_t* base_device, uint64_t initial_value,
     iree_hal_semaphore_flags_t flags, iree_hal_semaphore_t** out_semaphore) {
@@ -554,12 +532,9 @@
     .query_i64 = iree_hal_task_device_query_i64,
     .create_channel = iree_hal_task_device_create_channel,
     .create_command_buffer = iree_hal_task_device_create_command_buffer,
-    .create_descriptor_set_layout =
-        iree_hal_task_device_create_descriptor_set_layout,
     .create_event = iree_hal_task_device_create_event,
     .create_executable_cache = iree_hal_task_device_create_executable_cache,
     .import_file = iree_hal_task_device_import_file,
-    .create_pipeline_layout = iree_hal_task_device_create_pipeline_layout,
     .create_semaphore = iree_hal_task_device_create_semaphore,
     .query_semaphore_compatibility =
         iree_hal_task_device_query_semaphore_compatibility,
diff --git a/runtime/src/iree/hal/drivers/metal/CMakeLists.txt b/runtime/src/iree/hal/drivers/metal/CMakeLists.txt
index c3186a1..d43efc0 100644
--- a/runtime/src/iree/hal/drivers/metal/CMakeLists.txt
+++ b/runtime/src/iree/hal/drivers/metal/CMakeLists.txt
@@ -19,16 +19,14 @@
     "direct_allocator.m"
     "direct_command_buffer.h"
     "direct_command_buffer.m"
-    "kernel_library.h"
-    "kernel_library.m"
+    "executable.h"
+    "executable.m"
     "metal_buffer.h"
     "metal_buffer.m"
     "metal_device.m"
     "metal_driver.m"
     "nop_executable_cache.h"
     "nop_executable_cache.m"
-    "pipeline_layout.h"
-    "pipeline_layout.m"
     "shared_event.h"
     "shared_event.m"
     "staging_buffer.h"
@@ -42,9 +40,11 @@
     iree::hal
     iree::hal::drivers::metal::builtin
     iree::hal::utils::deferred_command_buffer
+    iree::hal::utils::executable_debug_info
     iree::hal::utils::file_transfer
     iree::hal::utils::memory_file
     iree::hal::utils::resource_set
+    iree::schemas::executable_debug_info_c_fbs
     iree::schemas::metal_executable_def_c_fbs
     "-framework Foundation"
     "-framework Metal"
diff --git a/runtime/src/iree/hal/drivers/metal/builtin_executables.h b/runtime/src/iree/hal/drivers/metal/builtin_executables.h
index 08fc065..dae5d66 100644
--- a/runtime/src/iree/hal/drivers/metal/builtin_executables.h
+++ b/runtime/src/iree/hal/drivers/metal/builtin_executables.h
@@ -11,19 +11,28 @@
 
 #include "iree/base/api.h"
 #include "iree/hal/api.h"
-#include "iree/hal/drivers/metal/kernel_library.h"
+#include "iree/hal/drivers/metal/executable.h"
 
 #ifdef __cplusplus
 extern "C" {
 #endif  // __cplusplus
 
+// Object and launch parameters for a compute function.
+typedef struct iree_hal_metal_builtin_pipeline_t {
+  id<MTLComputePipelineState> pipeline_state;
+  IREE_TRACE(iree_hal_metal_source_location_t source_location;)
+} iree_hal_metal_builtin_pipeline_t;
+
 typedef struct iree_hal_metal_builtin_executable_t {
   iree_allocator_t host_allocator;
 
-  // The number of entry points in this builtin executable.
-  iree_host_size_t entry_point_count;
-  // THe list of entry points, pointing to the end of the struct allocation.
-  iree_hal_metal_kernel_params_t entry_points[];
+  // Compiled MTLLibrary instances containing the builtin kernels.
+  NSArray<id<MTLLibrary>>* libraries;
+
+  // The number of pipelines in this builtin executable.
+  iree_host_size_t pipeline_count;
+  // The list of pipelines, pointing to the end of the struct allocation.
+  iree_hal_metal_builtin_pipeline_t pipelines[];
 } iree_hal_metal_builtin_executable_t;
 // + Additional inline allocation for holding all entry point kernel parameters.
 
@@ -33,8 +42,7 @@
     id<MTLDevice> device, iree_allocator_t host_allocator,
     iree_hal_metal_builtin_executable_t** out_executable);
 
-void iree_hal_metal_builtin_executable_destroy(
-    iree_hal_metal_builtin_executable_t* executable);
+void iree_hal_metal_builtin_executable_destroy(iree_hal_metal_builtin_executable_t* executable);
 
 // Fills the |target_buffer| at the given |target_offset| of |length| with
 // |pattern| using builtin executables dispatched via |encoder|.
@@ -42,9 +50,8 @@
 // Under the hood, this will record all necessary commands to bind kernel
 // objects and buffer resources, and the perform dispatch.
 iree_status_t iree_hal_metal_builtin_executable_fill_buffer(
-    const iree_hal_metal_builtin_executable_t* executable,
-    id<MTLComputeCommandEncoder> encoder, id<MTLBuffer> target_buffer,
-    iree_device_size_t target_offset, iree_device_size_t length,
+    const iree_hal_metal_builtin_executable_t* executable, id<MTLComputeCommandEncoder> encoder,
+    id<MTLBuffer> target_buffer, iree_device_size_t target_offset, iree_device_size_t length,
     uint32_t pattern);
 
 // Copies the |source_buffer| at |source_offset| to the |target_buffer| at
@@ -54,9 +61,8 @@
 // Under the hood, this will record all necessary commands to bind kernel
 // objects and buffer resources, and the perform dispatch.
 iree_status_t iree_hal_metal_builtin_executable_copy_buffer(
-    const iree_hal_metal_builtin_executable_t* executable,
-    id<MTLComputeCommandEncoder> encoder, id<MTLBuffer> source_buffer,
-    iree_device_size_t source_offset, id<MTLBuffer> target_buffer,
+    const iree_hal_metal_builtin_executable_t* executable, id<MTLComputeCommandEncoder> encoder,
+    id<MTLBuffer> source_buffer, iree_device_size_t source_offset, id<MTLBuffer> target_buffer,
     iree_device_size_t target_offset, iree_device_size_t length);
 
 #ifdef __cplusplus
diff --git a/runtime/src/iree/hal/drivers/metal/builtin_executables.m b/runtime/src/iree/hal/drivers/metal/builtin_executables.m
index 77912ab..3982048 100644
--- a/runtime/src/iree/hal/drivers/metal/builtin_executables.m
+++ b/runtime/src/iree/hal/drivers/metal/builtin_executables.m
@@ -6,41 +6,152 @@
 
 #include "iree/hal/drivers/metal/builtin_executables.h"
 
+#include <Foundation/Foundation.h>
 #include <string.h>
 
 #include "iree/base/api.h"
+#include "iree/base/status.h"
 #include "iree/base/tracing.h"
 #include "iree/hal/api.h"
 #include "iree/hal/drivers/metal/builtin/metal_buffer_kernels.h"
 
-typedef struct iree_hal_metal_builtin_executable_data_t {
-  const char* entry_point;
+typedef struct iree_hal_metal_builtin_pipeline_info_t {
+  iree_string_view_t entry_point;
   uint32_t file_index;
-} iree_hal_metal_builtin_executable_data_t;
+} iree_hal_metal_builtin_pipeline_info_t;
 
 // The list of builtin executable entry points and their source file index in builtin exectuable
-// embedded data. This MUST be consistent with kernel function names in MSL source code and the file
-// order in embedded data.
-static iree_hal_metal_builtin_executable_data_t iree_hal_metal_builtin_executable_entry_points[] = {
-    {"fill_buffer_16byte", 1},  // Buffer fills; 16-byte aligned offset/length
-    {"fill_buffer_4byte", 1},   // Buffer fills; 4-byte aligned offset/length
-    {"fill_buffer_1byte", 1},   // Buffer fills; 1-byte aligned offset/length
-    {"copy_buffer_1byte", 0},   // Buffer copies; 1-byte aligned offset/length
+// embedded data.
+//
+// NOTE: must be consistent with the same struct in MSL source code.
+// NOTE: the exact order here is assumed below and must not change (that should be fixed...).
+static iree_hal_metal_builtin_pipeline_info_t iree_hal_metal_builtin_pipeline_info[] = {
+    {IREE_SVL("fill_buffer_16byte"), 1},  // Buffer fills; 16-byte aligned offset/length
+    {IREE_SVL("fill_buffer_4byte"), 1},   // Buffer fills; 4-byte aligned offset/length
+    {IREE_SVL("fill_buffer_1byte"), 1},   // Buffer fills; 1-byte aligned offset/length
+    {IREE_SVL("copy_buffer_1byte"), 0},   // Buffer copies; 1-byte aligned offset/length
 };
 
-// The buffer fill specificiation. This MUST be consistent with the same struct in MSL source code.
+// NOTE: must be consistent with the same struct in MSL source code.
 typedef struct iree_hal_metal_buffer_fill_spec_t {
   uint64_t buffer_offset;  // Buffer offset to fill (in bytes)
   uint64_t buffer_length;  // Buffer length to fill (in bytes)
   uint32_t pattern;        // 32-bit fill pattern
 } iree_hal_metal_buffer_fill_spec_t;
 
+// NOTE: must be consistent with the same struct in MSL source code.
 typedef struct iree_hal_metal_buffer_copy_spec_t {
   uint64_t src_buffer_offset;  // Source buffer offset (in bytes)
   uint64_t dst_buffer_offset;  // Destination buffer offset (in bytes)
   uint64_t length;             // Buffer length to fill (in bytes)
 } iree_hal_metal_buffer_copy_spec_t;
 
+// Compiles |source_file| as MSL source into a MTLLibrary for the given |device|.
+//
+// TODO: we should be precompiling this and shipping a binary metallib instead: compiling from
+// source at runtime is _extremely_ inefficient.
+static iree_status_t iree_hal_metal_compile_embedded_msl(id<MTLDevice> device,
+                                                         iree_file_toc_t source_file,
+                                                         id<MTLLibrary>* out_library) {
+  *out_library = nil;
+  IREE_TRACE_ZONE_BEGIN(z0);
+  IREE_TRACE_ZONE_APPEND_TEXT(z0, source_file.name);
+
+  iree_status_t status = iree_ok_status();
+  id<MTLLibrary> library = nil;
+  @autoreleasepool {
+    MTLCompileOptions* compile_options = [[MTLCompileOptions new] autorelease];
+    compile_options.languageVersion = MTLLanguageVersion3_0;
+
+    NSString* shader_source =
+        [[[NSString alloc] initWithBytes:source_file.data
+                                  length:source_file.size
+                                encoding:[NSString defaultCStringEncoding]] autorelease];
+
+    NSError* error = nil;
+    library = [device newLibraryWithSource:shader_source
+                                   options:compile_options
+                                     error:&error];  // +1
+    if (IREE_UNLIKELY(library == nil)) {
+      const char* ns_c_error = [error.localizedDescription
+          cStringUsingEncoding:[NSString defaultCStringEncoding]];  // autoreleased
+      status = iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
+                                "failed to create MTLLibrary from shader source in %s: %s",
+                                source_file.name, ns_c_error);
+    }
+  }
+
+  if (iree_status_is_ok(status)) {
+    *out_library = library;
+  } else {
+    [library release];
+  }
+  IREE_TRACE_ZONE_END(z0);
+  return status;
+}
+
+// Loads all MTLLibrary instances required by the builtin pipelines.
+static iree_status_t iree_hal_metal_load_builtin_libraries(
+    id<MTLDevice> device, NSArray<id<MTLLibrary>>** out_libraries) {
+  *out_libraries = nil;
+  IREE_TRACE_ZONE_BEGIN(z0);
+
+  NSMutableArray<id<MTLLibrary>>* libraries = [[NSMutableArray alloc] init];  // +1
+
+  // TODO: don't compile sources and instead embed the libraries in binary form.
+  // Embedding source files is an anti-pattern.
+  iree_status_t status = iree_ok_status();
+  const iree_file_toc_t* embedded_files = metal_buffer_kernels_create();
+  for (iree_host_size_t i = 0; i < metal_buffer_kernels_size(); ++i) {
+    iree_file_toc_t source_file = embedded_files[i];
+    id<MTLLibrary> library = nil;
+    status = iree_hal_metal_compile_embedded_msl(device, source_file, &library);
+    if (!iree_status_is_ok(status)) break;
+    [libraries addObject:library];
+  }
+
+  if (iree_status_is_ok(status)) {
+    *out_libraries = libraries;
+  } else {
+    [libraries release];  // -1
+  }
+  IREE_TRACE_ZONE_END(z0);
+  return status;
+}
+
+// Creates MTL compute pipeline objects for the given |entry_point| in |library| and writes to
+// |out_function| and |out_pipeline_state|. The caller should release |out_function| and
+// |out_pipeline_state| after done.
+static iree_status_t iree_hal_metal_create_builtin_pipeline(
+    id<MTLDevice> device, id<MTLLibrary> library, iree_string_view_t entry_point,
+    iree_hal_metal_builtin_pipeline_t* out_pipeline) {
+  @autoreleasepool {
+    NSString* function_name =
+        [[[NSString alloc] initWithBytes:entry_point.data
+                                  length:entry_point.size
+                                encoding:[NSString defaultCStringEncoding]] autorelease];
+    id<MTLFunction> function = [[library newFunctionWithName:function_name] autorelease];
+    if (IREE_UNLIKELY(function == nil)) {
+      return iree_make_status(IREE_STATUS_FAILED_PRECONDITION,
+                              "function %.*s not found in the provided MTLLibrary",
+                              (int)entry_point.size, entry_point.data);
+    }
+
+    NSError* error = nil;
+    id<MTLComputePipelineState> pipeline_state =
+        [device newComputePipelineStateWithFunction:function error:&error];  // +1
+    if (IREE_UNLIKELY(pipeline_state == nil)) {
+      const char* ns_c_error = [error.localizedDescription
+          cStringUsingEncoding:[NSString defaultCStringEncoding]];  // autoreleased
+      return iree_make_status(IREE_STATUS_INTERNAL, "invalid shader source for builtin %.*s: %s",
+                              (int)entry_point.size, entry_point.data, ns_c_error);
+    }
+
+    out_pipeline->pipeline_state = pipeline_state;
+  }
+  return iree_ok_status();
+}
+
 iree_status_t iree_hal_metal_builtin_executable_create(
     id<MTLDevice> device, iree_allocator_t host_allocator,
     iree_hal_metal_builtin_executable_t** out_executable) {
@@ -49,56 +160,42 @@
   IREE_TRACE_ZONE_BEGIN(z0);
 
   iree_hal_metal_builtin_executable_t* executable = NULL;
-  iree_host_size_t entry_point_count =
-      IREE_ARRAYSIZE(iree_hal_metal_builtin_executable_entry_points);
+  iree_host_size_t pipeline_count = IREE_ARRAYSIZE(iree_hal_metal_builtin_pipeline_info);
   iree_host_size_t total_size =
-      sizeof(*executable) + entry_point_count * sizeof(executable->entry_points[0]);
-  iree_status_t status = iree_allocator_malloc(host_allocator, total_size, (void**)&executable);
+      sizeof(*executable) + pipeline_count * sizeof(executable->pipelines[0]);
+  IREE_RETURN_AND_END_ZONE_IF_ERROR(
+      z0, iree_allocator_malloc(host_allocator, total_size, (void**)&executable));
+  executable->host_allocator = host_allocator;
+  executable->pipeline_count = pipeline_count;
 
+  // Load all MTLLibrary instances used by the pipelines.
+  iree_status_t status = iree_hal_metal_load_builtin_libraries(device, &executable->libraries);
   if (iree_status_is_ok(status)) {
-    executable->host_allocator = host_allocator;
-    executable->entry_point_count = entry_point_count;
+    // Create pipelines using the loaded libraries.
+    for (iree_host_size_t i = 0; i < IREE_ARRAYSIZE(iree_hal_metal_builtin_pipeline_info); ++i) {
+      const iree_hal_metal_builtin_pipeline_info_t* pipeline_info =
+          &iree_hal_metal_builtin_pipeline_info[i];
+      IREE_TRACE_ZONE_BEGIN(z_pipeline);
+      IREE_TRACE_ZONE_APPEND_TEXT(z_pipeline, pipeline_info->entry_point.data,
+                                  pipeline_info->entry_point.size);
 
-    // Compile each MSL source string into a MTLLibrary and get the MTLFunction for the entry point
-    // to build the pipeline state object.
-    // TODO(antiagainst): We are performing synchronous compilation at runtime here. This is good
-    // for debugging purposes but bad for performance. Enable offline compilation and make that as
-    // the default.
+      iree_hal_metal_builtin_pipeline_t* pipeline = &executable->pipelines[i];
+      IREE_TRACE({
+        const iree_file_toc_t* embedded_files = metal_buffer_kernels_create();
+        iree_file_toc_t source_file = embedded_files[pipeline_info->file_index];
+        pipeline->source_location.func_name = pipeline_info->entry_point;
+        pipeline->source_location.file_name = IREE_SV(source_file.name);
+        pipeline->source_location.line = 0;
+      });
 
-    MTLCompileOptions* compile_options = [MTLCompileOptions new];  // +1
-    compile_options.languageVersion = MTLLanguageVersion3_0;
+      id<MTLLibrary> library =
+          [executable->libraries objectAtIndex:pipeline_info->file_index];  // unretained
+      status = iree_hal_metal_create_builtin_pipeline(device, library, pipeline_info->entry_point,
+                                                      pipeline);
 
-    for (iree_host_size_t i = 0; i < IREE_ARRAYSIZE(iree_hal_metal_builtin_executable_entry_points);
-         ++i) {
-      const char* entry_point = iree_hal_metal_builtin_executable_entry_points[i].entry_point;
-      uint32_t file_index = iree_hal_metal_builtin_executable_entry_points[i].file_index;
-      iree_file_toc_t source_code = metal_buffer_kernels_create()[file_index];
-
-      id<MTLLibrary> library = nil;
-      id<MTLFunction> function = nil;
-      id<MTLComputePipelineState> pso = nil;
-      status = iree_hal_metal_compile_msl_and_create_pipeline_object(
-          iree_make_string_view(source_code.data, source_code.size), IREE_SV(entry_point), device,
-          compile_options, &library, &function, &pso);
+      IREE_TRACE_ZONE_END(z_pipeline);
       if (!iree_status_is_ok(status)) break;
-
-      // Package required parameters for kernel launches for each entry point.
-      iree_hal_metal_kernel_params_t* params = &executable->entry_points[i];
-      params->library = library;
-      params->function = function;
-      params->pso = pso;
-      // Thread group size for builtin executables are determined at runtime dispatch time.
-      params->threadgroup_size[0] = 0;
-      params->threadgroup_size[1] = 0;
-      params->threadgroup_size[2] = 0;
-      // We don't need the layout parameter for builtin executables too.
-      params->layout = NULL;
-
-      // Stash the entry point name in the string table for use when tracing.
-      IREE_TRACE({ params->function_name = IREE_SV(entry_point); });
     }
-
-    [compile_options release];  // -1
   }
 
   if (iree_status_is_ok(status)) {
@@ -114,40 +211,34 @@
 void iree_hal_metal_builtin_executable_destroy(iree_hal_metal_builtin_executable_t* executable) {
   IREE_TRACE_ZONE_BEGIN(z0);
 
-  for (iree_host_size_t i = 0; i < executable->entry_point_count; ++i) {
-    iree_hal_metal_kernel_params_t* entry_point = &executable->entry_points[i];
-    [entry_point->pso release];
-    [entry_point->function release];
-    [entry_point->library release];
-    IREE_ASSERT_EQ(entry_point->layout, NULL);
+  for (iree_host_size_t i = 0; i < executable->pipeline_count; ++i) {
+    iree_hal_metal_builtin_pipeline_t* pipeline = &executable->pipelines[i];
+    [pipeline->pipeline_state release];
   }
+
+  [executable->libraries release];
+
   iree_allocator_free(executable->host_allocator, executable);
 
   IREE_TRACE_ZONE_END(z0);
 }
 
-static inline iree_device_size_t iree_hal_metal_ceil_div(iree_device_size_t a,
-                                                         iree_device_size_t b) {
-  return (a + b - 1) / b;
-}
-
 iree_status_t iree_hal_metal_builtin_executable_fill_buffer(
     const iree_hal_metal_builtin_executable_t* executable, id<MTLComputeCommandEncoder> encoder,
     id<MTLBuffer> target_buffer, iree_device_size_t target_offset, iree_device_size_t length,
     uint32_t pattern) {
-  id<MTLComputePipelineState> pso = nil;
+  id<MTLComputePipelineState> pipeline_state = nil;
   MTLResourceUsage usage = MTLResourceUsageWrite;
   const iree_device_size_t workgroup_size = 32;
   iree_device_size_t workgroup_count = 0;
-
   if (target_offset % 16 == 0 && length % 16 == 0) {  // 16-byte aligned case
-    pso = executable->entry_points[0].pso;
-    workgroup_count = iree_hal_metal_ceil_div(length, workgroup_size * 16);
+    pipeline_state = executable->pipelines[0].pipeline_state;
+    workgroup_count = iree_device_size_ceil_div(length, workgroup_size * 16);
   } else if (target_offset % 4 == 0 && length % 4 == 0) {  // 4-byte aligned case
-    pso = executable->entry_points[1].pso;
-    workgroup_count = iree_hal_metal_ceil_div(length, workgroup_size * 4);
+    pipeline_state = executable->pipelines[1].pipeline_state;
+    workgroup_count = iree_device_size_ceil_div(length, workgroup_size * 4);
   } else {  // 1-byte aligned case
-    pso = executable->entry_points[2].pso;
+    pipeline_state = executable->pipelines[2].pipeline_state;
     // We may potentially need to read some 32-bit scalars at unaligned addresses.
     usage |= MTLResourceUsageRead;
     // Calculate unaligned partial prefix/suffix byte count, and then get the middle aligned byte
@@ -159,28 +250,28 @@
     // prefix and suffix partial bytes are the same (< 0). We'd need one thread to handle the
     // partial bytes at least.
     if (middle_byte_count <= 0) middle_byte_count = 1;
-    workgroup_count = iree_hal_metal_ceil_div(middle_byte_count, workgroup_size * 4);
+    workgroup_count = iree_device_size_ceil_div(middle_byte_count, workgroup_size * 4);
   }
-
-  iree_hal_metal_buffer_fill_spec_t spec = {
-      .buffer_offset = target_offset,
-      .buffer_length = length,
-      .pattern = pattern,
-  };
-
-  [encoder setComputePipelineState:pso];
+  [encoder setComputePipelineState:pipeline_state];
 
   // The following MUST exactly match the pipeline layout from MSL source code.
   // buffer(0) is the target buffer to fill. Note that we MUST set 0 as offset here--the offset
   // is to be handled directly in the kernels!
   [encoder setBuffer:target_buffer offset:0 atIndex:0];
   [encoder useResource:target_buffer usage:usage];
+
   // buffer(1) is the buffer fill spec.
+  iree_hal_metal_buffer_fill_spec_t spec = {
+      .buffer_offset = target_offset,
+      .buffer_length = length,
+      .pattern = pattern,
+  };
   [encoder setBytes:&spec length:sizeof(spec) atIndex:1];
 
   // Encode the dispatch.
   [encoder dispatchThreadgroups:MTLSizeMake(workgroup_count, 1, 1)
           threadsPerThreadgroup:MTLSizeMake(workgroup_size, 1, 1)];
+
   return iree_ok_status();
 }
 
@@ -188,32 +279,33 @@
     const iree_hal_metal_builtin_executable_t* executable, id<MTLComputeCommandEncoder> encoder,
     id<MTLBuffer> source_buffer, iree_device_size_t source_offset, id<MTLBuffer> target_buffer,
     iree_device_size_t target_offset, iree_device_size_t length) {
-  id<MTLComputePipelineState> pso = executable->entry_points[3].pso;
-  const iree_device_size_t workgroup_size = 32;
-  iree_device_size_t workgroup_count = iree_hal_metal_ceil_div(length, workgroup_size * 4);
-
-  iree_hal_metal_buffer_copy_spec_t spec = {
-      .src_buffer_offset = source_offset,
-      .dst_buffer_offset = target_offset,
-      .length = length,
-  };
-
-  [encoder setComputePipelineState:pso];
+  id<MTLComputePipelineState> pipeline_state = executable->pipelines[3].pipeline_state;
+  [encoder setComputePipelineState:pipeline_state];
 
   // The following MUST exactly match the pipeline layout from MSL source code.
   // buffer(0) is the source buffer. Note that we MUST set 0 as offset here--the offset is to be
   // handled directly in the kernels!
   [encoder setBuffer:source_buffer offset:0 atIndex:0];
   [encoder useResource:source_buffer usage:MTLResourceUsageRead];
+
   // buffer(0) is the target buffer. Note that we MUST set 0 as offset here--the offset is to be
   // handled directly in the kernels!
   [encoder setBuffer:target_buffer offset:0 atIndex:1];
   [encoder useResource:target_buffer usage:MTLResourceUsageWrite];
+
   // buffer(1) is the buffer copy spec.
+  iree_hal_metal_buffer_copy_spec_t spec = {
+      .src_buffer_offset = source_offset,
+      .dst_buffer_offset = target_offset,
+      .length = length,
+  };
   [encoder setBytes:&spec length:sizeof(spec) atIndex:2];
 
   // Encode the dispatch.
+  const iree_device_size_t workgroup_size = 32;
+  iree_device_size_t workgroup_count = iree_device_size_ceil_div(length, workgroup_size * 4);
   [encoder dispatchThreadgroups:MTLSizeMake(workgroup_count, 1, 1)
           threadsPerThreadgroup:MTLSizeMake(workgroup_size, 1, 1)];
+
   return iree_ok_status();
 }
diff --git a/runtime/src/iree/hal/drivers/metal/direct_allocator.m b/runtime/src/iree/hal/drivers/metal/direct_allocator.m
index 06109d2..d97886e 100644
--- a/runtime/src/iree/hal/drivers/metal/direct_allocator.m
+++ b/runtime/src/iree/hal/drivers/metal/direct_allocator.m
@@ -305,9 +305,11 @@
 
   IREE_TRACE_FREE_NAMED(IREE_HAL_METAL_ALLOCATOR_ID,
                         (void*)iree_hal_metal_buffer_handle(base_buffer));
-  IREE_STATISTICS(iree_hal_allocator_statistics_record_free(
-      &allocator->statistics, iree_hal_buffer_memory_type(base_buffer),
-      iree_hal_buffer_allocation_size(base_buffer)));
+  if (!iree_hal_metal_buffer_is_external(base_buffer)) {
+    IREE_STATISTICS(iree_hal_allocator_statistics_record_free(
+        &allocator->statistics, iree_hal_buffer_memory_type(base_buffer),
+        iree_hal_buffer_allocation_size(base_buffer)));
+  }
 
   iree_hal_buffer_destroy(base_buffer);  // -1
 }
diff --git a/runtime/src/iree/hal/drivers/metal/direct_command_buffer.m b/runtime/src/iree/hal/drivers/metal/direct_command_buffer.m
index fbf6374..16bd8c5 100644
--- a/runtime/src/iree/hal/drivers/metal/direct_command_buffer.m
+++ b/runtime/src/iree/hal/drivers/metal/direct_command_buffer.m
@@ -13,10 +13,9 @@
 #include "iree/base/tracing.h"
 #include "iree/hal/api.h"
 #include "iree/hal/drivers/metal/builtin_executables.h"
-#include "iree/hal/drivers/metal/kernel_library.h"
+#include "iree/hal/drivers/metal/executable.h"
 #include "iree/hal/drivers/metal/metal_buffer.h"
 #include "iree/hal/drivers/metal/metal_device.h"
-#include "iree/hal/drivers/metal/pipeline_layout.h"
 #include "iree/hal/drivers/metal/staging_buffer.h"
 #include "iree/hal/utils/resource_set.h"
 
@@ -48,7 +47,6 @@
 // Command action kind of a command segment.
 typedef enum iree_hal_metal_command_segment_action_e {
   IREE_HAL_METAL_COMMAND_SEGMENT_ACTION_BARRIER,      // Execution/memory barrier command
-  IREE_HAL_METAL_COMMAND_SEGMENT_ACTION_DISPATCH,     // Dispatch command
   IREE_HAL_METAL_COMMAND_SEGMENT_ACTION_DISPATCH2,    // Dispatch command
   IREE_HAL_METAL_COMMAND_SEGMENT_ACTION_FILL_BUFFER,  // Fill buffer command
   IREE_HAL_METAL_COMMAND_SEGMENT_ACTION_COPY_BUFFER,  // Copy buffer command
@@ -64,7 +62,6 @@
 // + Additional inline allocation for holding all buffer barriers.
 
 typedef struct iree_hal_metal_descriptor_t {
-  uint32_t set;
   uint32_t binding;
   iree_hal_buffer_t* buffer;
   iree_device_size_t offset;
@@ -73,14 +70,14 @@
 
 // API data for dispatch command segments.
 typedef struct iree_hal_metal_dispatch_segment_t {
-  // Compute kernel information--kernel object, pipeline layout, threadgroup size, etc.
-  iree_hal_metal_kernel_params_t kernel_params;
+  // Compute function pipeline with information required to dispatch it.
+  const iree_hal_metal_pipeline_t* pipeline;
 
   // Workgroup count information--if |workgroups_buffer| is not nil, then indirect dispatch;
   // otherwise uses |workgroup_count| for direct dispatch.
   id<MTLBuffer> workgroups_buffer;
   iree_device_size_t workgroups_offset;
-  uint32_t workgroup_count[3];
+  MTLSize workgroup_count;
 
   // The number of descriptors bound for this dispatch.
   iree_host_size_t descriptor_count;
@@ -88,37 +85,13 @@
   iree_hal_metal_descriptor_t* descriptors;
 
   // The number of push constant values.
-  iree_host_size_t push_constant_count;
+  iree_host_size_t constant_count;
   // The list of push constants, pointing to the end of the segment allocation.
-  int32_t* push_constants;
+  int32_t* constants;
 } iree_hal_metal_dispatch_segment_t;
 // + Additional inline allocation for holding all bound descriptors.
 // + Additional inline allocation for holding all push constants.
 
-// API data for dispatch command segments.
-typedef struct iree_hal_metal_dispatch2_segment_t {
-  // Compute kernel information--kernel object, pipeline layout, threadgroup size, etc.
-  iree_hal_metal_kernel_params_t kernel_params;
-
-  // Workgroup count information--if |workgroups_buffer| is not nil, then indirect dispatch;
-  // otherwise uses |workgroup_count| for direct dispatch.
-  id<MTLBuffer> workgroups_buffer;
-  iree_device_size_t workgroups_offset;
-  uint32_t workgroup_count[3];
-
-  // The number of descriptors bound for this dispatch.
-  iree_host_size_t descriptor_count;
-  // The list of bound descriptors, pointing to the end of the segment allocation.
-  iree_hal_metal_descriptor_t* descriptors;
-
-  // The number of push constant values.
-  iree_host_size_t push_constant_count;
-  // The list of push constants, pointing to the end of the segment allocation.
-  int32_t* push_constants;
-} iree_hal_metal_dispatch2_segment_t;
-// + Additional inline allocation for holding all bound descriptors.
-// + Additional inline allocation for holding all push constants.
-
 // API data for fill buffer command segments.
 typedef struct iree_hal_metal_fill_buffer_segment_t {
   id<MTLBuffer> target_buffer;
@@ -146,7 +119,6 @@
   union {
     iree_hal_metal_barrier_segment_t barrier;
     iree_hal_metal_dispatch_segment_t dispatch;
-    iree_hal_metal_dispatch2_segment_t dispatch2;
     iree_hal_metal_fill_buffer_segment_t fill_buffer;
     iree_hal_metal_copy_buffer_segment_t copy_buffer;
   };
@@ -229,19 +201,6 @@
     id<MTLEvent> encoder_event;
     // The next available encoder event value to signal/wait to/on.
     uint64_t next_encoder_event_value;
-
-    // Metal APIs mandate we create argument bufffers (for descriptor sets) from compiled kernel
-    // function. That means we need to bind the compute kernel first before setting descriptors and
-    // binding buffers. However in IREE HAL API we see push descriptors before the dispatch command.
-    // So we need to cache the descriptor information by ourselves and record them at dispatch time.
-    struct {
-      iree_hal_metal_descriptor_t bindings[IREE_HAL_METAL_MAX_DESCRIPTOR_SET_BINDING_COUNT];
-    } descriptor_sets[IREE_HAL_METAL_PUSH_CONSTANT_BUFFER_INDEX];
-
-    // All available push constants updated each time push_constants is called. Reset only with the
-    // command buffer and otherwise will maintain its values during recording to allow for partial
-    // push_constants updates.
-    int32_t push_constants[IREE_HAL_METAL_MAX_PUSH_CONSTANT_COUNT];
   } state;
 } iree_hal_metal_command_buffer_t;
 
@@ -873,269 +832,11 @@
   return iree_make_status(IREE_STATUS_UNIMPLEMENTED, "collectives not yet supported");
 }
 
-static iree_status_t iree_hal_metal_command_buffer_push_constants(
-    iree_hal_command_buffer_t* base_command_buffer, iree_hal_pipeline_layout_t* pipeline_layout,
-    iree_host_size_t offset, const void* values, iree_host_size_t values_length) {
-  iree_hal_metal_command_buffer_t* command_buffer =
-      iree_hal_metal_command_buffer_cast(base_command_buffer);
-
-  // "Binding a pipeline with a layout that is not compatible with the push constant layout does not
-  // disturb the push constant values." So we don't need to check whether the pipeline layout
-  // compatibility and invalidate existing values.
-
-  if (IREE_UNLIKELY(offset + values_length >= sizeof(command_buffer->state.push_constants))) {
-    return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
-                            "push constant range [%zu, %zu) out of range", offset,
-                            offset + values_length);
-  }
-
-  memcpy((uint8_t*)&command_buffer->state.push_constants + offset, values, values_length);
-
-  return iree_ok_status();
-}
-
-static inline MTLResourceUsage iree_hal_metal_get_metal_resource_usage(
-    const iree_hal_descriptor_set_layout_binding_t* binding) {
-  MTLResourceUsage usage = MTLResourceUsageRead;
-  if (binding->flags != IREE_HAL_DESCRIPTOR_FLAG_READ_ONLY) usage |= MTLResourceUsageWrite;
-  return usage;
-}
-
-static iree_status_t iree_hal_metal_command_buffer_push_descriptor_set(
-    iree_hal_command_buffer_t* base_command_buffer, iree_hal_pipeline_layout_t* pipeline_layout,
-    uint32_t set, iree_host_size_t binding_count, const iree_hal_buffer_ref_t* bindings) {
-  iree_hal_metal_command_buffer_t* command_buffer =
-      iree_hal_metal_command_buffer_cast(base_command_buffer);
-
-  if (binding_count > IREE_HAL_METAL_MAX_DESCRIPTOR_SET_BINDING_COUNT) {
-    return iree_make_status(IREE_STATUS_RESOURCE_EXHAUSTED,
-                            "exceeded available binding slots for push descriptor set #%u; "
-                            "requested %lu vs. maximal %d",
-                            set, binding_count, IREE_HAL_METAL_MAX_DESCRIPTOR_SET_BINDING_COUNT);
-  }
-
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  IREE_ASSERT(set < IREE_HAL_METAL_PUSH_CONSTANT_BUFFER_INDEX);
-  const iree_hal_descriptor_set_layout_t* set_layout =
-      iree_hal_metal_pipeline_layout_descriptor_set_layout(pipeline_layout, set);
-  iree_hal_metal_descriptor_t* descriptors = command_buffer->state.descriptor_sets[set].bindings;
-
-  // Update descriptors in the current set.
-  for (iree_host_size_t i = 0; i < binding_count; ++i) {
-    iree_hal_metal_descriptor_t* descriptor = &descriptors[i];
-
-    descriptor->set = set;
-    descriptor->binding = bindings[i].ordinal;
-    descriptor->buffer = bindings[i].buffer;
-    descriptor->offset = bindings[i].offset;
-
-    const iree_hal_descriptor_set_layout_binding_t* binding_params =
-        iree_hal_metal_descriptor_set_layout_binding(set_layout, descriptor->binding);
-    descriptor->usage = iree_hal_metal_get_metal_resource_usage(binding_params);
-  }
-
-  // Retain all buffers bound in this descriptor set.
-  for (iree_host_size_t i = 0; i < binding_count; ++i) {
-    if (bindings[i].buffer) {
-      IREE_RETURN_AND_END_ZONE_IF_ERROR(
-          z0, iree_hal_resource_set_insert(command_buffer->resource_set, 1, &bindings[i].buffer));
-    }
-  }
-
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_hal_resource_set_insert(command_buffer->resource_set, 1, &pipeline_layout));
-
-  IREE_TRACE_ZONE_END(z0);
-  return iree_ok_status();
-}
-
 // Prepares kernels and argument buffers needed for kernel dispatches.
 static iree_status_t iree_hal_metal_command_segment_create_dispatch(
     iree_hal_command_buffer_t* base_command_buffer, iree_hal_executable_t* executable,
-    int32_t entry_point, iree_hal_metal_dispatch_segment_t** out_segment) {
-  iree_hal_metal_command_buffer_t* command_buffer =
-      iree_hal_metal_command_buffer_cast(base_command_buffer);
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_hal_resource_set_insert(command_buffer->resource_set, 1, &executable));
-
-  iree_hal_metal_kernel_params_t kernel_params;
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(z0, iree_hal_metal_kernel_library_entry_point_kernel_params(
-                                            executable, entry_point, &kernel_params));
-
-  // Allocate the command segment and keep track of all necessary API data.
-  uint8_t* storage_base = NULL;
-  iree_hal_metal_command_segment_t* segment = NULL;
-  const iree_host_size_t set_count =
-      iree_hal_metal_pipeline_layout_descriptor_set_count(kernel_params.layout);
-  iree_host_size_t descriptor_count = 0;
-  // Calculate the total number of bindings across all descriptor sets.
-  for (iree_host_size_t i = 0; i < set_count; ++i) {
-    const iree_hal_descriptor_set_layout_t* set_layout =
-        iree_hal_metal_pipeline_layout_descriptor_set_layout(kernel_params.layout, i);
-    descriptor_count += iree_hal_metal_descriptor_set_layout_binding_count(set_layout);
-  }
-  iree_host_size_t descriptor_length = descriptor_count * sizeof(iree_hal_metal_descriptor_t);
-  iree_host_size_t push_constant_count =
-      iree_hal_metal_pipeline_layout_push_constant_count(kernel_params.layout);
-  iree_host_size_t push_constant_length = push_constant_count * sizeof(int32_t);
-  iree_host_size_t total_size = sizeof(*segment) + descriptor_length + push_constant_length;
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_arena_allocate(&command_buffer->arena, total_size, (void**)&storage_base));
-
-  // Compose and push the dispatch segment.
-  segment = (iree_hal_metal_command_segment_t*)storage_base;
-  memset(segment, 0, sizeof(*segment));
-  segment->action = IREE_HAL_METAL_COMMAND_SEGMENT_ACTION_DISPATCH;
-  iree_hal_metal_command_segment_list_push_back(&command_buffer->segments, segment);
-
-  segment->dispatch.kernel_params = kernel_params;
-
-  // Copy descriptors from all sets to the end of the current segment for later access.
-  segment->dispatch.descriptor_count = descriptor_count;
-  uint8_t* descriptor_ptr = storage_base + sizeof(*segment);
-  segment->dispatch.descriptors = (iree_hal_metal_descriptor_t*)descriptor_ptr;
-  for (iree_host_size_t i = 0; i < set_count; ++i) {
-    const iree_hal_descriptor_set_layout_t* set_layout =
-        iree_hal_metal_pipeline_layout_descriptor_set_layout(kernel_params.layout, i);
-    iree_host_size_t binding_count = iree_hal_metal_descriptor_set_layout_binding_count(set_layout);
-    iree_host_size_t current_size = binding_count * sizeof(iree_hal_metal_descriptor_t);
-    memcpy(descriptor_ptr, command_buffer->state.descriptor_sets[i].bindings, current_size);
-    descriptor_ptr += current_size;
-  }
-
-  // Copy push constants to the end of the current segment for later access.
-  segment->dispatch.push_constant_count = push_constant_count;
-  uint8_t* push_constant_ptr = storage_base + sizeof(*segment) + descriptor_length;
-  segment->dispatch.push_constants = (int32_t*)push_constant_ptr;
-  memcpy(push_constant_ptr, (const uint8_t*)command_buffer->state.push_constants,
-         push_constant_length);
-
-  *out_segment = &segment->dispatch;
-  IREE_TRACE_ZONE_END(z0);
-  return iree_ok_status();
-}
-
-static iree_status_t iree_hal_metal_command_segment_record_dispatch(
-    iree_hal_metal_command_buffer_t* command_buffer, iree_hal_metal_dispatch_segment_t* segment) {
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  // Set the compute kernel to dispatch.
-  id<MTLComputeCommandEncoder> compute_encoder =
-      iree_hal_metal_get_or_begin_compute_encoder(command_buffer);
-  [compute_encoder setComputePipelineState:segment->kernel_params.pso];
-
-  // Record push constants.
-  if (segment->push_constant_count != 0) {
-    [compute_encoder setBytes:(void*)segment->push_constants
-                       length:segment->push_constant_count * sizeof(int32_t)
-                      atIndex:IREE_HAL_METAL_PUSH_CONSTANT_BUFFER_INDEX];
-  }
-
-  // Record argument buffers for all descriptors and record buffer usages.
-  iree_hal_metal_descriptor_t* descriptors = segment->descriptors;
-  for (iree_host_size_t i = 0; i < segment->descriptor_count;) {
-    uint32_t current_set = descriptors[i].set;
-
-    // Build argument encoder and argument buffer for the current descriptor set.
-    // TODO(antiagainst): Use a cache layer to cache and reuse argument buffers with the same
-    // content, to avoid duplicating overhead.
-    id<MTLBuffer> argument_buffer = command_buffer->staging_buffer->metal_buffer;
-    id<MTLArgumentEncoder> argument_encoder =
-        [segment->kernel_params.function newArgumentEncoderWithBufferIndex:current_set];  // +1
-    IREE_ASSERT(argument_encoder != nil);
-
-    // Reserve space for the argument buffer from shared staging buffer.
-    iree_byte_span_t reservation;
-    uint32_t argument_buffer_offset;
-    IREE_RETURN_AND_END_ZONE_IF_ERROR(
-        z0, iree_hal_metal_staging_buffer_reserve(
-                command_buffer->staging_buffer, argument_encoder.encodedLength,
-                argument_encoder.alignment, &reservation, &argument_buffer_offset));
-    [argument_encoder setArgumentBuffer:argument_buffer offset:argument_buffer_offset];
-
-    // Now record all bound buffers belonging to the current set into the argument buffer.
-    for (; i < segment->descriptor_count && descriptors[i].set == current_set; ++i) {
-      uint32_t current_binding = descriptors[i].binding;
-      id<MTLBuffer> current_buffer =
-          iree_hal_metal_buffer_handle(iree_hal_buffer_allocated_buffer(descriptors[i].buffer));
-      iree_host_size_t offset =
-          iree_hal_buffer_byte_offset(descriptors[i].buffer) + descriptors[i].offset;
-      [argument_encoder setBuffer:current_buffer offset:offset atIndex:current_binding];
-
-      // Also record buffer usages.
-      [compute_encoder useResource:current_buffer usage:descriptors[i].usage];
-    }
-    // Record the argument buffer.
-    [compute_encoder setBuffer:argument_buffer offset:argument_buffer_offset atIndex:current_set];
-
-    [argument_encoder release];  // -1
-  }
-
-  // Record the dispatch, either direct or indirect.
-  uint32_t* workgroup_size = segment->kernel_params.threadgroup_size;
-  if (segment->workgroups_buffer == nil) {
-    // Direct dispatch of a fixed workgroup count.
-    uint32_t* workgroup_count = segment->workgroup_count;
-    [compute_encoder
-         dispatchThreadgroups:MTLSizeMake(workgroup_count[0], workgroup_count[1],
-                                          workgroup_count[2])
-        threadsPerThreadgroup:MTLSizeMake(workgroup_size[0], workgroup_size[1], workgroup_size[2])];
-  } else {
-    // Indirect dispatch using a workgroup count from buffers.
-    [compute_encoder
-        dispatchThreadgroupsWithIndirectBuffer:segment->workgroups_buffer
-                          indirectBufferOffset:segment->workgroups_offset
-                         threadsPerThreadgroup:MTLSizeMake(workgroup_size[0], workgroup_size[1],
-                                                           workgroup_size[2])];
-  }
-
-  IREE_TRACE_ZONE_END(z0);
-  return iree_ok_status();
-}
-
-static iree_status_t iree_hal_metal_command_buffer_prepare_dispatch(
-    iree_hal_command_buffer_t* base_command_buffer, iree_hal_executable_t* executable,
-    int32_t entry_point, uint32_t workgroup_count_x, uint32_t workgroup_count_y,
-    uint32_t workgroup_count_z, iree_hal_dispatch_flags_t flags) {
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  iree_hal_metal_dispatch_segment_t* segment = NULL;
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_hal_metal_command_segment_create_dispatch(base_command_buffer, executable,
-                                                         entry_point, &segment));
-  segment->workgroup_count[0] = workgroup_count_x;
-  segment->workgroup_count[1] = workgroup_count_y;
-  segment->workgroup_count[2] = workgroup_count_z;
-
-  IREE_TRACE_ZONE_END(z0);
-  return iree_ok_status();
-}
-
-static iree_status_t iree_hal_metal_command_buffer_prepare_dispatch_indirect(
-    iree_hal_command_buffer_t* base_command_buffer, iree_hal_executable_t* executable,
-    int32_t entry_point, iree_hal_buffer_ref_t workgroups_ref, iree_hal_dispatch_flags_t flags) {
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  iree_hal_metal_dispatch_segment_t* segment = NULL;
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_hal_metal_command_segment_create_dispatch(base_command_buffer, executable,
-                                                         entry_point, &segment));
-  segment->workgroups_buffer =
-      iree_hal_metal_buffer_handle(iree_hal_buffer_allocated_buffer(workgroups_ref.buffer));
-  segment->workgroups_offset = workgroups_ref.offset;
-
-  IREE_TRACE_ZONE_END(z0);
-  return iree_ok_status();
-}
-
-// Prepares kernels and argument buffers needed for kernel dispatches.
-static iree_status_t iree_hal_metal_command_segment_create_dispatch2(
-    iree_hal_command_buffer_t* base_command_buffer, iree_hal_executable_t* executable,
     int32_t entry_point, iree_const_byte_span_t constants, iree_hal_buffer_ref_list_t bindings,
-    iree_hal_dispatch_flags_t flags, iree_hal_metal_dispatch2_segment_t** out_segment) {
+    iree_hal_dispatch_flags_t flags, iree_hal_metal_dispatch_segment_t** out_segment) {
   iree_hal_metal_command_buffer_t* command_buffer =
       iree_hal_metal_command_buffer_cast(base_command_buffer);
   IREE_TRACE_ZONE_BEGIN(z0);
@@ -1143,9 +844,9 @@
   IREE_RETURN_AND_END_ZONE_IF_ERROR(
       z0, iree_hal_resource_set_insert(command_buffer->resource_set, 1, &executable));
 
-  iree_hal_metal_kernel_params_t kernel_params;
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(z0, iree_hal_metal_kernel_library_entry_point_kernel_params(
-                                            executable, entry_point, &kernel_params));
+  const iree_hal_metal_pipeline_t* pipeline = NULL;
+  IREE_RETURN_AND_END_ZONE_IF_ERROR(
+      z0, iree_hal_metal_executable_lookup_pipeline(executable, entry_point, &pipeline));
 
   // Allocate the command segment and keep track of all necessary API data.
   uint8_t* storage_base = NULL;
@@ -1161,24 +862,24 @@
   segment->action = IREE_HAL_METAL_COMMAND_SEGMENT_ACTION_DISPATCH2;
   iree_hal_metal_command_segment_list_push_back(&command_buffer->segments, segment);
 
-  segment->dispatch.kernel_params = kernel_params;
+  segment->dispatch.pipeline = pipeline;
 
   // Copy descriptors from all sets to the end of the current segment for later access.
-  const iree_hal_descriptor_set_layout_t* set_layout =
-      iree_hal_metal_pipeline_layout_descriptor_set_layout(kernel_params.layout, 0);
   segment->dispatch.descriptor_count = bindings.count;
   segment->dispatch.descriptors = (iree_hal_metal_descriptor_t*)(storage_base + sizeof(*segment));
   for (iree_host_size_t i = 0; i < bindings.count; ++i) {
     iree_hal_metal_descriptor_t* descriptor = &segment->dispatch.descriptors[i];
 
-    descriptor->set = 0;
     descriptor->binding = i;
     descriptor->buffer = bindings.values[i].buffer;
     descriptor->offset = bindings.values[i].offset;
 
-    const iree_hal_descriptor_set_layout_binding_t* binding_params =
-        iree_hal_metal_descriptor_set_layout_binding(set_layout, descriptor->binding);
-    descriptor->usage = iree_hal_metal_get_metal_resource_usage(binding_params);
+    MTLResourceUsage usage = MTLResourceUsageRead;
+    uint64_t binding_bit = 1ull << i;
+    if (iree_any_bit_set(pipeline->binding_read_only_bits, binding_bit)) {
+      usage |= MTLResourceUsageWrite;
+    }
+    descriptor->usage = usage;
 
     if (descriptor->buffer) {
       IREE_RETURN_AND_END_ZONE_IF_ERROR(
@@ -1187,29 +888,29 @@
   }
 
   // Copy push constants to the end of the current segment for later access.
-  segment->dispatch.push_constant_count = constants.data_length / sizeof(uint32_t);
-  uint8_t* push_constant_ptr = storage_base + sizeof(*segment) + descriptor_length;
-  segment->dispatch.push_constants = (int32_t*)push_constant_ptr;
-  memcpy(push_constant_ptr, constants.data, constants.data_length);
+  segment->dispatch.constant_count = constants.data_length / sizeof(uint32_t);
+  uint8_t* constant_ptr = storage_base + sizeof(*segment) + descriptor_length;
+  segment->dispatch.constants = (uint32_t*)constant_ptr;
+  memcpy(constant_ptr, constants.data, constants.data_length);
 
-  *out_segment = &segment->dispatch2;
+  *out_segment = &segment->dispatch;
   IREE_TRACE_ZONE_END(z0);
   return iree_ok_status();
 }
 
-static iree_status_t iree_hal_metal_command_segment_record_dispatch2(
-    iree_hal_metal_command_buffer_t* command_buffer, iree_hal_metal_dispatch2_segment_t* segment) {
+static iree_status_t iree_hal_metal_command_segment_record_dispatch(
+    iree_hal_metal_command_buffer_t* command_buffer, iree_hal_metal_dispatch_segment_t* segment) {
   IREE_TRACE_ZONE_BEGIN(z0);
 
   // Set the compute kernel to dispatch.
   id<MTLComputeCommandEncoder> compute_encoder =
       iree_hal_metal_get_or_begin_compute_encoder(command_buffer);
-  [compute_encoder setComputePipelineState:segment->kernel_params.pso];
+  [compute_encoder setComputePipelineState:segment->pipeline->pipeline_state];
 
   // Record push constants.
-  if (segment->push_constant_count != 0) {
-    [compute_encoder setBytes:(void*)segment->push_constants
-                       length:segment->push_constant_count * sizeof(int32_t)
+  if (segment->constant_count != 0) {
+    [compute_encoder setBytes:(void*)segment->constants
+                       length:segment->constant_count * sizeof(int32_t)
                       atIndex:IREE_HAL_METAL_PUSH_CONSTANT_BUFFER_INDEX];
   }
 
@@ -1221,7 +922,7 @@
   // content, to avoid duplicating overhead.
   id<MTLBuffer> argument_buffer = command_buffer->staging_buffer->metal_buffer;
   id<MTLArgumentEncoder> argument_encoder =
-      [segment->kernel_params.function newArgumentEncoderWithBufferIndex:0];  // +1
+      [segment->pipeline->function newArgumentEncoderWithBufferIndex:0];  // +1
   IREE_ASSERT(argument_encoder != nil);
 
   // Reserve space for the argument buffer from shared staging buffer.
@@ -1251,54 +952,47 @@
   [argument_encoder release];  // -1
 
   // Record the dispatch, either direct or indirect.
-  uint32_t* workgroup_size = segment->kernel_params.threadgroup_size;
   if (segment->workgroups_buffer == nil) {
     // Direct dispatch of a fixed workgroup count.
-    uint32_t* workgroup_count = segment->workgroup_count;
-    [compute_encoder
-         dispatchThreadgroups:MTLSizeMake(workgroup_count[0], workgroup_count[1],
-                                          workgroup_count[2])
-        threadsPerThreadgroup:MTLSizeMake(workgroup_size[0], workgroup_size[1], workgroup_size[2])];
+    [compute_encoder dispatchThreadgroups:segment->workgroup_count
+                    threadsPerThreadgroup:segment->pipeline->threadgroup_size];
   } else {
     // Indirect dispatch using a workgroup count from buffers.
-    [compute_encoder
-        dispatchThreadgroupsWithIndirectBuffer:segment->workgroups_buffer
-                          indirectBufferOffset:segment->workgroups_offset
-                         threadsPerThreadgroup:MTLSizeMake(workgroup_size[0], workgroup_size[1],
-                                                           workgroup_size[2])];
+    [compute_encoder dispatchThreadgroupsWithIndirectBuffer:segment->workgroups_buffer
+                                       indirectBufferOffset:segment->workgroups_offset
+                                      threadsPerThreadgroup:segment->pipeline->threadgroup_size];
   }
 
   IREE_TRACE_ZONE_END(z0);
   return iree_ok_status();
 }
 
-static iree_status_t iree_hal_metal_command_buffer_prepare_dispatch2(
+static iree_status_t iree_hal_metal_command_buffer_prepare_dispatch(
     iree_hal_command_buffer_t* base_command_buffer, iree_hal_executable_t* executable,
     int32_t entry_point, const uint32_t workgroup_count[3], iree_const_byte_span_t constants,
     iree_hal_buffer_ref_list_t bindings, iree_hal_dispatch_flags_t flags) {
   IREE_TRACE_ZONE_BEGIN(z0);
 
-  iree_hal_metal_dispatch2_segment_t* segment = NULL;
+  iree_hal_metal_dispatch_segment_t* segment = NULL;
   IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_hal_metal_command_segment_create_dispatch2(
+      z0, iree_hal_metal_command_segment_create_dispatch(
               base_command_buffer, executable, entry_point, constants, bindings, flags, &segment));
-  segment->workgroup_count[0] = workgroup_count[0];
-  segment->workgroup_count[1] = workgroup_count[1];
-  segment->workgroup_count[2] = workgroup_count[2];
+  segment->workgroup_count =
+      MTLSizeMake(workgroup_count[0], workgroup_count[1], workgroup_count[2]);
 
   IREE_TRACE_ZONE_END(z0);
   return iree_ok_status();
 }
 
-static iree_status_t iree_hal_metal_command_buffer_prepare_dispatch2_indirect(
+static iree_status_t iree_hal_metal_command_buffer_prepare_dispatch_indirect(
     iree_hal_command_buffer_t* base_command_buffer, iree_hal_executable_t* executable,
     int32_t entry_point, iree_hal_buffer_ref_t workgroups_ref, iree_const_byte_span_t constants,
     iree_hal_buffer_ref_list_t bindings, iree_hal_dispatch_flags_t flags) {
   IREE_TRACE_ZONE_BEGIN(z0);
 
-  iree_hal_metal_dispatch2_segment_t* segment = NULL;
+  iree_hal_metal_dispatch_segment_t* segment = NULL;
   IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_hal_metal_command_segment_create_dispatch2(
+      z0, iree_hal_metal_command_segment_create_dispatch(
               base_command_buffer, executable, entry_point, constants, bindings, flags, &segment));
   segment->workgroups_buffer =
       iree_hal_metal_buffer_handle(iree_hal_buffer_allocated_buffer(workgroups_ref.buffer));
@@ -1320,14 +1014,10 @@
         IREE_RETURN_AND_END_ZONE_IF_ERROR(
             z0, iree_hal_metal_command_segment_record_barrier(command_buffer, &segment->barrier));
       } break;
-      case IREE_HAL_METAL_COMMAND_SEGMENT_ACTION_DISPATCH: {
+      case IREE_HAL_METAL_COMMAND_SEGMENT_ACTION_DISPATCH2: {
         IREE_RETURN_AND_END_ZONE_IF_ERROR(
             z0, iree_hal_metal_command_segment_record_dispatch(command_buffer, &segment->dispatch));
       } break;
-      case IREE_HAL_METAL_COMMAND_SEGMENT_ACTION_DISPATCH2: {
-        IREE_RETURN_AND_END_ZONE_IF_ERROR(z0, iree_hal_metal_command_segment_record_dispatch2(
-                                                  command_buffer, &segment->dispatch2));
-      } break;
       case IREE_HAL_METAL_COMMAND_SEGMENT_ACTION_FILL_BUFFER: {
         IREE_RETURN_AND_END_ZONE_IF_ERROR(z0, iree_hal_metal_command_segment_record_fill_buffer(
                                                   command_buffer, &segment->fill_buffer));
@@ -1383,10 +1073,6 @@
     .update_buffer = iree_hal_metal_command_buffer_prepare_update_buffer,
     .copy_buffer = iree_hal_metal_command_buffer_prepare_copy_buffer,
     .collective = iree_hal_metal_command_buffer_collective,
-    .push_constants = iree_hal_metal_command_buffer_push_constants,
-    .push_descriptor_set = iree_hal_metal_command_buffer_push_descriptor_set,
     .dispatch = iree_hal_metal_command_buffer_prepare_dispatch,
     .dispatch_indirect = iree_hal_metal_command_buffer_prepare_dispatch_indirect,
-    .dispatch2 = iree_hal_metal_command_buffer_prepare_dispatch2,
-    .dispatch2_indirect = iree_hal_metal_command_buffer_prepare_dispatch2_indirect,
 };
diff --git a/runtime/src/iree/hal/drivers/metal/executable.h b/runtime/src/iree/hal/drivers/metal/executable.h
new file mode 100644
index 0000000..ea70eb6
--- /dev/null
+++ b/runtime/src/iree/hal/drivers/metal/executable.h
@@ -0,0 +1,107 @@
+// Copyright 2023 The IREE Authors
+//
+// Licensed under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+
+#ifndef IREE_HAL_DRIVERS_METAL_EXECUTABLE_H_
+#define IREE_HAL_DRIVERS_METAL_EXECUTABLE_H_
+
+#import <Metal/Metal.h>
+#include <stdint.h>
+
+#include "iree/base/api.h"
+#include "iree/hal/api.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif  // __cplusplus
+
+//===----------------------------------------------------------------------===//
+// Limitations
+//===----------------------------------------------------------------------===//
+
+// The max number of bindings per descriptor set allowed in the Metal HAL
+// implementation.
+//
+// Note that Metal itself is more permissive:
+// - Argument buffer tier 1 binding limits:
+//   - iOS: 31 buffers (on A11 and later, 96 buffers)
+//   - macOS: 64 buffers
+// - Argument buffer tier 2 binding limits:
+//   - 500,000 buffers or textures
+#define IREE_HAL_METAL_MAX_DESCRIPTOR_SET_BINDING_COUNT 16
+
+// The max number of descriptor sets allowed in the Metal HAL implementation.
+//
+// This depends on the general descriptor set planning in IREE and should adjust
+// with it.
+#define IREE_HAL_METAL_MAX_DESCRIPTOR_SET_COUNT 4
+
+// The [[buffer(N)]] index for push constants.
+//
+// This depends on the general descriptor set planning in IREE and should adjust
+// with it. Note that it also needs to be consistent with the compiler side when
+// setting up resource location attributes during cross compiling SPIR-V to MSL.
+#define IREE_HAL_METAL_PUSH_CONSTANT_BUFFER_INDEX \
+  (IREE_HAL_METAL_MAX_DESCRIPTOR_SET_COUNT - 1)
+
+// The max number of push constants supported by the Metal HAL implementation.
+#define IREE_HAL_METAL_MAX_PUSH_CONSTANT_COUNT 64
+
+//===----------------------------------------------------------------------===//
+// iree_hal_metal_executable_t
+//===----------------------------------------------------------------------===//
+
+typedef struct iree_hal_metal_source_location_t {
+  iree_string_view_t file_name;
+  int line;
+  iree_string_view_t func_name;
+} iree_hal_metal_source_location_t;
+
+// Object and launch parameters for a compute function.
+typedef struct iree_hal_metal_pipeline_t {
+  id<MTLFunction> function;
+
+  // Cached pipeline used to dispatch the function.
+  id<MTLComputePipelineState> pipeline_state;
+
+  // Threadgroup size required during dispatch.
+  MTLSize threadgroup_size;
+
+  // Total number of 32-bit constants.
+  uint32_t constant_count;
+  // Total number of bindings.
+  uint32_t binding_count;
+  // One bit per binding indicating whether it is read-only.
+  uint64_t binding_read_only_bits;
+
+  IREE_TRACE(iree_hal_metal_source_location_t source_location;)
+} iree_hal_metal_pipeline_t;
+
+// Creates a Metal kernel library as an IREE executable. The Metal library may
+// contain several kernel functions that can be extracted along with the
+// associated block size.
+//
+// Metal represents compute kernels as MTLFunctions. MTLLibrary is just an
+// allocation of MTLFunctions. One creates a MTLComputePipelineState from a
+// MTLFunction and uses the pipeline state for creating compute pipelines.
+// This class bundles all the necessary Metal objects for getting pipeline state
+// objects for a compute kernel.
+//
+// |out_executable| must be released by the caller (see
+// iree_hal_executable_release).
+iree_status_t iree_hal_metal_executable_create(
+    id<MTLDevice> device, const iree_hal_executable_params_t* executable_params,
+    iree_allocator_t host_allocator, iree_hal_executable_t** out_executable);
+
+// Returns the function launch parameters for the given |entry_point|.
+iree_status_t iree_hal_metal_executable_lookup_pipeline(
+    const iree_hal_executable_t* executable, uint32_t entry_point,
+    const iree_hal_metal_pipeline_t** out_pipeline);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif  // __cplusplus
+
+#endif  // IREE_HAL_DRIVERS_METAL_EXECUTABLE_H_
diff --git a/runtime/src/iree/hal/drivers/metal/executable.m b/runtime/src/iree/hal/drivers/metal/executable.m
new file mode 100644
index 0000000..e4b8883
--- /dev/null
+++ b/runtime/src/iree/hal/drivers/metal/executable.m
@@ -0,0 +1,496 @@
+// Copyright 2023 The IREE Authors
+//
+// Licensed under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+
+#include "iree/hal/drivers/metal/executable.h"
+
+#include <Metal/Metal.h>
+#include <stddef.h>
+
+#include "iree/base/api.h"
+#include "iree/hal/utils/executable_debug_info.h"
+
+// flatcc schemas:
+#include "iree/base/internal/flatcc/parsing.h"
+#include "iree/schemas/executable_debug_info_reader.h"
+#include "iree/schemas/executable_debug_info_verifier.h"
+#include "iree/schemas/metal_executable_def_reader.h"
+#include "iree/schemas/metal_executable_def_verifier.h"
+
+typedef struct iree_hal_metal_executable_t {
+  // Abstract resource used for injecting reference counting and vtable; must be at offset 0.
+  iree_hal_resource_t resource;
+
+  iree_allocator_t host_allocator;
+
+  NSArray<id<MTLLibrary>>* libraries;
+
+  iree_host_size_t pipeline_count;
+  iree_hal_metal_pipeline_t pipelines[];
+} iree_hal_metal_executable_t;
+
+static const iree_hal_executable_vtable_t iree_hal_metal_executable_vtable;
+
+static iree_hal_metal_executable_t* iree_hal_metal_executable_cast(
+    iree_hal_executable_t* base_value) {
+  IREE_HAL_ASSERT_TYPE(base_value, &iree_hal_metal_executable_vtable);
+  return (iree_hal_metal_executable_t*)base_value;
+}
+
+static const iree_hal_metal_executable_t* iree_hal_metal_executable_const_cast(
+    const iree_hal_executable_t* base_value) {
+  IREE_HAL_ASSERT_TYPE(base_value, &iree_hal_metal_executable_vtable);
+  return (const iree_hal_metal_executable_t*)base_value;
+}
+
+// Verifies the structure of the flatbuffer so that we can avoid doing so during runtime.
+//
+// There are still some conditions we must be aware of (such as omitted names on functions with
+// internal linkage), however we shouldn't need to bounds check anything within the flatbuffer
+// after this succeeds.
+static iree_status_t iree_hal_metal_executable_flatbuffer_verify(
+    iree_const_byte_span_t flatbuffer_data) {
+  if (!flatbuffer_data.data || flatbuffer_data.data_length < 16) {
+    return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
+                            "flatbuffer data is not present or less than 16 bytes (%zu total)",
+                            flatbuffer_data.data_length);
+  }
+
+  // Run flatcc generated verification. This ensures all pointers are in-bounds and that we can
+  // safely walk the file, but not that the actual contents of the flatbuffer meet our expectations.
+  int verify_ret = iree_hal_metal_ExecutableDef_verify_as_root(flatbuffer_data.data,
+                                                               flatbuffer_data.data_length);
+  if (verify_ret != flatcc_verify_ok) {
+    return iree_make_status(IREE_STATUS_INVALID_ARGUMENT, "flatbuffer verification failed: %s",
+                            flatcc_verify_error_string(verify_ret));
+  }
+
+  iree_hal_metal_ExecutableDef_table_t executable_def =
+      iree_hal_metal_ExecutableDef_as_root(flatbuffer_data.data);
+
+  iree_hal_metal_LibraryDef_vec_t libraries_vec =
+      iree_hal_metal_ExecutableDef_libraries_get(executable_def);
+  iree_host_size_t library_count = iree_hal_metal_LibraryDef_vec_len(libraries_vec);
+  for (iree_host_size_t i = 0; i < library_count; ++i) {
+    iree_hal_metal_LibraryDef_table_t library_def =
+        iree_hal_metal_LibraryDef_vec_at(libraries_vec, i);
+    if (!library_def) {
+      return iree_make_status(IREE_STATUS_INVALID_ARGUMENT, "libraries[%" PRIhsz "] is NULL", i);
+    }
+
+    // NOTE: the source is optional but if present must be valid.
+    iree_hal_metal_MSLSourceDef_table_t source_def =
+        iree_hal_metal_LibraryDef_source_get(library_def);
+    if (source_def) {
+      // NOTE: the version check just ensures that we don't pass garbage to Metal; the current
+      // platform may not support the version even if the enum is valid and we won't know until we
+      // try compiling it.
+      uint32_t version = iree_hal_metal_MSLSourceDef_version_get(source_def);
+      if (version > MTLLanguageVersion3_0) {
+        return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
+                                "libraries[%" PRIhsz
+                                "] MSL language version %u is unsupported by the compiled runtime",
+                                i, version);
+      }
+      flatbuffers_string_t code = iree_hal_metal_MSLSourceDef_code_get(source_def);
+      if (flatbuffers_string_len(code) == 0) {
+        return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
+                                "libraries[%" PRIhsz "] MSL source is empty", i);
+      }
+    }
+
+    // Require that source is provided if no metallib is.
+    flatbuffers_string_t metallib = iree_hal_metal_LibraryDef_metallib_get(library_def);
+    if (flatbuffers_string_len(metallib) == 0 && !source_def) {
+      return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
+                              "libraries[%" PRIhsz "] has neither source or binary data", i);
+    }
+  }
+
+  iree_hal_metal_PipelineDef_vec_t pipelines_vec =
+      iree_hal_metal_ExecutableDef_pipelines_get(executable_def);
+  for (iree_host_size_t i = 0; i < iree_hal_metal_PipelineDef_vec_len(pipelines_vec); ++i) {
+    iree_hal_metal_PipelineDef_table_t pipeline_def =
+        iree_hal_metal_PipelineDef_vec_at(pipelines_vec, i);
+    if (!pipeline_def) continue;
+
+    uint32_t library_ordinal = iree_hal_metal_PipelineDef_library_ordinal_get(pipeline_def);
+    if (library_ordinal >= library_count) {
+      return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
+                              "pipelines[%" PRIhsz "] library_ordinal %u is out of bounds %" PRIhsz,
+                              i, library_ordinal, library_count);
+    }
+
+    if (flatbuffers_string_len(iree_hal_metal_PipelineDef_entry_point_get(pipeline_def)) == 0) {
+      return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
+                              "pipelines[%" PRIhsz "] entry point name is empty", i);
+    }
+
+    if (!iree_hal_metal_PipelineDef_threadgroup_size_is_present(pipeline_def)) {
+      return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
+                              "pipelines[%" PRIhsz "] threadgroup size is missing", i);
+    }
+
+    uint32_t constant_count = iree_hal_metal_PipelineDef_constant_count_get(pipeline_def);
+    if (constant_count > IREE_HAL_METAL_MAX_PUSH_CONSTANT_COUNT) {
+      return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
+                              "pipelines[%" PRIhsz "] constant_count %u exceeds maximum of %u", i,
+                              constant_count, IREE_HAL_METAL_MAX_PUSH_CONSTANT_COUNT);
+    }
+
+    iree_hal_metal_BindingBits_vec_t binding_flags_vec =
+        iree_hal_metal_PipelineDef_binding_flags_get(pipeline_def);
+    if (iree_hal_metal_BindingBits_vec_len(binding_flags_vec) >
+        IREE_HAL_METAL_MAX_DESCRIPTOR_SET_BINDING_COUNT) {
+      return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
+                              "pipelines[%" PRIhsz
+                              "] binding_flags count %zu exceeds maximum of %u",
+                              i, iree_hal_metal_BindingBits_vec_len(binding_flags_vec),
+                              IREE_HAL_METAL_MAX_DESCRIPTOR_SET_BINDING_COUNT);
+    }
+
+    IREE_RETURN_IF_ERROR(
+        iree_hal_debug_verify_export_def(iree_hal_metal_PipelineDef_debug_info_get(pipeline_def)));
+  }
+
+  return iree_ok_status();
+}
+
+static iree_status_t iree_hal_metal_compile_source(id<MTLDevice> device,
+                                                   iree_hal_metal_MSLSourceDef_table_t source_def,
+                                                   id<MTLLibrary>* out_library) {
+  *out_library = nil;
+  IREE_TRACE_ZONE_BEGIN(z0);
+
+  iree_status_t status = iree_ok_status();
+  id<MTLLibrary> library = nil;
+  @autoreleasepool {
+    MTLCompileOptions* compile_options = [[MTLCompileOptions new] autorelease];
+    compile_options.languageVersion = MTLLanguageVersion3_0;
+    if (iree_hal_metal_MSLSourceDef_version_is_present(source_def)) {
+      compile_options.languageVersion =
+          (MTLLanguageVersion)iree_hal_metal_MSLSourceDef_version_get(source_def);
+    }
+
+    flatbuffers_string_t code = iree_hal_metal_MSLSourceDef_code_get(source_def);
+    NSString* code_str =
+        [[[NSString alloc] initWithBytes:code
+                                  length:flatbuffers_string_len(code)
+                                encoding:[NSString defaultCStringEncoding]] autorelease];
+
+    NSError* error = nil;
+    library = [device newLibraryWithSource:code_str options:compile_options error:&error];  // +1
+    if (IREE_UNLIKELY(library == nil)) {
+      const char* ns_c_error = [error.localizedDescription
+          cStringUsingEncoding:[NSString defaultCStringEncoding]];  // autoreleased
+      status = iree_make_status(IREE_STATUS_INVALID_ARGUMENT, "failed to create MTLLibrary: %s",
+                                ns_c_error);
+    }
+  }
+
+  if (iree_status_is_ok(status)) {
+    *out_library = library;
+  } else {
+    [library release];
+  }
+  IREE_TRACE_ZONE_END(z0);
+  return status;
+}
+
+static iree_status_t iree_hal_metal_load_library(id<MTLDevice> device,
+                                                 flatbuffers_string_t metallib,
+                                                 flatbuffers_string_t metallibsym,
+                                                 id<MTLLibrary>* out_library) {
+  *out_library = nil;
+  IREE_TRACE_ZONE_BEGIN(z0);
+
+  iree_status_t status = iree_ok_status();
+  id<MTLLibrary> library = nil;
+  @autoreleasepool {
+    dispatch_data_t data = dispatch_data_create(metallib, flatbuffers_string_len(metallib),
+                                                /*queue=*/NULL, DISPATCH_DATA_DESTRUCTOR_DEFAULT);
+    NSError* error = nil;
+    library = [device newLibraryWithData:data error:&error];  // +1
+    if (IREE_UNLIKELY(library == nil)) {
+      const char* ns_c_error = [error.localizedDescription
+          cStringUsingEncoding:[NSString defaultCStringEncoding]];  // autoreleased
+      status = iree_make_status(IREE_STATUS_INVALID_ARGUMENT, "failed to create MTLLibrary: %s",
+                                ns_c_error);
+    }
+  }
+
+  if (iree_status_is_ok(status)) {
+    *out_library = library;
+  } else {
+    [library release];
+  }
+  IREE_TRACE_ZONE_END(z0);
+  return status;
+}
+
+// Loads all MTLLibrary instances in the executable and returns an array with matching order.
+static iree_status_t iree_hal_metal_load_libraries(id<MTLDevice> device,
+                                                   iree_hal_metal_LibraryDef_vec_t libraries_vec,
+                                                   NSArray<id<MTLLibrary>>** out_libraries) {
+  *out_libraries = nil;
+  IREE_TRACE_ZONE_BEGIN(z0);
+
+  NSMutableArray<id<MTLLibrary>>* libraries = [[NSMutableArray alloc] init];  // +1
+
+  iree_status_t status = iree_ok_status();
+  for (iree_host_size_t i = 0; i < iree_hal_metal_LibraryDef_vec_len(libraries_vec); ++i) {
+    iree_hal_metal_LibraryDef_table_t library_def =
+        iree_hal_metal_LibraryDef_vec_at(libraries_vec, i);
+    id<MTLLibrary> library = nil;
+    if (iree_hal_metal_LibraryDef_metallib_is_present(library_def)) {
+      // Load binary MTLLibrary (with optional symbols).
+      flatbuffers_string_t metallib = iree_hal_metal_LibraryDef_metallib_get(library_def);
+      flatbuffers_string_t metallibsym = iree_hal_metal_LibraryDef_metallibsym_get(library_def);
+      status = iree_hal_metal_load_library(device, metallib, metallibsym, &library);
+    } else {
+      // Compile MSL source code into a MTLLibrary.
+      iree_hal_metal_MSLSourceDef_table_t source_def =
+          iree_hal_metal_LibraryDef_source_get(library_def);
+      status = iree_hal_metal_compile_source(device, source_def, &library);
+    }
+    if (!iree_status_is_ok(status)) break;
+    [libraries addObject:library];
+  }
+
+  if (iree_status_is_ok(status)) {
+    *out_libraries = libraries;
+  } else {
+    [libraries release];  // -1
+  }
+  IREE_TRACE_ZONE_END(z0);
+  return status;
+}
+
+#if 0
+// Creates MTL compute pipeline objects for the given |entry_point| in |library| and writes to
+// |out_function| and |out_pipeline_state|. The caller should release |out_function| and
+// |out_pipeline_state| after done.
+static iree_status_t iree_hal_metal_create_pipeline_state(
+    id<MTLLibrary> library, iree_string_view_t entry_point, const char* source_code,
+    id<MTLDevice> device, id<MTLFunction>* out_function,
+    id<MTLComputePipelineState>* out_pipeline_state) {
+  @autoreleasepool {
+    NSError* error = nil;
+
+    // TODO(#14047): Enable async pipeline creation at runtime.
+    *out_pipeline_state = [device newComputePipelineStateWithFunction:*out_function
+                                                                error:&error];  // +1
+    if (IREE_UNLIKELY(*out_pipeline_state == nil)) {
+      [*out_function release];
+      return iree_hal_metal_get_invalid_kernel_status(
+          "invalid shader source", "when creating MTLComputePipelineState with NSError: %s", error,
+          entry_point, source_code);
+    }
+  }
+  return iree_ok_status();
+}
+#endif  // 0
+
+static iree_status_t iree_hal_metal_create_pipeline(id<MTLDevice> device, id<MTLLibrary> library,
+                                                    iree_hal_metal_PipelineDef_table_t pipeline_def,
+                                                    iree_hal_metal_pipeline_t* out_pipeline) {
+  IREE_TRACE_ZONE_BEGIN(z0);
+  flatbuffers_string_t entry_point = iree_hal_metal_PipelineDef_entry_point_get(pipeline_def);
+  IREE_TRACE_ZONE_APPEND_TEXT(z0, entry_point);
+
+  iree_status_t status = iree_ok_status();
+  @autoreleasepool {
+    NSString* function_name =
+        [[[NSString alloc] initWithBytes:entry_point
+                                  length:flatbuffers_string_len(entry_point)
+                                encoding:[NSString defaultCStringEncoding]] autorelease];
+    out_pipeline->function = [library newFunctionWithName:function_name];  // +1
+    if (IREE_UNLIKELY(out_pipeline->function == nil)) {
+      status =
+          iree_make_status(IREE_STATUS_INVALID_ARGUMENT, "function `%.*s` not found in MTLLibrary",
+                           (int)flatbuffers_string_len(entry_point), entry_point);
+    }
+
+    if (iree_status_is_ok(status)) {
+      MTLComputePipelineDescriptor* descriptor =
+          [[[MTLComputePipelineDescriptor alloc] init] autorelease];
+      [descriptor setComputeFunction:out_pipeline->function];
+      [descriptor setLabel:function_name];
+      if (iree_hal_metal_PipelineDef_max_threads_per_threadgroup_is_present(pipeline_def)) {
+        [descriptor setMaxTotalThreadsPerThreadgroup:
+                        iree_hal_metal_PipelineDef_max_threads_per_threadgroup_get(pipeline_def)];
+      }
+      if (iree_hal_metal_PipelineDef_threadgroup_size_aligned_is_present(pipeline_def)) {
+        [descriptor setThreadGroupSizeIsMultipleOfThreadExecutionWidth:
+                        iree_hal_metal_PipelineDef_threadgroup_size_aligned_get(pipeline_def)];
+      }
+      [[[descriptor buffers] objectAtIndexedSubscript:0] setMutability:MTLMutabilityImmutable];
+      [[[descriptor buffers] objectAtIndexedSubscript:IREE_HAL_METAL_PUSH_CONSTANT_BUFFER_INDEX]
+          setMutability:MTLMutabilityImmutable];
+
+      NSError* error = nil;
+      out_pipeline->pipeline_state =
+          [device newComputePipelineStateWithDescriptor:descriptor
+                                                options:MTLPipelineOptionNone
+                                             reflection:nil
+                                                  error:&error];
+      if (IREE_UNLIKELY(out_pipeline->pipeline_state == nil)) {
+        const char* ns_c_error = [error.localizedDescription
+            cStringUsingEncoding:[NSString defaultCStringEncoding]];  // autoreleased
+        status = iree_make_status(
+            IREE_STATUS_INVALID_ARGUMENT, "failed to create pipeline with function `%.*s`: %s",
+            (int)flatbuffers_string_len(entry_point), entry_point, ns_c_error);
+      }
+    }
+  }
+
+  if (iree_status_is_ok(status)) {
+    const iree_hal_metal_ThreadgroupSize_t* threadgroup_size =
+        iree_hal_metal_PipelineDef_threadgroup_size_get(pipeline_def);
+    out_pipeline->threadgroup_size =
+        MTLSizeMake(threadgroup_size->x, threadgroup_size->y, threadgroup_size->z);
+
+    out_pipeline->constant_count = iree_hal_metal_PipelineDef_constant_count_get(pipeline_def);
+    iree_hal_metal_BindingBits_vec_t binding_flags_vec =
+        iree_hal_metal_PipelineDef_binding_flags_get(pipeline_def);
+    out_pipeline->binding_count = iree_hal_metal_BindingBits_vec_len(binding_flags_vec);
+
+    out_pipeline->binding_read_only_bits = 0;
+    for (iree_host_size_t i = 0; i < out_pipeline->binding_count; ++i) {
+      iree_hal_metal_BindingBits_enum_t binding_flags =
+          iree_hal_metal_BindingBits_vec_at(binding_flags_vec, i);
+      if (iree_all_bits_set(binding_flags, iree_hal_metal_BindingBits_IMMUTABLE)) {
+        out_pipeline->binding_read_only_bits |= 1ull << i;
+      }
+    }
+  }
+
+  IREE_TRACE_ZONE_END(z0);
+  return status;
+}
+
+iree_status_t iree_hal_metal_executable_create(
+    id<MTLDevice> device, const iree_hal_executable_params_t* executable_params,
+    iree_allocator_t host_allocator, iree_hal_executable_t** out_executable) {
+  IREE_ASSERT_ARGUMENT(device);
+  IREE_ASSERT_ARGUMENT(executable_params);
+  IREE_ASSERT_ARGUMENT(out_executable);
+  *out_executable = NULL;
+  IREE_TRACE_ZONE_BEGIN(z0);
+
+  IREE_RETURN_IF_ERROR(
+      iree_hal_metal_executable_flatbuffer_verify(executable_params->executable_data));
+
+  iree_hal_metal_ExecutableDef_table_t executable_def =
+      iree_hal_metal_ExecutableDef_as_root(executable_params->executable_data.data);
+
+  iree_hal_metal_PipelineDef_vec_t pipelines_vec =
+      iree_hal_metal_ExecutableDef_pipelines_get(executable_def);
+  iree_host_size_t pipeline_count = flatbuffers_string_vec_len(pipelines_vec);
+
+  // Calculate the total number of characters across all entry point names. This
+  // is only required when tracing so that we can store copies of the names as
+  // the flatbuffer storing the strings may be released while the executable is
+  // still live.
+  iree_host_size_t total_debug_info_length = 0;
+  IREE_TRACE({
+    for (iree_host_size_t i = 0; i < pipeline_count; ++i) {
+      iree_hal_metal_PipelineDef_table_t pipeline_def =
+          iree_hal_metal_PipelineDef_vec_at(pipelines_vec, i);
+      total_debug_info_length += iree_hal_debug_calculate_export_info_size(
+          iree_hal_metal_PipelineDef_debug_info_get(pipeline_def));
+    }
+  });
+
+  // Allocate storage for the executable and its associated data structures.
+  iree_hal_metal_executable_t* executable = NULL;
+  iree_host_size_t total_size = sizeof(*executable) +
+                                pipeline_count * sizeof(executable->pipelines[0]) +
+                                total_debug_info_length;
+  IREE_RETURN_AND_END_ZONE_IF_ERROR(
+      z0, iree_allocator_malloc(host_allocator, total_size, (void**)&executable));
+  iree_hal_resource_initialize(&iree_hal_metal_executable_vtable, &executable->resource);
+  executable->host_allocator = host_allocator;
+  executable->pipeline_count = pipeline_count;
+  IREE_TRACE(
+      iree_hal_debug_export_info_t* export_infos =
+          (iree_hal_debug_export_info_t*)((uint8_t*)executable->pipelines +
+                                          pipeline_count * sizeof(executable->pipelines[0])));
+
+  // Publish any embedded source files to the tracing infrastructure.
+  iree_hal_debug_publish_source_files(
+      iree_hal_metal_ExecutableDef_source_files_get(executable_def));
+
+  // Load all libraries that may be referenced by the pipelines.
+  iree_hal_metal_LibraryDef_vec_t libraries_vec =
+      iree_hal_metal_ExecutableDef_libraries_get(executable_def);
+  iree_status_t status =
+      iree_hal_metal_load_libraries(device, libraries_vec, &executable->libraries);
+
+  if (iree_status_is_ok(status)) {
+    for (iree_host_size_t i = 0; i < pipeline_count; ++i) {
+      iree_hal_metal_PipelineDef_table_t pipeline_def =
+          iree_hal_metal_PipelineDef_vec_at(pipelines_vec, i);
+
+      uint32_t library_ordinal = iree_hal_metal_PipelineDef_library_ordinal_get(pipeline_def);
+      id<MTLLibrary> library = [executable->libraries objectAtIndex:library_ordinal];  // unretained
+
+      iree_hal_metal_pipeline_t* pipeline = &executable->pipelines[i];
+      status = iree_hal_metal_create_pipeline(device, library, pipeline_def, pipeline);
+      if (!iree_status_is_ok(status)) break;
+
+      IREE_TRACE({
+        iree_hal_debug_copy_export_info(iree_hal_metal_PipelineDef_debug_info_get(pipeline_def),
+                                        &export_infos[i]);
+        pipeline->source_location.func_name = export_infos[i].function_name;
+        pipeline->source_location.file_name = export_infos[i].source_filename;
+        pipeline->source_location.line = export_infos[i].source_line;
+      });
+    }
+  }
+
+  if (iree_status_is_ok(status)) {
+    *out_executable = (iree_hal_executable_t*)executable;
+  } else {
+    iree_hal_executable_destroy((iree_hal_executable_t*)executable);
+  }
+
+  IREE_TRACE_ZONE_END(z0);
+  return status;
+}
+
+static void iree_hal_metal_executable_destroy(iree_hal_executable_t* base_executable) {
+  iree_hal_metal_executable_t* executable = iree_hal_metal_executable_cast(base_executable);
+  IREE_TRACE_ZONE_BEGIN(z0);
+
+  for (iree_host_size_t i = 0; i < executable->pipeline_count; ++i) {
+    iree_hal_metal_pipeline_t* entry_point = &executable->pipelines[i];
+    [entry_point->pipeline_state release];  // -1
+    [entry_point->function release];        // -1
+  }
+
+  [executable->libraries release];  // -1
+
+  iree_allocator_free(executable->host_allocator, executable);
+
+  IREE_TRACE_ZONE_END(z0);
+}
+
+iree_status_t iree_hal_metal_executable_lookup_pipeline(
+    const iree_hal_executable_t* base_executable, uint32_t entry_point,
+    const iree_hal_metal_pipeline_t** out_pipeline) {
+  const iree_hal_metal_executable_t* executable =
+      iree_hal_metal_executable_const_cast(base_executable);
+  if (entry_point >= executable->pipeline_count) {
+    return iree_make_status(IREE_STATUS_OUT_OF_RANGE, "invalid entry point ordinal %u",
+                            entry_point);
+  }
+  *out_pipeline = &executable->pipelines[entry_point];
+  return iree_ok_status();
+}
+
+static const iree_hal_executable_vtable_t iree_hal_metal_executable_vtable = {
+    .destroy = iree_hal_metal_executable_destroy,
+};
diff --git a/runtime/src/iree/hal/drivers/metal/kernel_library.h b/runtime/src/iree/hal/drivers/metal/kernel_library.h
deleted file mode 100644
index aa7c957..0000000
--- a/runtime/src/iree/hal/drivers/metal/kernel_library.h
+++ /dev/null
@@ -1,64 +0,0 @@
-// Copyright 2023 The IREE Authors
-//
-// Licensed under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-
-#ifndef IREE_HAL_DRIVERS_METAL_KERNEL_LIBRARY_H_
-#define IREE_HAL_DRIVERS_METAL_KERNEL_LIBRARY_H_
-
-#import <Metal/Metal.h>
-#include <stdint.h>
-
-#include "iree/base/api.h"
-#include "iree/base/tracing.h"
-#include "iree/hal/api.h"
-
-#ifdef __cplusplus
-extern "C" {
-#endif  // __cplusplus
-
-// Object and launch parameters for a compute kernel.
-typedef struct iree_hal_metal_kernel_params_t {
-  id<MTLLibrary> library;
-  id<MTLFunction> function;
-  id<MTLComputePipelineState> pso;
-  uint32_t threadgroup_size[3];
-  iree_hal_pipeline_layout_t* layout;
-  IREE_TRACE(iree_string_view_t function_name;)
-} iree_hal_metal_kernel_params_t;
-
-// Creates a Metal kernel library as an IREE executable. The Metal library may
-// contain several kernel functions that can be extracted along with the
-// associated block size.
-//
-// Metal represents compute kernels as MTLFunctions. MTLLibrary is just an
-// allocation of MTLFunctions. One creates a MTLComputePipelineState from a
-// MTLFunction and uses the pipeline state for creating compute pipelines.
-// This class bundles all the necessary Metal objects for getting pipeline state
-// objects for a compute kernel.
-//
-// |out_executable| must be released by the caller (see
-// iree_hal_executable_release).
-iree_status_t iree_hal_metal_kernel_library_create(
-    id<MTLDevice> device, const iree_hal_executable_params_t* executable_params,
-    iree_allocator_t host_allocator, iree_hal_executable_t** out_executable);
-
-// Returns the kernel launch parameters for the given |entry_point|.
-iree_status_t iree_hal_metal_kernel_library_entry_point_kernel_params(
-    const iree_hal_executable_t* executable, int32_t entry_point,
-    iree_hal_metal_kernel_params_t* out_params);
-
-// Compiles the given |entry_point| in Metal |source_code| and writes the
-// |out_library|, |out_function|, and compute pipeline |out_pso| accordingly.
-iree_status_t iree_hal_metal_compile_msl_and_create_pipeline_object(
-    iree_string_view_t source_code, iree_string_view_t entry_point,
-    id<MTLDevice> device, MTLCompileOptions* compile_options,
-    id<MTLLibrary>* out_library, id<MTLFunction>* out_function,
-    id<MTLComputePipelineState>* out_pso);
-
-#ifdef __cplusplus
-}  // extern "C"
-#endif  // __cplusplus
-
-#endif  // IREE_HAL_DRIVERS_METAL_KERNEL_LIBRARY_H_
diff --git a/runtime/src/iree/hal/drivers/metal/kernel_library.m b/runtime/src/iree/hal/drivers/metal/kernel_library.m
deleted file mode 100644
index f759f5e..0000000
--- a/runtime/src/iree/hal/drivers/metal/kernel_library.m
+++ /dev/null
@@ -1,384 +0,0 @@
-// Copyright 2023 The IREE Authors
-//
-// Licensed under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-
-#include "iree/hal/drivers/metal/kernel_library.h"
-
-#include <stddef.h>
-
-#include "iree/base/api.h"
-
-// flatcc schemas:
-#include "iree/base/internal/flatcc/parsing.h"
-#include "iree/schemas/metal_executable_def_reader.h"
-#include "iree/schemas/metal_executable_def_verifier.h"
-
-typedef struct iree_hal_metal_kernel_library_t {
-  // Abstract resource used for injecting reference counting and vtable; must be at offset 0.
-  iree_hal_resource_t resource;
-
-  iree_allocator_t host_allocator;
-
-  iree_host_size_t entry_point_count;
-  iree_hal_metal_kernel_params_t entry_points[];
-} iree_hal_metal_kernel_library_t;
-
-static const iree_hal_executable_vtable_t iree_hal_metal_kernel_library_vtable;
-
-static iree_hal_metal_kernel_library_t* iree_hal_metal_kernel_library_cast(
-    iree_hal_executable_t* base_value) {
-  IREE_HAL_ASSERT_TYPE(base_value, &iree_hal_metal_kernel_library_vtable);
-  return (iree_hal_metal_kernel_library_t*)base_value;
-}
-
-static const iree_hal_metal_kernel_library_t* iree_hal_metal_kernel_library_const_cast(
-    const iree_hal_executable_t* base_value) {
-  IREE_HAL_ASSERT_TYPE(base_value, &iree_hal_metal_kernel_library_vtable);
-  return (const iree_hal_metal_kernel_library_t*)base_value;
-}
-
-// Verifies the structure of the flatbuffer so that we can avoid doing so during runtime.
-//
-// There are still some conditions we must be aware of (such as omitted names on functions with
-// internal linkage), however we shouldn't need to bounds check anything within the flatbuffer
-// after this succeeds.
-static iree_status_t iree_hal_metal_kernel_library_flatbuffer_verify(
-    iree_const_byte_span_t flatbuffer_data) {
-  if (!flatbuffer_data.data || flatbuffer_data.data_length < 16) {
-    return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
-                            "flatbuffer data is not present or less than 16 bytes (%zu total)",
-                            flatbuffer_data.data_length);
-  }
-
-  // Run flatcc generated verification. This ensures all pointers are in-bounds and that we can
-  // safely walk the file, but not that the actual contents of the flatbuffer meet our expectations.
-  int verify_ret = iree_hal_metal_ExecutableDef_verify_as_root(flatbuffer_data.data,
-                                                               flatbuffer_data.data_length);
-  if (verify_ret != flatcc_verify_ok) {
-    return iree_make_status(IREE_STATUS_INVALID_ARGUMENT, "flatbuffer verification failed: %s",
-                            flatcc_verify_error_string(verify_ret));
-  }
-
-  iree_hal_metal_ExecutableDef_table_t executable_def =
-      iree_hal_metal_ExecutableDef_as_root(flatbuffer_data.data);
-
-  flatbuffers_string_vec_t entry_points_vec =
-      iree_hal_metal_ExecutableDef_entry_points_get(executable_def);
-  size_t entry_point_count = flatbuffers_string_vec_len(entry_points_vec);
-  for (size_t i = 0; i < entry_point_count; ++i) {
-    if (!flatbuffers_string_len(flatbuffers_string_vec_at(entry_points_vec, i))) {
-      return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
-                              "executable entry point %zu has no name", i);
-    }
-  }
-
-  iree_hal_metal_ThreadgroupSize_vec_t threadgroup_sizes_vec =
-      iree_hal_metal_ExecutableDef_threadgroup_sizes(executable_def);
-  size_t threadgroup_size_count = iree_hal_metal_ThreadgroupSize_vec_len(threadgroup_sizes_vec);
-  if (!threadgroup_size_count) {
-    return iree_make_status(IREE_STATUS_INVALID_ARGUMENT, "no threadgroup sizes present");
-  }
-
-  if (entry_point_count != threadgroup_size_count) {
-    return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
-                            "entry points (%zu) and thread group sizes (%zu) count mismatch",
-                            entry_point_count, threadgroup_size_count);
-  }
-
-  flatbuffers_string_vec_t shader_libraries_vec =
-      iree_hal_metal_ExecutableDef_shader_libraries_get(executable_def);
-  size_t shader_library_count = flatbuffers_string_vec_len(shader_libraries_vec);
-  for (size_t i = 0; i < shader_library_count; ++i) {
-    if (!flatbuffers_string_len(flatbuffers_string_vec_at(shader_libraries_vec, i))) {
-      return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
-                              "executable shader library %zu is empty", i);
-    }
-  }
-  if (shader_library_count != 0 && entry_point_count != shader_library_count) {
-    return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
-                            "entry points (%zu) and source libraries (%zu) count mismatch",
-                            entry_point_count, shader_library_count);
-  }
-
-  flatbuffers_string_vec_t shader_sources_vec =
-      iree_hal_metal_ExecutableDef_shader_sources_get(executable_def);
-  size_t shader_source_count = flatbuffers_string_vec_len(shader_sources_vec);
-  for (size_t i = 0; i < shader_source_count; ++i) {
-    if (!flatbuffers_string_len(flatbuffers_string_vec_at(shader_sources_vec, i))) {
-      return iree_make_status(IREE_STATUS_INVALID_ARGUMENT, "executable shader source %zu is empty",
-                              i);
-    }
-  }
-
-  if (shader_source_count != 0 && entry_point_count != shader_source_count) {
-    return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
-                            "entry points (%zu) and source strings (%zu) count mismatch",
-                            entry_point_count, shader_source_count);
-  }
-
-  if (!shader_library_count && !shader_source_count) {
-    return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
-                            "missing shader library or source strings");
-  }
-
-  return iree_ok_status();
-}
-
-// Returns an invalid argument status with proper Metal NSError annotations during compute pipeline
-// creation.
-static iree_status_t iree_hal_metal_get_invalid_kernel_status(const char* iree_error_template,
-                                                              const char* metal_error_template,
-                                                              NSError* ns_error,
-                                                              iree_string_view_t entry_point,
-                                                              const char* shader_source) {
-  iree_status_t status = iree_make_status(IREE_STATUS_INVALID_ARGUMENT, iree_error_template);
-  const char* ns_c_error = [ns_error.localizedDescription
-      cStringUsingEncoding:[NSString defaultCStringEncoding]];  // autoreleased
-  status = iree_status_annotate_f(status, metal_error_template, ns_c_error);
-  if (shader_source) {
-    return iree_status_annotate_f(status, "for entry point '%.*s' in MSL source:\n%s\n",
-                                  (int)entry_point.size, entry_point.data, shader_source);
-  }
-  return iree_status_annotate_f(status, "for entry point '%.*s' in MTLLibrary\n",
-                                (int)entry_point.size, entry_point.data);
-}
-
-// Compiles the given |entry_point| in the MSL |source_code| into MTLLibrary and writes to
-// |out_library|. The caller should release |out_library| after done.
-iree_status_t iree_hal_metal_compile_msl(iree_string_view_t source_code,
-                                         iree_string_view_t entry_point, id<MTLDevice> device,
-                                         MTLCompileOptions* compile_options,
-                                         id<MTLLibrary>* out_library) {
-  @autoreleasepool {
-    NSError* error = nil;
-    NSString* shader_source =
-        [[[NSString alloc] initWithBytes:source_code.data
-                                  length:source_code.size
-                                encoding:[NSString defaultCStringEncoding]] autorelease];
-    *out_library = [device newLibraryWithSource:shader_source
-                                        options:compile_options
-                                          error:&error];  // +1
-    if (IREE_UNLIKELY(*out_library == nil)) {
-      return iree_hal_metal_get_invalid_kernel_status(
-          "failed to create MTLLibrary from shader source",
-          "when creating MTLLibrary with NSError: %.*s", error, entry_point, source_code.data);
-    }
-  }
-
-  return iree_ok_status();
-}
-
-// Compiles the given |entry_point| in the MSL library |source_data| into MTLLibrary and writes to
-// |out_library|. The caller should release |out_library| after done.
-static iree_status_t iree_hal_metal_load_mtllib(iree_const_byte_span_t source_data,
-                                                iree_string_view_t entry_point,
-                                                id<MTLDevice> device, id<MTLLibrary>* out_library) {
-  @autoreleasepool {
-    NSError* error = nil;
-    dispatch_data_t data = dispatch_data_create(source_data.data, source_data.data_length,
-                                                /*queue=*/NULL, DISPATCH_DATA_DESTRUCTOR_DEFAULT);
-    *out_library = [device newLibraryWithData:data error:&error];  // +1
-    if (IREE_UNLIKELY(*out_library == nil)) {
-      return iree_hal_metal_get_invalid_kernel_status(
-          "failed to create MTLLibrary from shader source",
-          "when creating MTLLibrary with NSError: %s", error, entry_point, NULL);
-    }
-  }
-
-  return iree_ok_status();
-}
-
-// Creates MTL compute pipeline objects for the given |entry_point| in |library| and writes to
-// |out_function| and |out_pso|. The caller should release |out_function| and |out_pso| after done.
-static iree_status_t iree_hal_metal_create_pipline_object(
-    id<MTLLibrary> library, iree_string_view_t entry_point, const char* source_code,
-    id<MTLDevice> device, id<MTLFunction>* out_function, id<MTLComputePipelineState>* out_pso) {
-  @autoreleasepool {
-    NSError* error = nil;
-    NSString* function_name =
-        [[[NSString alloc] initWithBytes:entry_point.data
-                                  length:entry_point.size
-                                encoding:[NSString defaultCStringEncoding]] autorelease];
-    *out_function = [library newFunctionWithName:function_name];  // +1
-    if (IREE_UNLIKELY(*out_function == nil)) {
-      return iree_hal_metal_get_invalid_kernel_status("cannot find entry point in shader source",
-                                                      "when creating MTLFunction with NSError: %s",
-                                                      error, entry_point, source_code);
-    }
-
-    // TODO(#14047): Enable async pipeline creation at runtime.
-    *out_pso = [device newComputePipelineStateWithFunction:*out_function error:&error];  // +1
-    if (IREE_UNLIKELY(*out_pso == nil)) {
-      [*out_function release];
-      return iree_hal_metal_get_invalid_kernel_status(
-          "invalid shader source", "when creating MTLComputePipelineState with NSError: %s", error,
-          entry_point, source_code);
-    }
-  }
-  return iree_ok_status();
-}
-
-iree_status_t iree_hal_metal_compile_msl_and_create_pipeline_object(
-    iree_string_view_t source_code, iree_string_view_t entry_point, id<MTLDevice> device,
-    MTLCompileOptions* compile_options, id<MTLLibrary>* out_library, id<MTLFunction>* out_function,
-    id<MTLComputePipelineState>* out_pso) {
-  IREE_RETURN_IF_ERROR(
-      iree_hal_metal_compile_msl(source_code, entry_point, device, compile_options, out_library));
-  return iree_hal_metal_create_pipline_object(*out_library, entry_point, source_code.data, device,
-                                              out_function, out_pso);
-}
-
-iree_status_t iree_hal_metal_kernel_library_create(
-    id<MTLDevice> device, const iree_hal_executable_params_t* executable_params,
-    iree_allocator_t host_allocator, iree_hal_executable_t** out_executable) {
-  IREE_ASSERT_ARGUMENT(executable_params);
-  IREE_ASSERT_ARGUMENT(out_executable);
-  *out_executable = NULL;
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  iree_hal_metal_kernel_library_t* executable = NULL;
-
-  IREE_RETURN_IF_ERROR(
-      iree_hal_metal_kernel_library_flatbuffer_verify(executable_params->executable_data));
-
-  iree_hal_metal_ExecutableDef_table_t executable_def =
-      iree_hal_metal_ExecutableDef_as_root(executable_params->executable_data.data);
-
-  flatbuffers_string_vec_t entry_points_vec =
-      iree_hal_metal_ExecutableDef_entry_points_get(executable_def);
-  iree_hal_metal_ThreadgroupSize_vec_t threadgroup_sizes_vec =
-      iree_hal_metal_ExecutableDef_threadgroup_sizes(executable_def);
-  flatbuffers_string_vec_t shader_libraries_vec =
-      iree_hal_metal_ExecutableDef_shader_libraries_get(executable_def);
-  flatbuffers_string_vec_t shader_sources_vec =
-      iree_hal_metal_ExecutableDef_shader_sources_get(executable_def);
-  iree_host_size_t entry_point_count = flatbuffers_string_vec_len(entry_points_vec);
-
-  // Calculate the total number of characters across all entry point names. This is only required
-  // when tracing so that we can store copies of the names as the flatbuffer storing the strings
-  // may be released while the executable is still live.
-  iree_host_size_t total_entry_point_name_chars = 0;
-  IREE_TRACE({
-    for (iree_host_size_t i = 0; i < entry_point_count; i++) {
-      const char* entry_name = flatbuffers_string_vec_at(entry_points_vec, i);
-      total_entry_point_name_chars += flatbuffers_string_len(entry_name);
-    }
-  });
-
-  // Create the kernel library.
-  iree_host_size_t total_size = sizeof(*executable) +
-                                entry_point_count * sizeof(executable->entry_points[0]) +
-                                total_entry_point_name_chars;
-  iree_status_t status = iree_allocator_malloc(host_allocator, total_size, (void**)&executable);
-  IREE_TRACE(char* string_table_buffer =
-                 (char*)((char*)executable + sizeof(*executable) +
-                         entry_point_count * sizeof(executable->entry_points[0])));
-  if (iree_status_is_ok(status)) {
-    iree_hal_resource_initialize(&iree_hal_metal_kernel_library_vtable, &executable->resource);
-    executable->host_allocator = host_allocator;
-    executable->entry_point_count = entry_point_count;
-
-    size_t shader_library_count = flatbuffers_string_vec_len(shader_libraries_vec);
-    size_t shader_source_count = flatbuffers_string_vec_len(shader_sources_vec);
-
-    // Try to load as Metal library first. Otherwise, compile each MSL source string into a
-    // MTLLibrary and get the MTLFunction for the entry point to build the pipeline state object.
-    // TODO(#14047): Enable async MSL compilation at runtime.
-
-    MTLCompileOptions* compile_options = [MTLCompileOptions new];  // +1
-    compile_options.languageVersion = MTLLanguageVersion3_0;
-
-    for (size_t i = 0, e = iree_max(shader_library_count, shader_source_count); i < e; ++i) {
-      id<MTLLibrary> library = nil;
-      id<MTLFunction> function = nil;
-      id<MTLComputePipelineState> pso = nil;
-
-      flatbuffers_string_t source_code = NULL;
-      flatbuffers_string_t entry_point = flatbuffers_string_vec_at(entry_points_vec, i);
-      iree_string_view_t entry_point_view =
-          iree_make_string_view(entry_point, flatbuffers_string_len(entry_point));
-
-      if (shader_library_count != 0) {
-        flatbuffers_string_t source_library = flatbuffers_string_vec_at(shader_libraries_vec, i);
-        status = iree_hal_metal_load_mtllib(
-            iree_make_const_byte_span(source_library, flatbuffers_string_len(source_library)),
-            entry_point_view, device, &library);
-      } else {
-        source_code = flatbuffers_string_vec_at(shader_sources_vec, i);
-        status = iree_hal_metal_compile_msl(
-            iree_make_string_view(source_code, flatbuffers_string_len(source_code)),
-            entry_point_view, device, compile_options, &library);
-      }
-      if (!iree_status_is_ok(status)) break;
-
-      status = iree_hal_metal_create_pipline_object(library, entry_point_view, source_code, device,
-                                                    &function, &pso);
-      if (!iree_status_is_ok(status)) break;
-
-      // Package required parameters for kernel launches for each entry point.
-      iree_hal_metal_kernel_params_t* params = &executable->entry_points[i];
-      params->library = library;
-      params->function = function;
-      params->pso = pso;
-      params->threadgroup_size[0] = threadgroup_sizes_vec[i].x;
-      params->threadgroup_size[1] = threadgroup_sizes_vec[i].y;
-      params->threadgroup_size[2] = threadgroup_sizes_vec[i].z;
-      params->layout = executable_params->pipeline_layouts[i];
-      iree_hal_pipeline_layout_retain(params->layout);
-
-      // Stash the entry point name in the string table for use when tracing.
-      IREE_TRACE({
-        iree_host_size_t entry_name_length = flatbuffers_string_len(entry_point);
-        memcpy(string_table_buffer, entry_point, entry_name_length);
-        params->function_name = iree_make_string_view(string_table_buffer, entry_name_length);
-        string_table_buffer += entry_name_length;
-      });
-    }
-
-    [compile_options release];  // -1
-  }
-
-  if (iree_status_is_ok(status)) {
-    *out_executable = (iree_hal_executable_t*)executable;
-  } else {
-    iree_hal_executable_destroy((iree_hal_executable_t*)executable);
-  }
-
-  IREE_TRACE_ZONE_END(z0);
-  return status;
-}
-
-static void iree_hal_metal_kernel_library_destroy(iree_hal_executable_t* base_executable) {
-  iree_hal_metal_kernel_library_t* executable = iree_hal_metal_kernel_library_cast(base_executable);
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  for (iree_host_size_t i = 0; i < executable->entry_point_count; ++i) {
-    iree_hal_metal_kernel_params_t* entry_point = &executable->entry_points[i];
-    [entry_point->pso release];       // -1
-    [entry_point->function release];  // -1
-    [entry_point->library release];   // -1
-    iree_hal_pipeline_layout_release(entry_point->layout);
-  }
-  iree_allocator_free(executable->host_allocator, executable);
-
-  IREE_TRACE_ZONE_END(z0);
-}
-
-iree_status_t iree_hal_metal_kernel_library_entry_point_kernel_params(
-    const iree_hal_executable_t* base_executable, int32_t entry_point,
-    iree_hal_metal_kernel_params_t* out_params) {
-  const iree_hal_metal_kernel_library_t* executable =
-      iree_hal_metal_kernel_library_const_cast(base_executable);
-  if (entry_point >= executable->entry_point_count) {
-    return iree_make_status(IREE_STATUS_OUT_OF_RANGE, "invalid entry point ordinal %d",
-                            entry_point);
-  }
-  memcpy(out_params, &executable->entry_points[entry_point], sizeof(*out_params));
-  return iree_ok_status();
-}
-
-static const iree_hal_executable_vtable_t iree_hal_metal_kernel_library_vtable = {
-    .destroy = iree_hal_metal_kernel_library_destroy,
-};
diff --git a/runtime/src/iree/hal/drivers/metal/metal_buffer.h b/runtime/src/iree/hal/drivers/metal/metal_buffer.h
index 2325b74..1a30fd3 100644
--- a/runtime/src/iree/hal/drivers/metal/metal_buffer.h
+++ b/runtime/src/iree/hal/drivers/metal/metal_buffer.h
@@ -30,6 +30,10 @@
     iree_hal_buffer_release_callback_t release_callback,
     iree_hal_buffer_t** out_buffer);
 
+// Returns true if the buffer was wrapped from an external handle instead of
+// allocated by the HAL allocator.
+bool iree_hal_metal_buffer_is_external(const iree_hal_buffer_t* buffer);
+
 // Returns the underlying Metal buffer handle for the given |buffer|.
 id<MTLBuffer> iree_hal_metal_buffer_handle(const iree_hal_buffer_t* buffer);
 
diff --git a/runtime/src/iree/hal/drivers/metal/metal_buffer.m b/runtime/src/iree/hal/drivers/metal/metal_buffer.m
index 80df57c..2f22c84 100644
--- a/runtime/src/iree/hal/drivers/metal/metal_buffer.m
+++ b/runtime/src/iree/hal/drivers/metal/metal_buffer.m
@@ -85,6 +85,11 @@
   IREE_TRACE_ZONE_END(z0);
 }
 
+bool iree_hal_metal_buffer_is_external(const iree_hal_buffer_t* base_buffer) {
+  const iree_hal_metal_buffer_t* buffer = iree_hal_metal_buffer_const_cast(base_buffer);
+  return buffer->release_callback.fn != NULL;
+}
+
 id<MTLBuffer> iree_hal_metal_buffer_handle(const iree_hal_buffer_t* base_buffer) {
   const iree_hal_metal_buffer_t* buffer = iree_hal_metal_buffer_const_cast(base_buffer);
   return buffer->buffer;
diff --git a/runtime/src/iree/hal/drivers/metal/metal_device.m b/runtime/src/iree/hal/drivers/metal/metal_device.m
index 45775d7..72e09d8 100644
--- a/runtime/src/iree/hal/drivers/metal/metal_device.m
+++ b/runtime/src/iree/hal/drivers/metal/metal_device.m
@@ -14,7 +14,6 @@
 #include "iree/hal/drivers/metal/direct_allocator.h"
 #include "iree/hal/drivers/metal/direct_command_buffer.h"
 #include "iree/hal/drivers/metal/nop_executable_cache.h"
-#include "iree/hal/drivers/metal/pipeline_layout.h"
 #include "iree/hal/drivers/metal/shared_event.h"
 #include "iree/hal/drivers/metal/staging_buffer.h"
 #include "iree/hal/utils/deferred_command_buffer.h"
@@ -268,15 +267,6 @@
       out_command_buffer);
 }
 
-static iree_status_t iree_hal_metal_device_create_descriptor_set_layout(
-    iree_hal_device_t* base_device, iree_hal_descriptor_set_layout_flags_t flags,
-    iree_host_size_t binding_count, const iree_hal_descriptor_set_layout_binding_t* bindings,
-    iree_hal_descriptor_set_layout_t** out_descriptor_set_layout) {
-  iree_hal_metal_device_t* device = iree_hal_metal_device_cast(base_device);
-  return iree_hal_metal_descriptor_set_layout_create(
-      flags, binding_count, bindings, device->host_allocator, out_descriptor_set_layout);
-}
-
 static iree_status_t iree_hal_metal_device_create_event(iree_hal_device_t* base_device,
                                                         iree_hal_queue_affinity_t queue_affinity,
                                                         iree_hal_event_flags_t flags,
@@ -307,15 +297,6 @@
                                    iree_hal_device_host_allocator(base_device), out_file);
 }
 
-static iree_status_t iree_hal_metal_device_create_pipeline_layout(
-    iree_hal_device_t* base_device, iree_host_size_t push_constants,
-    iree_host_size_t set_layout_count, iree_hal_descriptor_set_layout_t* const* set_layouts,
-    iree_hal_pipeline_layout_t** out_pipeline_layout) {
-  iree_hal_metal_device_t* device = iree_hal_metal_device_cast(base_device);
-  return iree_hal_metal_pipeline_layout_create(set_layout_count, set_layouts, push_constants,
-                                               device->host_allocator, out_pipeline_layout);
-}
-
 static iree_status_t iree_hal_metal_device_create_semaphore(iree_hal_device_t* base_device,
                                                             uint64_t initial_value,
                                                             iree_hal_semaphore_flags_t flags,
@@ -475,11 +456,13 @@
         }
       } else {
         // Retain the command buffer until the submission has completed.
+        iree_hal_command_buffer_retain(command_buffer);
         direct_command_buffer = command_buffer;
       }
       if (!iree_status_is_ok(status)) break;
       status = iree_hal_resource_set_insert(resource_set, 1, &direct_command_buffer);
       if (!iree_status_is_ok(status)) break;
+      iree_hal_command_buffer_release(direct_command_buffer);  // retained in resource set
       direct_command_buffers[i] = direct_command_buffer;
     }
   }
@@ -637,11 +620,9 @@
     .query_i64 = iree_hal_metal_device_query_i64,
     .create_channel = iree_hal_metal_device_create_channel,
     .create_command_buffer = iree_hal_metal_device_create_command_buffer,
-    .create_descriptor_set_layout = iree_hal_metal_device_create_descriptor_set_layout,
     .create_event = iree_hal_metal_device_create_event,
     .create_executable_cache = iree_hal_metal_device_create_executable_cache,
     .import_file = iree_hal_metal_device_import_file,
-    .create_pipeline_layout = iree_hal_metal_device_create_pipeline_layout,
     .create_semaphore = iree_hal_metal_device_create_semaphore,
     .query_semaphore_compatibility = iree_hal_metal_device_query_semaphore_compatibility,
     .queue_alloca = iree_hal_metal_device_queue_alloca,
diff --git a/runtime/src/iree/hal/drivers/metal/nop_executable_cache.m b/runtime/src/iree/hal/drivers/metal/nop_executable_cache.m
index 347ce7d..f57be28 100644
--- a/runtime/src/iree/hal/drivers/metal/nop_executable_cache.m
+++ b/runtime/src/iree/hal/drivers/metal/nop_executable_cache.m
@@ -11,7 +11,7 @@
 
 #include "iree/base/api.h"
 #include "iree/base/tracing.h"
-#include "iree/hal/drivers/metal/kernel_library.h"
+#include "iree/hal/drivers/metal/executable.h"
 
 typedef struct iree_hal_metal_nop_executable_cache_t {
   // Abstract resource used for injecting reference counting and vtable; must be at offset 0.
@@ -75,8 +75,8 @@
     const iree_hal_executable_params_t* executable_params, iree_hal_executable_t** out_executable) {
   iree_hal_metal_nop_executable_cache_t* executable_cache =
       iree_hal_metal_nop_executable_cache_cast(base_executable_cache);
-  return iree_hal_metal_kernel_library_create(executable_cache->device, executable_params,
-                                              executable_cache->host_allocator, out_executable);
+  return iree_hal_metal_executable_create(executable_cache->device, executable_params,
+                                          executable_cache->host_allocator, out_executable);
 }
 
 static const iree_hal_executable_cache_vtable_t iree_hal_metal_nop_executable_cache_vtable = {
diff --git a/runtime/src/iree/hal/drivers/metal/pipeline_layout.h b/runtime/src/iree/hal/drivers/metal/pipeline_layout.h
deleted file mode 100644
index b97225b..0000000
--- a/runtime/src/iree/hal/drivers/metal/pipeline_layout.h
+++ /dev/null
@@ -1,106 +0,0 @@
-// Copyright 2023 The IREE Authors
-//
-// Licensed under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-
-#ifndef IREE_HAL_DRIVERS_METAL_PIPELINE_LAYOUT_H_
-#define IREE_HAL_DRIVERS_METAL_PIPELINE_LAYOUT_H_
-
-#include "iree/base/api.h"
-#include "iree/hal/api.h"
-
-#ifdef __cplusplus
-extern "C" {
-#endif  // __cplusplus
-
-//===----------------------------------------------------------------------===//
-// Limitations
-//===----------------------------------------------------------------------===//
-
-// The max number of bindings per descriptor set allowed in the Metal HAL
-// implementation.
-//
-// Note that Metal itself is more permissive:
-// - Argument buffer tier 1 binding limits:
-//   - iOS: 31 buffers (on A11 and later, 96 buffers)
-//   - macOS: 64 buffers
-// - Argument buffer tier 2 binding limits:
-//   - 500,000 buffers or textures
-#define IREE_HAL_METAL_MAX_DESCRIPTOR_SET_BINDING_COUNT 16
-
-// The max number of descriptor sets allowed in the Metal HAL implementation.
-//
-// This depends on the general descriptor set planning in IREE and should adjust
-// with it.
-#define IREE_HAL_METAL_MAX_DESCRIPTOR_SET_COUNT 4
-
-// The [[buffer(N)]] index for push constants.
-//
-// This depends on the general descriptor set planning in IREE and should adjust
-// with it. Note that it also needs to be consistent with the compiler side when
-// setting up resource location attributes during cross compiling SPIR-V to MSL.
-#define IREE_HAL_METAL_PUSH_CONSTANT_BUFFER_INDEX \
-  (IREE_HAL_METAL_MAX_DESCRIPTOR_SET_COUNT - 1)
-
-// The max number of push constants supported by the Metal HAL implementation.
-#define IREE_HAL_METAL_MAX_PUSH_CONSTANT_COUNT 64
-
-//===----------------------------------------------------------------------===//
-// iree_hal_metal_descriptor_set_layout_t
-//===----------------------------------------------------------------------===//
-
-// Creates a descriptor set layout for the given |bindings|.
-//
-// |out_descriptor_set_layout| must be released by the caller (see
-// iree_hal_descriptor_set_layout_release).
-iree_status_t iree_hal_metal_descriptor_set_layout_create(
-    iree_hal_descriptor_set_layout_flags_t flags,
-    iree_host_size_t binding_count,
-    const iree_hal_descriptor_set_layout_binding_t* bindings,
-    iree_allocator_t host_allocator,
-    iree_hal_descriptor_set_layout_t** out_descriptor_set_layout);
-
-// Returns the total number of bindings in the given descriptor set.
-iree_host_size_t iree_hal_metal_descriptor_set_layout_binding_count(
-    const iree_hal_descriptor_set_layout_t* descriptor_set_layout);
-
-// Returns the information about a given |binding| in |descriptor_set_layout|.
-const iree_hal_descriptor_set_layout_binding_t*
-iree_hal_metal_descriptor_set_layout_binding(
-    const iree_hal_descriptor_set_layout_t* descriptor_set_layout,
-    uint32_t binding);
-
-//===----------------------------------------------------------------------===//
-// iree_hal_metal_pipeline_layout_t
-//===----------------------------------------------------------------------===//
-
-// Creates a pipeline layout with the given |set_layouts| and
-// |push_constant_count|.
-//
-// |out_pipeline_layout| must be released by the caller (see
-// iree_hal_pipeline_layout_release).
-iree_status_t iree_hal_metal_pipeline_layout_create(
-    iree_host_size_t set_layout_count,
-    iree_hal_descriptor_set_layout_t* const* set_layouts,
-    iree_host_size_t push_constant_count, iree_allocator_t host_allocator,
-    iree_hal_pipeline_layout_t** out_pipeline_layout);
-
-// Returns the descriptor set layout of the given |set| in |pipeline_layout|.
-const iree_hal_descriptor_set_layout_t*
-iree_hal_metal_pipeline_layout_descriptor_set_layout(
-    const iree_hal_pipeline_layout_t* pipeline_layout, uint32_t set);
-
-// Returns the descriptor set count in the given |pipeline_layout|.
-iree_host_size_t iree_hal_metal_pipeline_layout_descriptor_set_count(
-    const iree_hal_pipeline_layout_t* pipeline_layout);
-
-// Returns the push constant count in the given |pipeline_layout|.
-iree_host_size_t iree_hal_metal_pipeline_layout_push_constant_count(
-    const iree_hal_pipeline_layout_t* pipeline_layout);
-
-#ifdef __cplusplus
-}  // extern "C"
-#endif  // __cplusplus
-
-#endif  // IREE_HAL_DRIVERS_METAL_PIPELINE_LAYOUT_H_
diff --git a/runtime/src/iree/hal/drivers/metal/pipeline_layout.m b/runtime/src/iree/hal/drivers/metal/pipeline_layout.m
deleted file mode 100644
index b7899bb..0000000
--- a/runtime/src/iree/hal/drivers/metal/pipeline_layout.m
+++ /dev/null
@@ -1,201 +0,0 @@
-// Copyright 2023 The IREE Authors
-//
-// Licensed under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-
-#include "iree/hal/drivers/metal/pipeline_layout.h"
-
-#include <stddef.h>
-
-#include "iree/base/api.h"
-#include "iree/base/tracing.h"
-
-//===------------------------------------------------------------------------------------------===//
-// iree_hal_metal_descriptor_set_layout_t
-//===------------------------------------------------------------------------------------------===//
-
-typedef struct iree_hal_metal_descriptor_set_layout_t {
-  // Abstract resource used for injecting reference counting and vtable; must be at offset 0.
-  iree_hal_resource_t resource;
-
-  iree_allocator_t host_allocator;
-
-  iree_host_size_t binding_count;
-  iree_hal_descriptor_set_layout_binding_t bindings[];
-} iree_hal_metal_descriptor_set_layout_t;
-
-static const iree_hal_descriptor_set_layout_vtable_t iree_hal_metal_descriptor_set_layout_vtable;
-
-static iree_hal_metal_descriptor_set_layout_t* iree_hal_metal_descriptor_set_layout_cast(
-    iree_hal_descriptor_set_layout_t* base_value) {
-  IREE_HAL_ASSERT_TYPE(base_value, &iree_hal_metal_descriptor_set_layout_vtable);
-  return (iree_hal_metal_descriptor_set_layout_t*)base_value;
-}
-
-static const iree_hal_metal_descriptor_set_layout_t*
-iree_hal_metal_descriptor_set_layout_const_cast(
-    const iree_hal_descriptor_set_layout_t* base_value) {
-  IREE_HAL_ASSERT_TYPE(base_value, &iree_hal_metal_descriptor_set_layout_vtable);
-  return (const iree_hal_metal_descriptor_set_layout_t*)base_value;
-}
-
-iree_status_t iree_hal_metal_descriptor_set_layout_create(
-    iree_hal_descriptor_set_layout_flags_t flags, iree_host_size_t binding_count,
-    const iree_hal_descriptor_set_layout_binding_t* bindings, iree_allocator_t host_allocator,
-    iree_hal_descriptor_set_layout_t** out_descriptor_set_layout) {
-  IREE_ASSERT_ARGUMENT(!binding_count || bindings);
-  IREE_ASSERT_ARGUMENT(out_descriptor_set_layout);
-  *out_descriptor_set_layout = NULL;
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  iree_hal_metal_descriptor_set_layout_t* descriptor_set_layout = NULL;
-  iree_host_size_t bindings_size = binding_count * sizeof(descriptor_set_layout->bindings[0]);
-  iree_status_t status =
-      iree_allocator_malloc(host_allocator, sizeof(*descriptor_set_layout) + bindings_size,
-                            (void**)&descriptor_set_layout);
-  if (iree_status_is_ok(status)) {
-    iree_hal_resource_initialize(&iree_hal_metal_descriptor_set_layout_vtable,
-                                 &descriptor_set_layout->resource);
-    descriptor_set_layout->host_allocator = host_allocator;
-    descriptor_set_layout->binding_count = binding_count;
-    memcpy(descriptor_set_layout->bindings, bindings, bindings_size);
-    *out_descriptor_set_layout = (iree_hal_descriptor_set_layout_t*)descriptor_set_layout;
-  }
-  IREE_TRACE_ZONE_END(z0);
-  return status;
-}
-
-static void iree_hal_metal_descriptor_set_layout_destroy(
-    iree_hal_descriptor_set_layout_t* base_descriptor_set_layout) {
-  iree_hal_metal_descriptor_set_layout_t* descriptor_set_layout =
-      iree_hal_metal_descriptor_set_layout_cast(base_descriptor_set_layout);
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  iree_allocator_free(descriptor_set_layout->host_allocator, descriptor_set_layout);
-
-  IREE_TRACE_ZONE_END(z0);
-}
-
-const iree_hal_descriptor_set_layout_binding_t* iree_hal_metal_descriptor_set_layout_binding(
-    const iree_hal_descriptor_set_layout_t* base_descriptor_set_layout, uint32_t binding) {
-  const iree_hal_metal_descriptor_set_layout_t* descriptor_set_layout =
-      iree_hal_metal_descriptor_set_layout_const_cast(base_descriptor_set_layout);
-  for (iree_host_size_t i = 0; i < descriptor_set_layout->binding_count; ++i) {
-    if (descriptor_set_layout->bindings[i].binding == binding) {
-      return &descriptor_set_layout->bindings[i];
-    }
-  }
-  return NULL;
-}
-
-iree_host_size_t iree_hal_metal_descriptor_set_layout_binding_count(
-    const iree_hal_descriptor_set_layout_t* base_descriptor_set_layout) {
-  const iree_hal_metal_descriptor_set_layout_t* descriptor_set_layout =
-      iree_hal_metal_descriptor_set_layout_const_cast(base_descriptor_set_layout);
-  return descriptor_set_layout->binding_count;
-}
-
-static const iree_hal_descriptor_set_layout_vtable_t iree_hal_metal_descriptor_set_layout_vtable = {
-    .destroy = iree_hal_metal_descriptor_set_layout_destroy,
-};
-
-//===------------------------------------------------------------------------------------------===//
-// iree_hal_metal_pipeline_layout_t
-//===------------------------------------------------------------------------------------------===//
-
-typedef struct iree_hal_metal_pipeline_layout_t {
-  // Abstract resource used for injecting reference counting and vtable; must be at offset 0.
-  iree_hal_resource_t resource;
-
-  iree_allocator_t host_allocator;
-
-  iree_host_size_t push_constant_count;
-
-  iree_host_size_t set_layout_count;
-  iree_hal_descriptor_set_layout_t* set_layouts[];
-} iree_hal_metal_pipeline_layout_t;
-
-static const iree_hal_pipeline_layout_vtable_t iree_hal_metal_pipeline_layout_vtable;
-
-static iree_hal_metal_pipeline_layout_t* iree_hal_metal_pipeline_layout_cast(
-    iree_hal_pipeline_layout_t* base_value) {
-  IREE_HAL_ASSERT_TYPE(base_value, &iree_hal_metal_pipeline_layout_vtable);
-  return (iree_hal_metal_pipeline_layout_t*)base_value;
-}
-
-static const iree_hal_metal_pipeline_layout_t* iree_hal_metal_pipeline_layout_const_cast(
-    const iree_hal_pipeline_layout_t* base_value) {
-  IREE_HAL_ASSERT_TYPE(base_value, &iree_hal_metal_pipeline_layout_vtable);
-  return (const iree_hal_metal_pipeline_layout_t*)base_value;
-}
-
-iree_status_t iree_hal_metal_pipeline_layout_create(
-    iree_host_size_t set_layout_count, iree_hal_descriptor_set_layout_t* const* set_layouts,
-    iree_host_size_t push_constant_count, iree_allocator_t host_allocator,
-    iree_hal_pipeline_layout_t** out_pipeline_layout) {
-  IREE_ASSERT_ARGUMENT(!set_layout_count || set_layouts);
-  IREE_ASSERT_ARGUMENT(out_pipeline_layout);
-  *out_pipeline_layout = NULL;
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  iree_hal_metal_pipeline_layout_t* pipeline_layout = NULL;
-  iree_host_size_t total_size =
-      sizeof(*pipeline_layout) + set_layout_count * sizeof(pipeline_layout->set_layouts[0]);
-  iree_status_t status =
-      iree_allocator_malloc(host_allocator, total_size, (void**)&pipeline_layout);
-  if (iree_status_is_ok(status)) {
-    iree_hal_resource_initialize(&iree_hal_metal_pipeline_layout_vtable,
-                                 &pipeline_layout->resource);
-    pipeline_layout->host_allocator = host_allocator;
-    pipeline_layout->push_constant_count = push_constant_count;
-    pipeline_layout->set_layout_count = set_layout_count;
-    for (iree_host_size_t i = 0; i < set_layout_count; ++i) {
-      pipeline_layout->set_layouts[i] = set_layouts[i];
-      iree_hal_descriptor_set_layout_retain(set_layouts[i]);
-    }
-    *out_pipeline_layout = (iree_hal_pipeline_layout_t*)pipeline_layout;
-  }
-  IREE_TRACE_ZONE_END(z0);
-  return status;
-}
-
-static void iree_hal_metal_pipeline_layout_destroy(
-    iree_hal_pipeline_layout_t* base_pipeline_layout) {
-  iree_hal_metal_pipeline_layout_t* pipeline_layout =
-      iree_hal_metal_pipeline_layout_cast(base_pipeline_layout);
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  for (iree_host_size_t i = 0; i < pipeline_layout->set_layout_count; ++i) {
-    iree_hal_descriptor_set_layout_release(pipeline_layout->set_layouts[i]);
-  }
-  iree_allocator_free(pipeline_layout->host_allocator, pipeline_layout);
-
-  IREE_TRACE_ZONE_END(z0);
-}
-
-const iree_hal_descriptor_set_layout_t* iree_hal_metal_pipeline_layout_descriptor_set_layout(
-    const iree_hal_pipeline_layout_t* base_pipeline_layout, uint32_t set) {
-  const iree_hal_metal_pipeline_layout_t* pipeline_layout =
-      iree_hal_metal_pipeline_layout_const_cast(base_pipeline_layout);
-  if (set < pipeline_layout->set_layout_count) return pipeline_layout->set_layouts[set];
-  return NULL;
-}
-
-iree_host_size_t iree_hal_metal_pipeline_layout_descriptor_set_count(
-    const iree_hal_pipeline_layout_t* base_pipeline_layout) {
-  const iree_hal_metal_pipeline_layout_t* pipeline_layout =
-      iree_hal_metal_pipeline_layout_const_cast(base_pipeline_layout);
-  return pipeline_layout->set_layout_count;
-}
-
-iree_host_size_t iree_hal_metal_pipeline_layout_push_constant_count(
-    const iree_hal_pipeline_layout_t* base_pipeline_layout) {
-  const iree_hal_metal_pipeline_layout_t* pipeline_layout =
-      iree_hal_metal_pipeline_layout_const_cast(base_pipeline_layout);
-  return pipeline_layout->push_constant_count;
-}
-
-static const iree_hal_pipeline_layout_vtable_t iree_hal_metal_pipeline_layout_vtable = {
-    .destroy = iree_hal_metal_pipeline_layout_destroy,
-};
diff --git a/runtime/src/iree/hal/drivers/vulkan/BUILD.bazel b/runtime/src/iree/hal/drivers/vulkan/BUILD.bazel
index ce5b68b..ff0e089 100644
--- a/runtime/src/iree/hal/drivers/vulkan/BUILD.bazel
+++ b/runtime/src/iree/hal/drivers/vulkan/BUILD.bazel
@@ -44,12 +44,12 @@
         "native_event.h",
         "native_executable.cc",
         "native_executable.h",
-        "native_pipeline_layout.cc",
-        "native_pipeline_layout.h",
         "native_semaphore.cc",
         "native_semaphore.h",
         "nop_executable_cache.cc",
         "nop_executable_cache.h",
+        "pipeline_layout.cc",
+        "pipeline_layout.h",
         "sparse_buffer.cc",
         "sparse_buffer.h",
         "status_util.c",
@@ -79,11 +79,13 @@
         "//runtime/src/iree/hal/drivers/vulkan/util:intrusive_list",
         "//runtime/src/iree/hal/drivers/vulkan/util:ref_ptr",
         "//runtime/src/iree/hal/utils:deferred_command_buffer",
+        "//runtime/src/iree/hal/utils:executable_debug_info",
         "//runtime/src/iree/hal/utils:file_transfer",
         "//runtime/src/iree/hal/utils:memory_file",
         "//runtime/src/iree/hal/utils:resource_set",
         "//runtime/src/iree/hal/utils:semaphore_base",
-        "//runtime/src/iree/schemas:spirv_executable_def_c_fbs",
+        "//runtime/src/iree/schemas:executable_debug_info_c_fbs",
+        "//runtime/src/iree/schemas:vulkan_executable_def_c_fbs",
         "@vulkan_headers",
     ],
 )
diff --git a/runtime/src/iree/hal/drivers/vulkan/CMakeLists.txt b/runtime/src/iree/hal/drivers/vulkan/CMakeLists.txt
index 76e376f..0084d00 100644
--- a/runtime/src/iree/hal/drivers/vulkan/CMakeLists.txt
+++ b/runtime/src/iree/hal/drivers/vulkan/CMakeLists.txt
@@ -45,12 +45,12 @@
     "native_event.h"
     "native_executable.cc"
     "native_executable.h"
-    "native_pipeline_layout.cc"
-    "native_pipeline_layout.h"
     "native_semaphore.cc"
     "native_semaphore.h"
     "nop_executable_cache.cc"
     "nop_executable_cache.h"
+    "pipeline_layout.cc"
+    "pipeline_layout.h"
     "sparse_buffer.cc"
     "sparse_buffer.h"
     "status_util.c"
@@ -74,11 +74,13 @@
     iree::hal::drivers::vulkan::util::intrusive_list
     iree::hal::drivers::vulkan::util::ref_ptr
     iree::hal::utils::deferred_command_buffer
+    iree::hal::utils::executable_debug_info
     iree::hal::utils::file_transfer
     iree::hal::utils::memory_file
     iree::hal::utils::resource_set
     iree::hal::utils::semaphore_base
-    iree::schemas::spirv_executable_def_c_fbs
+    iree::schemas::executable_debug_info_c_fbs
+    iree::schemas::vulkan_executable_def_c_fbs
   PUBLIC
 )
 
diff --git a/runtime/src/iree/hal/drivers/vulkan/builtin_executables.cc b/runtime/src/iree/hal/drivers/vulkan/builtin_executables.cc
index 2291731..72b2694 100644
--- a/runtime/src/iree/hal/drivers/vulkan/builtin_executables.cc
+++ b/runtime/src/iree/hal/drivers/vulkan/builtin_executables.cc
@@ -9,7 +9,7 @@
 #include <cstddef>
 
 #include "iree/hal/drivers/vulkan/builtin/builtin_shaders_spv.h"
-#include "iree/hal/drivers/vulkan/native_pipeline_layout.h"
+#include "iree/hal/drivers/vulkan/pipeline_layout.h"
 #include "iree/hal/drivers/vulkan/status_util.h"
 
 namespace iree {
@@ -26,7 +26,7 @@
 } iree_hal_vulkan_builtin_fill_unaligned_constants_t;
 
 static_assert(sizeof(iree_hal_vulkan_builtin_fill_unaligned_constants_t) ==
-                  IREE_HAL_VULKAN_BUILTIN_PUSH_CONSTANT_COUNT,
+                  IREE_HAL_VULKAN_BUILTIN_PUSH_CONSTANTS_SIZE,
               "push constant count must match struct size");
 
 }  // namespace
@@ -41,11 +41,11 @@
   }
 
   if (pipeline_layout_) {
-    iree_hal_pipeline_layout_destroy(pipeline_layout_);
+    iree_hal_vulkan_pipeline_layout_release(pipeline_layout_);
   }
 
   for (size_t i = 0; i < IREE_HAL_VULKAN_BUILTIN_DESCRIPTOR_SET_COUNT; ++i) {
-    iree_hal_descriptor_set_layout_release(descriptor_set_layouts_[i]);
+    iree_hal_vulkan_descriptor_set_layout_release(descriptor_set_layouts_[i]);
   }
 }
 
@@ -56,18 +56,20 @@
   // Even though we're just using one set, we still need to create dummy set
   // layout (without any bindings) for those preceding this set.
   for (size_t i = 0; i < IREE_HAL_VULKAN_BUILTIN_DESCRIPTOR_SET_COUNT; ++i) {
-    iree_hal_descriptor_set_layout_t* layout = NULL;
+    iree_hal_vulkan_descriptor_set_layout_t* layout = NULL;
     if (i == IREE_HAL_VULKAN_BUILTIN_DESCRIPTOR_SET) {
-      iree_hal_descriptor_set_layout_binding_t layout_binding;
+      VkDescriptorSetLayoutBinding layout_binding;
       layout_binding.binding = 0;
-      layout_binding.type = IREE_HAL_DESCRIPTOR_TYPE_STORAGE_BUFFER;
-      layout_binding.flags = IREE_HAL_DESCRIPTOR_FLAG_NONE;
-      IREE_RETURN_IF_ERROR(iree_hal_vulkan_native_descriptor_set_layout_create(
-          logical_device_, IREE_HAL_DESCRIPTOR_SET_LAYOUT_FLAG_NONE,
+      layout_binding.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
+      layout_binding.descriptorCount = 1;
+      layout_binding.stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;
+      layout_binding.pImmutableSamplers = NULL;
+      IREE_RETURN_IF_ERROR(iree_hal_vulkan_descriptor_set_layout_create(
+          logical_device_, /*flags=*/0,
           /*binding_count=*/1, &layout_binding, &layout));
     } else {
-      IREE_RETURN_IF_ERROR(iree_hal_vulkan_native_descriptor_set_layout_create(
-          logical_device_, IREE_HAL_DESCRIPTOR_SET_LAYOUT_FLAG_NONE,
+      IREE_RETURN_IF_ERROR(iree_hal_vulkan_descriptor_set_layout_create(
+          logical_device_, /*flags=*/0,
           /*binding_count=*/0, /*bindings=*/nullptr, &layout));
     }
     descriptor_set_layouts_[i] = layout;
@@ -92,10 +94,14 @@
 
   // Create pipeline layout.
   if (iree_status_is_ok(status)) {
-    status = iree_hal_vulkan_native_pipeline_layout_create(
-        logical_device_, IREE_HAL_VULKAN_BUILTIN_PUSH_CONSTANT_COUNT / 4,
-        IREE_HAL_VULKAN_BUILTIN_DESCRIPTOR_SET_COUNT, descriptor_set_layouts_,
-        &pipeline_layout_);
+    VkPushConstantRange push_constant_ranges[1];
+    push_constant_ranges[0].offset = 0;
+    push_constant_ranges[0].size = IREE_HAL_VULKAN_BUILTIN_PUSH_CONSTANTS_SIZE;
+    push_constant_ranges[0].stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;
+    status = iree_hal_vulkan_pipeline_layout_create(
+        logical_device_, IREE_ARRAYSIZE(push_constant_ranges),
+        push_constant_ranges, IREE_HAL_VULKAN_BUILTIN_DESCRIPTOR_SET_COUNT,
+        descriptor_set_layouts_, &pipeline_layout_);
   }
 
   // Create pipeline.
@@ -105,7 +111,7 @@
     pipeline_create_info.pNext = NULL;
     pipeline_create_info.flags = VK_PIPELINE_CREATE_ALLOW_DERIVATIVES_BIT;
     pipeline_create_info.layout =
-        iree_hal_vulkan_native_pipeline_layout_handle(pipeline_layout_);
+        iree_hal_vulkan_pipeline_layout_handle(pipeline_layout_);
     pipeline_create_info.basePipelineHandle = VK_NULL_HANDLE;
     pipeline_create_info.basePipelineIndex = 0;
     VkPipelineShaderStageCreateInfo* stage_create_info =
@@ -138,7 +144,7 @@
     VkCommandBuffer command_buffer, DescriptorSetArena* descriptor_set_arena,
     iree_hal_buffer_t* target_buffer, iree_device_size_t target_offset,
     iree_device_size_t length, const void* pattern,
-    iree_host_size_t pattern_length, const void* push_constants_to_restore) {
+    iree_host_size_t pattern_length) {
   IREE_TRACE_SCOPE();
 
   iree_hal_vulkan_builtin_fill_unaligned_constants_t constants;
@@ -160,7 +166,7 @@
   }
 
   iree_hal_buffer_ref_t binding;
-  binding.ordinal = 0;
+  binding.reserved = 0;
   binding.buffer = target_buffer;
   binding.offset = 0;
   binding.length = IREE_WHOLE_BUFFER;
@@ -175,8 +181,7 @@
   constants.fill_offset_bytes = target_offset;
   constants.fill_length_bytes = length;
   logical_device_->syms()->vkCmdPushConstants(
-      command_buffer,
-      iree_hal_vulkan_native_pipeline_layout_handle(pipeline_layout_),
+      command_buffer, iree_hal_vulkan_pipeline_layout_handle(pipeline_layout_),
       VK_SHADER_STAGE_COMPUTE_BIT, /*offset=*/0,
       sizeof(iree_hal_vulkan_builtin_fill_unaligned_constants_t), &constants);
 
@@ -186,14 +191,6 @@
 
   logical_device_->syms()->vkCmdDispatch(command_buffer, 1, 1, 1);
 
-  // Restore push constants.
-  logical_device_->syms()->vkCmdPushConstants(
-      command_buffer,
-      iree_hal_vulkan_native_pipeline_layout_handle(pipeline_layout_),
-      VK_SHADER_STAGE_COMPUTE_BIT, /*offset=*/0,
-      sizeof(iree_hal_vulkan_builtin_fill_unaligned_constants_t),
-      push_constants_to_restore);
-
   return iree_ok_status();
 }
 
diff --git a/runtime/src/iree/hal/drivers/vulkan/builtin_executables.h b/runtime/src/iree/hal/drivers/vulkan/builtin_executables.h
index 2850dbf..1dfae3a 100644
--- a/runtime/src/iree/hal/drivers/vulkan/builtin_executables.h
+++ b/runtime/src/iree/hal/drivers/vulkan/builtin_executables.h
@@ -26,7 +26,7 @@
 #define IREE_HAL_VULKAN_BUILTIN_DESCRIPTOR_SET_COUNT 4
 #define IREE_HAL_VULKAN_BUILTIN_DESCRIPTOR_SET 3
 
-#define IREE_HAL_VULKAN_BUILTIN_PUSH_CONSTANT_COUNT 16
+#define IREE_HAL_VULKAN_BUILTIN_PUSH_CONSTANTS_SIZE 16
 
 class BuiltinExecutables {
  public:
@@ -43,22 +43,21 @@
   //
   // This only implements the unaligned edges of fills, vkCmdFillBuffer should
   // be used for the aligned interior (if any).
-  //
-  // |push_constants_to_restore| will be pushed using vkCmdPushConstants over
-  // the bytes used by this call.
-  iree_status_t FillBufferUnaligned(
-      VkCommandBuffer command_buffer, DescriptorSetArena* descriptor_set_arena,
-      iree_hal_buffer_t* target_buffer, iree_device_size_t target_offset,
-      iree_device_size_t length, const void* pattern,
-      iree_host_size_t pattern_length, const void* push_constants_to_restore);
+  iree_status_t FillBufferUnaligned(VkCommandBuffer command_buffer,
+                                    DescriptorSetArena* descriptor_set_arena,
+                                    iree_hal_buffer_t* target_buffer,
+                                    iree_device_size_t target_offset,
+                                    iree_device_size_t length,
+                                    const void* pattern,
+                                    iree_host_size_t pattern_length);
 
  private:
   VkDeviceHandle* logical_device_ = NULL;
 
-  iree_hal_descriptor_set_layout_t*
+  iree_hal_vulkan_descriptor_set_layout_t*
       descriptor_set_layouts_[IREE_HAL_VULKAN_BUILTIN_DESCRIPTOR_SET_COUNT] = {
           NULL};
-  iree_hal_pipeline_layout_t* pipeline_layout_ = NULL;
+  iree_hal_vulkan_pipeline_layout_t* pipeline_layout_ = NULL;
   VkPipeline pipeline_ = VK_NULL_HANDLE;
 };
 
diff --git a/runtime/src/iree/hal/drivers/vulkan/descriptor_set_arena.cc b/runtime/src/iree/hal/drivers/vulkan/descriptor_set_arena.cc
index 44abc79..7f988ac 100644
--- a/runtime/src/iree/hal/drivers/vulkan/descriptor_set_arena.cc
+++ b/runtime/src/iree/hal/drivers/vulkan/descriptor_set_arena.cc
@@ -13,7 +13,7 @@
 #include "iree/base/internal/math.h"
 #include "iree/hal/drivers/vulkan/base_buffer.h"
 #include "iree/hal/drivers/vulkan/extensibility_util.h"
-#include "iree/hal/drivers/vulkan/native_pipeline_layout.h"
+#include "iree/hal/drivers/vulkan/pipeline_layout.h"
 #include "iree/hal/drivers/vulkan/status_util.h"
 
 namespace iree {
@@ -71,7 +71,7 @@
     write_info.sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
     write_info.pNext = nullptr;
     write_info.dstSet = dst_set;
-    write_info.dstBinding = binding.ordinal;
+    write_info.dstBinding = (uint32_t)i;
     write_info.dstArrayElement = 0;
     write_info.descriptorCount = 1;
     write_info.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_BUFFER;
@@ -100,9 +100,9 @@
 }
 
 iree_status_t DescriptorSetArena::BindDescriptorSet(
-    VkCommandBuffer command_buffer, iree_hal_pipeline_layout_t* pipeline_layout,
-    uint32_t set, iree_host_size_t binding_count,
-    const iree_hal_buffer_ref_t* bindings) {
+    VkCommandBuffer command_buffer,
+    iree_hal_vulkan_pipeline_layout_t* pipeline_layout, uint32_t set,
+    iree_host_size_t binding_count, const iree_hal_buffer_ref_t* bindings) {
   // Always prefer using push descriptors when available as we can avoid the
   // additional API overhead of updating/resetting pools.
   if (logical_device_->enabled_extensions().push_descriptors) {
@@ -113,8 +113,7 @@
 
   IREE_TRACE_SCOPE_NAMED("DescriptorSetArena::BindDescriptorSet");
 
-  auto* set_layout =
-      iree_hal_vulkan_native_pipeline_layout_set(pipeline_layout, set);
+  auto* set_layout = iree_hal_vulkan_pipeline_layout_set(pipeline_layout, set);
 
   // Pick a bucket based on the number of descriptors required.
   // NOTE: right now we are 1:1 with bindings.
@@ -143,7 +142,7 @@
   allocate_info.pNext = nullptr;
   allocate_info.descriptorPool = descriptor_pool.handle;
   VkDescriptorSetLayout set_layout_handle =
-      iree_hal_vulkan_native_descriptor_set_layout_handle(set_layout);
+      iree_hal_vulkan_descriptor_set_layout_handle(set_layout);
   allocate_info.descriptorSetCount = 1;
   allocate_info.pSetLayouts = &set_layout_handle;
 
@@ -191,19 +190,19 @@
   // Bind the descriptor set.
   syms().vkCmdBindDescriptorSets(
       command_buffer, VK_PIPELINE_BIND_POINT_COMPUTE,
-      iree_hal_vulkan_native_pipeline_layout_handle(pipeline_layout), set, 1,
+      iree_hal_vulkan_pipeline_layout_handle(pipeline_layout), set, 1,
       &descriptor_set, 0, nullptr);
 
   return iree_ok_status();
 }
 
 void DescriptorSetArena::PushDescriptorSet(
-    VkCommandBuffer command_buffer, iree_hal_pipeline_layout_t* pipeline_layout,
-    uint32_t set, iree_host_size_t binding_count,
-    const iree_hal_buffer_ref_t* bindings) {
+    VkCommandBuffer command_buffer,
+    iree_hal_vulkan_pipeline_layout_t* pipeline_layout, uint32_t set,
+    iree_host_size_t binding_count, const iree_hal_buffer_ref_t* bindings) {
   IREE_TRACE_SCOPE_NAMED("DescriptorSetArena::PushDescriptorSet");
   VkPipelineLayout device_pipeline_layout =
-      iree_hal_vulkan_native_pipeline_layout_handle(pipeline_layout);
+      iree_hal_vulkan_pipeline_layout_handle(pipeline_layout);
 
   // Get a list of VkWriteDescriptorSet structs with all bound buffers.
   iree_host_size_t write_info_count = 0;
diff --git a/runtime/src/iree/hal/drivers/vulkan/descriptor_set_arena.h b/runtime/src/iree/hal/drivers/vulkan/descriptor_set_arena.h
index 6ae5807..786e893 100644
--- a/runtime/src/iree/hal/drivers/vulkan/descriptor_set_arena.h
+++ b/runtime/src/iree/hal/drivers/vulkan/descriptor_set_arena.h
@@ -18,6 +18,7 @@
 #include "iree/hal/drivers/vulkan/dynamic_symbols.h"
 #include "iree/hal/drivers/vulkan/handle_util.h"
 #include "iree/hal/drivers/vulkan/native_executable.h"
+#include "iree/hal/drivers/vulkan/pipeline_layout.h"
 #include "iree/hal/drivers/vulkan/util/arena.h"
 #include "iree/hal/drivers/vulkan/util/ref_ptr.h"
 
@@ -34,10 +35,10 @@
   // Allocates and binds a descriptor set from the arena.
   // The command buffer will have the descriptor set containing |bindings| bound
   // to it.
-  iree_status_t BindDescriptorSet(VkCommandBuffer command_buffer,
-                                  iree_hal_pipeline_layout_t* pipeline_layout,
-                                  uint32_t set, iree_host_size_t binding_count,
-                                  const iree_hal_buffer_ref_t* bindings);
+  iree_status_t BindDescriptorSet(
+      VkCommandBuffer command_buffer,
+      iree_hal_vulkan_pipeline_layout_t* pipeline_layout, uint32_t set,
+      iree_host_size_t binding_count, const iree_hal_buffer_ref_t* bindings);
 
   // Flushes all pending writes to descriptor sets allocated from the arena and
   // returns a group that - when dropped - will release the descriptor sets
@@ -49,7 +50,7 @@
 
   // Pushes the descriptor set to the command buffer, if supported.
   void PushDescriptorSet(VkCommandBuffer command_buffer,
-                         iree_hal_pipeline_layout_t* pipeline_layout,
+                         iree_hal_vulkan_pipeline_layout_t* pipeline_layout,
                          uint32_t set, iree_host_size_t binding_count,
                          const iree_hal_buffer_ref_t* bindings);
 
diff --git a/runtime/src/iree/hal/drivers/vulkan/direct_command_buffer.cc b/runtime/src/iree/hal/drivers/vulkan/direct_command_buffer.cc
index 03dac80..20782fd 100644
--- a/runtime/src/iree/hal/drivers/vulkan/direct_command_buffer.cc
+++ b/runtime/src/iree/hal/drivers/vulkan/direct_command_buffer.cc
@@ -17,7 +17,7 @@
 #include "iree/hal/drivers/vulkan/dynamic_symbols.h"
 #include "iree/hal/drivers/vulkan/native_event.h"
 #include "iree/hal/drivers/vulkan/native_executable.h"
-#include "iree/hal/drivers/vulkan/native_pipeline_layout.h"
+#include "iree/hal/drivers/vulkan/pipeline_layout.h"
 #include "iree/hal/drivers/vulkan/status_util.h"
 #include "iree/hal/drivers/vulkan/util/ref_ptr.h"
 #include "iree/hal/utils/resource_set.h"
@@ -51,13 +51,6 @@
   DescriptorSetGroup descriptor_set_group;
 
   BuiltinExecutables* builtin_executables;
-
-  // Shadow copy of push constants used during normal operation, for restoring
-  // after builtin_executables uses vkCmdPushConstants. Size must be greater
-  // than or equal to the push constant memory used by builtin_executables.
-  // TODO(scotttodd): use [maxPushConstantsSize - 16, maxPushConstantsSize]
-  //                  instead of [0, 16] to reduce frequency of updates
-  uint8_t push_constants_storage[IREE_HAL_VULKAN_BUILTIN_PUSH_CONSTANT_COUNT];
 } iree_hal_vulkan_direct_command_buffer_t;
 
 namespace {
@@ -559,8 +552,7 @@
     IREE_RETURN_IF_ERROR(
         command_buffer->builtin_executables->FillBufferUnaligned(
             command_buffer->handle, &(command_buffer->descriptor_set_arena),
-            target_ref.buffer, target_offset, length, pattern, pattern_length,
-            command_buffer->push_constants_storage));
+            target_ref.buffer, target_offset, length, pattern, pattern_length));
 
     // Continue using vkCmdFillBuffer below, but only for the inner aligned
     // portion of the fill operation.
@@ -678,149 +670,43 @@
                           "collectives not yet implemented on Vulkan");
 }
 
-static iree_status_t iree_hal_vulkan_direct_command_buffer_push_constants(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_pipeline_layout_t* pipeline_layout, iree_host_size_t offset,
-    const void* values, iree_host_size_t values_length) {
-  iree_hal_vulkan_direct_command_buffer_t* command_buffer =
-      iree_hal_vulkan_direct_command_buffer_cast(base_command_buffer);
-
-  iree_host_size_t storage_size =
-      IREE_ARRAYSIZE(command_buffer->push_constants_storage);
-  if (offset < storage_size) {
-    memcpy(command_buffer->push_constants_storage + offset, values,
-           std::min(values_length, storage_size) - offset);
-  }
-
-  command_buffer->syms->vkCmdPushConstants(
-      command_buffer->handle,
-      iree_hal_vulkan_native_pipeline_layout_handle(pipeline_layout),
-      VK_SHADER_STAGE_COMPUTE_BIT, (uint32_t)offset, (uint32_t)values_length,
-      values);
-
+static iree_status_t iree_hal_vulkan_direct_command_buffer_dispatch_bind(
+    iree_hal_vulkan_direct_command_buffer_t* command_buffer,
+    const iree_hal_vulkan_pipeline_t* pipeline,
+    iree_const_byte_span_t constants, iree_hal_buffer_ref_list_t bindings,
+    iree_hal_dispatch_flags_t flags) {
   return iree_ok_status();
 }
 
-static iree_status_t iree_hal_vulkan_direct_command_buffer_push_descriptor_set(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_pipeline_layout_t* pipeline_layout, uint32_t set,
-    iree_host_size_t binding_count, const iree_hal_buffer_ref_t* bindings) {
-  iree_hal_vulkan_direct_command_buffer_t* command_buffer =
-      iree_hal_vulkan_direct_command_buffer_cast(base_command_buffer);
-
-  // TODO(benvanik): batch insert by getting the resources in their own list.
-  for (iree_host_size_t i = 0; i < binding_count; ++i) {
-    if (bindings[i].buffer) {
-      IREE_RETURN_IF_ERROR(iree_hal_resource_set_insert(
-          command_buffer->resource_set, 1, &bindings[i].buffer));
-    }
-  }
-
-  // Either allocate, update, and bind a descriptor set or use push descriptor
-  // sets to use the command buffer pool when supported.
-  return command_buffer->descriptor_set_arena.BindDescriptorSet(
-      command_buffer->handle, pipeline_layout, set, binding_count, bindings);
-}
-
 static iree_status_t iree_hal_vulkan_direct_command_buffer_dispatch(
     iree_hal_command_buffer_t* base_command_buffer,
     iree_hal_executable_t* executable, int32_t entry_point,
-    uint32_t workgroup_x, uint32_t workgroup_y, uint32_t workgroup_z,
-    iree_hal_dispatch_flags_t flags) {
+    const uint32_t workgroup_count[3], iree_const_byte_span_t constants,
+    iree_hal_buffer_ref_list_t bindings, iree_hal_dispatch_flags_t flags) {
   iree_hal_vulkan_direct_command_buffer_t* command_buffer =
       iree_hal_vulkan_direct_command_buffer_cast(base_command_buffer);
 
-  IREE_TRACE({
-    iree_hal_vulkan_source_location_t source_location;
-    iree_hal_vulkan_native_executable_entry_point_source_location(
-        executable, entry_point, &source_location);
-    IREE_VULKAN_TRACE_ZONE_BEGIN_EXTERNAL(
-        command_buffer->tracing_context, command_buffer->handle,
-        source_location.file_name.data, source_location.file_name.size,
-        source_location.line, source_location.func_name.data,
-        source_location.func_name.size, /*name=*/NULL, 0);
-  });
+  const iree_hal_vulkan_pipeline_t* pipeline = NULL;
+  IREE_RETURN_IF_ERROR(iree_hal_vulkan_native_executable_lookup_pipeline(
+      executable, entry_point, &pipeline));
 
-  IREE_RETURN_IF_ERROR(iree_hal_resource_set_insert(
-      command_buffer->resource_set, 1, &executable));
-
-  // Get the compiled and linked pipeline for the specified entry point and
-  // bind it to the command buffer.
-  VkPipeline pipeline_handle = VK_NULL_HANDLE;
-  IREE_RETURN_IF_ERROR(
-      iree_hal_vulkan_native_executable_pipeline_for_entry_point(
-          executable, entry_point, &pipeline_handle, NULL));
-  command_buffer->syms->vkCmdBindPipeline(
-      command_buffer->handle, VK_PIPELINE_BIND_POINT_COMPUTE, pipeline_handle);
-
-  command_buffer->syms->vkCmdDispatch(command_buffer->handle, workgroup_x,
-                                      workgroup_y, workgroup_z);
-
-  IREE_VULKAN_TRACE_ZONE_END(command_buffer->tracing_context,
-                             command_buffer->handle);
-
-  return iree_ok_status();
-}
-
-static iree_status_t iree_hal_vulkan_direct_command_buffer_dispatch_indirect(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_executable_t* executable, int32_t entry_point,
-    iree_hal_buffer_ref_t workgroups_ref, iree_hal_dispatch_flags_t flags) {
-  iree_hal_vulkan_direct_command_buffer_t* command_buffer =
-      iree_hal_vulkan_direct_command_buffer_cast(base_command_buffer);
-
-  const void* resources[2] = {executable, workgroups_ref.buffer};
-  IREE_RETURN_IF_ERROR(iree_hal_resource_set_insert(
-      command_buffer->resource_set, IREE_ARRAYSIZE(resources), resources));
-
-  iree_hal_vulkan_source_location_t source_location;
-  iree_hal_vulkan_native_executable_entry_point_source_location(
-      executable, entry_point, &source_location);
+#if IREE_TRACING_FEATURES & IREE_TRACING_FEATURE_INSTRUMENTATION_DEVICE
+  iree_hal_vulkan_source_location_t source_location = pipeline->source_location;
   IREE_VULKAN_TRACE_ZONE_BEGIN_EXTERNAL(
       command_buffer->tracing_context, command_buffer->handle,
       source_location.file_name.data, source_location.file_name.size,
       source_location.line, source_location.func_name.data,
       source_location.func_name.size, /*name=*/NULL, 0);
+#endif  // IREE_TRACING_FEATURES & IREE_TRACING_FEATURE_INSTRUMENTATION_DEVICE
 
-  // Get the compiled and linked pipeline for the specified entry point and
-  // bind it to the command buffer.
-  VkPipeline pipeline_handle = VK_NULL_HANDLE;
-  IREE_RETURN_IF_ERROR(
-      iree_hal_vulkan_native_executable_pipeline_for_entry_point(
-          executable, entry_point, &pipeline_handle, NULL));
-  command_buffer->syms->vkCmdBindPipeline(
-      command_buffer->handle, VK_PIPELINE_BIND_POINT_COMPUTE, pipeline_handle);
-
-  VkBuffer workgroups_device_buffer =
-      iree_hal_vulkan_buffer_handle(workgroups_ref.buffer);
-  iree_device_size_t workgroups_offset =
-      iree_hal_buffer_byte_offset(workgroups_ref.buffer) +
-      workgroups_ref.offset;
-  command_buffer->syms->vkCmdDispatchIndirect(
-      command_buffer->handle, workgroups_device_buffer, workgroups_offset);
-
-  IREE_VULKAN_TRACE_ZONE_END(command_buffer->tracing_context,
-                             command_buffer->handle);
-
-  return iree_ok_status();
-}
-
-static iree_status_t iree_hal_vulkan_direct_command_buffer_dispatch2_bind(
-    iree_hal_vulkan_direct_command_buffer_t* command_buffer,
-    iree_hal_executable_t* executable, int32_t entry_point,
-    iree_const_byte_span_t constants, iree_hal_buffer_ref_list_t bindings,
-    iree_hal_dispatch_flags_t flags) {
-  // Get the compiled and linked pipeline for the specified entry point.
-  VkPipeline pipeline_handle = VK_NULL_HANDLE;
-  iree_hal_pipeline_layout_t* pipeline_layout = NULL;
-  IREE_RETURN_IF_ERROR(
-      iree_hal_vulkan_native_executable_pipeline_for_entry_point(
-          executable, entry_point, &pipeline_handle, &pipeline_layout));
+  // Retain executable.
+  IREE_RETURN_IF_ERROR(iree_hal_resource_set_insert(
+      command_buffer->resource_set, 1, &executable));
 
   // Update push constants.
   if (!iree_const_byte_span_is_empty(constants)) {
     VkPipelineLayout pipeline_layout_handle =
-        iree_hal_vulkan_native_pipeline_layout_handle(pipeline_layout);
+        iree_hal_vulkan_pipeline_layout_handle(pipeline->layout);
     command_buffer->syms->vkCmdPushConstants(
         command_buffer->handle, pipeline_layout_handle,
         VK_SHADER_STAGE_COMPUTE_BIT, (uint32_t)0,
@@ -835,40 +721,12 @@
   // Either allocate, update, and bind a descriptor set or use push descriptor
   // sets to use the command buffer pool when supported.
   IREE_RETURN_IF_ERROR(command_buffer->descriptor_set_arena.BindDescriptorSet(
-      command_buffer->handle, pipeline_layout, 0, bindings.count,
+      command_buffer->handle, pipeline->layout, 0, bindings.count,
       bindings.values));
 
+  // Bind and dispatch the pipeline.
   command_buffer->syms->vkCmdBindPipeline(
-      command_buffer->handle, VK_PIPELINE_BIND_POINT_COMPUTE, pipeline_handle);
-
-  return iree_ok_status();
-}
-
-static iree_status_t iree_hal_vulkan_direct_command_buffer_dispatch2(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_executable_t* executable, int32_t entry_point,
-    const uint32_t workgroup_count[3], iree_const_byte_span_t constants,
-    iree_hal_buffer_ref_list_t bindings, iree_hal_dispatch_flags_t flags) {
-  iree_hal_vulkan_direct_command_buffer_t* command_buffer =
-      iree_hal_vulkan_direct_command_buffer_cast(base_command_buffer);
-
-  IREE_TRACE({
-    iree_hal_vulkan_source_location_t source_location;
-    iree_hal_vulkan_native_executable_entry_point_source_location(
-        executable, entry_point, &source_location);
-    IREE_VULKAN_TRACE_ZONE_BEGIN_EXTERNAL(
-        command_buffer->tracing_context, command_buffer->handle,
-        source_location.file_name.data, source_location.file_name.size,
-        source_location.line, source_location.func_name.data,
-        source_location.func_name.size, /*name=*/NULL, 0);
-  });
-
-  IREE_RETURN_IF_ERROR(iree_hal_resource_set_insert(
-      command_buffer->resource_set, 1, &executable));
-
-  IREE_RETURN_IF_ERROR(iree_hal_vulkan_direct_command_buffer_dispatch2_bind(
-      command_buffer, executable, entry_point, constants, bindings, flags));
-
+      command_buffer->handle, VK_PIPELINE_BIND_POINT_COMPUTE, pipeline->handle);
   command_buffer->syms->vkCmdDispatch(command_buffer->handle,
                                       workgroup_count[0], workgroup_count[1],
                                       workgroup_count[2]);
@@ -879,7 +737,7 @@
   return iree_ok_status();
 }
 
-static iree_status_t iree_hal_vulkan_direct_command_buffer_dispatch2_indirect(
+static iree_status_t iree_hal_vulkan_direct_command_buffer_dispatch_indirect(
     iree_hal_command_buffer_t* base_command_buffer,
     iree_hal_executable_t* executable, int32_t entry_point,
     iree_hal_buffer_ref_t workgroups_ref, iree_const_byte_span_t constants,
@@ -887,24 +745,48 @@
   iree_hal_vulkan_direct_command_buffer_t* command_buffer =
       iree_hal_vulkan_direct_command_buffer_cast(base_command_buffer);
 
-  IREE_TRACE({
-    iree_hal_vulkan_source_location_t source_location;
-    iree_hal_vulkan_native_executable_entry_point_source_location(
-        executable, entry_point, &source_location);
-    IREE_VULKAN_TRACE_ZONE_BEGIN_EXTERNAL(
-        command_buffer->tracing_context, command_buffer->handle,
-        source_location.file_name.data, source_location.file_name.size,
-        source_location.line, source_location.func_name.data,
-        source_location.func_name.size, /*name=*/NULL, 0);
-  });
+  const iree_hal_vulkan_pipeline_t* pipeline = NULL;
+  IREE_RETURN_IF_ERROR(iree_hal_vulkan_native_executable_lookup_pipeline(
+      executable, entry_point, &pipeline));
 
+#if IREE_TRACING_FEATURES & IREE_TRACING_FEATURE_INSTRUMENTATION_DEVICE
+  iree_hal_vulkan_source_location_t source_location = pipeline->source_location;
+  IREE_VULKAN_TRACE_ZONE_BEGIN_EXTERNAL(
+      command_buffer->tracing_context, command_buffer->handle,
+      source_location.file_name.data, source_location.file_name.size,
+      source_location.line, source_location.func_name.data,
+      source_location.func_name.size, /*name=*/NULL, 0);
+#endif  // IREE_TRACING_FEATURES & IREE_TRACING_FEATURE_INSTRUMENTATION_DEVICE
+
+  // Retain executable and workgroup count buffer.
   const void* resources[2] = {executable, workgroups_ref.buffer};
   IREE_RETURN_IF_ERROR(iree_hal_resource_set_insert(
       command_buffer->resource_set, IREE_ARRAYSIZE(resources), resources));
 
-  IREE_RETURN_IF_ERROR(iree_hal_vulkan_direct_command_buffer_dispatch2_bind(
-      command_buffer, executable, entry_point, constants, bindings, flags));
+  // Update push constants.
+  if (!iree_const_byte_span_is_empty(constants)) {
+    VkPipelineLayout pipeline_layout_handle =
+        iree_hal_vulkan_pipeline_layout_handle(pipeline->layout);
+    command_buffer->syms->vkCmdPushConstants(
+        command_buffer->handle, pipeline_layout_handle,
+        VK_SHADER_STAGE_COMPUTE_BIT, (uint32_t)0,
+        (uint32_t)constants.data_length, constants.data);
+  }
 
+  // Retain bound buffers until the command buffer is reset.
+  IREE_RETURN_IF_ERROR(iree_hal_resource_set_insert_strided(
+      command_buffer->resource_set, bindings.count, bindings.values,
+      offsetof(iree_hal_buffer_ref_t, buffer), sizeof(iree_hal_buffer_ref_t)));
+
+  // Either allocate, update, and bind a descriptor set or use push descriptor
+  // sets to use the command buffer pool when supported.
+  IREE_RETURN_IF_ERROR(command_buffer->descriptor_set_arena.BindDescriptorSet(
+      command_buffer->handle, pipeline->layout, 0, bindings.count,
+      bindings.values));
+
+  // Bind and dispatch the pipeline.
+  command_buffer->syms->vkCmdBindPipeline(
+      command_buffer->handle, VK_PIPELINE_BIND_POINT_COMPUTE, pipeline->handle);
   VkBuffer workgroups_device_buffer =
       iree_hal_vulkan_buffer_handle(workgroups_ref.buffer);
   iree_device_size_t workgroups_offset =
@@ -943,15 +825,8 @@
         /*.copy_buffer=*/iree_hal_vulkan_direct_command_buffer_copy_buffer,
         /*.collective=*/
         iree_hal_vulkan_direct_command_buffer_collective,
-        /*.push_constants=*/
-        iree_hal_vulkan_direct_command_buffer_push_constants,
-        /*.push_descriptor_set=*/
-        iree_hal_vulkan_direct_command_buffer_push_descriptor_set,
         /*.dispatch=*/iree_hal_vulkan_direct_command_buffer_dispatch,
         /*.dispatch_indirect=*/
         iree_hal_vulkan_direct_command_buffer_dispatch_indirect,
-        /*.dispatch2=*/iree_hal_vulkan_direct_command_buffer_dispatch2,
-        /*.dispatch2_indirect=*/
-        iree_hal_vulkan_direct_command_buffer_dispatch2_indirect,
 };
 }  // namespace
diff --git a/runtime/src/iree/hal/drivers/vulkan/extensibility_util.h b/runtime/src/iree/hal/drivers/vulkan/extensibility_util.h
index db828e6..611d90d 100644
--- a/runtime/src/iree/hal/drivers/vulkan/extensibility_util.h
+++ b/runtime/src/iree/hal/drivers/vulkan/extensibility_util.h
@@ -106,6 +106,28 @@
 iree_hal_vulkan_infer_enabled_device_extensions(
     const iree::hal::vulkan::DynamicSymbols* device_syms);
 
+// A subset of relevant device limits.
+// These come from VkPhysicalDeviceLimits and other extension structures and are
+// condensed here to avoid the need for handling extension/versioning
+// compatibility in all places that may be interested in the limits.
+typedef struct iree_hal_vulkan_device_limits_t {
+  // maxPerStageDescriptorUniformBuffers is the maximum number of uniform
+  // buffers that can be accessible to a single shader stage in a pipeline
+  // layout. Descriptors with a type of VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER or
+  // VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC count against this limit.
+  uint32_t max_per_stage_descriptor_uniform_buffers;
+  // maxPerStageDescriptorStorageBuffers is the maximum number of storage
+  // buffers that can be accessible to a single shader stage in a pipeline
+  // layout. Descriptors with a type of VK_DESCRIPTOR_TYPE_STORAGE_BUFFER or
+  // VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC count against this limit.
+  uint32_t max_per_stage_descriptor_storage_buffers;
+  // maxPushConstantsSize is the maximum size, in bytes, of the pool of push
+  // constant memory. For each of the push constant ranges indicated by the
+  // pPushConstantRanges member of the VkPipelineLayoutCreateInfo structure,
+  // (offset + size) must be less than or equal to this limit.
+  uint32_t max_push_constants_size;
+} iree_hal_vulkan_device_limits_t;
+
 // Struct for supported device properties.
 //
 // Note that the fields used here should match the ones used in KernelFeatures
@@ -144,6 +166,9 @@
   // ("address.<mode>")
   // * 0b01: address.physical64
   uint32_t address : 8;
+
+  // Device limits.
+  iree_hal_vulkan_device_limits_t limits;
 } iree_hal_vulkan_iree_hal_vulkan_device_properties_t;
 
 #endif  // IREE_HAL_DRIVERS_VULKAN_EXTENSIBILITY_UTIL_H_
diff --git a/runtime/src/iree/hal/drivers/vulkan/native_executable.cc b/runtime/src/iree/hal/drivers/vulkan/native_executable.cc
index ebfd006..e143d56 100644
--- a/runtime/src/iree/hal/drivers/vulkan/native_executable.cc
+++ b/runtime/src/iree/hal/drivers/vulkan/native_executable.cc
@@ -13,198 +13,191 @@
 #include "iree/base/api.h"
 #include "iree/hal/drivers/vulkan/dynamic_symbols.h"
 #include "iree/hal/drivers/vulkan/handle_util.h"
-#include "iree/hal/drivers/vulkan/native_pipeline_layout.h"
+#include "iree/hal/drivers/vulkan/pipeline_layout.h"
 #include "iree/hal/drivers/vulkan/status_util.h"
 #include "iree/hal/drivers/vulkan/util/ref_ptr.h"
+#include "iree/hal/utils/executable_debug_info.h"
 
 // flatcc schemas:
 #include "iree/base/internal/flatcc/parsing.h"
-#include "iree/schemas/spirv_executable_def_reader.h"
-#include "iree/schemas/spirv_executable_def_verifier.h"
+#include "iree/schemas/executable_debug_info_reader.h"
+#include "iree/schemas/executable_debug_info_verifier.h"
+#include "iree/schemas/vulkan_executable_def_reader.h"
+#include "iree/schemas/vulkan_executable_def_verifier.h"
 
 using namespace iree::hal::vulkan;
 
-typedef struct iree_hal_vulkan_entry_point_t {
-  VkPipeline pipeline;
-  iree_hal_pipeline_layout_t* layout;
-  iree_string_view_t name;
+//===----------------------------------------------------------------------===//
+// FlatBuffer Verification
+//===----------------------------------------------------------------------===//
 
-  // Optional debug information.
-  IREE_TRACE(iree_hal_spirv_FileLineLocDef_table_t source_location;)
-  IREE_TRACE(iree_hal_spirv_StageLocationDef_vec_t stage_locations;)
-} iree_hal_vulkan_entry_point_t;
-
-static iree_status_t iree_hal_vulkan_create_shader_module(
-    VkDeviceHandle* logical_device, iree_const_byte_span_t code,
-    VkShaderModule* out_shader_module) {
-  IREE_TRACE_SCOPE();
-
-  VkShaderModuleCreateInfo create_info;
-  create_info.sType = VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO;
-  create_info.pNext = NULL;
-  create_info.flags = 0;
-  create_info.codeSize = code.data_length;
-  create_info.pCode = (const uint32_t*)code.data;
-  VK_RETURN_IF_ERROR(logical_device->syms()->vkCreateShaderModule(
-                         *logical_device, &create_info,
-                         logical_device->allocator(), out_shader_module),
-                     "vkCreateShaderModule");
-
+static iree_status_t iree_hal_vulkan_shader_module_flatbuffer_verify(
+    const iree_hal_vulkan_device_properties_t* device_properties,
+    iree_hal_vulkan_ShaderModuleDef_table_t shader_module_def) {
+  if (!shader_module_def) {
+    return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
+                            "shader module is NULL");
+  }
+  flatbuffers_uint32_vec_t spirv_code_vec =
+      iree_hal_vulkan_ShaderModuleDef_spirv_code_get(shader_module_def);
+  if (!spirv_code_vec || flatbuffers_uint32_vec_len(spirv_code_vec) == 0) {
+    return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
+                            "shader module spirv_code is empty");
+  }
   return iree_ok_status();
 }
 
-static void iree_hal_vulkan_destroy_shader_module(
-    VkDeviceHandle* logical_device, VkShaderModule handle) {
-  if (handle == VK_NULL_HANDLE) return;
-  logical_device->syms()->vkDestroyShaderModule(*logical_device, handle,
-                                                logical_device->allocator());
-}
-
-static iree_status_t iree_hal_vulkan_create_pipelines(
-    VkDeviceHandle* logical_device, VkPipelineCache pipeline_cache,
-    const iree_hal_executable_params_t* executable_params,
-    iree_hal_spirv_ExecutableDef_table_t executable_def,
-    VkShaderModule* shader_modules, iree_host_size_t pipeline_count,
-    iree_hal_vulkan_entry_point_t* out_entry_points) {
-  IREE_TRACE_SCOPE();
-  uint8_t* scratch_memory = NULL;
-  size_t create_info_size =
-      pipeline_count * sizeof(VkComputePipelineCreateInfo);
-  size_t spec_map_size =
-      executable_params->constant_count * sizeof(VkSpecializationMapEntry);
-  size_t subgroup_control_size =
-      pipeline_count *
-      sizeof(VkPipelineShaderStageRequiredSubgroupSizeCreateInfo);
-  IREE_RETURN_IF_ERROR(iree_allocator_malloc(
-      logical_device->host_allocator(),
-      create_info_size + spec_map_size + subgroup_control_size,
-      (void**)&scratch_memory));
-  VkComputePipelineCreateInfo* create_infos =
-      (VkComputePipelineCreateInfo*)scratch_memory;
-  VkSpecializationMapEntry* spec_map_entries =
-      (VkSpecializationMapEntry*)(scratch_memory + create_info_size);
-  VkPipelineShaderStageRequiredSubgroupSizeCreateInfo* subgroup_control_entries =
-      (VkPipelineShaderStageRequiredSubgroupSizeCreateInfo*)(scratch_memory +
-                                                             create_info_size +
-                                                             spec_map_size);
-
-  VkSpecializationInfo spec_info;
-  memset(&spec_info, 0, sizeof(spec_info));
-  spec_info.mapEntryCount = executable_params->constant_count;
-  spec_info.pMapEntries = spec_map_entries;
-  spec_info.dataSize = executable_params->constant_count * sizeof(uint32_t);
-  spec_info.pData = executable_params->constants;
-  for (iree_host_size_t i = 0; i < executable_params->constant_count; ++i) {
-    spec_map_entries[i].constantID = i;
-    spec_map_entries[i].offset = i * sizeof(uint32_t);
-    spec_map_entries[i].size = sizeof(uint32_t);
+static iree_status_t iree_hal_vulkan_pipeline_layout_flatbuffer_verify(
+    const iree_hal_vulkan_device_properties_t* device_properties,
+    iree_hal_vulkan_DescriptorSetLayoutDef_vec_t descriptor_set_layouts_vec,
+    iree_hal_vulkan_PipelineLayoutDef_table_t pipeline_layout_def) {
+  if (!pipeline_layout_def) {
+    return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
+                            "pipeline layout is NULL");
   }
 
-  flatbuffers_string_vec_t entry_points_vec =
-      iree_hal_spirv_ExecutableDef_entry_points_get(executable_def);
-  flatbuffers_uint32_vec_t shader_module_indices_vec =
-      iree_hal_spirv_ExecutableDef_shader_module_indices_get(executable_def);
-  flatbuffers_uint32_vec_t subgroup_sizes_vec =
-      iree_hal_spirv_ExecutableDef_subgroup_sizes_get(executable_def);
-  for (iree_host_size_t entry_ordinal = 0; entry_ordinal < pipeline_count;
-       ++entry_ordinal) {
-    iree_hal_pipeline_layout_t* pipeline_layout =
-        executable_params->pipeline_layouts[entry_ordinal];
-    iree_hal_pipeline_layout_retain(pipeline_layout);
-    out_entry_points[entry_ordinal].layout = pipeline_layout;
-
-    VkComputePipelineCreateInfo* create_info = &create_infos[entry_ordinal];
-    create_info->sType = VK_STRUCTURE_TYPE_COMPUTE_PIPELINE_CREATE_INFO;
-    create_info->pNext = NULL;
-    create_info->flags = 0;
-    if (!iree_all_bits_set(
-            executable_params->caching_mode,
-            IREE_HAL_EXECUTABLE_CACHING_MODE_ALLOW_OPTIMIZATION)) {
-      create_info->flags |= VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT;
+  // Basic descriptor set verification based on device limits. We don't know all
+  // of the ways this can fail here but can provide better error messages when
+  // limits are exceeded instead of relying on the optional validation layers
+  // during execution.
+  flatbuffers_uint32_vec_t descriptor_set_layout_ordinals_vec =
+      iree_hal_vulkan_PipelineLayoutDef_descriptor_set_layout_ordinals_get(
+          pipeline_layout_def);
+  if (!descriptor_set_layout_ordinals_vec ||
+      flatbuffers_uint32_vec_len(descriptor_set_layout_ordinals_vec) == 0) {
+    return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
+                            "pipeline layout is has no descriptor sets");
+  }
+  uint32_t uniform_descriptor_count = 0;
+  uint32_t storage_descriptor_count = 0;
+  for (iree_host_size_t i = 0;
+       i < flatbuffers_uint32_vec_len(descriptor_set_layout_ordinals_vec);
+       ++i) {
+    uint32_t descriptor_set_layout_ordinal =
+        flatbuffers_uint32_vec_at(descriptor_set_layout_ordinals_vec, i);
+    if (descriptor_set_layout_ordinal >
+        iree_hal_vulkan_DescriptorSetLayoutDef_vec_len(
+            descriptor_set_layouts_vec)) {
+      return iree_make_status(
+          IREE_STATUS_INVALID_ARGUMENT,
+          "pipeline layout references an invalid descriptor set ordinal %u",
+          descriptor_set_layout_ordinal);
     }
-    if (entry_ordinal == 0) {
-      create_info->flags |= VK_PIPELINE_CREATE_ALLOW_DERIVATIVES_BIT;
-    } else {
-      create_info->flags |= VK_PIPELINE_CREATE_DERIVATIVE_BIT;
-    }
-    create_info->layout =
-        iree_hal_vulkan_native_pipeline_layout_handle(pipeline_layout);
-    create_info->basePipelineHandle = VK_NULL_HANDLE;
-    create_info->basePipelineIndex = 0;
-
-    VkPipelineShaderStageCreateInfo* stage_create_info = &create_info->stage;
-    stage_create_info->sType =
-        VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO;
-    stage_create_info->flags = 0;
-    stage_create_info->stage = VK_SHADER_STAGE_COMPUTE_BIT;
-    uint32_t shader_module_index =
-        flatbuffers_uint32_vec_at(shader_module_indices_vec, entry_ordinal);
-    // We have verified that shader_module_index is within the range.
-    stage_create_info->module = shader_modules[shader_module_index];
-    stage_create_info->pName =
-        flatbuffers_string_vec_at(entry_points_vec, entry_ordinal);
-    stage_create_info->pSpecializationInfo = &spec_info;
-
-    // If subgroup size is not 0, request the said subgroup size via
-    // VK_EXT_subgroup_size_control (promoted to core since v1.3).
-    stage_create_info->pNext = NULL;
-    if (subgroup_sizes_vec) {
-      if (uint32_t subgroup_size =
-              flatbuffers_uint32_vec_at(subgroup_sizes_vec, entry_ordinal)) {
-        subgroup_control_entries[entry_ordinal].sType =
-            VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_REQUIRED_SUBGROUP_SIZE_CREATE_INFO;
-        subgroup_control_entries[entry_ordinal].pNext = NULL;
-        subgroup_control_entries[entry_ordinal].requiredSubgroupSize =
-            subgroup_size;
-        stage_create_info->pNext = &subgroup_control_entries[entry_ordinal];
+    iree_hal_vulkan_DescriptorSetLayoutDef_table_t set_layout_def =
+        iree_hal_vulkan_DescriptorSetLayoutDef_vec_at(
+            descriptor_set_layouts_vec, descriptor_set_layout_ordinal);
+    iree_hal_vulkan_DescriptorSetLayoutBindingDef_vec_t bindings_vec =
+        iree_hal_vulkan_DescriptorSetLayoutDef_bindings_get(set_layout_def);
+    for (iree_host_size_t j = 0;
+         j <
+         iree_hal_vulkan_DescriptorSetLayoutBindingDef_vec_len(bindings_vec);
+         ++j) {
+      iree_hal_vulkan_DescriptorSetLayoutBindingDef_table_t binding_def =
+          iree_hal_vulkan_DescriptorSetLayoutBindingDef_vec_at(bindings_vec, j);
+      uint32_t descriptor_count =
+          iree_hal_vulkan_DescriptorSetLayoutBindingDef_descriptor_count_get(
+              binding_def);
+      iree_hal_vulkan_VkDescriptorType_enum_t type =
+          iree_hal_vulkan_DescriptorSetLayoutBindingDef_descriptor_type_get(
+              binding_def);
+      uint32_t stage_flags =
+          iree_hal_vulkan_DescriptorSetLayoutBindingDef_stage_flags_get(
+              binding_def);
+      switch (type) {
+        case iree_hal_vulkan_VkDescriptorType_UNIFORM_BUFFER:
+          uniform_descriptor_count += descriptor_count;
+          break;
+        case iree_hal_vulkan_VkDescriptorType_STORAGE_BUFFER:
+          storage_descriptor_count += descriptor_count;
+          break;
+        default:
+          return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
+                                  "pipeline layout set[%" PRIhsz
+                                  "] binding[%" PRIhsz
+                                  "] has an unsupported descriptor_type",
+                                  i, j);
+      }
+      // For now we limit to just COMPUTE. If we support other pipeline types in
+      // the future we can expand these.
+      if (stage_flags != VK_SHADER_STAGE_COMPUTE_BIT &&
+          stage_flags != VK_SHADER_STAGE_ALL) {
+        return iree_make_status(
+            IREE_STATUS_INVALID_ARGUMENT,
+            "pipeline layout set[%" PRIhsz "] binding[%" PRIhsz
+            "] has invalid stage flags; must be VK_SHADER_STAGE_COMPUTE_BIT",
+            i, j);
       }
     }
   }
+  if (uniform_descriptor_count + storage_descriptor_count == 0) {
+    return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
+                            "pipeline layout has no declared descriptor "
+                            "bindings and must have at least one");
+  } else if (uniform_descriptor_count >
+             device_properties->limits
+                 .max_per_stage_descriptor_uniform_buffers) {
+    return iree_make_status(
+        IREE_STATUS_INVALID_ARGUMENT,
+        "pipeline layout exceeds device maximum uniform "
+        "buffer limit %u by using %u uniform descriptors",
+        device_properties->limits.max_per_stage_descriptor_uniform_buffers,
+        uniform_descriptor_count);
+  } else if (storage_descriptor_count >
+             device_properties->limits
+                 .max_per_stage_descriptor_storage_buffers) {
+    return iree_make_status(
+        IREE_STATUS_INVALID_ARGUMENT,
+        "pipeline layout exceeds device maximum storage "
+        "buffer limit %u by using %u storage descriptors",
+        device_properties->limits.max_per_stage_descriptor_storage_buffers,
+        storage_descriptor_count);
+  }
 
-  VkPipeline* pipelines =
-      (VkPipeline*)iree_alloca(pipeline_count * sizeof(VkPipeline));
-  iree_status_t status = VK_RESULT_TO_STATUS(
-      logical_device->syms()->vkCreateComputePipelines(
-          *logical_device, pipeline_cache, (uint32_t)pipeline_count,
-          create_infos, logical_device->allocator(), pipelines),
-      "vkCreateComputePipelines");
-  if (iree_status_is_ok(status)) {
-    for (iree_host_size_t i = 0; i < pipeline_count; ++i) {
-      out_entry_points[i].pipeline = pipelines[i];
+  iree_hal_vulkan_PushConstantRange_vec_t push_constant_ranges_vec =
+      iree_hal_vulkan_PipelineLayoutDef_push_constant_ranges_get(
+          pipeline_layout_def);
+  for (iree_host_size_t i = 0;
+       i < iree_hal_vulkan_PushConstantRange_vec_len(push_constant_ranges_vec);
+       ++i) {
+    const iree_hal_vulkan_PushConstantRange_t* push_constant_range =
+        iree_hal_vulkan_PushConstantRange_vec_at(push_constant_ranges_vec, i);
 
-      // Set pipeline name for tooling.
-      if (PFN_vkSetDebugUtilsObjectNameEXT set_name =
-              logical_device->syms()->vkSetDebugUtilsObjectNameEXT) {
-        VkDebugUtilsObjectNameInfoEXT name_info = {};
-        name_info.sType = VK_STRUCTURE_TYPE_DEBUG_UTILS_OBJECT_NAME_INFO_EXT;
-        name_info.pNext = NULL;
-        name_info.objectHandle = (uint64_t)pipelines[i];
-        name_info.objectType = VK_OBJECT_TYPE_PIPELINE;
-        name_info.pObjectName = flatbuffers_string_vec_at(entry_points_vec, i);
-        set_name(*logical_device, &name_info);
-      }
+    // For now we limit to just COMPUTE. If we support other pipeline types in
+    // the future we can expand these.
+    if (push_constant_range->stage_flags != VK_SHADER_STAGE_COMPUTE_BIT &&
+        push_constant_range->stage_flags != VK_SHADER_STAGE_ALL) {
+      return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
+                              "pipeline layout push_constant_ranges[%" PRIhsz
+                              "] has invalid stage flags; "
+                              "must be VK_SHADER_STAGE_COMPUTE_BIT",
+                              i);
+    }
+
+    // Ensure the push constant range is in-bounds. This is additional early
+    // verification that is otherwise (probably) performed by the driver.
+    uint32_t range_end =
+        push_constant_range->offset + push_constant_range->size;
+    if (range_end > device_properties->limits.max_push_constants_size) {
+      return iree_make_status(
+          IREE_STATUS_INVALID_ARGUMENT,
+          "pipeline layout push_constant_ranges[%" PRIhsz
+          "] (offset=%u, size=%u) "
+          "exceeds device limit %u",
+          i, push_constant_range->offset, push_constant_range->size,
+          device_properties->limits.max_push_constants_size);
     }
   }
 
-  iree_allocator_free(logical_device->host_allocator(), scratch_memory);
-  return status;
-}
-
-static void iree_hal_vulkan_destroy_pipeline(VkDeviceHandle* logical_device,
-                                             VkPipeline handle) {
-  IREE_TRACE_SCOPE();
-  if (handle == VK_NULL_HANDLE) return;
-  logical_device->syms()->vkDestroyPipeline(*logical_device, handle,
-                                            logical_device->allocator());
+  return iree_ok_status();
 }
 
 // Verifies the structure of the FlatBuffer so that we can avoid doing so during
 // runtime. There are still some conditions we must be aware of (such as omitted
 // names on functions with internal linkage), however we shouldn't need to
 // bounds check anything within the FlatBuffer after this succeeds.
-static iree_status_t iree_hal_spirv_executable_flatbuffer_verify(
-    iree_const_byte_span_t flatbuffer_data,
-    iree_host_size_t expected_entry_point_count) {
+static iree_status_t iree_hal_vulkan_executable_flatbuffer_verify(
+    const iree_hal_vulkan_device_properties_t* device_properties,
+    iree_const_byte_span_t flatbuffer_data) {
   if (!flatbuffer_data.data || flatbuffer_data.data_length < 16) {
     return iree_make_status(
         IREE_STATUS_INVALID_ARGUMENT,
@@ -216,7 +209,7 @@
   // Run flatcc generated verification. This ensures all pointers are in-bounds
   // and that we can safely walk the file, but not that the actual contents of
   // the FlatBuffer meet our expectations.
-  int verify_ret = iree_hal_spirv_ExecutableDef_verify_as_root(
+  int verify_ret = iree_hal_vulkan_ExecutableDef_verify_as_root(
       flatbuffer_data.data, flatbuffer_data.data_length);
   if (verify_ret != flatcc_verify_ok) {
     return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
@@ -224,91 +217,660 @@
                             flatcc_verify_error_string(verify_ret));
   }
 
-  iree_hal_spirv_ExecutableDef_table_t executable_def =
-      iree_hal_spirv_ExecutableDef_as_root(flatbuffer_data.data);
+  iree_hal_vulkan_ExecutableDef_table_t executable_def =
+      iree_hal_vulkan_ExecutableDef_as_root(flatbuffer_data.data);
 
-  flatbuffers_string_vec_t entry_points_vec =
-      iree_hal_spirv_ExecutableDef_entry_points_get(executable_def);
-  size_t entry_point_count = flatbuffers_string_vec_len(entry_points_vec);
-  if (entry_point_count != expected_entry_point_count) {
-    return iree_make_status(IREE_STATUS_FAILED_PRECONDITION,
-                            "executable provides %zu entry points but caller "
-                            "provided %" PRIhsz "; must match",
-                            entry_point_count, expected_entry_point_count);
+  iree_hal_vulkan_DescriptorSetLayoutDef_vec_t descriptor_set_layouts_vec =
+      iree_hal_vulkan_ExecutableDef_descriptor_set_layouts_get(executable_def);
+  iree_hal_vulkan_PipelineLayoutDef_vec_t pipeline_layouts_vec =
+      iree_hal_vulkan_ExecutableDef_pipeline_layouts_get(executable_def);
+  for (iree_host_size_t i = 0;
+       i < iree_hal_vulkan_PipelineLayoutDef_vec_len(pipeline_layouts_vec);
+       ++i) {
+    iree_hal_vulkan_PipelineLayoutDef_table_t pipeline_layout_def =
+        iree_hal_vulkan_PipelineLayoutDef_vec_at(pipeline_layouts_vec, i);
+    IREE_RETURN_IF_ERROR(
+        iree_hal_vulkan_pipeline_layout_flatbuffer_verify(
+            device_properties, descriptor_set_layouts_vec, pipeline_layout_def),
+        "pipeline_layouts[%" PRIhsz "]", i);
   }
 
-  for (size_t i = 0; i < entry_point_count; ++i) {
-    if (!flatbuffers_string_len(
-            flatbuffers_string_vec_at(entry_points_vec, i))) {
+  iree_hal_vulkan_ShaderModuleDef_vec_t shader_modules_vec =
+      iree_hal_vulkan_ExecutableDef_shader_modules_get(executable_def);
+  for (iree_host_size_t i = 0;
+       i < iree_hal_vulkan_ShaderModuleDef_vec_len(shader_modules_vec); ++i) {
+    iree_hal_vulkan_ShaderModuleDef_table_t shader_module_def =
+        iree_hal_vulkan_ShaderModuleDef_vec_at(shader_modules_vec, i);
+    IREE_RETURN_IF_ERROR(iree_hal_vulkan_shader_module_flatbuffer_verify(
+                             device_properties, shader_module_def),
+                         "shader_modules[%" PRIhsz "]", i);
+  }
+
+  iree_hal_vulkan_PipelineDef_vec_t pipelines_vec =
+      iree_hal_vulkan_ExecutableDef_pipelines_get(executable_def);
+  for (size_t i = 0; i < iree_hal_vulkan_PipelineDef_vec_len(pipelines_vec);
+       ++i) {
+    iree_hal_vulkan_PipelineDef_table_t export_def =
+        iree_hal_vulkan_PipelineDef_vec_at(pipelines_vec, i);
+    if (!export_def) continue;
+
+    uint32_t shader_module_ordinal =
+        iree_hal_vulkan_PipelineDef_shader_module_ordinal_get(export_def);
+    if (shader_module_ordinal >=
+        iree_hal_vulkan_ShaderModuleDef_vec_len(shader_modules_vec)) {
+      return iree_make_status(
+          IREE_STATUS_INVALID_ARGUMENT,
+          "pipelines[%" PRIhsz "] shader_module_ordinal is out of bounds", i);
+    }
+
+    if (flatbuffers_string_len(
+            iree_hal_vulkan_PipelineDef_entry_point_get(export_def)) == 0) {
       return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
-                              "executable entry point %zu has no name", i);
+                              "pipelines[%" PRIhsz "] name is empty", i);
     }
-  }
 
-  flatbuffers_uint32_vec_t subgroup_sizes_vec =
-      iree_hal_spirv_ExecutableDef_subgroup_sizes_get(executable_def);
-  if (subgroup_sizes_vec) {
-    size_t subgroup_sizes_count = flatbuffers_vec_len(subgroup_sizes_vec);
-    if (subgroup_sizes_count != expected_entry_point_count) {
+    uint32_t pipeline_layout_ordinal =
+        iree_hal_vulkan_PipelineDef_pipeline_layout_ordinal_get(export_def);
+    if (pipeline_layout_ordinal >=
+        iree_hal_vulkan_PipelineLayoutDef_vec_len(pipeline_layouts_vec)) {
       return iree_make_status(
           IREE_STATUS_INVALID_ARGUMENT,
-          "executable has %" PRIhsz
-          " entry points but %zu subgroup sizes are defined",
-          expected_entry_point_count, subgroup_sizes_count);
+          "pipelines[%" PRIhsz "] pipeline_layout_ordinal is out of bounds", i);
     }
-  }
 
-  iree_hal_spirv_ShaderModuleDef_vec_t shader_modules_vec =
-      iree_hal_spirv_ExecutableDef_shader_modules_get(executable_def);
-  size_t shader_module_count = flatbuffers_vec_len(shader_modules_vec);
-  if (shader_module_count == 0) {
-    return iree_make_status(IREE_STATUS_FAILED_PRECONDITION,
-                            "executable provides no shader modules");
+    IREE_RETURN_IF_ERROR(iree_hal_debug_verify_export_def(
+        iree_hal_vulkan_PipelineDef_debug_info_get(export_def)));
   }
-  for (size_t i = 0; i < shader_module_count; ++i) {
-    iree_hal_spirv_ShaderModuleDef_table_t shader_module =
-        iree_hal_spirv_ShaderModuleDef_vec_at(shader_modules_vec, i);
-    size_t code_size = flatbuffers_uint32_vec_len(
-        iree_hal_spirv_ShaderModuleDef_code_get(shader_module));
-    if (code_size == 0) {
-      return iree_make_status(
-          IREE_STATUS_INVALID_ARGUMENT,
-          "executable SPIR-V code in shader module #%zu is missing", i);
-    }
-  }
-
-  flatbuffers_uint32_vec_t shader_module_indices_vec =
-      iree_hal_spirv_ExecutableDef_shader_module_indices_get(executable_def);
-  size_t shader_module_index_count =
-      flatbuffers_vec_len(shader_module_indices_vec);
-  if (shader_module_index_count != expected_entry_point_count) {
-    return iree_make_status(
-        IREE_STATUS_INVALID_ARGUMENT,
-        "executable has %" PRIhsz
-        " entry points but %zu shader module indices are defined",
-        expected_entry_point_count, shader_module_index_count);
-  }
-  for (size_t i = 0; i < shader_module_index_count; ++i) {
-    uint32_t index = flatbuffers_uint32_vec_at(shader_module_indices_vec, i);
-    if (index >= shader_module_count) {
-      return iree_make_status(
-          IREE_STATUS_INVALID_ARGUMENT,
-          "executable entry point shader module index %u out of range; "
-          "executable only has %zu total shader modules",
-          index, shader_module_count);
-    }
-  }
-
-  // TODO: verify source locations, stage locations, and source files.
 
   return iree_ok_status();
 }
 
+//===----------------------------------------------------------------------===//
+// Descriptor Set Layouts
+//===----------------------------------------------------------------------===//
+
+static void iree_hal_vulkan_release_descriptor_set_layouts(
+    VkDeviceHandle* logical_device,
+    iree_host_size_t descriptor_set_layout_count,
+    iree_hal_vulkan_descriptor_set_layout_t** descriptor_set_layouts) {
+  for (iree_host_size_t i = 0; i < descriptor_set_layout_count; ++i) {
+    iree_hal_vulkan_descriptor_set_layout_release(descriptor_set_layouts[i]);
+  }
+  iree_allocator_free(logical_device->host_allocator(), descriptor_set_layouts);
+}
+
+// Creates a descriptor set layout based on the flatbuffer definition.
+static iree_status_t iree_hal_vulkan_create_descriptor_set_layout(
+    VkDeviceHandle* logical_device,
+    iree_hal_vulkan_DescriptorSetLayoutDef_table_t descriptor_set_layout_def,
+    iree_hal_vulkan_descriptor_set_layout_t** out_descriptor_set_layout) {
+  IREE_ASSERT_ARGUMENT(logical_device);
+  IREE_ASSERT_ARGUMENT(descriptor_set_layout_def);
+  IREE_ASSERT_ARGUMENT(out_descriptor_set_layout);
+  *out_descriptor_set_layout = NULL;
+  IREE_TRACE_ZONE_BEGIN(z0);
+
+  iree_hal_vulkan_DescriptorSetLayoutBindingDef_vec_t bindings_vec =
+      iree_hal_vulkan_DescriptorSetLayoutDef_bindings_get(
+          descriptor_set_layout_def);
+  iree_host_size_t binding_count =
+      iree_hal_vulkan_DescriptorSetLayoutBindingDef_vec_len(bindings_vec);
+
+  VkDescriptorSetLayoutBinding* bindings = NULL;
+  if (binding_count > 0) {
+    // TODO(benvanik): avoid this allocation if possible (inline_array).
+    IREE_RETURN_IF_ERROR(iree_allocator_malloc(
+        logical_device->host_allocator(),
+        binding_count * sizeof(VkDescriptorSetLayoutBinding),
+        (void**)&bindings));
+    for (iree_host_size_t i = 0; i < binding_count; ++i) {
+      iree_hal_vulkan_DescriptorSetLayoutBindingDef_table_t binding_def =
+          iree_hal_vulkan_DescriptorSetLayoutBindingDef_vec_at(bindings_vec, i);
+      VkDescriptorSetLayoutBinding* binding = &bindings[i];
+      binding->binding =
+          iree_hal_vulkan_DescriptorSetLayoutBindingDef_binding_get(
+              binding_def);
+      binding->descriptorType = static_cast<VkDescriptorType>(
+          iree_hal_vulkan_DescriptorSetLayoutBindingDef_descriptor_type_get(
+              binding_def));
+      binding->descriptorCount =
+          iree_hal_vulkan_DescriptorSetLayoutBindingDef_descriptor_count_get(
+              binding_def);
+      binding->stageFlags = static_cast<VkShaderStageFlags>(
+          iree_hal_vulkan_DescriptorSetLayoutBindingDef_stage_flags_get(
+              binding_def));
+      binding->pImmutableSamplers = NULL;
+    }
+  }
+
+  VkDescriptorSetLayoutCreateFlags flags = 0;
+  iree_hal_vulkan_descriptor_set_layout_t* descriptor_set_layout = NULL;
+  iree_status_t status = iree_hal_vulkan_descriptor_set_layout_create(
+      logical_device, flags, binding_count, bindings, &descriptor_set_layout);
+
+  iree_allocator_free(logical_device->host_allocator(), bindings);
+
+  if (iree_status_is_ok(status)) {
+    *out_descriptor_set_layout = descriptor_set_layout;
+  }
+  IREE_TRACE_ZONE_END(z0);
+  return status;
+}
+
+// Creates all descriptor set layouts specified and returns a temporary heap
+// array with them in the same order. Callers must use
+// iree_hal_vulkan_release_descriptor_set_layouts when done with the array to
+// release the resources.
+static iree_status_t iree_hal_vulkan_create_descriptor_set_layouts(
+    VkDeviceHandle* logical_device,
+    iree_hal_vulkan_DescriptorSetLayoutDef_vec_t descriptor_set_layouts_vec,
+    iree_host_size_t* out_descriptor_set_layout_count,
+    iree_hal_vulkan_descriptor_set_layout_t*** out_descriptor_set_layouts) {
+  IREE_ASSERT_ARGUMENT(logical_device);
+  IREE_ASSERT_ARGUMENT(descriptor_set_layouts_vec);
+  IREE_ASSERT_ARGUMENT(out_descriptor_set_layout_count);
+  IREE_ASSERT_ARGUMENT(out_descriptor_set_layouts);
+  *out_descriptor_set_layout_count = 0;
+  *out_descriptor_set_layouts = NULL;
+  IREE_TRACE_ZONE_BEGIN(z0);
+
+  iree_host_size_t descriptor_set_layout_count =
+      iree_hal_vulkan_DescriptorSetLayoutDef_vec_len(
+          descriptor_set_layouts_vec);
+  iree_hal_vulkan_descriptor_set_layout_t** descriptor_set_layouts = NULL;
+  IREE_RETURN_AND_END_ZONE_IF_ERROR(
+      z0, iree_allocator_malloc(
+              logical_device->host_allocator(),
+              descriptor_set_layout_count * sizeof(descriptor_set_layouts[0]),
+              (void**)&descriptor_set_layouts));
+
+  iree_status_t status = iree_ok_status();
+  for (iree_host_size_t i = 0; i < descriptor_set_layout_count; ++i) {
+    iree_hal_vulkan_DescriptorSetLayoutDef_table_t descriptor_set_layout_def =
+        iree_hal_vulkan_DescriptorSetLayoutDef_vec_at(
+            descriptor_set_layouts_vec, i);
+    status = iree_hal_vulkan_create_descriptor_set_layout(
+        logical_device, descriptor_set_layout_def, &descriptor_set_layouts[i]);
+    if (!iree_status_is_ok(status)) {
+      status = iree_status_annotate_f(status,
+                                      "descriptor_set_layouts[%" PRIhsz "]", i);
+      break;
+    }
+  }
+
+  if (iree_status_is_ok(status)) {
+    *out_descriptor_set_layout_count = descriptor_set_layout_count;
+    *out_descriptor_set_layouts = descriptor_set_layouts;
+  } else {
+    iree_hal_vulkan_release_descriptor_set_layouts(
+        logical_device, descriptor_set_layout_count, descriptor_set_layouts);
+  }
+  IREE_TRACE_ZONE_END(z0);
+  return status;
+}
+
+//===----------------------------------------------------------------------===//
+// Pipeline Layouts
+//===----------------------------------------------------------------------===//
+
+// Creates a pipeline layout from the flatbuffer definition using the descriptor
+// set layouts provided.
+static iree_status_t iree_hal_vulkan_create_pipeline_layout(
+    VkDeviceHandle* logical_device,
+    iree_host_size_t descriptor_set_layout_count,
+    iree_hal_vulkan_descriptor_set_layout_t** descriptor_set_layouts,
+    iree_hal_vulkan_PipelineLayoutDef_table_t pipeline_layout_def,
+    iree_hal_vulkan_pipeline_layout_t** out_pipeline_layout) {
+  IREE_ASSERT_ARGUMENT(logical_device);
+  IREE_ASSERT_ARGUMENT(pipeline_layout_def);
+  IREE_ASSERT_ARGUMENT(descriptor_set_layouts);
+  IREE_ASSERT_ARGUMENT(out_pipeline_layout);
+  *out_pipeline_layout = NULL;
+  IREE_TRACE_ZONE_BEGIN(z0);
+
+  iree_hal_vulkan_PushConstantRange_vec_t push_constant_ranges =
+      iree_hal_vulkan_PipelineLayoutDef_push_constant_ranges_get(
+          pipeline_layout_def);
+  iree_host_size_t push_constant_range_count =
+      iree_hal_vulkan_PushConstantRange_vec_len(push_constant_ranges);
+  const VkPushConstantRange* push_constant_range_ptr = NULL;
+  if (push_constant_range_count > 0) {
+    static_assert(sizeof(iree_hal_vulkan_PushConstantRange) ==
+                      sizeof(VkPushConstantRange),
+                  "expecting to overlay VkPushConstantRange");
+    push_constant_range_ptr =
+        (const VkPushConstantRange*)iree_hal_vulkan_PushConstantRange_vec_at(
+            push_constant_ranges, 0);
+  }
+
+  flatbuffers_uint32_vec_t descriptor_set_layout_ordinals_vec =
+      iree_hal_vulkan_PipelineLayoutDef_descriptor_set_layout_ordinals_get(
+          pipeline_layout_def);
+  iree_host_size_t selected_set_layouts_count =
+      flatbuffers_uint32_vec_len(descriptor_set_layout_ordinals_vec);
+  iree_hal_vulkan_descriptor_set_layout_t** selected_set_layouts =
+      (iree_hal_vulkan_descriptor_set_layout_t**)iree_alloca(
+          selected_set_layouts_count *
+          sizeof(iree_hal_vulkan_descriptor_set_layout_t*));
+  for (iree_host_size_t i = 0; i < selected_set_layouts_count; ++i) {
+    uint32_t ordinal =
+        flatbuffers_uint32_vec_at(descriptor_set_layout_ordinals_vec, i);
+    selected_set_layouts[i] = descriptor_set_layouts[ordinal];
+  }
+
+  iree_status_t status = iree_hal_vulkan_pipeline_layout_create(
+      logical_device, push_constant_range_count, push_constant_range_ptr,
+      selected_set_layouts_count, selected_set_layouts, out_pipeline_layout);
+
+  IREE_TRACE_ZONE_END(z0);
+  return status;
+}
+
+static void iree_hal_vulkan_release_pipeline_layouts(
+    VkDeviceHandle* logical_device, iree_host_size_t pipeline_layout_count,
+    iree_hal_vulkan_pipeline_layout_t** pipeline_layouts) {
+  IREE_TRACE_ZONE_BEGIN(z0);
+  for (iree_host_size_t i = 0; i < pipeline_layout_count; ++i) {
+    iree_hal_vulkan_pipeline_layout_release(pipeline_layouts[i]);
+  }
+  iree_allocator_free(logical_device->host_allocator(), pipeline_layouts);
+  IREE_TRACE_ZONE_END(z0);
+}
+
+// Creates all pipeline layouts specified and returns a temporary heap array
+// with them in the same order. Callers must use
+// iree_hal_vulkan_release_pipeline_layouts when done with the array to
+// release the resources.
+static iree_status_t iree_hal_vulkan_create_pipeline_layouts(
+    VkDeviceHandle* logical_device,
+    iree_hal_vulkan_DescriptorSetLayoutDef_vec_t descriptor_set_layouts_vec,
+    iree_hal_vulkan_PipelineLayoutDef_vec_t pipeline_layouts_vec,
+    iree_host_size_t* out_pipeline_layout_count,
+    iree_hal_vulkan_pipeline_layout_t*** out_pipeline_layouts) {
+  IREE_ASSERT_ARGUMENT(logical_device);
+  IREE_ASSERT_ARGUMENT(descriptor_set_layouts_vec);
+  IREE_ASSERT_ARGUMENT(pipeline_layouts_vec);
+  IREE_ASSERT_ARGUMENT(out_pipeline_layout_count);
+  IREE_ASSERT_ARGUMENT(out_pipeline_layouts);
+  *out_pipeline_layout_count = 0;
+  *out_pipeline_layouts = NULL;
+  IREE_TRACE_ZONE_BEGIN(z0);
+
+  // Create a temporary descriptor set layout list to retain the layouts while
+  // creating pipeline layouts. The created pipeline layouts will retain the
+  // descriptor set layouts for as long as they are live even once we free the
+  // list below.
+  iree_host_size_t descriptor_set_layout_count = 0;
+  iree_hal_vulkan_descriptor_set_layout_t** descriptor_set_layouts = NULL;
+  IREE_RETURN_AND_END_ZONE_IF_ERROR(
+      z0, iree_hal_vulkan_create_descriptor_set_layouts(
+              logical_device, descriptor_set_layouts_vec,
+              &descriptor_set_layout_count, &descriptor_set_layouts));
+
+  iree_host_size_t pipeline_layout_count =
+      iree_hal_vulkan_PipelineLayoutDef_vec_len(pipeline_layouts_vec);
+  iree_hal_vulkan_pipeline_layout_t** pipeline_layouts = NULL;
+  iree_status_t status =
+      iree_allocator_malloc(logical_device->host_allocator(),
+                            pipeline_layout_count * sizeof(pipeline_layouts[0]),
+                            (void**)&pipeline_layouts);
+
+  if (iree_status_is_ok(status)) {
+    for (iree_host_size_t i = 0; i < pipeline_layout_count; ++i) {
+      iree_hal_vulkan_PipelineLayoutDef_table_t pipeline_layout_def =
+          iree_hal_vulkan_PipelineLayoutDef_vec_at(pipeline_layouts_vec, i);
+      status = iree_hal_vulkan_create_pipeline_layout(
+          logical_device, descriptor_set_layout_count, descriptor_set_layouts,
+          pipeline_layout_def, &pipeline_layouts[i]);
+      if (!iree_status_is_ok(status)) {
+        status =
+            iree_status_annotate_f(status, "pipeline_layouts[%" PRIhsz "]", i);
+        break;
+      }
+    }
+  }
+
+  // Release temporary descriptor set layouts; pipeline layouts retain them as
+  // needed.
+  iree_hal_vulkan_release_descriptor_set_layouts(
+      logical_device, descriptor_set_layout_count, descriptor_set_layouts);
+
+  if (iree_status_is_ok(status)) {
+    *out_pipeline_layout_count = pipeline_layout_count;
+    *out_pipeline_layouts = pipeline_layouts;
+  } else {
+    iree_hal_vulkan_release_pipeline_layouts(
+        logical_device, pipeline_layout_count, pipeline_layouts);
+  }
+  IREE_TRACE_ZONE_END(z0);
+  return status;
+}
+
+//===----------------------------------------------------------------------===//
+// Shader Modules
+//===----------------------------------------------------------------------===//
+
+static void iree_hal_vulkan_release_shader_modules(
+    VkDeviceHandle* logical_device, iree_host_size_t shader_module_count,
+    VkShaderModule* shader_modules) {
+  IREE_TRACE_ZONE_BEGIN(z0);
+  for (iree_host_size_t i = 0; i < shader_module_count; ++i) {
+    if (shader_modules[i] != VK_NULL_HANDLE) {
+      logical_device->syms()->vkDestroyShaderModule(
+          *logical_device, shader_modules[i], logical_device->allocator());
+    }
+  }
+  iree_allocator_free(logical_device->host_allocator(), shader_modules);
+  IREE_TRACE_ZONE_END(z0);
+}
+
+// Creates a VkShaderModule from the given flatbuffer definition.
+// This usually spends quite a bit of blocking time in the driver.
+static iree_status_t iree_hal_vulkan_create_shader_module(
+    VkDeviceHandle* logical_device,
+    iree_hal_vulkan_ShaderModuleDef_table_t shader_module_def,
+    VkShaderModule* out_shader_module) {
+  IREE_ASSERT_ARGUMENT(logical_device);
+  IREE_ASSERT_ARGUMENT(shader_module_def);
+  IREE_ASSERT_ARGUMENT(out_shader_module);
+  *out_shader_module = VK_NULL_HANDLE;
+  IREE_TRACE_ZONE_BEGIN(z0);
+
+  VkShaderModuleCreateInfo create_info;
+  create_info.sType = VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO;
+  create_info.pNext = NULL;
+  create_info.flags = 0;
+
+  flatbuffers_uint32_vec_t spirv_code_vec =
+      iree_hal_vulkan_ShaderModuleDef_spirv_code_get(shader_module_def);
+  create_info.codeSize =
+      flatbuffers_uint32_vec_len(spirv_code_vec) * sizeof(uint32_t);
+  create_info.pCode = (const uint32_t*)spirv_code_vec;
+
+  VkShaderModule shader_module = VK_NULL_HANDLE;
+  iree_status_t status =
+      VK_RESULT_TO_STATUS(logical_device->syms()->vkCreateShaderModule(
+                              *logical_device, &create_info,
+                              logical_device->allocator(), &shader_module),
+                          "vkCreateShaderModule");
+
+  if (iree_status_is_ok(status)) {
+    *out_shader_module = shader_module;
+  }
+  IREE_TRACE_ZONE_END(z0);
+  return status;
+}
+
+// Creates all shader modules specified and returns a temporary heap array with
+// them in the same order. Callers must use
+// iree_hal_vulkan_release_shader_modules when done with the array to release
+// the resources.
+static iree_status_t iree_hal_vulkan_create_shader_modules(
+    VkDeviceHandle* logical_device,
+    iree_hal_vulkan_ShaderModuleDef_vec_t shader_modules_vec,
+    iree_host_size_t* out_shader_module_count,
+    VkShaderModule** out_shader_modules) {
+  IREE_ASSERT_ARGUMENT(logical_device);
+  IREE_ASSERT_ARGUMENT(shader_modules_vec);
+  IREE_ASSERT_ARGUMENT(out_shader_module_count);
+  IREE_ASSERT_ARGUMENT(out_shader_modules);
+  *out_shader_module_count = 0;
+  *out_shader_modules = NULL;
+  IREE_TRACE_ZONE_BEGIN(z0);
+
+  iree_host_size_t shader_module_count =
+      iree_hal_vulkan_ShaderModuleDef_vec_len(shader_modules_vec);
+  VkShaderModule* shader_modules = NULL;
+  IREE_RETURN_AND_END_ZONE_IF_ERROR(
+      z0, iree_allocator_malloc(logical_device->host_allocator(),
+                                shader_module_count * sizeof(shader_modules[0]),
+                                (void**)&shader_modules));
+
+  iree_status_t status = iree_ok_status();
+  for (iree_host_size_t i = 0; i < shader_module_count; ++i) {
+    iree_hal_vulkan_ShaderModuleDef_table_t shader_module_def =
+        iree_hal_vulkan_ShaderModuleDef_vec_at(shader_modules_vec, i);
+    status = iree_hal_vulkan_create_shader_module(
+        logical_device, shader_module_def, &shader_modules[i]);
+    if (!iree_status_is_ok(status)) {
+      status = iree_status_annotate_f(status, "shader_modules[%" PRIhsz "]", i);
+      break;
+    }
+  }
+
+  if (iree_status_is_ok(status)) {
+    *out_shader_module_count = shader_module_count;
+    *out_shader_modules = shader_modules;
+  } else {
+    iree_hal_vulkan_release_shader_modules(logical_device, shader_module_count,
+                                           shader_modules);
+  }
+  IREE_TRACE_ZONE_END(z0);
+  return status;
+}
+
+//===----------------------------------------------------------------------===//
+// Pipelines
+//===----------------------------------------------------------------------===//
+
+// Creates a pipeline from the set of available pipeline layouts and shader
+// modules and stores it into |out_pipeline|.
+//
+// NOTE: vkCreateComputePipelines takes multiple pipelines but doesn't speed up
+// creation on any known driver; we process one at a time so that we can get
+// better error messages and multithread the pipeline creation ourselves.
+static iree_status_t iree_hal_vulkan_create_pipeline(
+    VkDeviceHandle* logical_device, VkPipelineCache pipeline_cache,
+    const iree_hal_executable_params_t* executable_params,
+    const VkSpecializationInfo* specialization_info,
+    iree_hal_vulkan_pipeline_layout_t** pipeline_layouts,
+    VkShaderModule* shader_modules,
+    iree_hal_vulkan_PipelineDef_table_t pipeline_def,
+    iree_hal_vulkan_pipeline_t* out_pipeline) {
+  IREE_ASSERT_ARGUMENT(logical_device);
+  IREE_ASSERT_ARGUMENT(pipeline_layouts);
+  IREE_ASSERT_ARGUMENT(shader_modules);
+  IREE_ASSERT_ARGUMENT(out_pipeline);
+  IREE_TRACE_ZONE_BEGIN(z0);
+
+  flatbuffers_string_t entry_point =
+      iree_hal_vulkan_PipelineDef_entry_point_get(pipeline_def);
+  IREE_TRACE_ZONE_APPEND_TEXT(z0, entry_point);
+
+  uint32_t shader_module_ordinal =
+      iree_hal_vulkan_PipelineDef_shader_module_ordinal_get(pipeline_def);
+  VkShaderModule shader_module = shader_modules[shader_module_ordinal];
+  uint32_t pipeline_layout_ordinal =
+      iree_hal_vulkan_PipelineDef_pipeline_layout_ordinal_get(pipeline_def);
+  iree_hal_vulkan_pipeline_layout_t* pipeline_layout =
+      pipeline_layouts[pipeline_layout_ordinal];
+
+  VkComputePipelineCreateInfo create_info;
+  memset(&create_info, 0, sizeof(create_info));
+  create_info.sType = VK_STRUCTURE_TYPE_COMPUTE_PIPELINE_CREATE_INFO;
+  create_info.pNext = NULL;
+  create_info.flags = 0;
+  if (!iree_all_bits_set(executable_params->caching_mode,
+                         IREE_HAL_EXECUTABLE_CACHING_MODE_ALLOW_OPTIMIZATION)) {
+    create_info.flags |= VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT;
+  }
+  create_info.layout = iree_hal_vulkan_pipeline_layout_handle(pipeline_layout);
+  create_info.basePipelineHandle = VK_NULL_HANDLE;
+  create_info.basePipelineIndex = 0;
+
+  VkPipelineShaderStageCreateInfo* stage_create_info = &create_info.stage;
+  stage_create_info->sType =
+      VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO;
+  stage_create_info->pNext = NULL;
+  stage_create_info->flags = 0;
+  stage_create_info->stage = VK_SHADER_STAGE_COMPUTE_BIT;
+  stage_create_info->module = shader_module;
+  stage_create_info->pName = entry_point;
+  stage_create_info->pSpecializationInfo = specialization_info;
+
+  // If subgroup size is not 0, request the said subgroup size via
+  // VK_EXT_subgroup_size_control (promoted to core since v1.3).
+  VkPipelineShaderStageRequiredSubgroupSizeCreateInfo subgroup_size_info;
+  memset(&subgroup_size_info, 0, sizeof(subgroup_size_info));
+  if (iree_hal_vulkan_PipelineDef_subgroup_size_is_present(pipeline_def)) {
+    if (uint32_t subgroup_size =
+            iree_hal_vulkan_PipelineDef_subgroup_size(pipeline_def)) {
+      subgroup_size_info.sType =
+          VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_REQUIRED_SUBGROUP_SIZE_CREATE_INFO;
+      subgroup_size_info.pNext = NULL;
+      subgroup_size_info.requiredSubgroupSize = subgroup_size;
+      stage_create_info->pNext = &subgroup_size_info;
+    }
+  }
+
+  // Create the pipeline. This may fail if the shader module or pipeline are
+  // invalid or the pipeline layout does not match expectations.
+  iree_status_t status = VK_RESULT_TO_STATUS(
+      logical_device->syms()->vkCreateComputePipelines(
+          *logical_device, pipeline_cache, 1, &create_info,
+          logical_device->allocator(), &out_pipeline->handle),
+      "vkCreateComputePipelines");
+
+  // Retain the pipeline layout for as long as the pipeline is live.
+  out_pipeline->layout = pipeline_layout;
+  iree_hal_vulkan_pipeline_layout_retain(out_pipeline->layout);
+
+  // Set pipeline name for tooling.
+  if (iree_status_is_ok(status)) {
+    if (PFN_vkSetDebugUtilsObjectNameEXT set_name =
+            logical_device->syms()->vkSetDebugUtilsObjectNameEXT) {
+      VkDebugUtilsObjectNameInfoEXT name_info = {};
+      name_info.sType = VK_STRUCTURE_TYPE_DEBUG_UTILS_OBJECT_NAME_INFO_EXT;
+      name_info.pNext = NULL;
+      name_info.objectHandle = (uint64_t)out_pipeline->handle;
+      name_info.objectType = VK_OBJECT_TYPE_PIPELINE;
+      name_info.pObjectName =
+          iree_hal_vulkan_PipelineDef_entry_point_get(pipeline_def);
+      set_name(*logical_device, &name_info);
+    }
+  }
+
+  IREE_TRACE_ZONE_END(z0);
+  return status;
+}
+
+static void iree_hal_vulkan_destroy_pipeline(
+    VkDeviceHandle* logical_device, iree_hal_vulkan_pipeline_t* pipeline) {
+  IREE_TRACE_ZONE_BEGIN(z0);
+  if (pipeline->handle != VK_NULL_HANDLE) {
+    logical_device->syms()->vkDestroyPipeline(*logical_device, pipeline->handle,
+                                              logical_device->allocator());
+  }
+  iree_hal_vulkan_pipeline_layout_release(pipeline->layout);
+  IREE_TRACE_ZONE_END(z0);
+}
+
+// Creates all pipelines in the flatbuffer and stores them directly into
+// the caller-allocated |pipelines| array. Upon failure the caller is
+// responsible for releasing partially initialized pipelines.
+//
+// NOTE: this function is designed as a top-level flatbuffer->VkPipeline[] entry
+// point for future multi-threaded pipeline creation. Today we do everything
+// serially but could farm out to an iree_loop_t.
+static iree_status_t iree_hal_vulkan_create_pipelines(
+    VkDeviceHandle* logical_device, VkPipelineCache pipeline_cache,
+    const iree_hal_executable_params_t* executable_params,
+    iree_hal_vulkan_DescriptorSetLayoutDef_vec_t descriptor_set_layouts_vec,
+    iree_hal_vulkan_PipelineLayoutDef_vec_t pipeline_layouts_vec,
+    iree_hal_vulkan_ShaderModuleDef_vec_t shader_modules_vec,
+    iree_hal_vulkan_PipelineDef_vec_t pipelines_vec,
+    iree_hal_vulkan_pipeline_t* pipelines) {
+  IREE_ASSERT_ARGUMENT(logical_device);
+  IREE_ASSERT_ARGUMENT(descriptor_set_layouts_vec);
+  IREE_ASSERT_ARGUMENT(pipeline_layouts_vec);
+  IREE_ASSERT_ARGUMENT(shader_modules_vec);
+  IREE_ASSERT_ARGUMENT(pipelines_vec);
+  IREE_ASSERT_ARGUMENT(pipelines);
+  IREE_TRACE_ZONE_BEGIN(z0);
+
+  // Create a temporary pipeline layout list to retain the layouts while
+  // creating pipelines. The created pipelines will retain the pipeline layouts
+  // (and transitively the descriptor set layouts) for as long as they are live
+  // even once we free the list below. This is usually a much smaller set than
+  // the total number of pipelines (~5-10 for 1000 pipelines) so we split this
+  // from the pipeline creation.
+  iree_host_size_t pipeline_layout_count = 0;
+  iree_hal_vulkan_pipeline_layout_t** pipeline_layouts = NULL;
+  IREE_RETURN_AND_END_ZONE_IF_ERROR(
+      z0, iree_hal_vulkan_create_pipeline_layouts(
+              logical_device, descriptor_set_layouts_vec, pipeline_layouts_vec,
+              &pipeline_layout_count, &pipeline_layouts));
+
+  // Create all shader modules used by pipelines into a temporary array.
+  // The shader modules are only required during pipeline creation and are then
+  // discarded.
+  iree_host_size_t shader_module_count = 0;
+  VkShaderModule* shader_modules = NULL;
+  iree_status_t status = iree_hal_vulkan_create_shader_modules(
+      logical_device, shader_modules_vec, &shader_module_count,
+      &shader_modules);
+
+  // Prepare specialization entries used across all pipelines.
+  VkSpecializationMapEntry* specialization_map_entries = NULL;
+  VkSpecializationInfo specialization_info;
+  memset(&specialization_info, 0, sizeof(specialization_info));
+  if (iree_status_is_ok(status) && executable_params->constant_count) {
+    status = iree_allocator_malloc(logical_device->host_allocator(),
+                                   executable_params->constant_count *
+                                       sizeof(specialization_map_entries[0]),
+                                   (void**)&specialization_map_entries);
+  }
+  if (iree_status_is_ok(status)) {
+    specialization_info.mapEntryCount = executable_params->constant_count;
+    specialization_info.pMapEntries = specialization_map_entries;
+    specialization_info.dataSize =
+        executable_params->constant_count * sizeof(uint32_t);
+    specialization_info.pData = executable_params->constants;
+    for (iree_host_size_t i = 0; i < executable_params->constant_count; ++i) {
+      specialization_map_entries[i].constantID = i;
+      specialization_map_entries[i].offset = i * sizeof(uint32_t);
+      specialization_map_entries[i].size = sizeof(uint32_t);
+    }
+  }
+
+  // Create pipelines in-place in the output storage using the temporary
+  // pipeline layouts array. The pipeline layouts will be retained as needed.
+  if (iree_status_is_ok(status)) {
+    for (iree_host_size_t i = 0;
+         i < iree_hal_vulkan_PipelineDef_vec_len(pipelines_vec); ++i) {
+      iree_hal_vulkan_PipelineDef_table_t pipeline_def =
+          iree_hal_vulkan_PipelineDef_vec_at(pipelines_vec, i);
+      status = iree_hal_vulkan_create_pipeline(
+          logical_device, pipeline_cache, executable_params,
+          &specialization_info, pipeline_layouts, shader_modules, pipeline_def,
+          &pipelines[i]);
+      if (!iree_status_is_ok(status)) {
+        status = iree_status_annotate_f(status, "pipelines[%" PRIhsz "]", i);
+        break;
+      }
+    }
+  }
+
+  iree_allocator_free(logical_device->host_allocator(),
+                      specialization_map_entries);
+  iree_hal_vulkan_release_shader_modules(logical_device, shader_module_count,
+                                         shader_modules);
+  iree_hal_vulkan_release_pipeline_layouts(
+      logical_device, pipeline_layout_count, pipeline_layouts);
+
+  IREE_TRACE_ZONE_END(z0);
+  return status;
+}
+
+//===----------------------------------------------------------------------===//
+// iree_hal_vulkan_native_executable_t
+//===----------------------------------------------------------------------===//
+
 typedef struct iree_hal_vulkan_native_executable_t {
   iree_hal_resource_t resource;
   VkDeviceHandle* logical_device;
-  iree_host_size_t entry_point_count;
-  iree_hal_vulkan_entry_point_t entry_points[];
+  iree_host_size_t pipeline_count;
+  iree_hal_vulkan_pipeline_t pipelines[];
 } iree_hal_vulkan_native_executable_t;
 
 namespace {
@@ -336,126 +898,82 @@
 
   // Verify and fetch the executable FlatBuffer wrapper.
   IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_hal_spirv_executable_flatbuffer_verify(
-              executable_params->executable_data,
-              executable_params->pipeline_layout_count));
-  iree_hal_spirv_ExecutableDef_table_t executable_def =
-      iree_hal_spirv_ExecutableDef_as_root(
+      z0, iree_hal_vulkan_executable_flatbuffer_verify(
+              &logical_device->supported_properties(),
+              executable_params->executable_data));
+  iree_hal_vulkan_ExecutableDef_table_t executable_def =
+      iree_hal_vulkan_ExecutableDef_as_root(
           executable_params->executable_data.data);
 
-  // Allocate space for Vulkan shader module handles.
-  iree_hal_spirv_ShaderModuleDef_vec_t shader_modules_vec =
-      iree_hal_spirv_ExecutableDef_shader_modules_get(executable_def);
-  size_t shader_module_count = flatbuffers_vec_len(shader_modules_vec);
-  VkShaderModule* shader_modules = NULL;
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_allocator_malloc(host_allocator,
-                                shader_module_count * sizeof(VkShaderModule),
-                                (void**)&shader_modules));
+  iree_hal_vulkan_PipelineDef_vec_t pipelines_vec =
+      iree_hal_vulkan_ExecutableDef_pipelines_get(executable_def);
+  iree_host_size_t pipeline_count =
+      iree_hal_vulkan_PipelineDef_vec_len(pipelines_vec);
 
-  // Create all shader modules.
-  // TODO: perform the shader module creation in multiple threaded manner.
-  iree_status_t status = iree_ok_status();
-  for (size_t i = 0; i < shader_module_count; ++i) {
-    iree_hal_spirv_ShaderModuleDef_table_t shader_module =
-        iree_hal_spirv_ShaderModuleDef_vec_at(shader_modules_vec, i);
-    flatbuffers_uint32_vec_t code_vec =
-        iree_hal_spirv_ShaderModuleDef_code_get(shader_module);
-    size_t code_size = flatbuffers_uint32_vec_len(code_vec) * sizeof(uint32_t);
-    status = iree_hal_vulkan_create_shader_module(
-        logical_device, iree_make_const_byte_span(code_vec, code_size),
-        &shader_modules[i]);
-    if (!iree_status_is_ok(status)) break;
-  }
-
-  // Create pipelines for each entry point.
-  flatbuffers_string_vec_t entry_points_vec =
-      iree_hal_spirv_ExecutableDef_entry_points_get(executable_def);
-  iree_host_size_t entry_point_count =
-      flatbuffers_string_vec_len(entry_points_vec);
+  // Calculate the total number of characters across all entry point names. This
+  // is only required when tracing so that we can store copies of the names as
+  // the flatbuffer storing the strings may be released while the executable is
+  // still live.
+  iree_host_size_t total_export_info_length = 0;
+  IREE_TRACE({
+    for (iree_host_size_t i = 0; i < pipeline_count; ++i) {
+      iree_hal_vulkan_PipelineDef_table_t pipeline_def =
+          iree_hal_vulkan_PipelineDef_vec_at(pipelines_vec, i);
+      total_export_info_length += iree_hal_debug_calculate_export_info_size(
+          iree_hal_vulkan_PipelineDef_debug_info_get(pipeline_def));
+    }
+  });
 
   iree_hal_vulkan_native_executable_t* executable = NULL;
-  if (iree_status_is_ok(status)) {
-    status = iree_allocator_malloc(
-        host_allocator,
-        sizeof(*executable) +
-            entry_point_count * sizeof(*executable->entry_points),
-        (void**)&executable);
-  }
-  if (iree_status_is_ok(status)) {
-    iree_hal_resource_initialize(&iree_hal_vulkan_native_executable_vtable,
-                                 &executable->resource);
-    executable->logical_device = logical_device;
-    executable->entry_point_count = entry_point_count;
-    memset(executable->entry_points, 0,
-           entry_point_count * sizeof(*executable->entry_points));
-  }
-  if (iree_status_is_ok(status)) {
-    status = iree_hal_vulkan_create_pipelines(
-        logical_device, pipeline_cache, executable_params, executable_def,
-        shader_modules, executable->entry_point_count,
-        executable->entry_points);
-  }
-  // Pipelines are created and we don't need the shader modules anymore.
-  // Note that if error happens before, we also destroy the shader modules here.
-  for (size_t i = 0; i < shader_module_count; ++i) {
-    iree_hal_vulkan_destroy_shader_module(logical_device, shader_modules[i]);
-  }
+  const iree_host_size_t total_size =
+      sizeof(*executable) + pipeline_count * sizeof(executable->pipelines[0]) +
+      total_export_info_length;
+  IREE_RETURN_AND_END_ZONE_IF_ERROR(
+      z0,
+      iree_allocator_malloc(host_allocator, total_size, (void**)&executable));
+  iree_hal_resource_initialize(&iree_hal_vulkan_native_executable_vtable,
+                               &executable->resource);
+  executable->logical_device = logical_device;
+  executable->pipeline_count = pipeline_count;
+  memset(executable->pipelines, 0,
+         pipeline_count * sizeof(executable->pipelines[0]));
 
-  if (iree_status_is_ok(status)) {
-    flatbuffers_string_vec_t entry_points_vec =
-        iree_hal_spirv_ExecutableDef_entry_points_get(executable_def);
-    for (iree_host_size_t i = 0; i < entry_point_count; ++i) {
-      flatbuffers_string_t name =
-          flatbuffers_string_vec_at(entry_points_vec, i);
-      executable->entry_points[i].name =
-          iree_make_string_view(name, flatbuffers_string_len(name));
-      IREE_TRACE_ZONE_APPEND_TEXT(z0, name);
-    }
-  }
+  // Publish any embedded source files to the tracing infrastructure.
+  iree_hal_debug_publish_source_files(
+      iree_hal_vulkan_ExecutableDef_source_files_get(executable_def));
 
-#if IREE_TRACING_FEATURES & IREE_TRACING_FEATURE_INSTRUMENTATION
-  if (iree_status_is_ok(status)) {
-    if (iree_hal_spirv_ExecutableDef_source_locations_is_present(
-            executable_def)) {
-      iree_hal_spirv_FileLineLocDef_vec_t source_locations_vec =
-          iree_hal_spirv_ExecutableDef_source_locations_get(executable_def);
-      for (iree_host_size_t i = 0; i < entry_point_count; ++i) {
-        executable->entry_points[i].source_location =
-            iree_hal_spirv_FileLineLocDef_vec_at(source_locations_vec, i);
-      }
-    }
-    if (iree_hal_spirv_ExecutableDef_stage_locations_is_present(
-            executable_def)) {
-      iree_hal_spirv_StageLocationsDef_vec_t stage_locations_vec =
-          iree_hal_spirv_ExecutableDef_stage_locations_get(executable_def);
-      for (iree_host_size_t i = 0; i < entry_point_count; ++i) {
-        iree_hal_spirv_StageLocationsDef_table_t stage_locations =
-            iree_hal_spirv_StageLocationsDef_vec_at(stage_locations_vec, i);
-        executable->entry_points[i].stage_locations =
-            iree_hal_spirv_StageLocationsDef_locations_get(stage_locations);
-      }
-    }
+  // Create one pipeline per exported function.
+  iree_hal_vulkan_DescriptorSetLayoutDef_vec_t descriptor_set_layouts_vec =
+      iree_hal_vulkan_ExecutableDef_descriptor_set_layouts_get(executable_def);
+  iree_hal_vulkan_PipelineLayoutDef_vec_t pipeline_layouts_vec =
+      iree_hal_vulkan_ExecutableDef_pipeline_layouts_get(executable_def);
+  iree_hal_vulkan_ShaderModuleDef_vec_t shader_modules_vec =
+      iree_hal_vulkan_ExecutableDef_shader_modules_get(executable_def);
+  iree_status_t status = iree_hal_vulkan_create_pipelines(
+      logical_device, pipeline_cache, executable_params,
+      descriptor_set_layouts_vec, pipeline_layouts_vec, shader_modules_vec,
+      pipelines_vec, executable->pipelines);
 
-    // Publish any embedded source files to the tracing infrastructure.
-    if (iree_hal_spirv_ExecutableDef_source_files_is_present(executable_def)) {
-      iree_hal_spirv_SourceFileDef_vec_t source_files_vec =
-          iree_hal_spirv_ExecutableDef_source_files_get(executable_def);
-      for (iree_host_size_t i = 0;
-           i < iree_hal_spirv_SourceFileDef_vec_len(source_files_vec); ++i) {
-        iree_hal_spirv_SourceFileDef_table_t source_file =
-            iree_hal_spirv_SourceFileDef_vec_at(source_files_vec, i);
-        flatbuffers_string_t path =
-            iree_hal_spirv_SourceFileDef_path_get(source_file);
-        flatbuffers_uint8_vec_t content =
-            iree_hal_spirv_SourceFileDef_content_get(source_file);
-        IREE_TRACE_PUBLISH_SOURCE_FILE(path, flatbuffers_string_len(path),
-                                       content,
-                                       flatbuffers_uint8_vec_len(content));
+  // Populate tracing info for each pipeline.
+  if (iree_status_is_ok(status)) {
+    IREE_TRACE({
+      iree_hal_debug_export_info_t* export_infos =
+          (iree_hal_debug_export_info_t*)((uint8_t*)executable->pipelines +
+                                          pipeline_count *
+                                              sizeof(executable->pipelines[0]));
+      for (iree_host_size_t i = 0; i < pipeline_count; ++i) {
+        iree_hal_vulkan_PipelineDef_table_t pipeline_def =
+            iree_hal_vulkan_PipelineDef_vec_at(pipelines_vec, i);
+        iree_hal_vulkan_pipeline_t* pipeline = &executable->pipelines[i];
+        iree_hal_debug_copy_export_info(
+            iree_hal_vulkan_PipelineDef_debug_info_get(pipeline_def),
+            &export_infos[i]);
+        pipeline->source_location.file_name = export_infos[i].source_filename;
+        pipeline->source_location.line = export_infos[i].source_line;
+        pipeline->source_location.func_name = export_infos[i].function_name;
       }
-    }
+    });
   }
-#endif  // IREE_TRACING_FEATURES & IREE_TRACING_FEATURE_INSTRUMENTATION
 
   if (iree_status_is_ok(status)) {
     *out_executable = (iree_hal_executable_t*)executable;
@@ -475,79 +993,25 @@
       executable->logical_device->host_allocator();
   IREE_TRACE_ZONE_BEGIN(z0);
 
-  for (iree_host_size_t i = 0; i < executable->entry_point_count; ++i) {
+  for (iree_host_size_t i = 0; i < executable->pipeline_count; ++i) {
     iree_hal_vulkan_destroy_pipeline(executable->logical_device,
-                                     executable->entry_points[i].pipeline);
-    iree_hal_pipeline_layout_release(executable->entry_points[i].layout);
+                                     &executable->pipelines[i]);
   }
   iree_allocator_free(host_allocator, executable);
 
   IREE_TRACE_ZONE_END(z0);
 }
 
-void iree_hal_vulkan_native_executable_entry_point_source_location(
-    iree_hal_executable_t* base_executable, iree_host_size_t entry_ordinal,
-    iree_hal_vulkan_source_location_t* out_source_location) {
+iree_status_t iree_hal_vulkan_native_executable_lookup_pipeline(
+    iree_hal_executable_t* base_executable, uint32_t entry_ordinal,
+    const iree_hal_vulkan_pipeline_t** out_pipeline) {
   iree_hal_vulkan_native_executable_t* executable =
       iree_hal_vulkan_native_executable_cast(base_executable);
-  memset(out_source_location, 0, sizeof(*out_source_location));
-  if (entry_ordinal >= executable->entry_point_count) {
-    return;
-  }
-  const iree_hal_vulkan_entry_point_t* entry_point =
-      &executable->entry_points[entry_ordinal];
-
-  out_source_location->func_name = entry_point->name;
-
-#if IREE_TRACING_FEATURES & IREE_TRACING_FEATURE_INSTRUMENTATION
-  iree_hal_spirv_FileLineLocDef_table_t source_location =
-      entry_point->source_location;
-  if (entry_point->stage_locations) {
-    for (size_t i = 0; i < iree_hal_spirv_StageLocationDef_vec_len(
-                               entry_point->stage_locations);
-         ++i) {
-      iree_hal_spirv_StageLocationDef_table_t stage_location =
-          iree_hal_spirv_StageLocationDef_vec_at(entry_point->stage_locations,
-                                                 i);
-      // TODO(benvanik): a way to select what location is chosen. For now we
-      // just pick the first one.
-      source_location =
-          iree_hal_spirv_StageLocationDef_location_get(stage_location);
-      break;
-    }
-  }
-  if (source_location) {
-    flatbuffers_string_t filename =
-        iree_hal_spirv_FileLineLocDef_filename_get(source_location);
-    out_source_location->file_name =
-        iree_make_string_view(filename, flatbuffers_string_len(filename));
-    out_source_location->line =
-        iree_hal_spirv_FileLineLocDef_line_get(source_location);
-  } else {
-    out_source_location->file_name = out_source_location->func_name;
-    out_source_location->line = 0;
-  }
-#else
-  out_source_location->file_name = out_source_location->func_name;
-  out_source_location->line = 0;
-#endif  // IREE_TRACING_FEATURES & IREE_TRACING_FEATURE_INSTRUMENTATION
-}
-
-iree_status_t iree_hal_vulkan_native_executable_pipeline_for_entry_point(
-    iree_hal_executable_t* base_executable, iree_host_size_t entry_ordinal,
-    VkPipeline* out_pipeline_handle,
-    iree_hal_pipeline_layout_t** out_pipeline_layout) {
-  iree_hal_vulkan_native_executable_t* executable =
-      iree_hal_vulkan_native_executable_cast(base_executable);
-  if (entry_ordinal >= executable->entry_point_count) {
+  if (entry_ordinal >= executable->pipeline_count) {
     return iree_make_status(IREE_STATUS_OUT_OF_RANGE,
-                            "invalid entry point ordinal %" PRIhsz,
-                            entry_ordinal);
+                            "invalid entry point ordinal %u", entry_ordinal);
   }
-  *out_pipeline_handle = executable->entry_points[entry_ordinal].pipeline;
-  if (out_pipeline_layout) {
-    *out_pipeline_layout = executable->entry_points[entry_ordinal].layout;
-  }
+  *out_pipeline = &executable->pipelines[entry_ordinal];
   return iree_ok_status();
 }
 
diff --git a/runtime/src/iree/hal/drivers/vulkan/native_executable.h b/runtime/src/iree/hal/drivers/vulkan/native_executable.h
index 248db1d..071f4f7 100644
--- a/runtime/src/iree/hal/drivers/vulkan/native_executable.h
+++ b/runtime/src/iree/hal/drivers/vulkan/native_executable.h
@@ -14,6 +14,7 @@
 #include "iree/base/api.h"
 #include "iree/hal/api.h"
 #include "iree/hal/drivers/vulkan/handle_util.h"
+#include "iree/hal/drivers/vulkan/pipeline_layout.h"
 
 #ifdef __cplusplus
 extern "C" {
@@ -25,6 +26,12 @@
   iree_string_view_t func_name;
 } iree_hal_vulkan_source_location_t;
 
+typedef struct iree_hal_vulkan_pipeline_t {
+  VkPipeline handle;
+  iree_hal_vulkan_pipeline_layout_t* layout;
+  IREE_TRACE(iree_hal_vulkan_source_location_t source_location;)
+} iree_hal_vulkan_pipeline_t;
+
 // Creates a wrapper for one or more VkPipelines that are sourced from the same
 // IREE executable. Each of the pipelines will share the same shader module
 // and just differs by the entry point into the shader module they reference.
@@ -34,17 +41,10 @@
     const iree_hal_executable_params_t* executable_params,
     iree_hal_executable_t** out_executable);
 
-// Returns the source location for the given entry point. May be empty if not
-// available.
-void iree_hal_vulkan_native_executable_entry_point_source_location(
-    iree_hal_executable_t* executable, iree_host_size_t entry_ordinal,
-    iree_hal_vulkan_source_location_t* out_source_location);
-
-// Returns the cached VkPipeline for the given executable |entry_ordinal|.
-iree_status_t iree_hal_vulkan_native_executable_pipeline_for_entry_point(
-    iree_hal_executable_t* executable, iree_host_size_t entry_ordinal,
-    VkPipeline* out_pipeline_handle,
-    iree_hal_pipeline_layout_t** out_pipeline_layout);
+// Returns the pipeline for the given |entry_point| in the |executable|.
+iree_status_t iree_hal_vulkan_native_executable_lookup_pipeline(
+    iree_hal_executable_t* executable, uint32_t entry_ordinal,
+    const iree_hal_vulkan_pipeline_t** out_pipeline);
 
 #ifdef __cplusplus
 }  // extern "C"
diff --git a/runtime/src/iree/hal/drivers/vulkan/native_pipeline_layout.cc b/runtime/src/iree/hal/drivers/vulkan/native_pipeline_layout.cc
deleted file mode 100644
index 958376e..0000000
--- a/runtime/src/iree/hal/drivers/vulkan/native_pipeline_layout.cc
+++ /dev/null
@@ -1,327 +0,0 @@
-// Copyright 2020 The IREE Authors
-//
-// Licensed under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-
-#include "iree/hal/drivers/vulkan/native_pipeline_layout.h"
-
-#include <cstddef>
-#include <cstdint>
-
-#include "iree/base/api.h"
-#include "iree/hal/drivers/vulkan/dynamic_symbol_tables.h"
-#include "iree/hal/drivers/vulkan/dynamic_symbols.h"
-#include "iree/hal/drivers/vulkan/extensibility_util.h"
-#include "iree/hal/drivers/vulkan/status_util.h"
-#include "iree/hal/drivers/vulkan/util/ref_ptr.h"
-
-using namespace iree::hal::vulkan;
-
-//===----------------------------------------------------------------------===//
-// iree_hal_vulkan_native_descriptor_set_layout_t
-//===----------------------------------------------------------------------===//
-
-typedef struct iree_hal_vulkan_native_descriptor_set_layout_t {
-  iree_hal_resource_t resource;
-  VkDeviceHandle* logical_device;
-  VkDescriptorSetLayout handle;
-} iree_hal_vulkan_native_descriptor_set_layout_t;
-
-namespace {
-extern const iree_hal_descriptor_set_layout_vtable_t
-    iree_hal_vulkan_native_descriptor_set_layout_vtable;
-}  // namespace
-
-static iree_hal_vulkan_native_descriptor_set_layout_t*
-iree_hal_vulkan_native_descriptor_set_layout_cast(
-    iree_hal_descriptor_set_layout_t* base_value) {
-  IREE_HAL_ASSERT_TYPE(base_value,
-                       &iree_hal_vulkan_native_descriptor_set_layout_vtable);
-  return (iree_hal_vulkan_native_descriptor_set_layout_t*)base_value;
-}
-
-static iree_status_t iree_hal_vulkan_create_descriptor_set_layout(
-    VkDeviceHandle* logical_device,
-    iree_hal_descriptor_set_layout_flags_t flags,
-    iree_host_size_t binding_count,
-    const iree_hal_descriptor_set_layout_binding_t* bindings,
-    VkDescriptorSetLayout* out_handle) {
-  VkDescriptorSetLayoutCreateInfo create_info;
-  create_info.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO;
-  create_info.pNext = NULL;
-  create_info.flags = 0;
-
-  VkDescriptorSetLayoutBinding* native_bindings = NULL;
-  if (binding_count > 0) {
-    if (logical_device->enabled_extensions().push_descriptors) {
-      // Note that we can *only* use push descriptor sets if we set this create
-      // flag. If push descriptors aren't supported we emulate them with normal
-      // descriptors so it's fine to have kPushOnly without support.
-      // Also we only enable this when there are at least one binding in it.
-      // (We can have dummy descriptor sets without any bindings for builtin
-      // executables.)
-      create_info.flags |=
-          VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR;
-    }
-
-    // TODO(benvanik): avoid this allocation if possible (inline_array).
-    IREE_RETURN_IF_ERROR(iree_allocator_malloc(
-        logical_device->host_allocator(),
-        binding_count * sizeof(VkDescriptorSetLayoutBinding),
-        (void**)&native_bindings));
-    for (iree_host_size_t i = 0; i < binding_count; ++i) {
-      VkDescriptorSetLayoutBinding* native_binding = &native_bindings[i];
-      native_binding->binding = bindings[i].binding;
-      native_binding->descriptorType =
-          static_cast<VkDescriptorType>(bindings[i].type);
-      native_binding->descriptorCount = 1;
-      native_binding->stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;
-      native_binding->pImmutableSamplers = NULL;
-    }
-  }
-  create_info.bindingCount = (uint32_t)binding_count;
-  create_info.pBindings = native_bindings;
-
-  iree_status_t status =
-      VK_RESULT_TO_STATUS(logical_device->syms()->vkCreateDescriptorSetLayout(
-                              *logical_device, &create_info,
-                              logical_device->allocator(), out_handle),
-                          "vkCreateDescriptorSetLayout");
-
-  iree_allocator_free(logical_device->host_allocator(), native_bindings);
-  return status;
-}
-
-static void iree_hal_vulkan_destroy_descriptor_set_layout(
-    VkDeviceHandle* logical_device, VkDescriptorSetLayout handle) {
-  if (handle == VK_NULL_HANDLE) return;
-  logical_device->syms()->vkDestroyDescriptorSetLayout(
-      *logical_device, handle, logical_device->allocator());
-}
-
-iree_status_t iree_hal_vulkan_native_descriptor_set_layout_create(
-    VkDeviceHandle* logical_device,
-    iree_hal_descriptor_set_layout_flags_t flags,
-    iree_host_size_t binding_count,
-    const iree_hal_descriptor_set_layout_binding_t* bindings,
-    iree_hal_descriptor_set_layout_t** out_descriptor_set_layout) {
-  IREE_ASSERT_ARGUMENT(logical_device);
-  IREE_ASSERT_ARGUMENT(!binding_count || bindings);
-  IREE_ASSERT_ARGUMENT(out_descriptor_set_layout);
-  *out_descriptor_set_layout = NULL;
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  VkDescriptorSetLayout handle = VK_NULL_HANDLE;
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_hal_vulkan_create_descriptor_set_layout(
-              logical_device, flags, binding_count, bindings, &handle));
-
-  iree_hal_vulkan_native_descriptor_set_layout_t* descriptor_set_layout = NULL;
-  iree_status_t status = iree_allocator_malloc(logical_device->host_allocator(),
-                                               sizeof(*descriptor_set_layout),
-                                               (void**)&descriptor_set_layout);
-  if (iree_status_is_ok(status)) {
-    iree_hal_resource_initialize(
-        &iree_hal_vulkan_native_descriptor_set_layout_vtable,
-        &descriptor_set_layout->resource);
-    descriptor_set_layout->logical_device = logical_device;
-    descriptor_set_layout->handle = handle;
-    *out_descriptor_set_layout =
-        (iree_hal_descriptor_set_layout_t*)descriptor_set_layout;
-  } else {
-    iree_hal_vulkan_destroy_descriptor_set_layout(logical_device, handle);
-  }
-
-  IREE_TRACE_ZONE_END(z0);
-  return status;
-}
-
-static void iree_hal_vulkan_native_descriptor_set_layout_destroy(
-    iree_hal_descriptor_set_layout_t* base_descriptor_set_layout) {
-  iree_hal_vulkan_native_descriptor_set_layout_t* descriptor_set_layout =
-      iree_hal_vulkan_native_descriptor_set_layout_cast(
-          base_descriptor_set_layout);
-  iree_allocator_t host_allocator =
-      descriptor_set_layout->logical_device->host_allocator();
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  iree_hal_vulkan_destroy_descriptor_set_layout(
-      descriptor_set_layout->logical_device, descriptor_set_layout->handle);
-  iree_allocator_free(host_allocator, descriptor_set_layout);
-
-  IREE_TRACE_ZONE_END(z0);
-}
-
-VkDescriptorSetLayout iree_hal_vulkan_native_descriptor_set_layout_handle(
-    iree_hal_descriptor_set_layout_t* base_descriptor_set_layout) {
-  iree_hal_vulkan_native_descriptor_set_layout_t* descriptor_set_layout =
-      iree_hal_vulkan_native_descriptor_set_layout_cast(
-          base_descriptor_set_layout);
-  return descriptor_set_layout->handle;
-}
-
-namespace {
-const iree_hal_descriptor_set_layout_vtable_t
-    iree_hal_vulkan_native_descriptor_set_layout_vtable = {
-        /*.destroy=*/iree_hal_vulkan_native_descriptor_set_layout_destroy,
-};
-}  // namespace
-
-//===----------------------------------------------------------------------===//
-// iree_hal_vulkan_native_pipeline_layout_t
-//===----------------------------------------------------------------------===//
-
-typedef struct iree_hal_vulkan_native_pipeline_layout_t {
-  iree_hal_resource_t resource;
-  VkDeviceHandle* logical_device;
-  VkPipelineLayout handle;
-  iree_host_size_t set_layout_count;
-  iree_hal_descriptor_set_layout_t* set_layouts[];
-} iree_hal_vulkan_native_pipeline_layout_t;
-
-namespace {
-extern const iree_hal_pipeline_layout_vtable_t
-    iree_hal_vulkan_native_pipeline_layout_vtable;
-}  // namespace
-
-static iree_hal_vulkan_native_pipeline_layout_t*
-iree_hal_vulkan_native_pipeline_layout_cast(
-    iree_hal_pipeline_layout_t* base_value) {
-  IREE_HAL_ASSERT_TYPE(base_value,
-                       &iree_hal_vulkan_native_pipeline_layout_vtable);
-  return (iree_hal_vulkan_native_pipeline_layout_t*)base_value;
-}
-
-static iree_status_t iree_hal_vulkan_create_pipeline_layout(
-    iree::hal::vulkan::VkDeviceHandle* logical_device,
-    iree_host_size_t push_constant_count, iree_host_size_t set_layout_count,
-    iree_hal_descriptor_set_layout_t* const* set_layouts,
-    VkPipelineLayout* out_handle) {
-  VkDescriptorSetLayout* set_layout_handles =
-      (VkDescriptorSetLayout*)iree_alloca(set_layout_count *
-                                          sizeof(VkDescriptorSetLayout));
-  for (iree_host_size_t i = 0; i < set_layout_count; ++i) {
-    set_layout_handles[i] =
-        iree_hal_vulkan_native_descriptor_set_layout_handle(set_layouts[i]);
-  }
-
-  VkPushConstantRange push_constant_ranges[1];
-  push_constant_ranges[0].stageFlags = VK_SHADER_STAGE_COMPUTE_BIT;
-  push_constant_ranges[0].offset = 0;
-  push_constant_ranges[0].size =
-      (uint32_t)(push_constant_count * sizeof(uint32_t));
-
-  VkPipelineLayoutCreateInfo create_info;
-  create_info.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO;
-  create_info.pNext = nullptr;
-  create_info.flags = 0;
-  create_info.setLayoutCount = (uint32_t)set_layout_count;
-  create_info.pSetLayouts = set_layout_handles;
-  create_info.pushConstantRangeCount = push_constant_count > 0 ? 1 : 0;
-  create_info.pPushConstantRanges = push_constant_ranges;
-
-  return VK_RESULT_TO_STATUS(logical_device->syms()->vkCreatePipelineLayout(
-                                 *logical_device, &create_info,
-                                 logical_device->allocator(), out_handle),
-                             "vkCreatePipelineLayout");
-}
-
-static void iree_hal_vulkan_destroy_pipeline_layout(
-    VkDeviceHandle* logical_device, VkPipelineLayout handle) {
-  if (handle == VK_NULL_HANDLE) return;
-  logical_device->syms()->vkDestroyPipelineLayout(*logical_device, handle,
-                                                  logical_device->allocator());
-}
-
-iree_status_t iree_hal_vulkan_native_pipeline_layout_create(
-    iree::hal::vulkan::VkDeviceHandle* logical_device,
-    iree_host_size_t push_constant_count, iree_host_size_t set_layout_count,
-    iree_hal_descriptor_set_layout_t* const* set_layouts,
-    iree_hal_pipeline_layout_t** out_pipeline_layout) {
-  IREE_ASSERT_ARGUMENT(logical_device);
-  IREE_ASSERT_ARGUMENT(!set_layout_count || set_layouts);
-  IREE_ASSERT_ARGUMENT(out_pipeline_layout);
-  *out_pipeline_layout = NULL;
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  VkPipelineLayout handle = VK_NULL_HANDLE;
-  IREE_RETURN_AND_END_ZONE_IF_ERROR(
-      z0, iree_hal_vulkan_create_pipeline_layout(
-              logical_device, push_constant_count, set_layout_count,
-              set_layouts, &handle));
-
-  iree_hal_vulkan_native_pipeline_layout_t* pipeline_layout = NULL;
-  iree_host_size_t total_size =
-      sizeof(*pipeline_layout) +
-      set_layout_count * sizeof(*pipeline_layout->set_layouts);
-  iree_status_t status = iree_allocator_malloc(
-      logical_device->host_allocator(), total_size, (void**)&pipeline_layout);
-  if (iree_status_is_ok(status)) {
-    iree_hal_resource_initialize(&iree_hal_vulkan_native_pipeline_layout_vtable,
-                                 &pipeline_layout->resource);
-    pipeline_layout->logical_device = logical_device;
-    pipeline_layout->handle = handle;
-    pipeline_layout->set_layout_count = set_layout_count;
-    for (iree_host_size_t i = 0; i < set_layout_count; ++i) {
-      pipeline_layout->set_layouts[i] = set_layouts[i];
-      iree_hal_descriptor_set_layout_retain(set_layouts[i]);
-    }
-    *out_pipeline_layout = (iree_hal_pipeline_layout_t*)pipeline_layout;
-  } else {
-    iree_hal_vulkan_destroy_pipeline_layout(logical_device, handle);
-  }
-
-  IREE_TRACE_ZONE_END(z0);
-  return status;
-}
-
-static void iree_hal_vulkan_native_pipeline_layout_destroy(
-    iree_hal_pipeline_layout_t* base_pipeline_layout) {
-  iree_hal_vulkan_native_pipeline_layout_t* pipeline_layout =
-      iree_hal_vulkan_native_pipeline_layout_cast(base_pipeline_layout);
-  iree_allocator_t host_allocator =
-      pipeline_layout->logical_device->host_allocator();
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  iree_hal_vulkan_destroy_pipeline_layout(pipeline_layout->logical_device,
-                                          pipeline_layout->handle);
-  for (iree_host_size_t i = 0; i < pipeline_layout->set_layout_count; ++i) {
-    iree_hal_descriptor_set_layout_release(pipeline_layout->set_layouts[i]);
-  }
-  iree_allocator_free(host_allocator, pipeline_layout);
-
-  IREE_TRACE_ZONE_END(z0);
-}
-
-VkPipelineLayout iree_hal_vulkan_native_pipeline_layout_handle(
-    iree_hal_pipeline_layout_t* base_pipeline_layout) {
-  iree_hal_vulkan_native_pipeline_layout_t* pipeline_layout =
-      iree_hal_vulkan_native_pipeline_layout_cast(base_pipeline_layout);
-  return pipeline_layout->handle;
-}
-
-iree_host_size_t iree_hal_vulkan_native_pipeline_layout_set_count(
-    iree_hal_pipeline_layout_t* base_pipeline_layout) {
-  iree_hal_vulkan_native_pipeline_layout_t* pipeline_layout =
-      iree_hal_vulkan_native_pipeline_layout_cast(base_pipeline_layout);
-  return pipeline_layout->set_layout_count;
-}
-
-iree_hal_descriptor_set_layout_t* iree_hal_vulkan_native_pipeline_layout_set(
-    iree_hal_pipeline_layout_t* base_pipeline_layout,
-    iree_host_size_t set_index) {
-  iree_hal_vulkan_native_pipeline_layout_t* pipeline_layout =
-      iree_hal_vulkan_native_pipeline_layout_cast(base_pipeline_layout);
-  if (IREE_UNLIKELY(set_index >= pipeline_layout->set_layout_count)) {
-    return NULL;
-  }
-  return pipeline_layout->set_layouts[set_index];
-}
-
-namespace {
-const iree_hal_pipeline_layout_vtable_t
-    iree_hal_vulkan_native_pipeline_layout_vtable = {
-        /*.destroy=*/iree_hal_vulkan_native_pipeline_layout_destroy,
-};
-}  // namespace
diff --git a/runtime/src/iree/hal/drivers/vulkan/native_pipeline_layout.h b/runtime/src/iree/hal/drivers/vulkan/native_pipeline_layout.h
deleted file mode 100644
index 2c68059..0000000
--- a/runtime/src/iree/hal/drivers/vulkan/native_pipeline_layout.h
+++ /dev/null
@@ -1,66 +0,0 @@
-// Copyright 2020 The IREE Authors
-//
-// Licensed under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-
-#ifndef IREE_HAL_DRIVERS_VULKAN_NATIVE_PIPELINE_LAYOUT_H_
-#define IREE_HAL_DRIVERS_VULKAN_NATIVE_PIPELINE_LAYOUT_H_
-
-// clang-format off: must be included before all other headers.
-#include "iree/hal/drivers/vulkan/vulkan_headers.h"
-// clang-format on
-
-#include "iree/base/api.h"
-#include "iree/hal/api.h"
-#include "iree/hal/drivers/vulkan/handle_util.h"
-
-#ifdef __cplusplus
-extern "C" {
-#endif  // __cplusplus
-
-//===----------------------------------------------------------------------===//
-// iree_hal_vulkan_native_descriptor_set_layout_t
-//===----------------------------------------------------------------------===//
-
-// Creates a native Vulkan VkDescriptorSetLayout object.
-iree_status_t iree_hal_vulkan_native_descriptor_set_layout_create(
-    iree::hal::vulkan::VkDeviceHandle* logical_device,
-    iree_hal_descriptor_set_layout_flags_t flags,
-    iree_host_size_t binding_count,
-    const iree_hal_descriptor_set_layout_binding_t* bindings,
-    iree_hal_descriptor_set_layout_t** out_descriptor_set_layout);
-
-// Returns the native Vulkan VkDescriptorSetLayout handle.
-VkDescriptorSetLayout iree_hal_vulkan_native_descriptor_set_layout_handle(
-    iree_hal_descriptor_set_layout_t* base_descriptor_set_layout);
-
-//===----------------------------------------------------------------------===//
-// iree_hal_vulkan_native_pipeline_layout_t
-//===----------------------------------------------------------------------===//
-
-// Creates a VkPipelineLayout-based pipeline layout composed of one or more
-// descriptor set layouts.
-iree_status_t iree_hal_vulkan_native_pipeline_layout_create(
-    iree::hal::vulkan::VkDeviceHandle* logical_device,
-    iree_host_size_t push_constant_count, iree_host_size_t set_layout_count,
-    iree_hal_descriptor_set_layout_t* const* set_layouts,
-    iree_hal_pipeline_layout_t** out_pipeline_layout);
-
-// Returns the native VkPipelineLayout handle for the pipeline layout.
-VkPipelineLayout iree_hal_vulkan_native_pipeline_layout_handle(
-    iree_hal_pipeline_layout_t* pipeline_layout);
-
-// Returns the total number of descriptor sets within the layout.
-iree_host_size_t iree_hal_vulkan_native_pipeline_layout_set_count(
-    iree_hal_pipeline_layout_t* pipeline_layout);
-
-// Returns the descriptor set layout with the given |set_index|.
-iree_hal_descriptor_set_layout_t* iree_hal_vulkan_native_pipeline_layout_set(
-    iree_hal_pipeline_layout_t* pipeline_layout, iree_host_size_t set_index);
-
-#ifdef __cplusplus
-}  // extern "C"
-#endif  // __cplusplus
-
-#endif  // IREE_HAL_DRIVERS_VULKAN_NATIVE_PIPELINE_LAYOUT_H_
diff --git a/runtime/src/iree/hal/drivers/vulkan/pipeline_layout.cc b/runtime/src/iree/hal/drivers/vulkan/pipeline_layout.cc
new file mode 100644
index 0000000..efa0043
--- /dev/null
+++ b/runtime/src/iree/hal/drivers/vulkan/pipeline_layout.cc
@@ -0,0 +1,236 @@
+// Copyright 2020 The IREE Authors
+//
+// Licensed under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+
+#include "iree/hal/drivers/vulkan/pipeline_layout.h"
+
+#include <cstddef>
+#include <cstdint>
+
+#include "iree/base/api.h"
+#include "iree/hal/drivers/vulkan/dynamic_symbol_tables.h"
+#include "iree/hal/drivers/vulkan/dynamic_symbols.h"
+#include "iree/hal/drivers/vulkan/extensibility_util.h"
+#include "iree/hal/drivers/vulkan/status_util.h"
+#include "iree/hal/drivers/vulkan/util/ref_ptr.h"
+
+using namespace iree::hal::vulkan;
+
+//===----------------------------------------------------------------------===//
+// iree_hal_vulkan_descriptor_set_layout_t
+//===----------------------------------------------------------------------===//
+
+iree_status_t iree_hal_vulkan_descriptor_set_layout_create(
+    VkDeviceHandle* logical_device, VkDescriptorSetLayoutCreateFlags flags,
+    iree_host_size_t binding_count,
+    const VkDescriptorSetLayoutBinding* bindings,
+    iree_hal_vulkan_descriptor_set_layout_t** out_descriptor_set_layout) {
+  IREE_ASSERT_ARGUMENT(logical_device);
+  IREE_ASSERT_ARGUMENT(!binding_count || bindings);
+  IREE_ASSERT_ARGUMENT(out_descriptor_set_layout);
+  *out_descriptor_set_layout = NULL;
+  IREE_TRACE_ZONE_BEGIN(z0);
+
+  iree_hal_vulkan_descriptor_set_layout_t* descriptor_set_layout = NULL;
+  IREE_RETURN_AND_END_ZONE_IF_ERROR(
+      z0, iree_allocator_malloc(logical_device->host_allocator(),
+                                sizeof(*descriptor_set_layout),
+                                (void**)&descriptor_set_layout));
+  iree_atomic_ref_count_init(&descriptor_set_layout->ref_count);
+  descriptor_set_layout->logical_device = logical_device;
+  descriptor_set_layout->handle = VK_NULL_HANDLE;
+
+  VkDescriptorSetLayoutCreateInfo create_info;
+  create_info.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO;
+  create_info.pNext = NULL;
+
+  create_info.flags = flags;
+  if (binding_count > 0) {
+    if (logical_device->enabled_extensions().push_descriptors) {
+      // Note that we can *only* use push descriptor sets if we set this create
+      // flag. If push descriptors aren't supported we emulate them with normal
+      // descriptors so it's fine to have kPushOnly without support.
+      // Also we only enable this when there are at least one binding in it.
+      // (We can have dummy descriptor sets without any bindings for builtin
+      // executables.)
+      create_info.flags |=
+          VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR;
+    }
+  }
+
+  create_info.bindingCount = (uint32_t)binding_count;
+  create_info.pBindings = bindings;
+
+  iree_status_t status = VK_RESULT_TO_STATUS(
+      logical_device->syms()->vkCreateDescriptorSetLayout(
+          *logical_device, &create_info, logical_device->allocator(),
+          &descriptor_set_layout->handle),
+      "vkCreateDescriptorSetLayout");
+
+  if (iree_status_is_ok(status)) {
+    *out_descriptor_set_layout = descriptor_set_layout;
+  } else {
+    iree_hal_vulkan_descriptor_set_layout_release(descriptor_set_layout);
+  }
+  IREE_TRACE_ZONE_END(z0);
+  return status;
+}
+
+static void iree_hal_vulkan_descriptor_set_layout_destroy(
+    iree_hal_vulkan_descriptor_set_layout_t* descriptor_set_layout) {
+  VkDeviceHandle* logical_device = descriptor_set_layout->logical_device;
+  iree_allocator_t host_allocator = logical_device->host_allocator();
+  IREE_TRACE_ZONE_BEGIN(z0);
+
+  if (descriptor_set_layout->handle != VK_NULL_HANDLE) {
+    logical_device->syms()->vkDestroyDescriptorSetLayout(
+        *logical_device, descriptor_set_layout->handle,
+        logical_device->allocator());
+  }
+
+  iree_allocator_free(host_allocator, descriptor_set_layout);
+
+  IREE_TRACE_ZONE_END(z0);
+}
+
+void iree_hal_vulkan_descriptor_set_layout_retain(
+    iree_hal_vulkan_descriptor_set_layout_t* descriptor_set_layout) {
+  if (descriptor_set_layout) {
+    iree_atomic_ref_count_inc(&descriptor_set_layout->ref_count);
+  }
+}
+
+void iree_hal_vulkan_descriptor_set_layout_release(
+    iree_hal_vulkan_descriptor_set_layout_t* descriptor_set_layout) {
+  if (descriptor_set_layout &&
+      iree_atomic_ref_count_dec(&descriptor_set_layout->ref_count) == 1) {
+    iree_hal_vulkan_descriptor_set_layout_destroy(descriptor_set_layout);
+  }
+}
+
+VkDescriptorSetLayout iree_hal_vulkan_descriptor_set_layout_handle(
+    iree_hal_vulkan_descriptor_set_layout_t* descriptor_set_layout) {
+  return descriptor_set_layout->handle;
+}
+
+//===----------------------------------------------------------------------===//
+// iree_hal_vulkan_pipeline_layout_t
+//===----------------------------------------------------------------------===//
+
+iree_status_t iree_hal_vulkan_pipeline_layout_create(
+    iree::hal::vulkan::VkDeviceHandle* logical_device,
+    iree_host_size_t push_constant_range_count,
+    const VkPushConstantRange* push_constant_ranges,
+    iree_host_size_t set_layout_count,
+    iree_hal_vulkan_descriptor_set_layout_t* const* set_layouts,
+    iree_hal_vulkan_pipeline_layout_t** out_pipeline_layout) {
+  IREE_ASSERT_ARGUMENT(logical_device);
+  IREE_ASSERT_ARGUMENT(!set_layout_count || set_layouts);
+  IREE_ASSERT_ARGUMENT(out_pipeline_layout);
+  *out_pipeline_layout = NULL;
+  IREE_TRACE_ZONE_BEGIN(z0);
+
+  iree_hal_vulkan_pipeline_layout_t* pipeline_layout = NULL;
+  const iree_host_size_t total_size =
+      sizeof(*pipeline_layout) +
+      set_layout_count * sizeof(*pipeline_layout->set_layouts);
+  IREE_RETURN_AND_END_ZONE_IF_ERROR(
+      z0, iree_allocator_malloc(logical_device->host_allocator(), total_size,
+                                (void**)&pipeline_layout));
+  iree_atomic_ref_count_init(&pipeline_layout->ref_count);
+  pipeline_layout->logical_device = logical_device;
+  pipeline_layout->handle = VK_NULL_HANDLE;
+  pipeline_layout->set_layout_count = set_layout_count;
+  for (iree_host_size_t i = 0; i < set_layout_count; ++i) {
+    pipeline_layout->set_layouts[i] = set_layouts[i];
+    iree_hal_vulkan_descriptor_set_layout_retain(set_layouts[i]);
+  }
+
+  VkDescriptorSetLayout* set_layout_handles =
+      (VkDescriptorSetLayout*)iree_alloca(set_layout_count *
+                                          sizeof(VkDescriptorSetLayout));
+  for (iree_host_size_t i = 0; i < set_layout_count; ++i) {
+    set_layout_handles[i] = set_layouts[i]->handle;
+  }
+
+  VkPipelineLayoutCreateInfo create_info;
+  create_info.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO;
+  create_info.pNext = nullptr;
+  create_info.flags = 0;
+  create_info.setLayoutCount = (uint32_t)set_layout_count;
+  create_info.pSetLayouts = set_layout_handles;
+  create_info.pushConstantRangeCount = (uint32_t)push_constant_range_count;
+  create_info.pPushConstantRanges = push_constant_ranges;
+
+  iree_status_t status = VK_RESULT_TO_STATUS(
+      logical_device->syms()->vkCreatePipelineLayout(
+          *logical_device, &create_info, logical_device->allocator(),
+          &pipeline_layout->handle),
+      "vkCreatePipelineLayout");
+
+  if (iree_status_is_ok(status)) {
+    *out_pipeline_layout = pipeline_layout;
+  } else {
+    iree_hal_vulkan_pipeline_layout_release(pipeline_layout);
+  }
+
+  IREE_TRACE_ZONE_END(z0);
+  return status;
+}
+
+static void iree_hal_vulkan_pipeline_layout_destroy(
+    iree_hal_vulkan_pipeline_layout_t* pipeline_layout) {
+  VkDeviceHandle* logical_device = pipeline_layout->logical_device;
+  iree_allocator_t host_allocator = logical_device->host_allocator();
+  IREE_TRACE_ZONE_BEGIN(z0);
+
+  if (pipeline_layout->handle != VK_NULL_HANDLE) {
+    logical_device->syms()->vkDestroyPipelineLayout(
+        *logical_device, pipeline_layout->handle, logical_device->allocator());
+  }
+
+  for (iree_host_size_t i = 0; i < pipeline_layout->set_layout_count; ++i) {
+    iree_hal_vulkan_descriptor_set_layout_release(
+        pipeline_layout->set_layouts[i]);
+  }
+
+  iree_allocator_free(host_allocator, pipeline_layout);
+
+  IREE_TRACE_ZONE_END(z0);
+}
+
+void iree_hal_vulkan_pipeline_layout_retain(
+    iree_hal_vulkan_pipeline_layout_t* pipeline_layout) {
+  if (pipeline_layout) {
+    iree_atomic_ref_count_inc(&pipeline_layout->ref_count);
+  }
+}
+
+void iree_hal_vulkan_pipeline_layout_release(
+    iree_hal_vulkan_pipeline_layout_t* pipeline_layout) {
+  if (pipeline_layout &&
+      iree_atomic_ref_count_dec(&pipeline_layout->ref_count) == 1) {
+    iree_hal_vulkan_pipeline_layout_destroy(pipeline_layout);
+  }
+}
+
+VkPipelineLayout iree_hal_vulkan_pipeline_layout_handle(
+    iree_hal_vulkan_pipeline_layout_t* pipeline_layout) {
+  return pipeline_layout->handle;
+}
+
+iree_host_size_t iree_hal_vulkan_pipeline_layout_set_count(
+    iree_hal_vulkan_pipeline_layout_t* pipeline_layout) {
+  return pipeline_layout->set_layout_count;
+}
+
+iree_hal_vulkan_descriptor_set_layout_t* iree_hal_vulkan_pipeline_layout_set(
+    iree_hal_vulkan_pipeline_layout_t* pipeline_layout,
+    iree_host_size_t set_index) {
+  if (IREE_UNLIKELY(set_index >= pipeline_layout->set_layout_count)) {
+    return NULL;
+  }
+  return pipeline_layout->set_layouts[set_index];
+}
diff --git a/runtime/src/iree/hal/drivers/vulkan/pipeline_layout.h b/runtime/src/iree/hal/drivers/vulkan/pipeline_layout.h
new file mode 100644
index 0000000..e9d2183
--- /dev/null
+++ b/runtime/src/iree/hal/drivers/vulkan/pipeline_layout.h
@@ -0,0 +1,98 @@
+// Copyright 2020 The IREE Authors
+//
+// Licensed under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+
+#ifndef IREE_HAL_DRIVERS_VULKAN_PIPELINE_LAYOUT_H_
+#define IREE_HAL_DRIVERS_VULKAN_PIPELINE_LAYOUT_H_
+
+// clang-format off: must be included before all other headers.
+#include "iree/hal/drivers/vulkan/vulkan_headers.h"
+// clang-format on
+
+#include "iree/base/api.h"
+#include "iree/hal/api.h"
+#include "iree/hal/drivers/vulkan/handle_util.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif  // __cplusplus
+
+//===----------------------------------------------------------------------===//
+// iree_hal_vulkan_descriptor_set_layout_t
+//===----------------------------------------------------------------------===//
+
+typedef struct iree_hal_vulkan_descriptor_set_layout_t {
+  iree_atomic_ref_count_t ref_count;
+  iree::hal::vulkan::VkDeviceHandle* logical_device;
+  VkDescriptorSetLayout handle;
+} iree_hal_vulkan_descriptor_set_layout_t;
+
+// Creates a native Vulkan VkDescriptorSetLayout object.
+iree_status_t iree_hal_vulkan_descriptor_set_layout_create(
+    iree::hal::vulkan::VkDeviceHandle* logical_device,
+    VkDescriptorSetLayoutCreateFlags flags, iree_host_size_t binding_count,
+    const VkDescriptorSetLayoutBinding* bindings,
+    iree_hal_vulkan_descriptor_set_layout_t** out_descriptor_set_layout);
+
+// Retains the given |descriptor_set_layout| for the caller.
+void iree_hal_vulkan_descriptor_set_layout_retain(
+    iree_hal_vulkan_descriptor_set_layout_t* descriptor_set_layout);
+
+// Releases the given |descriptor_set_layout| from the caller.
+void iree_hal_vulkan_descriptor_set_layout_release(
+    iree_hal_vulkan_descriptor_set_layout_t* descriptor_set_layout);
+
+// Returns the native Vulkan VkDescriptorSetLayout handle.
+VkDescriptorSetLayout iree_hal_vulkan_descriptor_set_layout_handle(
+    iree_hal_vulkan_descriptor_set_layout_t* descriptor_set_layout);
+
+//===----------------------------------------------------------------------===//
+// iree_hal_vulkan_pipeline_layout_t
+//===----------------------------------------------------------------------===//
+
+typedef struct iree_hal_vulkan_pipeline_layout_t {
+  iree_atomic_ref_count_t ref_count;
+  iree::hal::vulkan::VkDeviceHandle* logical_device;
+  VkPipelineLayout handle;
+  iree_host_size_t set_layout_count;
+  iree_hal_vulkan_descriptor_set_layout_t* set_layouts[];
+} iree_hal_vulkan_pipeline_layout_t;
+
+// Creates a VkPipelineLayout-based pipeline layout composed of one or more
+// descriptor set layouts.
+iree_status_t iree_hal_vulkan_pipeline_layout_create(
+    iree::hal::vulkan::VkDeviceHandle* logical_device,
+    iree_host_size_t push_constant_range_count,
+    const VkPushConstantRange* push_constant_ranges,
+    iree_host_size_t set_layout_count,
+    iree_hal_vulkan_descriptor_set_layout_t* const* set_layouts,
+    iree_hal_vulkan_pipeline_layout_t** out_pipeline_layout);
+
+// Retains the given |pipeline_layout| for the caller.
+void iree_hal_vulkan_pipeline_layout_retain(
+    iree_hal_vulkan_pipeline_layout_t* pipeline_layout);
+
+// Releases the given |pipeline_layout| from the caller.
+void iree_hal_vulkan_pipeline_layout_release(
+    iree_hal_vulkan_pipeline_layout_t* pipeline_layout);
+
+// Returns the native VkPipelineLayout handle for the pipeline layout.
+VkPipelineLayout iree_hal_vulkan_pipeline_layout_handle(
+    iree_hal_vulkan_pipeline_layout_t* pipeline_layout);
+
+// Returns the total number of descriptor sets within the layout.
+iree_host_size_t iree_hal_vulkan_pipeline_layout_set_count(
+    iree_hal_vulkan_pipeline_layout_t* pipeline_layout);
+
+// Returns the descriptor set layout with the given |set_index|.
+iree_hal_vulkan_descriptor_set_layout_t* iree_hal_vulkan_pipeline_layout_set(
+    iree_hal_vulkan_pipeline_layout_t* pipeline_layout,
+    iree_host_size_t set_index);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif  // __cplusplus
+
+#endif  // IREE_HAL_DRIVERS_VULKAN_PIPELINE_LAYOUT_H_
diff --git a/runtime/src/iree/hal/drivers/vulkan/vulkan_device.cc b/runtime/src/iree/hal/drivers/vulkan/vulkan_device.cc
index 2df2f6d..3b7192d 100644
--- a/runtime/src/iree/hal/drivers/vulkan/vulkan_device.cc
+++ b/runtime/src/iree/hal/drivers/vulkan/vulkan_device.cc
@@ -24,7 +24,6 @@
 #include "iree/hal/drivers/vulkan/handle_util.h"
 #include "iree/hal/drivers/vulkan/native_allocator.h"
 #include "iree/hal/drivers/vulkan/native_event.h"
-#include "iree/hal/drivers/vulkan/native_pipeline_layout.h"
 #include "iree/hal/drivers/vulkan/native_semaphore.h"
 #include "iree/hal/drivers/vulkan/nop_executable_cache.h"
 #include "iree/hal/drivers/vulkan/status_util.h"
@@ -851,10 +850,10 @@
   return iree_ok_status();
 }
 
-static iree_status_t iree_hal_vulkan_get_device_properties(
+static iree_status_t iree_hal_vulkan_query_device_properties(
     DynamicSymbols* instance_syms, VkPhysicalDevice physical_device,
-    iree_hal_vulkan_device_properties_t* device_properties) {
-  memset(device_properties, 0, sizeof(*device_properties));
+    iree_hal_vulkan_device_properties_t* out_properties) {
+  memset(out_properties, 0, sizeof(*out_properties));
 
   VkPhysicalDeviceFeatures2 physical_device_features;
   memset(&physical_device_features, 0, sizeof(physical_device_features));
@@ -941,40 +940,40 @@
                                                 &physical_device_properties);
 
   if (shader_float16_int8_features.shaderFloat16) {
-    device_properties->compute_float |= 0x1u;
+    out_properties->compute_float |= 0x1u;
   }
   if (physical_device_features.features.shaderFloat64) {
-    device_properties->compute_float |= 0x2u;
+    out_properties->compute_float |= 0x2u;
   }
   if (shader_float16_int8_features.shaderInt8) {
-    device_properties->compute_int |= 0x1u;
+    out_properties->compute_int |= 0x1u;
   }
   if (physical_device_features.features.shaderInt16) {
-    device_properties->compute_int |= 0x2u;
+    out_properties->compute_int |= 0x2u;
   }
   if (physical_device_features.features.shaderInt64) {
-    device_properties->compute_int |= 0x4u;
+    out_properties->compute_int |= 0x4u;
   }
   if (supported_8bit_storage_features.storageBuffer8BitAccess &&
       supported_8bit_storage_features.uniformAndStorageBuffer8BitAccess) {
-    device_properties->storage |= 0x1u;
+    out_properties->storage |= 0x1u;
   }
   if (supported_16bit_storage_features.storageBuffer16BitAccess &&
       supported_16bit_storage_features.uniformAndStorageBuffer16BitAccess) {
-    device_properties->storage |= 0x2u;
+    out_properties->storage |= 0x2u;
   }
 
   if (iree_all_bits_set(subgroup_properties.supportedOperations,
                         VK_SUBGROUP_FEATURE_SHUFFLE_BIT)) {
-    device_properties->subgroup |= 0x1u;
+    out_properties->subgroup |= 0x1u;
   }
   if (iree_all_bits_set(subgroup_properties.supportedOperations,
                         VK_SUBGROUP_FEATURE_ARITHMETIC_BIT)) {
-    device_properties->subgroup |= 0x2u;
+    out_properties->subgroup |= 0x2u;
   }
 
   if (dot_product_features.shaderIntegerDotProduct) {
-    device_properties->dot_product |= 0x1u;
+    out_properties->dot_product |= 0x1u;
   }
 
   if (coop_matrix_features.cooperativeMatrix &&
@@ -999,7 +998,7 @@
           p->BType == VK_COMPONENT_TYPE_FLOAT16_KHR) {
         if (p->CType == VK_COMPONENT_TYPE_FLOAT16_KHR) {
           if (p->MSize == 16 && p->NSize == 16 && p->KSize == 16) {
-            device_properties->cooperative_matrix |= 0x1u;
+            out_properties->cooperative_matrix |= 0x1u;
           }
         }
       }
@@ -1007,9 +1006,18 @@
   }
 
   if (address_features.bufferDeviceAddress) {
-    device_properties->address |= 0x1u;
+    out_properties->address |= 0x1u;
   }
 
+  out_properties->limits.max_push_constants_size =
+      physical_device_properties.properties.limits.maxPushConstantsSize;
+  out_properties->limits.max_per_stage_descriptor_uniform_buffers =
+      physical_device_properties.properties.limits
+          .maxPerStageDescriptorUniformBuffers;
+  out_properties->limits.max_per_stage_descriptor_storage_buffers =
+      physical_device_properties.properties.limits
+          .maxPerStageDescriptorStorageBuffers;
+
   return iree_ok_status();
 }
 
@@ -1277,7 +1285,7 @@
   }
 
   iree_hal_vulkan_device_properties_t device_properties;
-  IREE_RETURN_IF_ERROR(iree_hal_vulkan_get_device_properties(
+  IREE_RETURN_IF_ERROR(iree_hal_vulkan_query_device_properties(
       instance_syms, physical_device, &device_properties));
 
   auto logical_device = new VkDeviceHandle(
@@ -1350,9 +1358,9 @@
   iree_hal_vulkan_device_extensions_t enabled_device_extensions =
       iree_hal_vulkan_infer_enabled_device_extensions(device_syms.get());
 
-  // We can still retrieve the correct device properties though.
+  // We can retrieve the device properties and limits from the wrapped handle.
   iree_hal_vulkan_device_properties_t device_properties;
-  IREE_RETURN_IF_ERROR(iree_hal_vulkan_get_device_properties(
+  IREE_RETURN_IF_ERROR(iree_hal_vulkan_query_device_properties(
       device_syms.get(), physical_device, &device_properties));
 
   iree_hal_vulkan_features_t enabled_features = 0;
@@ -1571,18 +1579,6 @@
       device->builtin_executables, &device->block_pool, out_command_buffer);
 }
 
-static iree_status_t iree_hal_vulkan_device_create_descriptor_set_layout(
-    iree_hal_device_t* base_device,
-    iree_hal_descriptor_set_layout_flags_t flags,
-    iree_host_size_t binding_count,
-    const iree_hal_descriptor_set_layout_binding_t* bindings,
-    iree_hal_descriptor_set_layout_t** out_descriptor_set_layout) {
-  iree_hal_vulkan_device_t* device = iree_hal_vulkan_device_cast(base_device);
-  return iree_hal_vulkan_native_descriptor_set_layout_create(
-      device->logical_device, flags, binding_count, bindings,
-      out_descriptor_set_layout);
-}
-
 static iree_status_t iree_hal_vulkan_device_create_event(
     iree_hal_device_t* base_device, iree_hal_queue_affinity_t queue_affinity,
     iree_hal_event_flags_t flags, iree_hal_event_t** out_event) {
@@ -1614,17 +1610,6 @@
       iree_hal_device_host_allocator(base_device), out_file);
 }
 
-static iree_status_t iree_hal_vulkan_device_create_pipeline_layout(
-    iree_hal_device_t* base_device, iree_host_size_t push_constants,
-    iree_host_size_t set_layout_count,
-    iree_hal_descriptor_set_layout_t* const* set_layouts,
-    iree_hal_pipeline_layout_t** out_pipeline_layout) {
-  iree_hal_vulkan_device_t* device = iree_hal_vulkan_device_cast(base_device);
-  return iree_hal_vulkan_native_pipeline_layout_create(
-      device->logical_device, push_constants, set_layout_count, set_layouts,
-      out_pipeline_layout);
-}
-
 static iree_status_t iree_hal_vulkan_device_create_semaphore(
     iree_hal_device_t* base_device, uint64_t initial_value,
     iree_hal_semaphore_flags_t flags, iree_hal_semaphore_t** out_semaphore) {
@@ -1913,14 +1898,10 @@
     /*.query_i64=*/iree_hal_vulkan_device_query_i64,
     /*.create_channel=*/iree_hal_vulkan_device_create_channel,
     /*.create_command_buffer=*/iree_hal_vulkan_device_create_command_buffer,
-    /*.create_descriptor_set_layout=*/
-    iree_hal_vulkan_device_create_descriptor_set_layout,
     /*.create_event=*/iree_hal_vulkan_device_create_event,
     /*.create_executable_cache=*/
     iree_hal_vulkan_device_create_executable_cache,
     /*.import_file=*/iree_hal_vulkan_device_import_file,
-    /*.create_pipeline_layout=*/
-    iree_hal_vulkan_device_create_pipeline_layout,
     /*.create_semaphore=*/iree_hal_vulkan_device_create_semaphore,
     /*.query_semaphore_compatibility=*/
     iree_hal_vulkan_device_query_semaphore_compatibility,
diff --git a/runtime/src/iree/hal/executable_cache.c b/runtime/src/iree/hal/executable_cache.c
index abd3b67..c34bcff 100644
--- a/runtime/src/iree/hal/executable_cache.c
+++ b/runtime/src/iree/hal/executable_cache.c
@@ -56,8 +56,6 @@
     iree_hal_executable_t** out_executable) {
   IREE_ASSERT_ARGUMENT(executable_cache);
   IREE_ASSERT_ARGUMENT(executable_params);
-  IREE_ASSERT_ARGUMENT(!executable_params->pipeline_layout_count ||
-                       executable_params->pipeline_layouts);
   IREE_ASSERT_ARGUMENT(out_executable);
   *out_executable = NULL;
   IREE_TRACE_ZONE_BEGIN(z0);
diff --git a/runtime/src/iree/hal/executable_cache.h b/runtime/src/iree/hal/executable_cache.h
index 435f01d..0f2e2d2 100644
--- a/runtime/src/iree/hal/executable_cache.h
+++ b/runtime/src/iree/hal/executable_cache.h
@@ -12,7 +12,6 @@
 
 #include "iree/base/api.h"
 #include "iree/hal/executable.h"
-#include "iree/hal/pipeline_layout.h"
 #include "iree/hal/resource.h"
 
 #ifdef __cplusplus
@@ -92,16 +91,6 @@
   // to any executable created using it still held by the caller.
   iree_const_byte_span_t executable_data;
 
-  // TODO(#18154): drop pipeline layouts with simplified bindings. Allowed to be
-  // empty for now on targets that support simplified bindings.
-  //
-  // A set of pipeline layouts for each entry point in the executable.
-  // The order matches that produced by the compiler. As multiple entry points
-  // may share the same layout some entries in this list may reference the same
-  // pipeline layout objects.
-  iree_host_size_t pipeline_layout_count;
-  iree_hal_pipeline_layout_t* const* pipeline_layouts;
-
   // Executable-level constants table used to perform runtime specialization
   // when information is not available statically during compilation. The
   // compiler defines the contents of the table, how they are populated, and
@@ -178,10 +167,6 @@
 // will be used to either lookup a previously prepared executable in the cache
 // or prepare a new one.
 //
-// Each entry point in the executable requires a corresponding value in
-// |pipeline_layouts| defining the layout used by the entry point. If multiple
-// entry points use the same layouts they can reuse the same values.
-//
 // Depending on the driver preparation may take a non-trivial amount of time
 // (such as when JITing/etc). As the cache is internally synchronized callers
 // can issue preparation requests from multiple threads - even for the same
diff --git a/runtime/src/iree/hal/local/BUILD.bazel b/runtime/src/iree/hal/local/BUILD.bazel
index 66ccfd6..c05b52a 100644
--- a/runtime/src/iree/hal/local/BUILD.bazel
+++ b/runtime/src/iree/hal/local/BUILD.bazel
@@ -122,14 +122,12 @@
     srcs = [
         "inline_command_buffer.c",
         "local_executable_cache.c",
-        "local_pipeline_layout.c",
     ],
     hdrs = [
         "executable_loader.h",
         "inline_command_buffer.h",
         "local_executable.h",
         "local_executable_cache.h",
-        "local_pipeline_layout.h",
     ],
     deps = [
         ":executable_environment",
diff --git a/runtime/src/iree/hal/local/CMakeLists.txt b/runtime/src/iree/hal/local/CMakeLists.txt
index 2a2c657..50f640b 100644
--- a/runtime/src/iree/hal/local/CMakeLists.txt
+++ b/runtime/src/iree/hal/local/CMakeLists.txt
@@ -138,11 +138,9 @@
     "inline_command_buffer.h"
     "local_executable.h"
     "local_executable_cache.h"
-    "local_pipeline_layout.h"
   SRCS
     "inline_command_buffer.c"
     "local_executable_cache.c"
-    "local_pipeline_layout.c"
   DEPS
     ::executable_environment
     ::executable_library
diff --git a/runtime/src/iree/hal/local/elf/testdata/elementwise_mul.mlir b/runtime/src/iree/hal/local/elf/testdata/elementwise_mul.mlir
index fafa77b..07df2a2 100644
--- a/runtime/src/iree/hal/local/elf/testdata/elementwise_mul.mlir
+++ b/runtime/src/iree/hal/local/elf/testdata/elementwise_mul.mlir
@@ -19,12 +19,10 @@
 //    --binding=4xf32=0,0,0,0
 
 // lhs * rhs => dst / s0b0 * s0b1 => s0b2
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // A single executable source definition is allowed per translation in this mode
@@ -47,9 +45,9 @@
   // exported.
   builtin.module {
     func.func @elementwise_mul() {
-      %lhs = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32) : !flow.dispatch.tensor<readonly:tensor<4xf32>>
-      %rhs = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(32) : !flow.dispatch.tensor<readonly:tensor<4xf32>>
-      %dst = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(32) : !flow.dispatch.tensor<writeonly:tensor<4xf32>>
+      %lhs = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32) : !flow.dispatch.tensor<readonly:tensor<4xf32>>
+      %rhs = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(32) : !flow.dispatch.tensor<readonly:tensor<4xf32>>
+      %dst = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(32) : !flow.dispatch.tensor<writeonly:tensor<4xf32>>
       %workgroup_size_x = hal.interface.workgroup.size[0] : index
       %workgroup_id_x = hal.interface.workgroup.id[0] : index
       %workgroup_count_x = hal.interface.workgroup.count[0] : index
diff --git a/runtime/src/iree/hal/local/elf/testdata/elementwise_mul_arm_32.so b/runtime/src/iree/hal/local/elf/testdata/elementwise_mul_arm_32.so
index ef4b031..eddc3f7 100644
--- a/runtime/src/iree/hal/local/elf/testdata/elementwise_mul_arm_32.so
+++ b/runtime/src/iree/hal/local/elf/testdata/elementwise_mul_arm_32.so
Binary files differ
diff --git a/runtime/src/iree/hal/local/elf/testdata/elementwise_mul_arm_64.so b/runtime/src/iree/hal/local/elf/testdata/elementwise_mul_arm_64.so
index d2f011a..c7e7a3a 100644
--- a/runtime/src/iree/hal/local/elf/testdata/elementwise_mul_arm_64.so
+++ b/runtime/src/iree/hal/local/elf/testdata/elementwise_mul_arm_64.so
Binary files differ
diff --git a/runtime/src/iree/hal/local/elf/testdata/elementwise_mul_riscv_32.so b/runtime/src/iree/hal/local/elf/testdata/elementwise_mul_riscv_32.so
index 3bc1bd2..4ae367d 100644
--- a/runtime/src/iree/hal/local/elf/testdata/elementwise_mul_riscv_32.so
+++ b/runtime/src/iree/hal/local/elf/testdata/elementwise_mul_riscv_32.so
Binary files differ
diff --git a/runtime/src/iree/hal/local/elf/testdata/elementwise_mul_riscv_64.so b/runtime/src/iree/hal/local/elf/testdata/elementwise_mul_riscv_64.so
index deb3e0a..374ca8f 100644
--- a/runtime/src/iree/hal/local/elf/testdata/elementwise_mul_riscv_64.so
+++ b/runtime/src/iree/hal/local/elf/testdata/elementwise_mul_riscv_64.so
Binary files differ
diff --git a/runtime/src/iree/hal/local/elf/testdata/elementwise_mul_x86_32.so b/runtime/src/iree/hal/local/elf/testdata/elementwise_mul_x86_32.so
index 7f9eee1..6e6f9e9 100644
--- a/runtime/src/iree/hal/local/elf/testdata/elementwise_mul_x86_32.so
+++ b/runtime/src/iree/hal/local/elf/testdata/elementwise_mul_x86_32.so
Binary files differ
diff --git a/runtime/src/iree/hal/local/elf/testdata/elementwise_mul_x86_64.so b/runtime/src/iree/hal/local/elf/testdata/elementwise_mul_x86_64.so
index 21b7a01..5a9ec26 100644
--- a/runtime/src/iree/hal/local/elf/testdata/elementwise_mul_x86_64.so
+++ b/runtime/src/iree/hal/local/elf/testdata/elementwise_mul_x86_64.so
Binary files differ
diff --git a/runtime/src/iree/hal/local/executable_library.h b/runtime/src/iree/hal/local/executable_library.h
index d45b477..bf2e172 100644
--- a/runtime/src/iree/hal/local/executable_library.h
+++ b/runtime/src/iree/hal/local/executable_library.h
@@ -87,12 +87,12 @@
 // or some semantic versioning we track in whatever spec we end up having.
 typedef uint32_t iree_hal_executable_library_version_t;
 
-#define IREE_HAL_EXECUTABLE_LIBRARY_VERSION_0_4 0x00000004u
+#define IREE_HAL_EXECUTABLE_LIBRARY_VERSION_0_5 0x00000005u
 
 // The latest version of the library API; can be used to populate the
 // iree_hal_executable_library_header_t::version when building libraries.
 #define IREE_HAL_EXECUTABLE_LIBRARY_VERSION_LATEST \
-  IREE_HAL_EXECUTABLE_LIBRARY_VERSION_0_4
+  IREE_HAL_EXECUTABLE_LIBRARY_VERSION_0_5
 
 // A header present at the top of all versions of the library API used by the
 // runtime to ensure version compatibility.
@@ -279,8 +279,8 @@
   uint32_t workgroup_size_y;
   uint16_t workgroup_size_z;
 
-  // Total number of available 4 byte push constant values in |push_constants|.
-  uint16_t push_constant_count;
+  // Total number of available 4 byte push constant values in |constants|.
+  uint16_t constant_count;
 
   // Total workgroup count for the dispatch. This is sourced from either the
   // original dispatch call (for iree_hal_command_buffer_dispatch) or the
@@ -299,8 +299,8 @@
   // used (known at compile-time).
   uint8_t binding_count;
 
-  // |push_constant_count| values.
-  const uint32_t* push_constants;
+  // |constant_count| values.
+  const uint32_t* constants;
   // Base pointers to each binding buffer.
   void* const* binding_ptrs;
   // The length of each binding in bytes, 1:1 with |binding_ptrs|.
@@ -392,9 +392,11 @@
   uint8_t constant_count;
   // Total number of bindings used by the dispatch.
   uint8_t binding_count;
-  // TODO(#18189): add ~8 uint64_t fields for binding bits (readonly/indirect).
+  // Unused to pad the structure. Must be 0.
+  uint32_t reserved_0;
+  // Unused. Must be 0.
+  uint64_t reserved_1[8];
 } iree_hal_executable_dispatch_attrs_v0_t;
-static_assert(sizeof(iree_hal_executable_dispatch_attrs_v0_t) == 4, "uint32_t");
 
 // Source location information for a dispatch function indicating what code was
 // used to generate it. This only represents a single source snapshot, of which
diff --git a/runtime/src/iree/hal/local/executable_library_benchmark.c b/runtime/src/iree/hal/local/executable_library_benchmark.c
index d87149d..8988b5d 100644
--- a/runtime/src/iree/hal/local/executable_library_benchmark.c
+++ b/runtime/src/iree/hal/local/executable_library_benchmark.c
@@ -17,7 +17,6 @@
 #include "iree/hal/local/executable_loader.h"
 #include "iree/hal/local/loaders/registration/init.h"
 #include "iree/hal/local/local_executable.h"
-#include "iree/hal/local/local_pipeline_layout.h"
 #include "iree/hal/local/plugins/registration/init.h"
 #include "iree/testing/benchmark.h"
 
@@ -49,53 +48,47 @@
 IREE_FLAG(int32_t, max_concurrency, 1,
           "Maximum available concurrency exposed to the dispatch.");
 
-// Total number of bindings we (currently) allow any executable to have.
-#define IREE_HAL_LOCAL_MAX_TOTAL_BINDING_COUNT \
-  (IREE_HAL_LOCAL_MAX_DESCRIPTOR_SET_COUNT *   \
-   IREE_HAL_LOCAL_MAX_DESCRIPTOR_BINDING_COUNT)
-
 // Parsed parameters from flags.
 // Used to construct the dispatch parameters for the benchmark invocation.
 struct {
-  int32_t push_constant_count;
+  int32_t constant_count;
   union {
     uint32_t ui32;
-  } push_constants[IREE_HAL_LOCAL_MAX_PUSH_CONSTANT_COUNT];
+  } constants[IREE_HAL_EXECUTABLE_MAX_CONSTANT_COUNT];
 
   int32_t binding_count;
-  iree_string_view_t bindings[IREE_HAL_LOCAL_MAX_TOTAL_BINDING_COUNT];
+  iree_string_view_t bindings[IREE_HAL_EXECUTABLE_MAX_BINDING_COUNT];
 } dispatch_params = {
-    .push_constant_count = 0,
+    .constant_count = 0,
     .binding_count = 0,
 };
 
-static iree_status_t parse_push_constant(iree_string_view_t flag_name,
-                                         void* storage,
-                                         iree_string_view_t value) {
-  IREE_ASSERT_LE(dispatch_params.push_constant_count + 1,
-                 IREE_ARRAYSIZE(dispatch_params.push_constants),
+static iree_status_t parse_constant(iree_string_view_t flag_name, void* storage,
+                                    iree_string_view_t value) {
+  IREE_ASSERT_LE(dispatch_params.constant_count + 1,
+                 IREE_ARRAYSIZE(dispatch_params.constants),
                  "too many push constants");
-  dispatch_params.push_constants[dispatch_params.push_constant_count++].ui32 =
+  dispatch_params.constants[dispatch_params.constant_count++].ui32 =
       atoi(value.data);
   return iree_ok_status();
 }
-static void print_push_constant(iree_string_view_t flag_name, void* storage,
-                                FILE* file) {
-  if (dispatch_params.push_constant_count == 0) {
+static void print_constant(iree_string_view_t flag_name, void* storage,
+                           FILE* file) {
+  if (dispatch_params.constant_count == 0) {
     fprintf(file, "# --%.*s=[integer value]\n", (int)flag_name.size,
             flag_name.data);
     return;
   }
-  for (int32_t i = 0; i < dispatch_params.push_constant_count; ++i) {
+  for (int32_t i = 0; i < dispatch_params.constant_count; ++i) {
     fprintf(file, "--%.*s=%u", (int)flag_name.size, flag_name.data,
-            dispatch_params.push_constants[i].ui32);
-    if (i < dispatch_params.push_constant_count - 1) {
+            dispatch_params.constants[i].ui32);
+    if (i < dispatch_params.constant_count - 1) {
       fprintf(file, "\n");
     }
   }
 }
-IREE_FLAG_CALLBACK(parse_push_constant, print_push_constant, &dispatch_params,
-                   push_constant_callback,
+IREE_FLAG_CALLBACK(parse_constant, print_constant, &dispatch_params,
+                   constant_callback,
                    "Appends a uint32_t push constant value.\n");
 
 static iree_status_t parse_binding(iree_string_view_t flag_name, void* storage,
@@ -165,12 +158,6 @@
                                                host_allocator, &file_contents));
   executable_params.executable_data = file_contents->const_buffer;
 
-  // Setup the layouts defining how each entry point is interpreted.
-  // NOTE: we know for the embedded library loader that this is not required.
-  // Other loaders may need it in which case it'll have to be provided.
-  executable_params.pipeline_layout_count = 0;
-  executable_params.pipeline_layouts = NULL;
-
   // Perform the load, which will fail if the executable cannot be loaded or
   // there was an issue with the layouts.
   iree_hal_executable_t* executable = NULL;
@@ -201,9 +188,9 @@
   IREE_RETURN_IF_ERROR(iree_hal_allocator_create_heap(
       iree_make_cstring_view("benchmark"), host_allocator, host_allocator,
       &heap_allocator));
-  iree_hal_buffer_view_t* buffer_views[IREE_HAL_LOCAL_MAX_TOTAL_BINDING_COUNT];
-  void* binding_ptrs[IREE_HAL_LOCAL_MAX_TOTAL_BINDING_COUNT];
-  size_t binding_lengths[IREE_HAL_LOCAL_MAX_TOTAL_BINDING_COUNT];
+  iree_hal_buffer_view_t* buffer_views[IREE_HAL_EXECUTABLE_MAX_BINDING_COUNT];
+  void* binding_ptrs[IREE_HAL_EXECUTABLE_MAX_BINDING_COUNT];
+  size_t binding_lengths[IREE_HAL_EXECUTABLE_MAX_BINDING_COUNT];
   for (iree_host_size_t i = 0; i < dispatch_params.binding_count; ++i) {
     IREE_RETURN_IF_ERROR(
         iree_hal_buffer_view_parse(dispatch_params.bindings[i], /*device=*/NULL,
@@ -229,8 +216,8 @@
       .workgroup_size_y = FLAG_workgroup_size_y,
       .workgroup_size_z = FLAG_workgroup_size_z,
       .max_concurrency = FLAG_max_concurrency,
-      .push_constant_count = dispatch_params.push_constant_count,
-      .push_constants = &dispatch_params.push_constants[0].ui32,
+      .constant_count = dispatch_params.constant_count,
+      .constants = &dispatch_params.constants[0].ui32,
       .binding_count = dispatch_params.binding_count,
       .binding_ptrs = binding_ptrs,
       .binding_lengths = binding_lengths,
diff --git a/runtime/src/iree/hal/local/executable_library_benchmark.md b/runtime/src/iree/hal/local/executable_library_benchmark.md
index 1e7d59a..07bfa31 100644
--- a/runtime/src/iree/hal/local/executable_library_benchmark.md
+++ b/runtime/src/iree/hal/local/executable_library_benchmark.md
@@ -205,7 +205,7 @@
 5. Look up in the IR to see the values of push constants, if required:
 
 ```mlir
-  hal.command_buffer.push_constants<%cmd : !hal.command_buffer>
+  hal.command_buffer.constants<%cmd : !hal.command_buffer>
       layout(%0 : !hal.pipeline_layout)
       offset(0)
       values(%c1, %c2, %c3, %c4) : i32, i32, i32, i32
@@ -216,8 +216,8 @@
 things like this but in cases where you know the meaning you can provide values:
 
 ```
---push_constant=1
---push_constant=2
---push_constant=3
---push_constant=4
+--constant=1
+--constant=2
+--constant=3
+--constant=4
 ```
diff --git a/runtime/src/iree/hal/local/executable_library_demo.c b/runtime/src/iree/hal/local/executable_library_demo.c
index 300d645..bb03027 100644
--- a/runtime/src/iree/hal/local/executable_library_demo.c
+++ b/runtime/src/iree/hal/local/executable_library_demo.c
@@ -22,17 +22,17 @@
 // communication between invocations must use the buffer bindings for I/O.
 //
 // This is a simple scalar addition:
-//    binding[1] = binding[0] + push_constant[0]
+//    binding[1] = binding[0] + constant[0]
 static int dispatch_tile_a(
     const iree_hal_executable_environment_v0_t* environment,
     const iree_hal_executable_dispatch_state_v0_t* dispatch_state,
     const iree_hal_executable_workgroup_state_v0_t* workgroup_state) {
-  const dispatch_tile_a_push_constants_t* push_constants =
-      (const dispatch_tile_a_push_constants_t*)dispatch_state->push_constants;
+  const dispatch_tile_a_constants_t* constants =
+      (const dispatch_tile_a_constants_t*)dispatch_state->constants;
   const float* src = ((const float*)dispatch_state->binding_ptrs[0]);
   float* dst = ((float*)dispatch_state->binding_ptrs[1]);
   const uint32_t x = workgroup_state->workgroup_id_x;
-  dst[x] = src[x] + push_constants->f0;
+  dst[x] = src[x] + constants->f0;
   return 0;
 }
 
diff --git a/runtime/src/iree/hal/local/executable_library_demo.h b/runtime/src/iree/hal/local/executable_library_demo.h
index f458768..1ebcfe9 100644
--- a/runtime/src/iree/hal/local/executable_library_demo.h
+++ b/runtime/src/iree/hal/local/executable_library_demo.h
@@ -26,14 +26,14 @@
   struct {
     float f0;
   };
-} dispatch_tile_a_push_constants_t;
+} dispatch_tile_a_constants_t;
 
 // Returns a simple demo library with the following structure:
 //
 // Name: 'demo_library'
 //
 // [0] 'dispatch_tile_a': matmul+div
-//       push constants: 1 (dispatch_tile_a_push_constants_t)
+//       push constants: 1 (dispatch_tile_a_constants_t)
 //       bindings: 2
 //         [0] = R
 //         [1] = W
diff --git a/runtime/src/iree/hal/local/executable_library_test.c b/runtime/src/iree/hal/local/executable_library_test.c
index cb35448..b6ffc36 100644
--- a/runtime/src/iree/hal/local/executable_library_test.c
+++ b/runtime/src/iree/hal/local/executable_library_test.c
@@ -58,9 +58,9 @@
   // to specify (no buffer pointer indirection) and more efficient to access
   // (static struct offset address calculation, all fit in a few cache lines,
   // etc). They are limited in capacity, though, so only <=64(ish) are usable.
-  dispatch_tile_a_push_constants_t push_constants;
-  memset(&push_constants, 0, sizeof(push_constants));
-  push_constants.f0 = 5.0f;
+  dispatch_tile_a_constants_t constants;
+  memset(&constants, 0, sizeof(constants));
+  constants.f0 = 5.0f;
 
   // Setup the two buffer bindings the entry point is expecting.
   // They only need to remain valid for the duration of the invocation and all
@@ -90,8 +90,8 @@
       .workgroup_size_y = 1,
       .workgroup_size_z = 1,
       .max_concurrency = 1,
-      .push_constant_count = IREE_ARRAYSIZE(push_constants.values),
-      .push_constants = push_constants.values,
+      .constant_count = IREE_ARRAYSIZE(constants.values),
+      .constants = constants.values,
       .binding_count = IREE_ARRAYSIZE(binding_ptrs),
       .binding_ptrs = binding_ptrs,
       .binding_lengths = binding_lengths,
diff --git a/runtime/src/iree/hal/local/executable_library_util.c b/runtime/src/iree/hal/local/executable_library_util.c
index b2d1165..18c806e 100644
--- a/runtime/src/iree/hal/local/executable_library_util.c
+++ b/runtime/src/iree/hal/local/executable_library_util.c
@@ -17,19 +17,6 @@
                         IREE_HAL_EXECUTABLE_CACHING_MODE_DISABLE_VERIFICATION);
   if (disable_verification) return iree_ok_status();
 
-  // Check that there's one pipeline layout per export. Multiple exports may
-  // share the same layout but it still needs to be declared.
-  // NOTE: pipeline layouts are optional but if provided must be consistent.
-  if (executable_params->pipeline_layout_count > 0) {
-    if (library->exports.count != executable_params->pipeline_layout_count) {
-      return iree_make_status(IREE_STATUS_FAILED_PRECONDITION,
-                              "executable provides %u entry points but caller "
-                              "provided %" PRIhsz "; must match",
-                              library->exports.count,
-                              executable_params->pipeline_layout_count);
-    }
-  }
-
   // Check to make sure that the constant table has values for all constants.
   if (library->constants.count != executable_params->constant_count) {
     return iree_make_status(IREE_STATUS_FAILED_PRECONDITION,
diff --git a/runtime/src/iree/hal/local/executable_loader.c b/runtime/src/iree/hal/local/executable_loader.c
index cd51a6f..843b0cd 100644
--- a/runtime/src/iree/hal/local/executable_loader.c
+++ b/runtime/src/iree/hal/local/executable_loader.c
@@ -99,8 +99,6 @@
     iree_host_size_t worker_capacity, iree_hal_executable_t** out_executable) {
   IREE_ASSERT_ARGUMENT(executable_loader);
   IREE_ASSERT_ARGUMENT(executable_params);
-  IREE_ASSERT_ARGUMENT(!executable_params->pipeline_layout_count ||
-                       executable_params->pipeline_layouts);
   IREE_ASSERT_ARGUMENT(!executable_params->executable_data.data_length ||
                        executable_params->executable_data.data);
   IREE_ASSERT_ARGUMENT(out_executable);
diff --git a/runtime/src/iree/hal/local/inline_command_buffer.c b/runtime/src/iree/hal/local/inline_command_buffer.c
index 2e0465c..7ea8513 100644
--- a/runtime/src/iree/hal/local/inline_command_buffer.c
+++ b/runtime/src/iree/hal/local/inline_command_buffer.c
@@ -16,7 +16,6 @@
 #include "iree/base/internal/math.h"
 #include "iree/hal/local/executable_library.h"
 #include "iree/hal/local/local_executable.h"
-#include "iree/hal/local/local_pipeline_layout.h"
 
 //===----------------------------------------------------------------------===//
 // iree_hal_inline_command_buffer_t
@@ -28,26 +27,9 @@
   iree_allocator_t host_allocator;
 
   struct {
-    // TODO(#18189): remove legacy bindings state.
-    //
-    // A flattened list of all available descriptor set bindings.
-    // As descriptor sets are pushed/bound the bindings will be updated to
-    // represent the fully-translated binding data pointer.
-    void* full_bindings[IREE_HAL_LOCAL_MAX_DESCRIPTOR_SET_COUNT *
-                        IREE_HAL_LOCAL_MAX_DESCRIPTOR_BINDING_COUNT];
-    size_t full_binding_lengths[IREE_HAL_LOCAL_MAX_DESCRIPTOR_SET_COUNT *
-                                IREE_HAL_LOCAL_MAX_DESCRIPTOR_BINDING_COUNT];
-
-    // TODO(#18189): remove legacy push constant state.
-    //
-    // All available push constants updated each time push_constants is called.
-    // Reset only with the command buffer and otherwise will maintain its values
-    // during recording to allow for partial push_constants updates.
-    uint32_t push_constants[IREE_HAL_LOCAL_MAX_PUSH_CONSTANT_COUNT];
-
     // Cached and initialized dispatch state reused for all dispatches.
     // Individual dispatches must populate the dynamically changing fields like
-    // push_constant_count and binding_count.
+    // constant_count and binding_count.
     iree_alignas(64) iree_hal_executable_dispatch_state_v0_t dispatch_state;
     // Persistent storage for binding pointers used by dispatch_state.
     void* binding_ptr_storage[IREE_HAL_EXECUTABLE_MAX_BINDING_COUNT];
@@ -77,7 +59,6 @@
   // Setup the cached dispatch state pointers that don't change.
   iree_hal_executable_dispatch_state_v0_t* dispatch_state =
       &command_buffer->state.dispatch_state;
-  dispatch_state->push_constants = command_buffer->state.push_constants;
   dispatch_state->binding_ptrs = command_buffer->state.binding_ptr_storage;
   dispatch_state->binding_lengths =
       command_buffer->state.binding_length_storage;
@@ -363,193 +344,9 @@
                           "collectives not yet implemented on CPU");
 }
 
-//===----------------------------------------------------------------------===//
-// iree_hal_command_buffer_push_constants
-//===----------------------------------------------------------------------===//
-// NOTE: command buffer state change only; enqueues no tasks.
-
-static iree_status_t iree_hal_inline_command_buffer_push_constants(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_pipeline_layout_t* pipeline_layout, iree_host_size_t offset,
-    const void* values, iree_host_size_t values_length) {
-  iree_hal_inline_command_buffer_t* command_buffer =
-      iree_hal_inline_command_buffer_cast(base_command_buffer);
-
-  if (IREE_UNLIKELY(offset + values_length >=
-                    sizeof(command_buffer->state.push_constants))) {
-    return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
-                            "push constant range %" PRIhsz " (length=%" PRIhsz
-                            ") out of range",
-                            offset, values_length);
-  }
-
-  memcpy((uint8_t*)&command_buffer->state.push_constants + offset, values,
-         values_length);
-
-  return iree_ok_status();
-}
-
-//===----------------------------------------------------------------------===//
-// iree_hal_command_buffer_push_descriptor_set
-//===----------------------------------------------------------------------===//
-// NOTE: command buffer state change only; enqueues no tasks.
-
-static iree_status_t iree_hal_inline_command_buffer_push_descriptor_set(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_pipeline_layout_t* pipeline_layout, uint32_t set,
-    iree_host_size_t binding_count, const iree_hal_buffer_ref_t* bindings) {
-  iree_hal_inline_command_buffer_t* command_buffer =
-      iree_hal_inline_command_buffer_cast(base_command_buffer);
-
-  if (IREE_UNLIKELY(set >= IREE_HAL_LOCAL_MAX_DESCRIPTOR_SET_COUNT)) {
-    return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
-                            "set %u out of bounds", set);
-  }
-
-  iree_host_size_t binding_base =
-      set * IREE_HAL_LOCAL_MAX_DESCRIPTOR_BINDING_COUNT;
-  for (iree_host_size_t i = 0; i < binding_count; ++i) {
-    if (IREE_UNLIKELY(bindings[i].ordinal >=
-                      IREE_HAL_LOCAL_MAX_DESCRIPTOR_BINDING_COUNT)) {
-      return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
-                              "buffer binding index out of bounds");
-    }
-    iree_host_size_t binding_ordinal = binding_base + bindings[i].ordinal;
-
-    // TODO(benvanik): track mapping so we can properly map/unmap/flush/etc.
-    iree_hal_buffer_mapping_t buffer_mapping = {{0}};
-    if (bindings[i].buffer) {
-      IREE_RETURN_IF_ERROR(iree_hal_buffer_map_range(
-          bindings[i].buffer, IREE_HAL_MAPPING_MODE_PERSISTENT,
-          IREE_HAL_MEMORY_ACCESS_ANY, bindings[i].offset, bindings[i].length,
-          &buffer_mapping));
-    }
-    command_buffer->state.full_bindings[binding_ordinal] =
-        buffer_mapping.contents.data;
-    command_buffer->state.full_binding_lengths[binding_ordinal] =
-        buffer_mapping.contents.data_length;
-  }
-
-  return iree_ok_status();
-}
-
-//===----------------------------------------------------------------------===//
-// iree_hal_command_buffer_dispatch
-//===----------------------------------------------------------------------===//
-
 static iree_status_t iree_hal_inline_command_buffer_dispatch(
     iree_hal_command_buffer_t* base_command_buffer,
     iree_hal_executable_t* executable, int32_t entry_point,
-    uint32_t workgroup_x, uint32_t workgroup_y, uint32_t workgroup_z,
-    iree_hal_dispatch_flags_t flags) {
-  iree_hal_inline_command_buffer_t* command_buffer =
-      iree_hal_inline_command_buffer_cast(base_command_buffer);
-
-  iree_hal_local_executable_t* local_executable =
-      iree_hal_local_executable_cast(executable);
-  if (IREE_UNLIKELY(!local_executable->pipeline_layouts)) {
-    return iree_make_status(
-        IREE_STATUS_FAILED_PRECONDITION,
-        "layouts not provided during executable creation; cannot dispatch");
-  }
-
-  iree_hal_local_pipeline_layout_t* local_layout =
-      (iree_hal_local_pipeline_layout_t*)
-          local_executable->pipeline_layouts[entry_point];
-  iree_host_size_t local_memory_size =
-      local_executable->dispatch_attrs
-          ? local_executable->dispatch_attrs[entry_point].local_memory_pages *
-                IREE_HAL_EXECUTABLE_WORKGROUP_LOCAL_MEMORY_PAGE_SIZE
-          : 0;
-
-  // Update the ID of the processor we are running on.
-  // We don't know how much time has passed since we last updated as we are
-  // running inline with the user program; if we knew we were going to be
-  // handling a batch of dispatches we could reduce the amount of times we call
-  // this - but that's what the task system is for.
-  iree_hal_inline_command_buffer_update_processor_id(command_buffer);
-
-  iree_hal_executable_dispatch_state_v0_t* dispatch_state =
-      &command_buffer->state.dispatch_state;
-
-  // TODO(benvanik): expose on API or keep fixed on executable.
-  dispatch_state->workgroup_size_x = 1;
-  dispatch_state->workgroup_size_y = 1;
-  dispatch_state->workgroup_size_z = 1;
-  dispatch_state->workgroup_count_x = workgroup_x;
-  dispatch_state->workgroup_count_y = workgroup_y;
-  dispatch_state->workgroup_count_z = workgroup_z;
-
-  // Single-threaded.
-  dispatch_state->max_concurrency = 1;
-
-  // Push constants are pulled directly from the command buffer state, but we
-  // only allow the dispatch to read what we know is initialized based on the
-  // layout.
-  dispatch_state->push_constant_count = local_layout->push_constants;
-  dispatch_state->push_constants = command_buffer->state.push_constants;
-
-  // Produce the dense binding list based on the declared bindings used.
-  // This allows us to change the descriptor sets and bindings counts supported
-  // in the HAL independent of any executable as each executable just gets the
-  // flat dense list and doesn't care about our descriptor set stuff.
-  //
-  // Note that we are just directly setting the binding data pointers here with
-  // no ownership/retaining/etc - it's part of the HAL contract that buffers are
-  // kept valid for the duration they may be in use.
-  iree_hal_local_binding_mask_t used_binding_mask = local_layout->used_bindings;
-  iree_host_size_t used_binding_count =
-      iree_math_count_ones_u64(used_binding_mask);
-  dispatch_state->binding_count = used_binding_count;
-  void** binding_ptrs = (void**)dispatch_state->binding_ptrs;
-  size_t* binding_lengths = (size_t*)dispatch_state->binding_lengths;
-  iree_host_size_t binding_base = 0;
-  for (iree_host_size_t i = 0; i < used_binding_count; ++i) {
-    int mask_offset = iree_math_count_trailing_zeros_u64(used_binding_mask);
-    int binding_ordinal = binding_base + mask_offset;
-    binding_base += mask_offset + 1;
-    used_binding_mask = iree_shr(used_binding_mask, mask_offset + 1);
-    binding_ptrs[i] = command_buffer->state.full_bindings[binding_ordinal];
-    if (!binding_ptrs[i]) {
-      return iree_make_status(IREE_STATUS_FAILED_PRECONDITION,
-                              "(flat) binding %d is NULL", binding_ordinal);
-    }
-    binding_lengths[i] =
-        command_buffer->state.full_binding_lengths[binding_ordinal];
-  }
-
-  // TODO(benvanik): plumb through an arena or fixed-size reservation to use.
-  // For now when deploying to devices where you want something like the
-  // inline command buffer you probably don't want 256KB of transient memory
-  // getting allocated and retained implicitly - this should be a compiler
-  // option. For now we just malloc here to make things work and strongly
-  // encourage the kind of user who wants synchronous inline execution to not
-  // also want tons of scratch memory.
-  iree_byte_span_t local_memory = iree_make_byte_span(NULL, local_memory_size);
-  if (local_memory_size > 0) {
-    IREE_RETURN_IF_ERROR(iree_allocator_malloc(command_buffer->host_allocator,
-                                               local_memory_size,
-                                               (void**)&local_memory.data));
-  }
-
-  // Since we are running on a borrowed thread, we know nothing about the
-  // floating point state. Reset it.
-  iree_fpu_state_t fpu_state =
-      iree_fpu_state_push(IREE_FPU_STATE_FLAG_FLUSH_DENORMALS_TO_ZERO);
-  iree_status_t status = iree_hal_local_executable_issue_dispatch_inline(
-      local_executable, entry_point, dispatch_state,
-      command_buffer->state.processor_id, local_memory);
-  iree_fpu_state_pop(fpu_state);
-
-  if (local_memory.data) {
-    iree_allocator_free(command_buffer->host_allocator, local_memory.data);
-  }
-  return status;
-}
-
-static iree_status_t iree_hal_inline_command_buffer_dispatch2(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_executable_t* executable, int32_t entry_point,
     const uint32_t workgroup_count[3], iree_const_byte_span_t constants,
     iree_hal_buffer_ref_list_t bindings, iree_hal_dispatch_flags_t flags) {
   iree_hal_inline_command_buffer_t* command_buffer =
@@ -600,8 +397,8 @@
         (uint32_t)dispatch_attrs.constant_count,
         constants.data_length / sizeof(uint32_t));
   }
-  dispatch_state->push_constant_count = dispatch_attrs.constant_count;
-  dispatch_state->push_constants = (const uint32_t*)constants.data;
+  dispatch_state->constant_count = dispatch_attrs.constant_count;
+  dispatch_state->constants = (const uint32_t*)constants.data;
 
   // Produce the dense binding list based on the declared bindings used.
   //
@@ -676,23 +473,6 @@
 static iree_status_t iree_hal_inline_command_buffer_dispatch_indirect(
     iree_hal_command_buffer_t* base_command_buffer,
     iree_hal_executable_t* executable, int32_t entry_point,
-    iree_hal_buffer_ref_t workgroups_ref, iree_hal_dispatch_flags_t flags) {
-  // TODO(benvanik): track mapping so we can properly map/unmap/flush/etc.
-  iree_hal_buffer_mapping_t buffer_mapping = {{0}};
-  IREE_RETURN_IF_ERROR(iree_hal_buffer_map_range(
-      workgroups_ref.buffer, IREE_HAL_MAPPING_MODE_PERSISTENT,
-      IREE_HAL_MEMORY_ACCESS_READ, workgroups_ref.offset, 3 * sizeof(uint32_t),
-      &buffer_mapping));
-  iree_hal_vec3_t workgroup_count =
-      *(const iree_hal_vec3_t*)buffer_mapping.contents.data;
-  return iree_hal_inline_command_buffer_dispatch(
-      base_command_buffer, executable, entry_point, workgroup_count.x,
-      workgroup_count.y, workgroup_count.z, flags);
-}
-
-static iree_status_t iree_hal_inline_command_buffer_dispatch2_indirect(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_executable_t* executable, int32_t entry_point,
     iree_hal_buffer_ref_t workgroups_ref, iree_const_byte_span_t constants,
     iree_hal_buffer_ref_list_t bindings, iree_hal_dispatch_flags_t flags) {
   // TODO(benvanik): track mapping so we can properly map/unmap/flush/etc.
@@ -703,7 +483,7 @@
       &buffer_mapping));
   iree_hal_vec3_t workgroup_count =
       *(const iree_hal_vec3_t*)buffer_mapping.contents.data;
-  return iree_hal_inline_command_buffer_dispatch2(
+  return iree_hal_inline_command_buffer_dispatch(
       base_command_buffer, executable, entry_point, workgroup_count.value,
       constants, bindings, flags);
 }
@@ -728,11 +508,6 @@
         .update_buffer = iree_hal_inline_command_buffer_update_buffer,
         .copy_buffer = iree_hal_inline_command_buffer_copy_buffer,
         .collective = iree_hal_inline_command_buffer_collective,
-        .push_constants = iree_hal_inline_command_buffer_push_constants,
-        .push_descriptor_set =
-            iree_hal_inline_command_buffer_push_descriptor_set,
         .dispatch = iree_hal_inline_command_buffer_dispatch,
         .dispatch_indirect = iree_hal_inline_command_buffer_dispatch_indirect,
-        .dispatch2 = iree_hal_inline_command_buffer_dispatch2,
-        .dispatch2_indirect = iree_hal_inline_command_buffer_dispatch2_indirect,
 };
diff --git a/runtime/src/iree/hal/local/loaders/embedded_elf_loader.c b/runtime/src/iree/hal/local/loaders/embedded_elf_loader.c
index 2a328b7..7cc1863 100644
--- a/runtime/src/iree/hal/local/loaders/embedded_elf_loader.c
+++ b/runtime/src/iree/hal/local/loaders/embedded_elf_loader.c
@@ -35,8 +35,6 @@
     const iree_hal_executable_library_header_t** header;
     const iree_hal_executable_library_v0_t* v0;
   } library;
-
-  iree_hal_pipeline_layout_t* layouts[];
 } iree_hal_elf_executable_t;
 
 static const iree_hal_local_executable_vtable_t iree_hal_elf_executable_vtable;
@@ -91,8 +89,6 @@
   IREE_ASSERT_ARGUMENT(executable_params);
   IREE_ASSERT_ARGUMENT(executable_params->executable_data.data &&
                        executable_params->executable_data.data_length);
-  IREE_ASSERT_ARGUMENT(!executable_params->pipeline_layout_count ||
-                       executable_params->pipeline_layouts);
   IREE_ASSERT_ARGUMENT(!executable_params->constant_count ||
                        executable_params->constants);
   IREE_ASSERT_ARGUMENT(out_executable);
@@ -105,24 +101,18 @@
   iree_hal_elf_executable_t* executable = NULL;
   iree_host_size_t total_size =
       sizeof(*executable) +
-      executable_params->pipeline_layout_count * sizeof(*executable->layouts) +
       executable_params->constant_count * sizeof(*executable_params->constants);
   iree_status_t status =
       iree_allocator_malloc(host_allocator, total_size, (void**)&executable);
   if (iree_status_is_ok(status)) {
-    iree_hal_local_executable_initialize(
-        &iree_hal_elf_executable_vtable,
-        executable_params->pipeline_layout_count,
-        executable_params->pipeline_layouts, &executable->layouts[0],
-        host_allocator, &executable->base);
+    iree_hal_local_executable_initialize(&iree_hal_elf_executable_vtable,
+                                         host_allocator, &executable->base);
   }
 
   // Copy executable constants so we own them.
   if (iree_status_is_ok(status) && executable_params->constant_count > 0) {
     uint32_t* target_constants =
-        (uint32_t*)((uint8_t*)executable + sizeof(*executable) +
-                    executable_params->pipeline_layout_count *
-                        sizeof(*executable->layouts));
+        (uint32_t*)((uint8_t*)executable + sizeof(*executable));
     memcpy(target_constants, executable_params->constants,
            executable_params->constant_count *
                sizeof(*executable_params->constants));
diff --git a/runtime/src/iree/hal/local/loaders/static_library_loader.c b/runtime/src/iree/hal/local/loaders/static_library_loader.c
index a1486c1..8f27b38 100644
--- a/runtime/src/iree/hal/local/loaders/static_library_loader.c
+++ b/runtime/src/iree/hal/local/loaders/static_library_loader.c
@@ -30,8 +30,6 @@
     const iree_hal_executable_library_header_t** header;
     const iree_hal_executable_library_v0_t* v0;
   } library;
-
-  iree_hal_pipeline_layout_t* layouts[];
 } iree_hal_static_executable_t;
 
 static const iree_hal_local_executable_vtable_t
@@ -49,8 +47,6 @@
     const iree_hal_executable_import_provider_t import_provider,
     iree_allocator_t host_allocator, iree_hal_executable_t** out_executable) {
   IREE_ASSERT_ARGUMENT(executable_params);
-  IREE_ASSERT_ARGUMENT(!executable_params->pipeline_layout_count ||
-                       executable_params->pipeline_layouts);
   IREE_ASSERT_ARGUMENT(!executable_params->constant_count ||
                        executable_params->constants);
   IREE_ASSERT_ARGUMENT(library_header);
@@ -61,16 +57,12 @@
   iree_hal_static_executable_t* executable = NULL;
   iree_host_size_t total_size =
       sizeof(*executable) +
-      executable_params->pipeline_layout_count * sizeof(*executable->layouts) +
       executable_params->constant_count * sizeof(*executable_params->constants);
   iree_status_t status =
       iree_allocator_malloc(host_allocator, total_size, (void**)&executable);
   if (iree_status_is_ok(status)) {
-    iree_hal_local_executable_initialize(
-        &iree_hal_static_executable_vtable,
-        executable_params->pipeline_layout_count,
-        executable_params->pipeline_layouts, &executable->layouts[0],
-        host_allocator, &executable->base);
+    iree_hal_local_executable_initialize(&iree_hal_static_executable_vtable,
+                                         host_allocator, &executable->base);
     executable->library.header = library_header;
     executable->identifier = iree_make_cstring_view((*library_header)->name);
     executable->base.dispatch_attrs = executable->library.v0->exports.attrs;
@@ -79,9 +71,7 @@
   // Copy executable constants so we own them.
   if (iree_status_is_ok(status) && executable_params->constant_count > 0) {
     uint32_t* target_constants =
-        (uint32_t*)((uint8_t*)executable + sizeof(*executable) +
-                    executable_params->pipeline_layout_count *
-                        sizeof(*executable->layouts));
+        (uint32_t*)((uint8_t*)executable + sizeof(*executable));
     memcpy(target_constants, executable_params->constants,
            executable_params->constant_count *
                sizeof(*executable_params->constants));
diff --git a/runtime/src/iree/hal/local/loaders/system_library_loader.c b/runtime/src/iree/hal/local/loaders/system_library_loader.c
index 6e065a7..23734e9 100644
--- a/runtime/src/iree/hal/local/loaders/system_library_loader.c
+++ b/runtime/src/iree/hal/local/loaders/system_library_loader.c
@@ -88,8 +88,6 @@
     const iree_hal_executable_library_header_t** header;
     const iree_hal_executable_library_v0_t* v0;
   } library;
-
-  iree_hal_pipeline_layout_t* layouts[];
 } iree_hal_system_executable_t;
 
 static const iree_hal_local_executable_vtable_t
@@ -217,8 +215,6 @@
   IREE_ASSERT_ARGUMENT(executable_params);
   IREE_ASSERT_ARGUMENT(executable_params->executable_data.data &&
                        executable_params->executable_data.data_length);
-  IREE_ASSERT_ARGUMENT(!executable_params->pipeline_layout_count ||
-                       executable_params->pipeline_layouts);
   IREE_ASSERT_ARGUMENT(!executable_params->constant_count ||
                        executable_params->constants);
   IREE_ASSERT_ARGUMENT(out_executable);
@@ -228,24 +224,18 @@
   iree_hal_system_executable_t* executable = NULL;
   iree_host_size_t total_size =
       sizeof(*executable) +
-      executable_params->pipeline_layout_count * sizeof(*executable->layouts) +
       executable_params->constant_count * sizeof(*executable_params->constants);
   iree_status_t status =
       iree_allocator_malloc(host_allocator, total_size, (void**)&executable);
   if (iree_status_is_ok(status)) {
-    iree_hal_local_executable_initialize(
-        &iree_hal_system_executable_vtable,
-        executable_params->pipeline_layout_count,
-        executable_params->pipeline_layouts, &executable->layouts[0],
-        host_allocator, &executable->base);
+    iree_hal_local_executable_initialize(&iree_hal_system_executable_vtable,
+                                         host_allocator, &executable->base);
   }
 
   // Copy executable constants so we own them.
   if (iree_status_is_ok(status) && executable_params->constant_count > 0) {
     uint32_t* target_constants =
-        (uint32_t*)((uint8_t*)executable + sizeof(*executable) +
-                    executable_params->pipeline_layout_count *
-                        sizeof(*executable->layouts));
+        (uint32_t*)((uint8_t*)executable + sizeof(*executable));
     memcpy(target_constants, executable_params->constants,
            executable_params->constant_count *
                sizeof(*executable_params->constants));
diff --git a/runtime/src/iree/hal/local/loaders/vmvx_module_loader.c b/runtime/src/iree/hal/local/loaders/vmvx_module_loader.c
index 2675f8e..78663ba 100644
--- a/runtime/src/iree/hal/local/loaders/vmvx_module_loader.c
+++ b/runtime/src/iree/hal/local/loaders/vmvx_module_loader.c
@@ -221,8 +221,6 @@
   IREE_ASSERT_ARGUMENT(instance);
   IREE_ASSERT_ARGUMENT(bytecode_module);
   IREE_ASSERT_ARGUMENT(executable_params);
-  IREE_ASSERT_ARGUMENT(!executable_params->pipeline_layout_count ||
-                       executable_params->pipeline_layouts);
   IREE_ASSERT_ARGUMENT(out_executable);
   *out_executable = NULL;
   IREE_TRACE_ZONE_BEGIN(z0);
@@ -230,30 +228,17 @@
   // NOTE: pipeline layouts are optional but if provided must be consistent.
   iree_host_size_t entry_count =
       iree_vm_module_signature(bytecode_module).export_function_count;
-  if (executable_params->pipeline_layout_count > 0 &&
-      entry_count != executable_params->pipeline_layout_count) {
-    return iree_make_status(IREE_STATUS_FAILED_PRECONDITION,
-                            "executable provides %" PRIhsz
-                            " entry points but caller "
-                            "provided %" PRIhsz "; must match",
-                            entry_count,
-                            executable_params->pipeline_layout_count);
-  }
 
   iree_hal_vmvx_executable_t* executable = NULL;
   const iree_host_size_t entry_fn_ordinals_size =
       iree_host_align(entry_count * sizeof(*executable->entry_fn_ordinals), 8);
   const iree_host_size_t dispatch_attrs_size = iree_host_align(
       entry_count * sizeof(*executable->base.dispatch_attrs), 8);
-  const iree_host_size_t pipeline_layouts_size =
-      iree_host_align(executable_params->pipeline_layout_count *
-                          sizeof(iree_hal_pipeline_layout_t*),
-                      8);
   const iree_host_size_t worker_states_size =
       iree_host_align(worker_capacity * sizeof(*executable->worker_states), 8);
-  const iree_host_size_t total_size =
-      sizeof(*executable) + entry_fn_ordinals_size + dispatch_attrs_size +
-      pipeline_layouts_size + worker_states_size;
+  const iree_host_size_t total_size = sizeof(*executable) +
+                                      entry_fn_ordinals_size +
+                                      dispatch_attrs_size + worker_states_size;
   iree_status_t status =
       iree_allocator_malloc(host_allocator, total_size, (void**)&executable);
   iree_hal_executable_dispatch_attrs_v0_t* dispatch_attrs = NULL;
@@ -262,14 +247,8 @@
         (uint8_t*)executable + sizeof(*executable) + entry_fn_ordinals_size;
     dispatch_attrs = (iree_hal_executable_dispatch_attrs_v0_t*)ptr;
     ptr += dispatch_attrs_size;
-    iree_hal_pipeline_layout_t** pipeline_layouts_ptr =
-        (iree_hal_pipeline_layout_t**)ptr;
-    ptr += pipeline_layouts_size;
-    iree_hal_local_executable_initialize(
-        &iree_hal_vmvx_executable_vtable,
-        executable_params->pipeline_layout_count,
-        executable_params->pipeline_layouts, pipeline_layouts_ptr,
-        host_allocator, &executable->base);
+    iree_hal_local_executable_initialize(&iree_hal_vmvx_executable_vtable,
+                                         host_allocator, &executable->base);
     executable->base.dispatch_attrs = dispatch_attrs;
 
     executable->worker_capacity = worker_capacity;
@@ -455,9 +434,8 @@
   iree_vm_buffer_t constants_buffer;
   iree_vm_buffer_initialize(
       IREE_VM_BUFFER_ACCESS_ORIGIN_HOST,
-      iree_make_byte_span(
-          (void*)dispatch_state->push_constants,
-          sizeof(uint32_t) * dispatch_state->push_constant_count),
+      iree_make_byte_span((void*)dispatch_state->constants,
+                          sizeof(uint32_t) * dispatch_state->constant_count),
       iree_allocator_null(), &constants_buffer);
 
   // Prepare call argument buffer. We've verified the signature on creation and
diff --git a/runtime/src/iree/hal/local/local_executable.c b/runtime/src/iree/hal/local/local_executable.c
index 9565e48..b9c8d04 100644
--- a/runtime/src/iree/hal/local/local_executable.c
+++ b/runtime/src/iree/hal/local/local_executable.c
@@ -10,21 +10,11 @@
 
 void iree_hal_local_executable_initialize(
     const iree_hal_local_executable_vtable_t* vtable,
-    iree_host_size_t pipeline_layout_count,
-    iree_hal_pipeline_layout_t* const* source_pipeline_layouts,
-    iree_hal_pipeline_layout_t** target_pipeline_layouts,
     iree_allocator_t host_allocator,
     iree_hal_local_executable_t* out_base_executable) {
   iree_hal_resource_initialize(vtable, &out_base_executable->resource);
   out_base_executable->host_allocator = host_allocator;
 
-  out_base_executable->pipeline_layout_count = pipeline_layout_count;
-  out_base_executable->pipeline_layouts = target_pipeline_layouts;
-  for (iree_host_size_t i = 0; i < pipeline_layout_count; ++i) {
-    target_pipeline_layouts[i] = source_pipeline_layouts[i];
-    iree_hal_pipeline_layout_retain(source_pipeline_layouts[i]);
-  }
-
   // Function attributes are optional and populated by the parent type.
   out_base_executable->dispatch_attrs = NULL;
 
@@ -34,12 +24,7 @@
 }
 
 void iree_hal_local_executable_deinitialize(
-    iree_hal_local_executable_t* base_executable) {
-  for (iree_host_size_t i = 0; i < base_executable->pipeline_layout_count;
-       ++i) {
-    iree_hal_pipeline_layout_release(base_executable->pipeline_layouts[i]);
-  }
-}
+    iree_hal_local_executable_t* base_executable) {}
 
 iree_hal_local_executable_t* iree_hal_local_executable_cast(
     iree_hal_executable_t* base_value) {
@@ -78,7 +63,7 @@
     int xyz_string_length =
         snprintf(xyz_string, IREE_ARRAYSIZE(xyz_string), "%ux%ux%u",
                  workgroup_count_x, workgroup_count_y, workgroup_count_z);
-    IREE_TRACE_ZONE_APPEND_TEXT_STRING_VIEW(z0, xyz_string, xyz_string_length);
+    IREE_TRACE_ZONE_APPEND_TEXT(z0, xyz_string, xyz_string_length);
   });
 #endif  // IREE_HAL_VERBOSE_TRACING_ENABLE
 
diff --git a/runtime/src/iree/hal/local/local_executable.h b/runtime/src/iree/hal/local/local_executable.h
index b6e2445..a9eb027 100644
--- a/runtime/src/iree/hal/local/local_executable.h
+++ b/runtime/src/iree/hal/local/local_executable.h
@@ -19,16 +19,6 @@
   iree_hal_resource_t resource;
   iree_allocator_t host_allocator;
 
-  // Optional pipeline layout
-  // Not all users require the layouts (such as when directly calling executable
-  // functions) and in those cases they can be omitted. Users routing through
-  // the HAL command buffer APIs will usually require them.
-  //
-  // TODO(benvanik): make this a flag we set and can query instead - poking into
-  // this from dispatch code is a layering violation.
-  iree_host_size_t pipeline_layout_count;
-  iree_hal_pipeline_layout_t** pipeline_layouts;
-
   // Defines per-entry point how much workgroup local memory is required.
   // Contains entries with 0 to indicate no local memory is required or >0 in
   // units of IREE_HAL_EXECUTABLE_WORKGROUP_LOCAL_MEMORY_PAGE_SIZE for the
@@ -50,14 +40,8 @@
 } iree_hal_local_executable_vtable_t;
 
 // Initializes the local executable base type.
-//
-// Callers must allocate memory for |target_pipeline_layouts| with at least
-// `pipeline_layout_count * sizeof(*target_pipeline_layouts)` bytes.
 void iree_hal_local_executable_initialize(
     const iree_hal_local_executable_vtable_t* vtable,
-    iree_host_size_t pipeline_layout_count,
-    iree_hal_pipeline_layout_t* const* source_pipeline_layouts,
-    iree_hal_pipeline_layout_t** target_pipeline_layouts,
     iree_allocator_t host_allocator,
     iree_hal_local_executable_t* out_base_executable);
 
diff --git a/runtime/src/iree/hal/local/local_pipeline_layout.c b/runtime/src/iree/hal/local/local_pipeline_layout.c
deleted file mode 100644
index d1b3cf6..0000000
--- a/runtime/src/iree/hal/local/local_pipeline_layout.c
+++ /dev/null
@@ -1,180 +0,0 @@
-// Copyright 2020 The IREE Authors
-//
-// Licensed under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-
-#include "iree/hal/local/local_pipeline_layout.h"
-
-#include <stddef.h>
-#include <string.h>
-
-//===----------------------------------------------------------------------===//
-// iree_hal_local_descriptor_set_layout_t
-//===----------------------------------------------------------------------===//
-
-static const iree_hal_descriptor_set_layout_vtable_t
-    iree_hal_local_descriptor_set_layout_vtable;
-
-iree_hal_local_descriptor_set_layout_t*
-iree_hal_local_descriptor_set_layout_cast(
-    iree_hal_descriptor_set_layout_t* base_value) {
-  IREE_HAL_ASSERT_TYPE(base_value,
-                       &iree_hal_local_descriptor_set_layout_vtable);
-  return (iree_hal_local_descriptor_set_layout_t*)base_value;
-}
-
-iree_status_t iree_hal_local_descriptor_set_layout_create(
-    iree_hal_descriptor_set_layout_flags_t flags,
-    iree_host_size_t binding_count,
-    const iree_hal_descriptor_set_layout_binding_t* bindings,
-    iree_allocator_t host_allocator,
-    iree_hal_descriptor_set_layout_t** out_descriptor_set_layout) {
-  IREE_ASSERT_ARGUMENT(!binding_count || bindings);
-  IREE_ASSERT_ARGUMENT(out_descriptor_set_layout);
-  *out_descriptor_set_layout = NULL;
-  if (binding_count > IREE_HAL_LOCAL_MAX_DESCRIPTOR_BINDING_COUNT) {
-    return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
-                            "binding count %" PRIhsz " over the limit of %d",
-                            binding_count,
-                            IREE_HAL_LOCAL_MAX_DESCRIPTOR_BINDING_COUNT);
-  }
-
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  iree_hal_local_descriptor_set_layout_t* layout = NULL;
-  iree_host_size_t total_size =
-      sizeof(*layout) + binding_count * sizeof(*layout->bindings);
-  iree_status_t status =
-      iree_allocator_malloc(host_allocator, total_size, (void**)&layout);
-  if (iree_status_is_ok(status)) {
-    iree_hal_resource_initialize(&iree_hal_local_descriptor_set_layout_vtable,
-                                 &layout->resource);
-    layout->host_allocator = host_allocator;
-    layout->flags = flags;
-    layout->binding_count = binding_count;
-    memcpy(layout->bindings, bindings,
-           binding_count * sizeof(iree_hal_descriptor_set_layout_binding_t));
-    *out_descriptor_set_layout = (iree_hal_descriptor_set_layout_t*)layout;
-  }
-
-  IREE_TRACE_ZONE_END(z0);
-  return status;
-}
-
-static void iree_hal_local_descriptor_set_layout_destroy(
-    iree_hal_descriptor_set_layout_t* base_layout) {
-  iree_hal_local_descriptor_set_layout_t* layout =
-      iree_hal_local_descriptor_set_layout_cast(base_layout);
-  iree_allocator_t host_allocator = layout->host_allocator;
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  iree_allocator_free(host_allocator, layout);
-
-  IREE_TRACE_ZONE_END(z0);
-}
-
-static const iree_hal_descriptor_set_layout_vtable_t
-    iree_hal_local_descriptor_set_layout_vtable = {
-        .destroy = iree_hal_local_descriptor_set_layout_destroy,
-};
-
-//===----------------------------------------------------------------------===//
-// iree_hal_local_pipeline_layout_t
-//===----------------------------------------------------------------------===//
-
-static const iree_hal_pipeline_layout_vtable_t
-    iree_hal_local_pipeline_layout_vtable;
-
-iree_hal_local_pipeline_layout_t* iree_hal_local_pipeline_layout_cast(
-    iree_hal_pipeline_layout_t* base_value) {
-  IREE_HAL_ASSERT_TYPE(base_value, &iree_hal_local_pipeline_layout_vtable);
-  return (iree_hal_local_pipeline_layout_t*)base_value;
-}
-
-iree_status_t iree_hal_local_pipeline_layout_create(
-    iree_host_size_t push_constants, iree_host_size_t set_layout_count,
-    iree_hal_descriptor_set_layout_t* const* set_layouts,
-    iree_allocator_t host_allocator,
-    iree_hal_pipeline_layout_t** out_pipeline_layout) {
-  IREE_ASSERT_ARGUMENT(!set_layout_count || set_layouts);
-  IREE_ASSERT_ARGUMENT(out_pipeline_layout);
-  *out_pipeline_layout = NULL;
-  if (set_layout_count > IREE_HAL_LOCAL_MAX_DESCRIPTOR_SET_COUNT) {
-    return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
-                            "set layout count %" PRIhsz " over the limit of %d",
-                            set_layout_count,
-                            IREE_HAL_LOCAL_MAX_DESCRIPTOR_SET_COUNT);
-  }
-  if (push_constants > IREE_HAL_LOCAL_MAX_PUSH_CONSTANT_COUNT) {
-    return iree_make_status(
-        IREE_STATUS_INVALID_ARGUMENT,
-        "push constant count %" PRIhsz " over the limit of %d", push_constants,
-        IREE_HAL_LOCAL_MAX_PUSH_CONSTANT_COUNT);
-  }
-
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  iree_host_size_t total_size =
-      sizeof(iree_hal_local_pipeline_layout_t) +
-      set_layout_count * sizeof(iree_hal_descriptor_set_layout_t*);
-
-  iree_hal_local_pipeline_layout_t* layout = NULL;
-  iree_status_t status =
-      iree_allocator_malloc(host_allocator, total_size, (void**)&layout);
-  if (iree_status_is_ok(status)) {
-    iree_hal_resource_initialize(&iree_hal_local_pipeline_layout_vtable,
-                                 &layout->resource);
-    layout->host_allocator = host_allocator;
-    layout->push_constants = push_constants;
-    layout->used_bindings = 0;
-    layout->read_only_bindings = 0;
-    layout->set_layout_count = set_layout_count;
-    for (iree_host_size_t i = 0; i < set_layout_count; ++i) {
-      layout->set_layouts[i] = set_layouts[i];
-      iree_hal_descriptor_set_layout_retain(layout->set_layouts[i]);
-
-      iree_hal_local_descriptor_set_layout_t* local_set_layout =
-          iree_hal_local_descriptor_set_layout_cast(set_layouts[i]);
-      for (iree_host_size_t j = 0; j < local_set_layout->binding_count; ++j) {
-        // Track that this binding is used in the sparse set.
-        const iree_hal_local_binding_mask_t binding_bit =
-            1ull << (i * IREE_HAL_LOCAL_MAX_DESCRIPTOR_BINDING_COUNT + j);
-        layout->used_bindings |= binding_bit;
-
-        // Track which bindings are read-only so we can protect memory and
-        // verify usage.
-        const iree_hal_descriptor_set_layout_binding_t* binding =
-            &local_set_layout->bindings[j];
-        if (iree_all_bits_set(binding->flags,
-                              IREE_HAL_DESCRIPTOR_FLAG_READ_ONLY)) {
-          layout->read_only_bindings |= binding_bit;
-        }
-      }
-    }
-    *out_pipeline_layout = (iree_hal_pipeline_layout_t*)layout;
-  }
-
-  IREE_TRACE_ZONE_END(z0);
-  return status;
-}
-
-static void iree_hal_local_pipeline_layout_destroy(
-    iree_hal_pipeline_layout_t* base_layout) {
-  iree_hal_local_pipeline_layout_t* layout =
-      iree_hal_local_pipeline_layout_cast(base_layout);
-  iree_allocator_t host_allocator = layout->host_allocator;
-  IREE_TRACE_ZONE_BEGIN(z0);
-
-  for (iree_host_size_t i = 0; i < layout->set_layout_count; ++i) {
-    iree_hal_descriptor_set_layout_release(layout->set_layouts[i]);
-  }
-  iree_allocator_free(host_allocator, layout);
-
-  IREE_TRACE_ZONE_END(z0);
-}
-
-static const iree_hal_pipeline_layout_vtable_t
-    iree_hal_local_pipeline_layout_vtable = {
-        .destroy = iree_hal_local_pipeline_layout_destroy,
-};
diff --git a/runtime/src/iree/hal/local/local_pipeline_layout.h b/runtime/src/iree/hal/local/local_pipeline_layout.h
deleted file mode 100644
index adc57e6..0000000
--- a/runtime/src/iree/hal/local/local_pipeline_layout.h
+++ /dev/null
@@ -1,79 +0,0 @@
-// Copyright 2020 The IREE Authors
-//
-// Licensed under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-
-#ifndef IREE_HAL_LOCAL_LOCAL_PIPELINE_LAYOUT_H_
-#define IREE_HAL_LOCAL_LOCAL_PIPELINE_LAYOUT_H_
-
-#include <stdint.h>
-
-#include "iree/base/api.h"
-#include "iree/hal/api.h"
-
-#ifdef __cplusplus
-extern "C" {
-#endif  // __cplusplus
-
-//===----------------------------------------------------------------------===//
-// iree_hal_local_descriptor_set_layout_t
-//===----------------------------------------------------------------------===//
-
-#define IREE_HAL_LOCAL_MAX_DESCRIPTOR_BINDING_COUNT 32
-
-typedef struct iree_hal_local_descriptor_set_layout_t {
-  iree_hal_resource_t resource;
-  iree_allocator_t host_allocator;
-  iree_hal_descriptor_set_layout_flags_t flags;
-  iree_host_size_t binding_count;
-  iree_hal_descriptor_set_layout_binding_t bindings[];
-} iree_hal_local_descriptor_set_layout_t;
-
-iree_status_t iree_hal_local_descriptor_set_layout_create(
-    iree_hal_descriptor_set_layout_flags_t flags,
-    iree_host_size_t binding_count,
-    const iree_hal_descriptor_set_layout_binding_t* bindings,
-    iree_allocator_t host_allocator,
-    iree_hal_descriptor_set_layout_t** out_descriptor_set_layout);
-
-iree_hal_local_descriptor_set_layout_t*
-iree_hal_local_descriptor_set_layout_cast(
-    iree_hal_descriptor_set_layout_t* base_value);
-
-//===----------------------------------------------------------------------===//
-// iree_hal_local_pipeline_layout_t
-//===----------------------------------------------------------------------===//
-
-#define IREE_HAL_LOCAL_MAX_DESCRIPTOR_SET_COUNT 2
-#define IREE_HAL_LOCAL_MAX_PUSH_CONSTANT_COUNT 64
-
-typedef uint64_t iree_hal_local_binding_mask_t;
-
-#define IREE_HAL_LOCAL_BINDING_MASK_BITS \
-  (sizeof(iree_hal_local_binding_mask_t) * 8)
-
-typedef struct iree_hal_local_pipeline_layout_t {
-  iree_hal_resource_t resource;
-  iree_allocator_t host_allocator;
-  iree_host_size_t push_constants;
-  iree_hal_local_binding_mask_t used_bindings;
-  iree_hal_local_binding_mask_t read_only_bindings;
-  iree_host_size_t set_layout_count;
-  iree_hal_descriptor_set_layout_t* set_layouts[];
-} iree_hal_local_pipeline_layout_t;
-
-iree_status_t iree_hal_local_pipeline_layout_create(
-    iree_host_size_t push_constants, iree_host_size_t set_layout_count,
-    iree_hal_descriptor_set_layout_t* const* set_layouts,
-    iree_allocator_t host_allocator,
-    iree_hal_pipeline_layout_t** out_pipeline_layout);
-
-iree_hal_local_pipeline_layout_t* iree_hal_local_pipeline_layout_cast(
-    iree_hal_pipeline_layout_t* base_value);
-
-#ifdef __cplusplus
-}  // extern "C"
-#endif  // __cplusplus
-
-#endif  // IREE_HAL_LOCAL_LOCAL_PIPELINE_LAYOUT_H_
diff --git a/runtime/src/iree/hal/pipeline_layout.c b/runtime/src/iree/hal/pipeline_layout.c
deleted file mode 100644
index 3153d48..0000000
--- a/runtime/src/iree/hal/pipeline_layout.c
+++ /dev/null
@@ -1,72 +0,0 @@
-// Copyright 2020 The IREE Authors
-//
-// Licensed under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-
-#include "iree/hal/pipeline_layout.h"
-
-#include <stddef.h>
-
-#include "iree/hal/detail.h"
-#include "iree/hal/device.h"
-#include "iree/hal/resource.h"
-
-//===----------------------------------------------------------------------===//
-// iree_hal_descriptor_set_layout_t
-//===----------------------------------------------------------------------===//
-
-#define _VTABLE_DISPATCH(descriptor_set_layout, method_name) \
-  IREE_HAL_VTABLE_DISPATCH(descriptor_set_layout,            \
-                           iree_hal_descriptor_set_layout, method_name)
-
-IREE_HAL_API_RETAIN_RELEASE(descriptor_set_layout);
-
-IREE_API_EXPORT iree_status_t iree_hal_descriptor_set_layout_create(
-    iree_hal_device_t* device, iree_hal_descriptor_set_layout_flags_t flags,
-    iree_host_size_t binding_count,
-    const iree_hal_descriptor_set_layout_binding_t* bindings,
-    iree_hal_descriptor_set_layout_t** out_descriptor_set_layout) {
-  IREE_ASSERT_ARGUMENT(device);
-  IREE_ASSERT_ARGUMENT(!binding_count || bindings);
-  IREE_ASSERT_ARGUMENT(out_descriptor_set_layout);
-  *out_descriptor_set_layout = NULL;
-  IREE_TRACE_ZONE_BEGIN(z0);
-  iree_status_t status = IREE_HAL_VTABLE_DISPATCH(device, iree_hal_device,
-                                                  create_descriptor_set_layout)(
-      device, flags, binding_count, bindings, out_descriptor_set_layout);
-  IREE_TRACE_ZONE_END(z0);
-  return status;
-}
-
-#undef _VTABLE_DISPATCH
-
-//===----------------------------------------------------------------------===//
-// iree_hal_pipeline_layout_t
-//===----------------------------------------------------------------------===//
-
-#define _VTABLE_DISPATCH(pipeline_layout, method_name)                \
-  IREE_HAL_VTABLE_DISPATCH(pipeline_layout, iree_hal_pipeline_layout, \
-                           method_name)
-
-IREE_HAL_API_RETAIN_RELEASE(pipeline_layout);
-
-IREE_API_EXPORT iree_status_t iree_hal_pipeline_layout_create(
-    iree_hal_device_t* device, iree_host_size_t push_constants,
-    iree_host_size_t set_layout_count,
-    iree_hal_descriptor_set_layout_t* const* set_layouts,
-    iree_hal_pipeline_layout_t** out_pipeline_layout) {
-  IREE_ASSERT_ARGUMENT(device);
-  IREE_ASSERT_ARGUMENT(!set_layout_count || set_layouts);
-  IREE_ASSERT_ARGUMENT(out_pipeline_layout);
-  *out_pipeline_layout = NULL;
-  IREE_TRACE_ZONE_BEGIN(z0);
-  iree_status_t status =
-      IREE_HAL_VTABLE_DISPATCH(device, iree_hal_device, create_pipeline_layout)(
-          device, push_constants, set_layout_count, set_layouts,
-          out_pipeline_layout);
-  IREE_TRACE_ZONE_END(z0);
-  return status;
-}
-
-#undef _VTABLE_DISPATCH
diff --git a/runtime/src/iree/hal/pipeline_layout.h b/runtime/src/iree/hal/pipeline_layout.h
deleted file mode 100644
index d090868..0000000
--- a/runtime/src/iree/hal/pipeline_layout.h
+++ /dev/null
@@ -1,171 +0,0 @@
-// Copyright 2020 The IREE Authors
-//
-// Licensed under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-
-#ifndef IREE_HAL_PIPELINE_LAYOUT_H_
-#define IREE_HAL_PIPELINE_LAYOUT_H_
-
-#include <stdbool.h>
-#include <stdint.h>
-
-#include "iree/base/api.h"
-#include "iree/hal/resource.h"
-
-#ifdef __cplusplus
-extern "C" {
-#endif  // __cplusplus
-
-typedef struct iree_hal_device_t iree_hal_device_t;
-
-//===----------------------------------------------------------------------===//
-// Types and Enums
-//===----------------------------------------------------------------------===//
-
-// A bitmask of flags controlling the behavior of a descriptor set.
-enum iree_hal_descriptor_set_layout_flag_bits_t {
-  IREE_HAL_DESCRIPTOR_SET_LAYOUT_FLAG_NONE = 0u,
-
-  // Indicates the descriptor sets are 'bindless' and passed via implementation-
-  // specific parameter buffers stored in memory instead of API-level calls.
-  // Ignored by implementations that don't have a concept of indirect bindings.
-  IREE_HAL_DESCRIPTOR_SET_LAYOUT_FLAG_INDIRECT = 1u << 0,
-};
-typedef uint32_t iree_hal_descriptor_set_layout_flags_t;
-
-// Specifies the type of a descriptor in a descriptor set.
-typedef enum iree_hal_descriptor_type_e {
-  IREE_HAL_DESCRIPTOR_TYPE_UNIFORM_BUFFER = 6,
-  IREE_HAL_DESCRIPTOR_TYPE_STORAGE_BUFFER = 7,
-} iree_hal_descriptor_type_t;
-
-// A bitmask of flags controlling the behavior of a descriptor.
-enum iree_hal_descriptor_flag_bits_t {
-  IREE_HAL_DESCRIPTOR_FLAG_NONE = 0u,
-
-  // Indicates that the binding is treated as immutable within all dispatches
-  // using it.
-  IREE_HAL_DESCRIPTOR_FLAG_READ_ONLY = 1u << 0,
-
-  // Indicates the descriptor is 'bindless' and passed via implementation-
-  // specific parameter buffers stored in memory instead of API-level calls.
-  // Ignored by implementations that don't have a concept of indirect bindings.
-  IREE_HAL_DESCRIPTOR_FLAG_INDIRECT = 1u << 1,
-};
-typedef uint32_t iree_hal_descriptor_flags_t;
-
-// Specifies a descriptor set layout binding.
-//
-// Maps to VkDescriptorSetLayoutBinding.
-typedef struct iree_hal_descriptor_set_layout_binding_t {
-  // The binding number of this entry and corresponds to a resource of the
-  // same binding number in the executable interface.
-  uint32_t binding;
-  // Specifies which type of resource descriptors are used for this binding.
-  iree_hal_descriptor_type_t type;
-  // Specifies how the descriptor is used.
-  iree_hal_descriptor_flags_t flags;
-} iree_hal_descriptor_set_layout_binding_t;
-
-//===----------------------------------------------------------------------===//
-// iree_hal_descriptor_set_layout_t
-//===----------------------------------------------------------------------===//
-
-// Opaque handle to a descriptor set layout object.
-// A "descriptor" is effectively a bound memory range and each dispatch can use
-// one or more "descriptor sets" to access their I/O memory. A "descriptor set
-// layout" defines the types and usage semantics of the descriptors that make up
-// one set. Implementations can use this to verify program correctness and
-// accelerate reservation/allocation/computation of descriptor-related
-// operations.
-//
-// Maps to VkDescriptorSetLayout:
-// https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkDescriptorSetLayout.html
-typedef struct iree_hal_descriptor_set_layout_t
-    iree_hal_descriptor_set_layout_t;
-
-// Creates a descriptor set layout with the given bindings.
-IREE_API_EXPORT iree_status_t iree_hal_descriptor_set_layout_create(
-    iree_hal_device_t* device, iree_hal_descriptor_set_layout_flags_t flags,
-    iree_host_size_t binding_count,
-    const iree_hal_descriptor_set_layout_binding_t* bindings,
-    iree_hal_descriptor_set_layout_t** out_descriptor_set_layout);
-
-// Retains the given |descriptor_set_layout| for the caller.
-IREE_API_EXPORT void iree_hal_descriptor_set_layout_retain(
-    iree_hal_descriptor_set_layout_t* descriptor_set_layout);
-
-// Releases the given |descriptor_set_layout| from the caller.
-IREE_API_EXPORT void iree_hal_descriptor_set_layout_release(
-    iree_hal_descriptor_set_layout_t* descriptor_set_layout);
-
-//===----------------------------------------------------------------------===//
-// iree_hal_pipeline_layout_t
-//===----------------------------------------------------------------------===//
-
-// Defines the resource binding layout used by an executable.
-// A "descriptor" is effectively a bound memory range and each dispatch can use
-// one or more "descriptor sets" to access their I/O memory. A "descriptor set
-// layout" defines the types and usage semantics of the descriptors that make up
-// one set. An "pipeline layout" defines all of the set layouts that will be
-// used when dispatching. Implementations can use this to verify program
-// correctness and accelerate reservation/allocation/computation of
-// descriptor-related operations.
-//
-// Executables can share the same layout even if they do not use all of the
-// resources referenced by descriptor sets referenced by the layout. Doing so
-// allows for more efficient binding as bound descriptor sets can be reused when
-// command buffer executable bindings change.
-//
-// Maps to VkPipelineLayout:
-// https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkPipelineLayout.html
-typedef struct iree_hal_pipeline_layout_t iree_hal_pipeline_layout_t;
-
-// Creates an pipeline layout composed of the given descriptor set layouts.
-// The returned pipeline layout can be used by multiple executables with the
-// same compatible resource binding layouts.
-IREE_API_EXPORT iree_status_t iree_hal_pipeline_layout_create(
-    iree_hal_device_t* device, iree_host_size_t push_constants,
-    iree_host_size_t set_layout_count,
-    iree_hal_descriptor_set_layout_t* const* set_layouts,
-    iree_hal_pipeline_layout_t** out_pipeline_layout);
-
-// Retains the given |pipeline_layout| for the caller.
-IREE_API_EXPORT void iree_hal_pipeline_layout_retain(
-    iree_hal_pipeline_layout_t* pipeline_layout);
-
-// Releases the given |pipeline_layout| from the caller.
-IREE_API_EXPORT void iree_hal_pipeline_layout_release(
-    iree_hal_pipeline_layout_t* pipeline_layout);
-
-//===----------------------------------------------------------------------===//
-// iree_hal_descriptor_set_layout_t implementation details
-//===----------------------------------------------------------------------===//
-
-typedef struct iree_hal_descriptor_set_layout_vtable_t {
-  void(IREE_API_PTR* destroy)(
-      iree_hal_descriptor_set_layout_t* descriptor_set_layout);
-} iree_hal_descriptor_set_layout_vtable_t;
-IREE_HAL_ASSERT_VTABLE_LAYOUT(iree_hal_descriptor_set_layout_vtable_t);
-
-IREE_API_EXPORT void iree_hal_descriptor_set_layout_destroy(
-    iree_hal_descriptor_set_layout_t* descriptor_set_layout);
-
-//===----------------------------------------------------------------------===//
-// iree_hal_pipeline_layout_t implementation details
-//===----------------------------------------------------------------------===//
-
-typedef struct iree_hal_pipeline_layout_vtable_t {
-  void(IREE_API_PTR* destroy)(iree_hal_pipeline_layout_t* pipeline_layout);
-} iree_hal_pipeline_layout_vtable_t;
-IREE_HAL_ASSERT_VTABLE_LAYOUT(iree_hal_pipeline_layout_vtable_t);
-
-IREE_API_EXPORT void iree_hal_pipeline_layout_destroy(
-    iree_hal_pipeline_layout_t* pipeline_layout);
-
-#ifdef __cplusplus
-}  // extern "C"
-#endif  // __cplusplus
-
-#endif  // IREE_HAL_PIPELINE_LAYOUT_H_
diff --git a/runtime/src/iree/hal/utils/BUILD.bazel b/runtime/src/iree/hal/utils/BUILD.bazel
index 1531601..0ad6080 100644
--- a/runtime/src/iree/hal/utils/BUILD.bazel
+++ b/runtime/src/iree/hal/utils/BUILD.bazel
@@ -26,16 +26,6 @@
 )
 
 iree_runtime_cc_library(
-    name = "debug_allocator",
-    srcs = ["debug_allocator.c"],
-    hdrs = ["debug_allocator.h"],
-    deps = [
-        "//runtime/src/iree/base",
-        "//runtime/src/iree/hal",
-    ],
-)
-
-iree_runtime_cc_library(
     name = "collective_batch",
     srcs = ["collective_batch.c"],
     hdrs = ["collective_batch.h"],
@@ -59,6 +49,16 @@
 )
 
 iree_runtime_cc_library(
+    name = "debug_allocator",
+    srcs = ["debug_allocator.c"],
+    hdrs = ["debug_allocator.h"],
+    deps = [
+        "//runtime/src/iree/base",
+        "//runtime/src/iree/hal",
+    ],
+)
+
+iree_runtime_cc_library(
     name = "deferred_command_buffer",
     srcs = ["deferred_command_buffer.c"],
     hdrs = ["deferred_command_buffer.h"],
@@ -71,6 +71,17 @@
 )
 
 iree_runtime_cc_library(
+    name = "executable_debug_info",
+    srcs = ["executable_debug_info.c"],
+    hdrs = ["executable_debug_info.h"],
+    deps = [
+        "//runtime/src/iree/base",
+        "//runtime/src/iree/base/internal/flatcc:parsing",
+        "//runtime/src/iree/schemas:executable_debug_info_c_fbs",
+    ],
+)
+
+iree_runtime_cc_library(
     name = "file_cache",
     srcs = ["file_cache.c"],
     hdrs = ["file_cache.h"],
diff --git a/runtime/src/iree/hal/utils/CMakeLists.txt b/runtime/src/iree/hal/utils/CMakeLists.txt
index fed7908..4bb9c8c 100644
--- a/runtime/src/iree/hal/utils/CMakeLists.txt
+++ b/runtime/src/iree/hal/utils/CMakeLists.txt
@@ -27,19 +27,6 @@
 
 iree_cc_library(
   NAME
-    debug_allocator
-  HDRS
-    "debug_allocator.h"
-  SRCS
-    "debug_allocator.c"
-  DEPS
-    iree::base
-    iree::hal
-  PUBLIC
-)
-
-iree_cc_library(
-  NAME
     collective_batch
   HDRS
     "collective_batch.h"
@@ -69,6 +56,19 @@
 
 iree_cc_library(
   NAME
+    debug_allocator
+  HDRS
+    "debug_allocator.h"
+  SRCS
+    "debug_allocator.c"
+  DEPS
+    iree::base
+    iree::hal
+  PUBLIC
+)
+
+iree_cc_library(
+  NAME
     deferred_command_buffer
   HDRS
     "deferred_command_buffer.h"
@@ -84,6 +84,20 @@
 
 iree_cc_library(
   NAME
+    executable_debug_info
+  HDRS
+    "executable_debug_info.h"
+  SRCS
+    "executable_debug_info.c"
+  DEPS
+    iree::base
+    iree::base::internal::flatcc::parsing
+    iree::schemas::executable_debug_info_c_fbs
+  PUBLIC
+)
+
+iree_cc_library(
+  NAME
     file_cache
   HDRS
     "file_cache.h"
diff --git a/runtime/src/iree/hal/utils/deferred_command_buffer.c b/runtime/src/iree/hal/utils/deferred_command_buffer.c
index a1c92bf..7206254 100644
--- a/runtime/src/iree/hal/utils/deferred_command_buffer.c
+++ b/runtime/src/iree/hal/utils/deferred_command_buffer.c
@@ -23,10 +23,6 @@
   IREE_HAL_CMD_UPDATE_BUFFER,
   IREE_HAL_CMD_COPY_BUFFER,
   IREE_HAL_CMD_COLLECTIVE,
-  IREE_HAL_CMD_PUSH_CONSTANTS,
-  IREE_HAL_CMD_PUSH_DESCRIPTOR_SET,
-  IREE_HAL_CMD_DISPATCH,
-  IREE_HAL_CMD_DISPATCH_INDIRECT,
   IREE_HAL_CMD_DISPATCH2,
   IREE_HAL_CMD_DISPATCH2_INDIRECT,
 } iree_hal_cmd_type_t;
@@ -668,199 +664,10 @@
 }
 
 //===----------------------------------------------------------------------===//
-// IREE_HAL_CMD_PUSH_CONSTANTS
-//===----------------------------------------------------------------------===//
-
-typedef struct iree_hal_cmd_push_constants_t {
-  iree_hal_cmd_header_t header;
-  iree_hal_pipeline_layout_t* pipeline_layout;
-  iree_host_size_t offset;
-  iree_host_size_t values_length;
-  uint8_t values[];
-} iree_hal_cmd_push_constants_t;
-
-static iree_status_t iree_hal_deferred_command_buffer_push_constants(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_pipeline_layout_t* pipeline_layout, iree_host_size_t offset,
-    const void* values, iree_host_size_t values_length) {
-  iree_hal_deferred_command_buffer_t* command_buffer =
-      iree_hal_deferred_command_buffer_cast(base_command_buffer);
-  iree_hal_cmd_list_t* cmd_list = &command_buffer->cmd_list;
-  IREE_RETURN_IF_ERROR(iree_hal_resource_set_insert(
-      command_buffer->resource_set, 1, &pipeline_layout));
-  iree_hal_cmd_push_constants_t* cmd = NULL;
-  IREE_RETURN_IF_ERROR(iree_hal_cmd_list_append_command(
-      cmd_list, IREE_HAL_CMD_PUSH_CONSTANTS,
-      sizeof(*cmd) + sizeof(cmd->values[0]) * values_length, (void**)&cmd));
-  cmd->pipeline_layout = pipeline_layout;
-  cmd->offset = offset;
-  cmd->values_length = values_length;
-  memcpy(cmd->values, values, sizeof(cmd->values[0]) * values_length);
-  return iree_ok_status();
-}
-
-static iree_status_t iree_hal_deferred_command_buffer_apply_push_constants(
-    iree_hal_command_buffer_t* target_command_buffer,
-    iree_hal_buffer_binding_table_t binding_table,
-    const iree_hal_cmd_push_constants_t* cmd) {
-  return iree_hal_command_buffer_push_constants(
-      target_command_buffer, cmd->pipeline_layout, cmd->offset, cmd->values,
-      cmd->values_length);
-}
-
-//===----------------------------------------------------------------------===//
-// IREE_HAL_CMD_PUSH_DESCRIPTOR_SET
-//===----------------------------------------------------------------------===//
-
-typedef struct iree_hal_cmd_push_descriptor_set_t {
-  iree_hal_cmd_header_t header;
-  iree_hal_pipeline_layout_t* pipeline_layout;
-  uint32_t set;
-  iree_host_size_t binding_count;
-  iree_hal_buffer_ref_t bindings[];
-} iree_hal_cmd_push_descriptor_set_t;
-
-static iree_status_t iree_hal_deferred_command_buffer_push_descriptor_set(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_pipeline_layout_t* pipeline_layout, uint32_t set,
-    iree_host_size_t binding_count, const iree_hal_buffer_ref_t* bindings) {
-  iree_hal_deferred_command_buffer_t* command_buffer =
-      iree_hal_deferred_command_buffer_cast(base_command_buffer);
-  iree_hal_cmd_list_t* cmd_list = &command_buffer->cmd_list;
-  IREE_RETURN_IF_ERROR(iree_hal_resource_set_insert(
-      command_buffer->resource_set, 1, &pipeline_layout));
-  iree_hal_cmd_push_descriptor_set_t* cmd = NULL;
-  IREE_RETURN_IF_ERROR(iree_hal_cmd_list_append_command(
-      cmd_list, IREE_HAL_CMD_PUSH_DESCRIPTOR_SET,
-      sizeof(*cmd) + sizeof(cmd->bindings[0]) * binding_count, (void**)&cmd));
-  cmd->pipeline_layout = pipeline_layout;
-  cmd->set = set;
-  cmd->binding_count = binding_count;
-  for (iree_host_size_t i = 0; i < binding_count; ++i) {
-    iree_hal_buffer_ref_t binding = bindings[i];
-    cmd->bindings[i] = binding;
-    if (binding.buffer) {
-      IREE_RETURN_IF_ERROR(iree_hal_resource_set_insert(
-          command_buffer->resource_set, 1, &binding.buffer));
-    }
-  }
-  return iree_ok_status();
-}
-
-static iree_status_t iree_hal_deferred_command_buffer_apply_push_descriptor_set(
-    iree_hal_command_buffer_t* target_command_buffer,
-    iree_hal_buffer_binding_table_t binding_table,
-    const iree_hal_cmd_push_descriptor_set_t* cmd) {
-  iree_hal_buffer_ref_t* binding_refs = (iree_hal_buffer_ref_t*)iree_alloca(
-      cmd->binding_count * sizeof(iree_hal_buffer_ref_t));
-  for (iree_host_size_t i = 0; i < cmd->binding_count; ++i) {
-    IREE_RETURN_IF_ERROR(iree_hal_buffer_binding_table_resolve_ref(
-        binding_table, cmd->bindings[i], &binding_refs[i]));
-  }
-  return iree_hal_command_buffer_push_descriptor_set(
-      target_command_buffer, cmd->pipeline_layout, cmd->set, cmd->binding_count,
-      binding_refs);
-}
-
-//===----------------------------------------------------------------------===//
-// IREE_HAL_CMD_DISPATCH
-//===----------------------------------------------------------------------===//
-
-typedef struct iree_hal_cmd_dispatch_t {
-  iree_hal_cmd_header_t header;
-  iree_hal_executable_t* executable;
-  int32_t entry_point;
-  uint32_t workgroup_x;
-  uint32_t workgroup_y;
-  uint32_t workgroup_z;
-  iree_hal_dispatch_flags_t flags;
-} iree_hal_cmd_dispatch_t;
-
-static iree_status_t iree_hal_deferred_command_buffer_dispatch(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_executable_t* executable, int32_t entry_point,
-    uint32_t workgroup_x, uint32_t workgroup_y, uint32_t workgroup_z,
-    iree_hal_dispatch_flags_t flags) {
-  iree_hal_deferred_command_buffer_t* command_buffer =
-      iree_hal_deferred_command_buffer_cast(base_command_buffer);
-  iree_hal_cmd_list_t* cmd_list = &command_buffer->cmd_list;
-  IREE_RETURN_IF_ERROR(iree_hal_resource_set_insert(
-      command_buffer->resource_set, 1, &executable));
-  iree_hal_cmd_dispatch_t* cmd = NULL;
-  IREE_RETURN_IF_ERROR(iree_hal_cmd_list_append_command(
-      cmd_list, IREE_HAL_CMD_DISPATCH, sizeof(*cmd), (void**)&cmd));
-  cmd->executable = executable;
-  cmd->entry_point = entry_point;
-  cmd->workgroup_x = workgroup_x;
-  cmd->workgroup_y = workgroup_y;
-  cmd->workgroup_z = workgroup_z;
-  cmd->flags = flags;
-  return iree_ok_status();
-}
-
-static iree_status_t iree_hal_deferred_command_buffer_apply_dispatch(
-    iree_hal_command_buffer_t* target_command_buffer,
-    iree_hal_buffer_binding_table_t binding_table,
-    const iree_hal_cmd_dispatch_t* cmd) {
-  return iree_hal_command_buffer_dispatch(
-      target_command_buffer, cmd->executable, cmd->entry_point,
-      cmd->workgroup_x, cmd->workgroup_y, cmd->workgroup_z, cmd->flags);
-}
-
-//===----------------------------------------------------------------------===//
-// IREE_HAL_CMD_DISPATCH_INDIRECT
-//===----------------------------------------------------------------------===//
-
-typedef struct iree_hal_cmd_dispatch_indirect_t {
-  iree_hal_cmd_header_t header;
-  iree_hal_executable_t* executable;
-  int32_t entry_point;
-  iree_hal_buffer_ref_t workgroups_ref;
-  iree_hal_dispatch_flags_t flags;
-} iree_hal_cmd_dispatch_indirect_t;
-
-static iree_status_t iree_hal_deferred_command_buffer_dispatch_indirect(
-    iree_hal_command_buffer_t* base_command_buffer,
-    iree_hal_executable_t* executable, int32_t entry_point,
-    iree_hal_buffer_ref_t workgroups_ref, iree_hal_dispatch_flags_t flags) {
-  iree_hal_deferred_command_buffer_t* command_buffer =
-      iree_hal_deferred_command_buffer_cast(base_command_buffer);
-  iree_hal_cmd_list_t* cmd_list = &command_buffer->cmd_list;
-  iree_host_size_t resource_count = 0;
-  const void* resources[2] = {NULL, NULL};
-  resources[resource_count++] = executable;
-  if (workgroups_ref.buffer) {
-    resources[resource_count++] = workgroups_ref.buffer;
-  }
-  IREE_RETURN_IF_ERROR(iree_hal_resource_set_insert(
-      command_buffer->resource_set, resource_count, resources));
-  iree_hal_cmd_dispatch_indirect_t* cmd = NULL;
-  IREE_RETURN_IF_ERROR(iree_hal_cmd_list_append_command(
-      cmd_list, IREE_HAL_CMD_DISPATCH_INDIRECT, sizeof(*cmd), (void**)&cmd));
-  cmd->executable = executable;
-  cmd->entry_point = entry_point;
-  cmd->workgroups_ref = workgroups_ref;
-  cmd->flags = flags;
-  return iree_ok_status();
-}
-
-static iree_status_t iree_hal_deferred_command_buffer_apply_dispatch_indirect(
-    iree_hal_command_buffer_t* target_command_buffer,
-    iree_hal_buffer_binding_table_t binding_table,
-    const iree_hal_cmd_dispatch_indirect_t* cmd) {
-  iree_hal_buffer_ref_t workgroups_ref;
-  IREE_RETURN_IF_ERROR(iree_hal_buffer_binding_table_resolve_ref(
-      binding_table, cmd->workgroups_ref, &workgroups_ref));
-  return iree_hal_command_buffer_dispatch_indirect(
-      target_command_buffer, cmd->executable, cmd->entry_point, workgroups_ref,
-      cmd->flags);
-}
-
-//===----------------------------------------------------------------------===//
 // IREE_HAL_CMD_DISPATCH2
 //===----------------------------------------------------------------------===//
 
-typedef struct iree_hal_cmd_dispatch2_t {
+typedef struct iree_hal_cmd_dispatch_t {
   iree_hal_cmd_header_t header;
   iree_hal_executable_t* executable;
   int32_t entry_point;
@@ -868,9 +675,9 @@
   iree_const_byte_span_t constants;
   iree_hal_buffer_ref_list_t bindings;
   iree_hal_dispatch_flags_t flags;
-} iree_hal_cmd_dispatch2_t;
+} iree_hal_cmd_dispatch_t;
 
-static iree_status_t iree_hal_deferred_command_buffer_dispatch2(
+static iree_status_t iree_hal_deferred_command_buffer_dispatch(
     iree_hal_command_buffer_t* base_command_buffer,
     iree_hal_executable_t* executable, int32_t entry_point,
     const uint32_t workgroup_count[3], iree_const_byte_span_t constants,
@@ -880,7 +687,7 @@
   IREE_RETURN_IF_ERROR(iree_hal_resource_set_insert(
       command_buffer->resource_set, 1, &executable));
 
-  iree_hal_cmd_dispatch2_t* cmd = NULL;
+  iree_hal_cmd_dispatch_t* cmd = NULL;
   iree_host_size_t total_size =
       sizeof(*cmd) + iree_host_align(constants.data_length, iree_max_align_t) +
       bindings.count * sizeof(bindings.values[0]);
@@ -910,10 +717,10 @@
   return iree_ok_status();
 }
 
-static iree_status_t iree_hal_deferred_command_buffer_apply_dispatch2(
+static iree_status_t iree_hal_deferred_command_buffer_apply_dispatch(
     iree_hal_command_buffer_t* target_command_buffer,
     iree_hal_buffer_binding_table_t binding_table,
-    const iree_hal_cmd_dispatch2_t* cmd) {
+    const iree_hal_cmd_dispatch_t* cmd) {
   iree_hal_buffer_ref_t* binding_refs = (iree_hal_buffer_ref_t*)iree_alloca(
       cmd->bindings.count * sizeof(iree_hal_buffer_ref_t));
   for (iree_host_size_t i = 0; i < cmd->bindings.count; ++i) {
@@ -924,7 +731,7 @@
       .count = cmd->bindings.count,
       .values = binding_refs,
   };
-  return iree_hal_command_buffer_dispatch2(
+  return iree_hal_command_buffer_dispatch(
       target_command_buffer, cmd->executable, cmd->entry_point,
       cmd->workgroup_count, cmd->constants, binding_ref_list, cmd->flags);
 }
@@ -933,7 +740,7 @@
 // IREE_HAL_CMD_DISPATCH2_INDIRECT
 //===----------------------------------------------------------------------===//
 
-typedef struct iree_hal_cmd_dispatch2_indirect_t {
+typedef struct iree_hal_cmd_dispatch_indirect_t {
   iree_hal_cmd_header_t header;
   iree_hal_executable_t* executable;
   int32_t entry_point;
@@ -941,9 +748,9 @@
   iree_const_byte_span_t constants;
   iree_hal_buffer_ref_list_t bindings;
   iree_hal_dispatch_flags_t flags;
-} iree_hal_cmd_dispatch2_indirect_t;
+} iree_hal_cmd_dispatch_indirect_t;
 
-static iree_status_t iree_hal_deferred_command_buffer_dispatch2_indirect(
+static iree_status_t iree_hal_deferred_command_buffer_dispatch_indirect(
     iree_hal_command_buffer_t* base_command_buffer,
     iree_hal_executable_t* executable, int32_t entry_point,
     iree_hal_buffer_ref_t workgroups_ref, iree_const_byte_span_t constants,
@@ -960,7 +767,7 @@
   IREE_RETURN_IF_ERROR(iree_hal_resource_set_insert(
       command_buffer->resource_set, resource_count, resources));
 
-  iree_hal_cmd_dispatch2_indirect_t* cmd = NULL;
+  iree_hal_cmd_dispatch_indirect_t* cmd = NULL;
   iree_host_size_t total_size =
       sizeof(*cmd) + iree_host_align(constants.data_length, iree_max_align_t) +
       bindings.count * sizeof(bindings.values[0]);
@@ -990,10 +797,10 @@
   return iree_ok_status();
 }
 
-static iree_status_t iree_hal_deferred_command_buffer_apply_dispatch2_indirect(
+static iree_status_t iree_hal_deferred_command_buffer_apply_dispatch_indirect(
     iree_hal_command_buffer_t* target_command_buffer,
     iree_hal_buffer_binding_table_t binding_table,
-    const iree_hal_cmd_dispatch2_indirect_t* cmd) {
+    const iree_hal_cmd_dispatch_indirect_t* cmd) {
   iree_hal_buffer_ref_t workgroups_ref;
   IREE_RETURN_IF_ERROR(iree_hal_buffer_binding_table_resolve_ref(
       binding_table, cmd->workgroups_ref, &workgroups_ref));
@@ -1007,7 +814,7 @@
       .count = cmd->bindings.count,
       .values = binding_refs,
   };
-  return iree_hal_command_buffer_dispatch2_indirect(
+  return iree_hal_command_buffer_dispatch_indirect(
       target_command_buffer, cmd->executable, cmd->entry_point, workgroups_ref,
       cmd->constants, binding_ref_list, cmd->flags);
 }
@@ -1035,18 +842,10 @@
         iree_hal_deferred_command_buffer_apply_copy_buffer,
     [IREE_HAL_CMD_COLLECTIVE] = (iree_hal_cmd_apply_fn_t)
         iree_hal_deferred_command_buffer_apply_collective,
-    [IREE_HAL_CMD_PUSH_CONSTANTS] = (iree_hal_cmd_apply_fn_t)
-        iree_hal_deferred_command_buffer_apply_push_constants,
-    [IREE_HAL_CMD_PUSH_DESCRIPTOR_SET] = (iree_hal_cmd_apply_fn_t)
-        iree_hal_deferred_command_buffer_apply_push_descriptor_set,
-    [IREE_HAL_CMD_DISPATCH] = (iree_hal_cmd_apply_fn_t)
-        iree_hal_deferred_command_buffer_apply_dispatch,
-    [IREE_HAL_CMD_DISPATCH_INDIRECT] = (iree_hal_cmd_apply_fn_t)
-        iree_hal_deferred_command_buffer_apply_dispatch_indirect,
     [IREE_HAL_CMD_DISPATCH2] = (iree_hal_cmd_apply_fn_t)
-        iree_hal_deferred_command_buffer_apply_dispatch2,
+        iree_hal_deferred_command_buffer_apply_dispatch,
     [IREE_HAL_CMD_DISPATCH2_INDIRECT] = (iree_hal_cmd_apply_fn_t)
-        iree_hal_deferred_command_buffer_apply_dispatch2_indirect,
+        iree_hal_deferred_command_buffer_apply_dispatch_indirect,
 };
 
 IREE_API_EXPORT iree_status_t iree_hal_deferred_command_buffer_apply(
@@ -1100,12 +899,6 @@
         .update_buffer = iree_hal_deferred_command_buffer_update_buffer,
         .copy_buffer = iree_hal_deferred_command_buffer_copy_buffer,
         .collective = iree_hal_deferred_command_buffer_collective,
-        .push_constants = iree_hal_deferred_command_buffer_push_constants,
-        .push_descriptor_set =
-            iree_hal_deferred_command_buffer_push_descriptor_set,
         .dispatch = iree_hal_deferred_command_buffer_dispatch,
         .dispatch_indirect = iree_hal_deferred_command_buffer_dispatch_indirect,
-        .dispatch2 = iree_hal_deferred_command_buffer_dispatch2,
-        .dispatch2_indirect =
-            iree_hal_deferred_command_buffer_dispatch2_indirect,
 };
diff --git a/runtime/src/iree/hal/utils/executable_debug_info.c b/runtime/src/iree/hal/utils/executable_debug_info.c
new file mode 100644
index 0000000..d0a8432
--- /dev/null
+++ b/runtime/src/iree/hal/utils/executable_debug_info.c
@@ -0,0 +1,136 @@
+// Copyright 2024 The IREE Authors
+//
+// Licensed under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+
+#include "iree/hal/utils/executable_debug_info.h"
+
+static iree_status_t iree_hal_debug_verify_string_nonempty(
+    const char* field_name, flatbuffers_string_t value) {
+  if (flatbuffers_string_len(value) == 0) {
+    return iree_make_status(
+        IREE_STATUS_INVALID_ARGUMENT,
+        "expected debug info field `%s` to contain a non-empty string value",
+        field_name);
+  }
+  return iree_ok_status();
+}
+
+static iree_status_t iree_hal_debug_verify_FileLineLocDef(
+    iree_hal_debug_FileLineLocDef_table_t def) {
+  if (!def) return iree_ok_status();
+  return iree_hal_debug_verify_string_nonempty(
+      "filename", iree_hal_debug_FileLineLocDef_filename_get(def));
+}
+
+iree_status_t iree_hal_debug_verify_export_def(
+    iree_hal_debug_ExportDef_table_t export_def) {
+  if (!export_def) return iree_ok_status();
+
+  IREE_RETURN_IF_ERROR(iree_hal_debug_verify_FileLineLocDef(
+      iree_hal_debug_ExportDef_location_get(export_def)));
+
+  iree_hal_debug_StageLocationDef_vec_t stage_locations_vec =
+      iree_hal_debug_ExportDef_stage_locations_get(export_def);
+  for (iree_host_size_t i = 0;
+       i < iree_hal_debug_StageLocationDef_vec_len(stage_locations_vec); ++i) {
+    iree_hal_debug_StageLocationDef_table_t stage_location_def =
+        iree_hal_debug_StageLocationDef_vec_at(stage_locations_vec, i);
+    if (!stage_location_def) {
+      return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
+                              "stage_locations[%" PRIhsz "] has NULL value", i);
+    }
+    IREE_RETURN_IF_ERROR(iree_hal_debug_verify_string_nonempty(
+                             "stage", iree_hal_debug_StageLocationDef_stage_get(
+                                          stage_location_def)),
+                         "verifying stage_locations[%" PRIhsz "]", i);
+    IREE_RETURN_IF_ERROR(
+        iree_hal_debug_verify_FileLineLocDef(
+            iree_hal_debug_StageLocationDef_location_get(stage_location_def)),
+        "verifying stage_locations[%" PRIhsz "]", i);
+  }
+
+  return iree_ok_status();
+}
+
+// TODO(benvanik): a way to select what location is chosen. For now we just
+// pick the first stage location if present and otherwise use the source
+// location.
+static iree_hal_debug_FileLineLocDef_table_t
+iree_hal_debug_select_source_location(
+    iree_hal_debug_ExportDef_table_t export_def) {
+  iree_hal_debug_StageLocationDef_vec_t stage_locations_vec =
+      iree_hal_debug_ExportDef_stage_locations_get(export_def);
+  if (iree_hal_debug_StageLocationDef_vec_len(stage_locations_vec) > 0) {
+    iree_hal_debug_StageLocationDef_table_t stage_location_def =
+        iree_hal_debug_StageLocationDef_vec_at(stage_locations_vec, 0);
+    return iree_hal_debug_StageLocationDef_location_get(stage_location_def);
+  }
+  return iree_hal_debug_ExportDef_location_get(export_def);
+}
+
+iree_host_size_t iree_hal_debug_calculate_export_info_size(
+    iree_hal_debug_ExportDef_table_t export_def) {
+  if (!export_def) return 0;
+
+  iree_host_size_t total_size = sizeof(iree_hal_debug_export_info_t);
+  total_size +=
+      flatbuffers_string_len(iree_hal_debug_ExportDef_name_get(export_def));
+
+  iree_hal_debug_FileLineLocDef_table_t location_def =
+      iree_hal_debug_select_source_location(export_def);
+  if (location_def) {
+    total_size += flatbuffers_string_len(
+        iree_hal_debug_FileLineLocDef_filename_get(location_def));
+  }
+
+  return total_size;
+}
+
+void iree_hal_debug_copy_export_info(
+    iree_hal_debug_ExportDef_table_t export_def,
+    iree_hal_debug_export_info_t* out_info) {
+  memset(out_info, 0, sizeof(*out_info));
+  if (!export_def) return;
+
+  char* ptr = (char*)out_info + sizeof(*out_info);
+
+  flatbuffers_string_t name = iree_hal_debug_ExportDef_name_get(export_def);
+  if (name) {
+    size_t name_length = flatbuffers_string_len(name);
+    memcpy(ptr, name, name_length);
+    out_info->function_name = iree_make_string_view(ptr, name_length);
+    ptr += name_length;
+  }
+
+  iree_hal_debug_FileLineLocDef_table_t location_def =
+      iree_hal_debug_select_source_location(export_def);
+  if (location_def) {
+    flatbuffers_string_t filename =
+        iree_hal_debug_FileLineLocDef_filename_get(location_def);
+    size_t filename_length = flatbuffers_string_len(filename);
+    memcpy(ptr, filename, filename_length);
+    out_info->source_filename = iree_make_string_view(ptr, filename_length);
+    ptr += filename_length;
+  }
+}
+
+void iree_hal_debug_publish_source_files(
+    iree_hal_debug_SourceFileDef_vec_t source_files_vec) {
+  if (!source_files_vec) return;
+#if IREE_TRACING_FEATURES & IREE_TRACING_FEATURE_INSTRUMENTATION
+  for (iree_host_size_t i = 0;
+       i < iree_hal_debug_SourceFileDef_vec_len(source_files_vec); ++i) {
+    iree_hal_debug_SourceFileDef_table_t source_file =
+        iree_hal_debug_SourceFileDef_vec_at(source_files_vec, i);
+    if (!source_file) continue;
+    flatbuffers_string_t path =
+        iree_hal_debug_SourceFileDef_path_get(source_file);
+    flatbuffers_uint8_vec_t content =
+        iree_hal_debug_SourceFileDef_content_get(source_file);
+    IREE_TRACE_PUBLISH_SOURCE_FILE(path, flatbuffers_string_len(path), content,
+                                   flatbuffers_uint8_vec_len(content));
+  }
+#endif  // IREE_TRACING_FEATURES & IREE_TRACING_FEATURE_INSTRUMENTATION
+}
diff --git a/runtime/src/iree/hal/utils/executable_debug_info.h b/runtime/src/iree/hal/utils/executable_debug_info.h
new file mode 100644
index 0000000..48c3bb9
--- /dev/null
+++ b/runtime/src/iree/hal/utils/executable_debug_info.h
@@ -0,0 +1,57 @@
+// Copyright 2024 The IREE Authors
+//
+// Licensed under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+
+#ifndef IREE_HAL_UTILS_EXECUTABLE_DEBUG_INFO_H_
+#define IREE_HAL_UTILS_EXECUTABLE_DEBUG_INFO_H_
+
+#include "iree/base/api.h"
+
+// flatcc schemas:
+#include "iree/base/internal/flatcc/parsing.h"
+#include "iree/schemas/executable_debug_info_reader.h"
+#include "iree/schemas/executable_debug_info_verifier.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif  // __cplusplus
+
+// Verifies per-export debug info is valid.
+// Executables using debug info must call this as part of their verification.
+iree_status_t iree_hal_debug_verify_export_def(
+    iree_hal_debug_ExportDef_table_t export_def);
+
+// Basic debug information referencing allocated host memory.
+typedef struct iree_hal_debug_export_info_t {
+  iree_string_view_t function_name;
+  iree_string_view_t source_filename;
+  uint32_t source_line;
+} iree_hal_debug_export_info_t;
+
+// Returns the size in bytes required to store a copy of the export debug info.
+// Callers should allocate this amount of memory to populate with
+// iree_hal_debug_copy_export_info.
+iree_host_size_t iree_hal_debug_calculate_export_info_size(
+    iree_hal_debug_ExportDef_table_t export_def);
+
+// Clones the given export flatbuffer data into a heap structure allocated with
+// at least the size as calculated by iree_hal_debug_calculate_export_info_size.
+// The storage is valid until freed by the caller and decoupled from the
+// Flatbuffer storage.
+void iree_hal_debug_copy_export_info(
+    iree_hal_debug_ExportDef_table_t export_def,
+    iree_hal_debug_export_info_t* out_info);
+
+// Publishes the given source files to any attached debug/trace providers.
+// This must be called prior to emitting any debug/trace events that reference
+// the files that are contained within.
+void iree_hal_debug_publish_source_files(
+    iree_hal_debug_SourceFileDef_vec_t source_files_vec);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif  // __cplusplus
+
+#endif  // IREE_HAL_UTILS_EXECUTABLE_DEBUG_INFO_H_
diff --git a/runtime/src/iree/hal/utils/stream_tracing.h b/runtime/src/iree/hal/utils/stream_tracing.h
index 9314df2..16a319e 100644
--- a/runtime/src/iree/hal/utils/stream_tracing.h
+++ b/runtime/src/iree/hal/utils/stream_tracing.h
@@ -52,10 +52,10 @@
 } iree_hal_stream_tracing_context_event_list_t;
 
 typedef enum iree_hal_stream_tracing_verbosity_e {
-  IREE_HAL_TRACING_VERBOSITY_OFF = 0,
-  IREE_HAL_TRACING_VERBOSITY_COARSE,
-  IREE_HAL_TRACING_VERBOSITY_FINE,
-  IREE_HAL_TRACING_VERBOSITY_MAX
+  IREE_HAL_STREAM_TRACING_VERBOSITY_OFF = 0,
+  IREE_HAL_STREAM_TRACING_VERBOSITY_COARSE,
+  IREE_HAL_STREAM_TRACING_VERBOSITY_FINE,
+  IREE_HAL_STREAM_TRACING_VERBOSITY_MAX
 } iree_hal_stream_tracing_verbosity_t;
 
 typedef struct iree_hal_stream_tracing_device_interface_vtable_t
diff --git a/runtime/src/iree/modules/hal/exports.inl b/runtime/src/iree/modules/hal/exports.inl
index 8d44445..46e7f17 100644
--- a/runtime/src/iree/modules/hal/exports.inl
+++ b/runtime/src/iree/modules/hal/exports.inl
@@ -50,21 +50,14 @@
 EXPORT_FN("command_buffer.collective", iree_hal_module_command_buffer_collective, rriiiirrIIIII, v)
 EXPORT_FN("command_buffer.copy_buffer", iree_hal_module_command_buffer_copy_buffer, riirIrII, v)
 EXPORT_FN("command_buffer.create", iree_hal_module_command_buffer_create, riiIi, r)
-// TODO(#18154): replace base dispatch with new `2` versions.
-EXPORT_FN("command_buffer.dispatch", iree_hal_module_command_buffer_dispatch, rriiiiI, v)
-EXPORT_FN("command_buffer.dispatch.indirect", iree_hal_module_command_buffer_dispatch_indirect, rriirII, v)
-EXPORT_FN_CUSTOM("command_buffer.dispatch2", iree_hal_module_command_buffer_dispatch2, rriiiiICiDCiirIID, v)
-EXPORT_FN_CUSTOM("command_buffer.dispatch2.indirect", iree_hal_module_command_buffer_dispatch2_indirect, rriirIICiDCiirIID, v)
+EXPORT_FN_CUSTOM("command_buffer.dispatch", iree_hal_module_command_buffer_dispatch, rriiiiICiDCiirIID, v)
+EXPORT_FN_CUSTOM("command_buffer.dispatch.indirect", iree_hal_module_command_buffer_dispatch_indirect, rriirIICiDCiirIID, v)
 EXPORT_FN("command_buffer.end_debug_group", iree_hal_module_command_buffer_end_debug_group, r, v)
 EXPORT_FN("command_buffer.execution_barrier", iree_hal_module_command_buffer_execution_barrier, riii, v)
 EXPORT_FN("command_buffer.fill_buffer", iree_hal_module_command_buffer_fill_buffer, rrIIiii, v)
 EXPORT_FN("command_buffer.finalize", iree_hal_module_command_buffer_finalize, r, v)
-EXPORT_FN("command_buffer.push_constants", iree_hal_module_command_buffer_push_constants, rriCiD, v)
-EXPORT_FN("command_buffer.push_descriptor_set", iree_hal_module_command_buffer_push_descriptor_set, rriCiirIID, v)
 EXPORT_FN("command_buffer.update_buffer", iree_hal_module_command_buffer_update_buffer, rrIrIIi, v)
 
-EXPORT_FN("descriptor_set_layout.create", iree_hal_module_descriptor_set_layout_create, riCiiiD, r)
-
 EXPORT_FN("device.allocator", iree_hal_module_device_allocator, r, r)
 EXPORT_FN("device.query.i64", iree_hal_module_device_query_i64, rrr, iI)
 EXPORT_FN("device.queue.alloca", iree_hal_module_device_queue_alloca, rIrriiiI, r)
@@ -80,9 +73,7 @@
 
 EXPORT_FN("ex.file.from_memory", iree_hal_module_ex_file_from_memory, rIirIIi, r)
 
-// TODO(#18154): replace base executable create with new `2` versions.
-EXPORT_FN("executable.create", iree_hal_module_executable_create, rrrrCrD, r)
-EXPORT_FN("executable.create2", iree_hal_module_executable_create2, rrrr, r)
+EXPORT_FN("executable.create", iree_hal_module_executable_create, rrrr, r)
 
 EXPORT_FN("fence.await", iree_hal_module_fence_await, iCrD, i)
 EXPORT_FN("fence.create", iree_hal_module_fence_create, ri, r)
@@ -91,6 +82,4 @@
 EXPORT_FN("fence.query", iree_hal_module_fence_query, r, i)
 EXPORT_FN("fence.signal", iree_hal_module_fence_signal, r, v)
 
-EXPORT_FN("pipeline_layout.create", iree_hal_module_pipeline_layout_create, riCrD, r)
-
 // clang-format on
diff --git a/runtime/src/iree/modules/hal/loader/module.c b/runtime/src/iree/modules/hal/loader/module.c
index 0126421..94baa95 100644
--- a/runtime/src/iree/modules/hal/loader/module.c
+++ b/runtime/src/iree/modules/hal/loader/module.c
@@ -199,8 +199,6 @@
   executable_params.executable_format = executable_format_str;
   executable_params.executable_data = iree_make_const_byte_span(
       executable_data->data.data, executable_data->data.data_length);
-  executable_params.pipeline_layout_count = 0;
-  executable_params.pipeline_layouts = NULL;
   executable_params.constant_count = constant_count;
   executable_params.constants = constants;
 
@@ -224,8 +222,8 @@
     };
     iree_vm_abi_riiii_t params;
   };
-  iree_vm_size_t push_constant_count;
-  const uint32_t* push_constants;
+  iree_vm_size_t constant_count;
+  const uint32_t* constants;
   iree_vm_size_t binding_count;
   const iree_vm_abi_rII_t* bindings;
 } iree_hal_loader_dispatch_args_t;
@@ -266,13 +264,13 @@
       .workgroup_size_x = 1,
       .workgroup_size_y = 1,
       .workgroup_size_z = 1,
-      .push_constant_count = args->push_constant_count,
+      .constant_count = args->constant_count,
       .workgroup_count_x = args->workgroup_x,
       .workgroup_count_y = args->workgroup_y,
       .workgroup_count_z = args->workgroup_z,
       .max_concurrency = 1,
       .binding_count = args->binding_count,
-      .push_constants = args->push_constants,
+      .constants = args->constants,
       .binding_ptrs = binding_ptrs,
       .binding_lengths = binding_lengths,
   };
@@ -304,13 +302,12 @@
       .params = *(const iree_vm_abi_riiii_t*)args_storage.data,
   };
   if (args_ok) {
-    const uint8_t* push_constants_ptr = args_storage.data + sizeof(args.params);
-    args.push_constant_count = *(const iree_vm_size_t*)push_constants_ptr;
-    args.push_constants =
-        (const uint32_t*)(push_constants_ptr + sizeof(iree_vm_size_t));
+    const uint8_t* constants_ptr = args_storage.data + sizeof(args.params);
+    args.constant_count = *(const iree_vm_size_t*)constants_ptr;
+    args.constants = (const uint32_t*)(constants_ptr + sizeof(iree_vm_size_t));
     const uint8_t* bindings_ptr =
-        push_constants_ptr + sizeof(iree_vm_size_t) +
-        args.push_constant_count * sizeof(args.push_constants[0]);
+        constants_ptr + sizeof(iree_vm_size_t) +
+        args.constant_count * sizeof(args.constants[0]);
     args.binding_count = *(const iree_vm_size_t*)bindings_ptr;
     args.bindings =
         (const iree_vm_abi_rII_t*)(bindings_ptr + sizeof(iree_vm_size_t));
diff --git a/runtime/src/iree/modules/hal/module.c b/runtime/src/iree/modules/hal/module.c
index f3cac5e..695b1ad 100644
--- a/runtime/src/iree/modules/hal/module.c
+++ b/runtime/src/iree/modules/hal/module.c
@@ -32,8 +32,8 @@
 // Module type definitions
 //===----------------------------------------------------------------------===//
 
-#define IREE_HAL_MODULE_VERSION_0_4 0x00000004u
-#define IREE_HAL_MODULE_VERSION_LATEST IREE_HAL_MODULE_VERSION_0_4
+#define IREE_HAL_MODULE_VERSION_0_5 0x00000005u
+#define IREE_HAL_MODULE_VERSION_LATEST IREE_HAL_MODULE_VERSION_0_5
 
 typedef struct iree_hal_module_t {
   iree_allocator_t host_allocator;
@@ -854,97 +854,6 @@
                                             send_ref, recv_ref, element_count);
 }
 
-IREE_VM_ABI_EXPORT(iree_hal_module_command_buffer_push_constants,  //
-                   iree_hal_module_state_t,                        //
-                   rriCiD, v) {
-  iree_hal_command_buffer_t* command_buffer = NULL;
-  IREE_RETURN_IF_ERROR(
-      iree_hal_command_buffer_check_deref(args->r0, &command_buffer));
-  iree_hal_pipeline_layout_t* pipeline_layout = NULL;
-  IREE_RETURN_IF_ERROR(
-      iree_hal_pipeline_layout_check_deref(args->r1, &pipeline_layout));
-  iree_vm_size_t offset = (iree_vm_size_t)args->i2;
-  iree_host_size_t value_count = args->a3_count;
-  const uint32_t* values = (const uint32_t*)&args->a3[0].i0;
-
-  return iree_hal_command_buffer_push_constants(
-      command_buffer, pipeline_layout, offset * sizeof(uint32_t), values,
-      value_count * sizeof(uint32_t));
-}
-
-IREE_VM_ABI_EXPORT(iree_hal_module_command_buffer_push_descriptor_set,  //
-                   iree_hal_module_state_t,                             //
-                   rriCiirIID, v) {
-  iree_hal_command_buffer_t* command_buffer = NULL;
-  IREE_RETURN_IF_ERROR(
-      iree_hal_command_buffer_check_deref(args->r0, &command_buffer));
-  iree_hal_pipeline_layout_t* pipeline_layout = NULL;
-  IREE_RETURN_IF_ERROR(
-      iree_hal_pipeline_layout_check_deref(args->r1, &pipeline_layout));
-  iree_vm_size_t set = args->i2;
-
-  iree_host_size_t binding_count = args->a3_count;
-  if (IREE_UNLIKELY(binding_count >
-                    IREE_HAL_MODULE_MAX_DESCRIPTOR_BINDING_COUNT)) {
-    return iree_make_status(
-        IREE_STATUS_OUT_OF_RANGE, "binding count %" PRIhsz " > %" PRIhsz,
-        binding_count, IREE_HAL_MODULE_MAX_DESCRIPTOR_BINDING_COUNT);
-  }
-  iree_hal_buffer_ref_t* bindings = (iree_hal_buffer_ref_t*)iree_alloca(
-      binding_count * sizeof(iree_hal_buffer_ref_t));
-  for (iree_host_size_t i = 0; i < binding_count; ++i) {
-    bindings[i].ordinal = (uint32_t)args->a3[i].i0;
-    bindings[i].buffer_slot = (uint32_t)args->a3[i].i1;
-    IREE_RETURN_IF_ERROR(iree_hal_buffer_check_deref_or_null(
-        args->a3[i].r2, &bindings[i].buffer));
-    bindings[i].offset = iree_hal_cast_device_size(args->a3[i].i3);
-    bindings[i].length = iree_hal_cast_device_size(args->a3[i].i4);
-  }
-
-  return iree_hal_command_buffer_push_descriptor_set(
-      command_buffer, pipeline_layout, set, binding_count, bindings);
-}
-
-IREE_VM_ABI_EXPORT(iree_hal_module_command_buffer_dispatch,  //
-                   iree_hal_module_state_t,                  //
-                   rriiiiI, v) {
-  iree_hal_command_buffer_t* command_buffer = NULL;
-  IREE_RETURN_IF_ERROR(
-      iree_hal_command_buffer_check_deref(args->r0, &command_buffer));
-  iree_hal_executable_t* executable = NULL;
-  IREE_RETURN_IF_ERROR(iree_hal_executable_check_deref(args->r1, &executable));
-  uint32_t entry_point = (uint32_t)args->i2;
-  uint32_t workgroup_x = (uint32_t)args->i3;
-  uint32_t workgroup_y = (uint32_t)args->i4;
-  uint32_t workgroup_z = (uint32_t)args->i5;
-  iree_hal_dispatch_flags_t flags = (iree_hal_dispatch_flags_t)args->i6;
-
-  return iree_hal_command_buffer_dispatch(command_buffer, executable,
-                                          entry_point, workgroup_x, workgroup_y,
-                                          workgroup_z, flags);
-}
-
-IREE_VM_ABI_EXPORT(iree_hal_module_command_buffer_dispatch_indirect,  //
-                   iree_hal_module_state_t,                           //
-                   rriirII, v) {
-  iree_hal_command_buffer_t* command_buffer = NULL;
-  IREE_RETURN_IF_ERROR(
-      iree_hal_command_buffer_check_deref(args->r0, &command_buffer));
-  iree_hal_executable_t* executable = NULL;
-  IREE_RETURN_IF_ERROR(iree_hal_executable_check_deref(args->r1, &executable));
-  uint32_t entry_point = (uint32_t)args->i2;
-  uint32_t workgroups_buffer_slot = (uint32_t)args->i3;
-  iree_device_size_t workgroups_offset = iree_hal_cast_device_size(args->i5);
-  iree_hal_buffer_ref_t workgroups_ref = iree_hal_make_indirect_buffer_ref(
-      workgroups_buffer_slot, workgroups_offset, 3 * sizeof(uint32_t));
-  IREE_RETURN_IF_ERROR(
-      iree_hal_buffer_check_deref_or_null(args->r4, &workgroups_ref.buffer));
-  iree_hal_dispatch_flags_t flags = (iree_hal_dispatch_flags_t)args->i6;
-
-  return iree_hal_command_buffer_dispatch_indirect(
-      command_buffer, executable, entry_point, workgroups_ref, flags);
-}
-
 // Argument signature: rriiiiICiDCiirIID
 typedef struct {
   union {
@@ -961,11 +870,11 @@
   const uint32_t* constants;
   iree_vm_size_t binding_count;
   const iree_vm_abi_iirII_t* bindings;
-} iree_hal_module_command_buffer_dispatch2_args_t;
-static iree_status_t iree_hal_module_command_buffer_dispatch2(
+} iree_hal_module_command_buffer_dispatch_args_t;
+static iree_status_t iree_hal_module_command_buffer_dispatch(
     iree_vm_stack_t* IREE_RESTRICT stack, void* IREE_RESTRICT module,
     iree_hal_module_state_t* IREE_RESTRICT state,
-    const iree_hal_module_command_buffer_dispatch2_args_t* IREE_RESTRICT args) {
+    const iree_hal_module_command_buffer_dispatch_args_t* IREE_RESTRICT args) {
   iree_hal_command_buffer_t* command_buffer = NULL;
   IREE_RETURN_IF_ERROR(iree_hal_command_buffer_check_deref(args->command_buffer,
                                                            &command_buffer));
@@ -988,7 +897,7 @@
   for (iree_host_size_t i = 0; i < bindings.count; ++i) {
     iree_hal_buffer_ref_t* binding =
         (iree_hal_buffer_ref_t*)&bindings.values[i];
-    binding->ordinal = 0;
+    binding->reserved = 0;
     binding->buffer_slot = (uint32_t)args->bindings[i].i1;
     IREE_RETURN_IF_ERROR(iree_hal_buffer_check_deref_or_null(
         args->bindings[i].r2, &binding->buffer));
@@ -996,13 +905,13 @@
     binding->length = iree_hal_cast_device_size(args->bindings[i].i4);
   }
 
-  return iree_hal_command_buffer_dispatch2(
+  return iree_hal_command_buffer_dispatch(
       command_buffer, executable, args->entry_point, args->workgroup_count,
       iree_make_const_byte_span(args->constants,
                                 args->constant_count * sizeof(uint32_t)),
       bindings, (iree_hal_dispatch_flags_t)args->flags);
 }
-static iree_status_t iree_hal_module_command_buffer_dispatch2_shim(
+static iree_status_t iree_hal_module_command_buffer_dispatch_shim(
     iree_vm_stack_t* IREE_RESTRICT stack, iree_vm_native_function_flags_t flags,
     iree_byte_span_t args_storage, iree_byte_span_t rets_storage,
     iree_vm_native_function_target2_t target_fn, void* IREE_RESTRICT module,
@@ -1016,7 +925,7 @@
     // Can't fit even with zero lengths.
     args_ok = false;
   }
-  iree_hal_module_command_buffer_dispatch2_args_t args = {
+  iree_hal_module_command_buffer_dispatch_args_t args = {
       .params = *(const iree_vm_abi_rriiiiI_t*)args_storage.data,
   };
   if (args_ok) {
@@ -1039,9 +948,9 @@
                             "argument/result signature mismatch");
   }
   IREE_ASSERT(target_fn == (iree_vm_native_function_target2_t)
-                               iree_hal_module_command_buffer_dispatch2);
-  return iree_hal_module_command_buffer_dispatch2(stack, module, module_state,
-                                                  &args);
+                               iree_hal_module_command_buffer_dispatch);
+  return iree_hal_module_command_buffer_dispatch(stack, module, module_state,
+                                                 &args);
 }
 
 // Argument signature: rriirIICiDCiirIID
@@ -1062,12 +971,12 @@
   const uint32_t* constants;
   iree_vm_size_t binding_count;
   const iree_vm_abi_iirII_t* bindings;
-} iree_hal_module_command_buffer_dispatch2_indirect_args_t;
-static iree_status_t iree_hal_module_command_buffer_dispatch2_indirect(
+} iree_hal_module_command_buffer_dispatch_indirect_args_t;
+static iree_status_t iree_hal_module_command_buffer_dispatch_indirect(
     iree_vm_stack_t* IREE_RESTRICT stack, void* IREE_RESTRICT module,
     iree_hal_module_state_t* IREE_RESTRICT state,
-    const iree_hal_module_command_buffer_dispatch2_indirect_args_t*
-        IREE_RESTRICT args) {
+    const iree_hal_module_command_buffer_dispatch_indirect_args_t* IREE_RESTRICT
+        args) {
   iree_hal_command_buffer_t* command_buffer = NULL;
   IREE_RETURN_IF_ERROR(iree_hal_command_buffer_check_deref(args->command_buffer,
                                                            &command_buffer));
@@ -1102,13 +1011,13 @@
     binding->length = iree_hal_cast_device_size(args->bindings[i].i4);
   }
 
-  return iree_hal_command_buffer_dispatch2_indirect(
+  return iree_hal_command_buffer_dispatch_indirect(
       command_buffer, executable, args->entry_point, workgroups_ref,
       iree_make_const_byte_span(args->constants,
                                 args->constant_count * sizeof(uint32_t)),
       bindings, (iree_hal_dispatch_flags_t)args->flags);
 }
-static iree_status_t iree_hal_module_command_buffer_dispatch2_indirect_shim(
+static iree_status_t iree_hal_module_command_buffer_dispatch_indirect_shim(
     iree_vm_stack_t* IREE_RESTRICT stack, iree_vm_native_function_flags_t flags,
     iree_byte_span_t args_storage, iree_byte_span_t rets_storage,
     iree_vm_native_function_target2_t target_fn, void* IREE_RESTRICT module,
@@ -1122,7 +1031,7 @@
     // Can't fit even with zero lengths.
     args_ok = false;
   }
-  iree_hal_module_command_buffer_dispatch2_indirect_args_t args = {
+  iree_hal_module_command_buffer_dispatch_indirect_args_t args = {
       .params = *(const iree_vm_abi_rriirII_t*)args_storage.data,
   };
   if (args_ok) {
@@ -1146,44 +1055,9 @@
   }
   IREE_ASSERT(target_fn ==
               (iree_vm_native_function_target2_t)
-                  iree_hal_module_command_buffer_dispatch2_indirect);
-  return iree_hal_module_command_buffer_dispatch2_indirect(stack, module,
-                                                           module_state, &args);
-}
-
-//===----------------------------------------------------------------------===//
-// iree_hal_descriptor_set_layout
-//===----------------------------------------------------------------------===//
-
-IREE_VM_ABI_EXPORT(iree_hal_module_descriptor_set_layout_create,  //
-                   iree_hal_module_state_t,                       //
-                   riCiiiD, r) {
-  iree_hal_device_t* device = NULL;
-  IREE_RETURN_IF_ERROR(iree_hal_device_check_deref(args->r0, &device));
-  iree_hal_descriptor_set_layout_flags_t flags =
-      (iree_hal_descriptor_set_layout_flags_t)args->i1;
-
-  iree_host_size_t binding_count = args->a2_count;
-  if (IREE_UNLIKELY(binding_count >
-                    IREE_HAL_MODULE_MAX_DESCRIPTOR_BINDING_COUNT)) {
-    return iree_make_status(
-        IREE_STATUS_OUT_OF_RANGE, "binding count %" PRIhsz " > %" PRIhsz,
-        binding_count, IREE_HAL_MODULE_MAX_DESCRIPTOR_BINDING_COUNT);
-  }
-  iree_hal_descriptor_set_layout_binding_t* bindings =
-      (iree_hal_descriptor_set_layout_binding_t*)iree_alloca(
-          binding_count * sizeof(iree_hal_descriptor_set_layout_binding_t));
-  for (iree_host_size_t i = 0; i < binding_count; ++i) {
-    bindings[i].binding = (uint32_t)args->a2[i].i0;
-    bindings[i].type = (iree_hal_descriptor_type_t)args->a2[i].i1;
-    bindings[i].flags = (iree_hal_descriptor_flags_t)args->a2[i].i2;
-  }
-
-  iree_hal_descriptor_set_layout_t* descriptor_set_layout = NULL;
-  IREE_RETURN_IF_ERROR(iree_hal_descriptor_set_layout_create(
-      device, flags, binding_count, bindings, &descriptor_set_layout));
-  rets->r0 = iree_hal_descriptor_set_layout_move_ref(descriptor_set_layout);
-  return iree_ok_status();
+                  iree_hal_module_command_buffer_dispatch_indirect);
+  return iree_hal_module_command_buffer_dispatch_indirect(stack, module,
+                                                          module_state, &args);
 }
 
 //===----------------------------------------------------------------------===//
@@ -1428,75 +1302,6 @@
 
 IREE_VM_ABI_EXPORT(iree_hal_module_executable_create,  //
                    iree_hal_module_state_t,            //
-                   rrrrCrD, r) {
-  iree_hal_device_t* device = NULL;
-  IREE_RETURN_IF_ERROR(iree_hal_device_check_deref(args->r0, &device));
-  iree_vm_buffer_t* executable_format = NULL;
-  IREE_RETURN_IF_ERROR(
-      iree_vm_buffer_check_deref(args->r1, &executable_format));
-  iree_string_view_t executable_format_str =
-      iree_vm_buffer_as_string(executable_format);
-  iree_vm_buffer_t* executable_data = NULL;
-  IREE_RETURN_IF_ERROR(iree_vm_buffer_check_deref(args->r2, &executable_data));
-  iree_host_size_t constant_count = 0;
-  const uint32_t* constants = NULL;
-  if (iree_vm_buffer_isa(args->r3)) {
-    iree_vm_buffer_t* constant_buffer = NULL;
-    IREE_RETURN_IF_ERROR(
-        iree_vm_buffer_check_deref(args->r3, &constant_buffer));
-    if (constant_buffer->data.data_length % 4 != 0) {
-      return iree_make_status(IREE_STATUS_INVALID_ARGUMENT,
-                              "constant buffer data must contain 4-byte "
-                              "elements but data length is %" PRIhsz,
-                              constant_buffer->data.data_length);
-    }
-    constant_count = constant_buffer->data.data_length / sizeof(uint32_t);
-    constants = (const uint32_t*)constant_buffer->data.data;
-  }
-
-  iree_hal_executable_cache_t* executable_cache = NULL;
-  IREE_RETURN_IF_ERROR(iree_hal_module_state_lookup_executable_cache(
-      state, device, &executable_cache));
-
-  iree_host_size_t pipeline_layout_count = args->a4_count;
-  iree_hal_pipeline_layout_t** pipeline_layouts = NULL;
-  IREE_RETURN_IF_ERROR(
-      iree_allocator_malloc(state->host_allocator,
-                            pipeline_layout_count * sizeof(pipeline_layouts[0]),
-                            (void**)&pipeline_layouts));
-  iree_status_t status = iree_ok_status();
-  for (iree_host_size_t i = 0; i < pipeline_layout_count; ++i) {
-    status = iree_hal_pipeline_layout_check_deref(args->a4[i].r0,
-                                                  &pipeline_layouts[i]);
-    if (!iree_status_is_ok(status)) break;
-  }
-
-  iree_hal_executable_t* executable = NULL;
-  if (iree_status_is_ok(status)) {
-    iree_hal_executable_params_t executable_params;
-    iree_hal_executable_params_initialize(&executable_params);
-    executable_params.caching_mode |=
-        executable_data->access == IREE_VM_BUFFER_ACCESS_ORIGIN_MODULE
-            ? IREE_HAL_EXECUTABLE_CACHING_MODE_ALIAS_PROVIDED_DATA
-            : 0;
-    executable_params.executable_format = executable_format_str;
-    executable_params.executable_data = iree_make_const_byte_span(
-        executable_data->data.data, executable_data->data.data_length);
-    executable_params.pipeline_layout_count = pipeline_layout_count;
-    executable_params.pipeline_layouts = pipeline_layouts;
-    executable_params.constant_count = constant_count;
-    executable_params.constants = constants;
-    status = iree_hal_executable_cache_prepare_executable(
-        executable_cache, &executable_params, &executable);
-  }
-
-  iree_allocator_free(state->host_allocator, pipeline_layouts);
-  rets->r0 = iree_hal_executable_move_ref(executable);
-  return status;
-}
-
-IREE_VM_ABI_EXPORT(iree_hal_module_executable_create2,  //
-                   iree_hal_module_state_t,             //
                    rrrr, r) {
   iree_hal_device_t* device = NULL;
   IREE_RETURN_IF_ERROR(iree_hal_device_check_deref(args->r0, &device));
@@ -1875,29 +1680,6 @@
 }
 
 //===----------------------------------------------------------------------===//
-// iree_hal_pipeline_layout_t
-//===----------------------------------------------------------------------===//
-
-IREE_VM_ABI_EXPORT(iree_hal_module_pipeline_layout_create,  //
-                   iree_hal_module_state_t,                 //
-                   riCrD, r) {
-  iree_hal_device_t* device = NULL;
-  IREE_RETURN_IF_ERROR(iree_hal_device_check_deref(args->r0, &device));
-  int32_t push_constants = (int32_t)args->i1;
-  iree_host_size_t set_layout_count = 0;
-  iree_hal_descriptor_set_layout_t** set_layouts = NULL;
-  IREE_VM_ABI_VLA_STACK_DEREF(args, a2_count, a2,
-                              iree_hal_descriptor_set_layout, 32,
-                              &set_layout_count, &set_layouts);
-
-  iree_hal_pipeline_layout_t* pipeline_layout = NULL;
-  IREE_RETURN_IF_ERROR(iree_hal_pipeline_layout_create(
-      device, push_constants, set_layout_count, set_layouts, &pipeline_layout));
-  rets->r0 = iree_hal_pipeline_layout_move_ref(pipeline_layout);
-  return iree_ok_status();
-}
-
-//===----------------------------------------------------------------------===//
 // VM module interface implementation
 //===----------------------------------------------------------------------===//
 
diff --git a/runtime/src/iree/modules/hal/types.c b/runtime/src/iree/modules/hal/types.c
index 52ce5a2..ebc8545 100644
--- a/runtime/src/iree/modules/hal/types.c
+++ b/runtime/src/iree/modules/hal/types.c
@@ -16,13 +16,9 @@
 IREE_VM_DEFINE_TYPE_ADAPTERS(iree_hal_channel, iree_hal_channel_t);
 IREE_VM_DEFINE_TYPE_ADAPTERS(iree_hal_command_buffer,
                              iree_hal_command_buffer_t);
-IREE_VM_DEFINE_TYPE_ADAPTERS(iree_hal_descriptor_set_layout,
-                             iree_hal_descriptor_set_layout_t);
 IREE_VM_DEFINE_TYPE_ADAPTERS(iree_hal_device, iree_hal_device_t);
 IREE_VM_DEFINE_TYPE_ADAPTERS(iree_hal_event, iree_hal_event_t);
 IREE_VM_DEFINE_TYPE_ADAPTERS(iree_hal_executable, iree_hal_executable_t);
-IREE_VM_DEFINE_TYPE_ADAPTERS(iree_hal_pipeline_layout,
-                             iree_hal_pipeline_layout_t);
 IREE_VM_DEFINE_TYPE_ADAPTERS(iree_hal_fence, iree_hal_fence_t);
 IREE_VM_DEFINE_TYPE_ADAPTERS(iree_hal_file, iree_hal_file_t);
 IREE_VM_DEFINE_TYPE_ADAPTERS(iree_hal_semaphore, iree_hal_semaphore_t);
@@ -87,10 +83,6 @@
   IREE_VM_REGISTER_HAL_C_TYPE(
       instance, iree_hal_command_buffer_t, "hal.command_buffer",
       iree_hal_command_buffer_destroy, iree_hal_command_buffer_registration);
-  IREE_VM_REGISTER_HAL_C_TYPE(instance, iree_hal_descriptor_set_layout_t,
-                              "hal.descriptor_set_layout",
-                              iree_hal_descriptor_set_layout_destroy,
-                              iree_hal_descriptor_set_layout_registration);
   IREE_VM_REGISTER_HAL_C_TYPE(instance, iree_hal_device_t, "hal.device",
                               iree_hal_device_destroy,
                               iree_hal_device_registration);
@@ -103,9 +95,6 @@
   IREE_VM_REGISTER_HAL_C_TYPE(instance, iree_hal_file_t, "hal.file",
                               iree_hal_file_destroy,
                               iree_hal_file_registration);
-  IREE_VM_REGISTER_HAL_C_TYPE(
-      instance, iree_hal_pipeline_layout_t, "hal.pipeline_layout",
-      iree_hal_pipeline_layout_destroy, iree_hal_pipeline_layout_registration);
   IREE_VM_REGISTER_HAL_C_TYPE(instance, iree_hal_semaphore_t, "hal.semaphore",
                               iree_hal_semaphore_destroy,
                               iree_hal_semaphore_registration);
@@ -166,9 +155,6 @@
   IREE_VM_RESOLVE_HAL_C_TYPE(instance, iree_hal_command_buffer_t,
                              "hal.command_buffer",
                              iree_hal_command_buffer_registration);
-  IREE_VM_RESOLVE_HAL_C_TYPE(instance, iree_hal_descriptor_set_layout_t,
-                             "hal.descriptor_set_layout",
-                             iree_hal_descriptor_set_layout_registration);
   IREE_VM_RESOLVE_HAL_C_TYPE(instance, iree_hal_device_t, "hal.device",
                              iree_hal_device_registration);
   IREE_VM_RESOLVE_HAL_C_TYPE(instance, iree_hal_event_t, "hal.event",
@@ -177,9 +163,6 @@
                              iree_hal_fence_registration);
   IREE_VM_RESOLVE_HAL_C_TYPE(instance, iree_hal_file_t, "hal.file",
                              iree_hal_file_registration);
-  IREE_VM_RESOLVE_HAL_C_TYPE(instance, iree_hal_pipeline_layout_t,
-                             "hal.pipeline_layout",
-                             iree_hal_pipeline_layout_registration);
   IREE_VM_RESOLVE_HAL_C_TYPE(instance, iree_hal_semaphore_t, "hal.semaphore",
                              iree_hal_semaphore_registration);
 
diff --git a/runtime/src/iree/modules/hal/types.h b/runtime/src/iree/modules/hal/types.h
index f546901..404f134 100644
--- a/runtime/src/iree/modules/hal/types.h
+++ b/runtime/src/iree/modules/hal/types.h
@@ -19,8 +19,6 @@
 IREE_VM_DECLARE_TYPE_ADAPTERS(iree_hal_channel, iree_hal_channel_t);
 IREE_VM_DECLARE_TYPE_ADAPTERS(iree_hal_command_buffer,
                               iree_hal_command_buffer_t);
-IREE_VM_DECLARE_TYPE_ADAPTERS(iree_hal_descriptor_set_layout,
-                              iree_hal_descriptor_set_layout_t);
 IREE_VM_DECLARE_TYPE_ADAPTERS(iree_hal_device, iree_hal_device_t);
 IREE_VM_DECLARE_TYPE_ADAPTERS(iree_hal_event, iree_hal_event_t);
 IREE_VM_DECLARE_TYPE_ADAPTERS(iree_hal_executable, iree_hal_executable_t);
@@ -28,8 +26,6 @@
                               iree_hal_executable_cache_t);
 IREE_VM_DECLARE_TYPE_ADAPTERS(iree_hal_fence, iree_hal_fence_t);
 IREE_VM_DECLARE_TYPE_ADAPTERS(iree_hal_file, iree_hal_file_t);
-IREE_VM_DECLARE_TYPE_ADAPTERS(iree_hal_pipeline_layout,
-                              iree_hal_pipeline_layout_t);
 IREE_VM_DECLARE_TYPE_ADAPTERS(iree_hal_semaphore, iree_hal_semaphore_t);
 
 #ifdef __cplusplus
diff --git a/runtime/src/iree/schemas/BUILD.bazel b/runtime/src/iree/schemas/BUILD.bazel
index 294c793..a8fbfca 100644
--- a/runtime/src/iree/schemas/BUILD.bazel
+++ b/runtime/src/iree/schemas/BUILD.bazel
@@ -30,39 +30,53 @@
     name = "cuda_executable_def_c_fbs",
     srcs = ["cuda_executable_def.fbs"],
     flatcc_args = FLATCC_ARGS,
+    includes = ["executable_debug_info.fbs"],
 )
 
 iree_flatbuffer_c_library(
-    name = "rocm_executable_def_c_fbs",
-    srcs = ["rocm_executable_def.fbs"],
+    name = "executable_debug_info_c_fbs",
+    srcs = ["executable_debug_info.fbs"],
     flatcc_args = FLATCC_ARGS,
 )
 
 iree_flatbuffer_c_library(
+    name = "hip_executable_def_c_fbs",
+    srcs = ["hip_executable_def.fbs"],
+    flatcc_args = FLATCC_ARGS,
+    includes = ["executable_debug_info.fbs"],
+)
+
+iree_flatbuffer_c_library(
     name = "metal_executable_def_c_fbs",
     srcs = ["metal_executable_def.fbs"],
     flatcc_args = FLATCC_ARGS,
+    includes = ["executable_debug_info.fbs"],
 )
 
 iree_flatbuffer_c_library(
-    name = "spirv_executable_def_c_fbs",
-    srcs = ["spirv_executable_def.fbs"],
+    name = "vulkan_executable_def_c_fbs",
+    srcs = ["vulkan_executable_def.fbs"],
     flatcc_args = FLATCC_ARGS,
+    includes = ["executable_debug_info.fbs"],
 )
 
 iree_flatbuffer_c_library(
-    name = "wgsl_executable_def_c_fbs",
-    srcs = ["wgsl_executable_def.fbs"],
+    name = "webgpu_executable_def_c_fbs",
+    srcs = ["webgpu_executable_def.fbs"],
     flatcc_args = FLATCC_ARGS,
+    includes = ["executable_debug_info.fbs"],
 )
 
 iree_build_test(
     name = "schema_build_test",
     targets = [
         ":bytecode_module_def_c_fbs",
+        ":cuda_executable_def_c_fbs",
+        ":executable_debug_info_c_fbs",
+        ":hip_executable_def_c_fbs",
         ":metal_executable_def_c_fbs",
-        ":spirv_executable_def_c_fbs",
-        ":wgsl_executable_def_c_fbs",
+        ":vulkan_executable_def_c_fbs",
+        ":webgpu_executable_def_c_fbs",
     ],
 )
 
diff --git a/runtime/src/iree/schemas/CMakeLists.txt b/runtime/src/iree/schemas/CMakeLists.txt
index 776616e..574b2ca 100644
--- a/runtime/src/iree/schemas/CMakeLists.txt
+++ b/runtime/src/iree/schemas/CMakeLists.txt
@@ -33,14 +33,16 @@
     "--builder"
     "--verifier"
     "--json"
+  INCLUDES
+    "executable_debug_info.fbs"
   PUBLIC
 )
 
 flatbuffer_c_library(
   NAME
-    rocm_executable_def_c_fbs
+    executable_debug_info_c_fbs
   SRCS
-    "rocm_executable_def.fbs"
+    "executable_debug_info.fbs"
   FLATCC_ARGS
     "--reader"
     "--builder"
@@ -51,6 +53,21 @@
 
 flatbuffer_c_library(
   NAME
+    hip_executable_def_c_fbs
+  SRCS
+    "hip_executable_def.fbs"
+  FLATCC_ARGS
+    "--reader"
+    "--builder"
+    "--verifier"
+    "--json"
+  INCLUDES
+    "executable_debug_info.fbs"
+  PUBLIC
+)
+
+flatbuffer_c_library(
+  NAME
     metal_executable_def_c_fbs
   SRCS
     "metal_executable_def.fbs"
@@ -59,32 +76,38 @@
     "--builder"
     "--verifier"
     "--json"
+  INCLUDES
+    "executable_debug_info.fbs"
   PUBLIC
 )
 
 flatbuffer_c_library(
   NAME
-    spirv_executable_def_c_fbs
+    vulkan_executable_def_c_fbs
   SRCS
-    "spirv_executable_def.fbs"
+    "vulkan_executable_def.fbs"
   FLATCC_ARGS
     "--reader"
     "--builder"
     "--verifier"
     "--json"
+  INCLUDES
+    "executable_debug_info.fbs"
   PUBLIC
 )
 
 flatbuffer_c_library(
   NAME
-    wgsl_executable_def_c_fbs
+    webgpu_executable_def_c_fbs
   SRCS
-    "wgsl_executable_def.fbs"
+    "webgpu_executable_def.fbs"
   FLATCC_ARGS
     "--reader"
     "--builder"
     "--verifier"
     "--json"
+  INCLUDES
+    "executable_debug_info.fbs"
   PUBLIC
 )
 
diff --git a/runtime/src/iree/schemas/cuda_executable_def.fbs b/runtime/src/iree/schemas/cuda_executable_def.fbs
index df78a7d..afb2834 100644
--- a/runtime/src/iree/schemas/cuda_executable_def.fbs
+++ b/runtime/src/iree/schemas/cuda_executable_def.fbs
@@ -4,47 +4,68 @@
 // See https://llvm.org/LICENSE.txt for license information.
 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 
+include "iree/schemas/executable_debug_info.fbs";
+
 namespace iree.hal.cuda;
 
-// 'CUDA Executable'.
-file_identifier "CUDA";
-file_extension "cuda";
+// 'CUDA v1 Executable'.
+file_identifier "CDA1";
+file_extension "cda1";
 
-// A struct for the kernel block size along each dimensions.
-struct BlockSizeDef {
+// A struct for the kernel block size along each dimension.
+struct BlockDims {
   x:uint32;
   y:uint32;
   z:uint32;
 }
 
-// Source code location denoted by a file name and line within that file.
-table FileLineLocDef {
-  filename:string;
-  line:int32;
+// Describes the behavior of each binding.
+enum BindingBits:uint64 (bit_flags) {
+  READ_ONLY = 0,  // 1u << 0
+  INDIRECT = 1,  // 1u << 1
+}
+
+// Information about an exported function on the executable.
+table ExportDef {
+  // Ordinal of the shader library containing the entry point in the executable
+  // libraries list.
+  module_ordinal:uint32;
+
+  // String name of the exported kernel function in the module.
+  kernel_name:string;
+
+  // Grid block dimensions for the export.
+  block_dims:BlockDims;
+
+  // Size of dynamic shared memory per block.
+  block_shared_memory_size:uint32;
+
+  // Total number of 32-bit push constants used by the export.
+  constant_count:uint32;
+
+  // Binding count and flags for each binding.
+  binding_flags:[BindingBits];
+
+  // Optional debug information related to the export.
+  debug_info:iree.hal.debug.ExportDef;
+}
+
+// A library containing one or more exported functions.
+table ModuleDef {
+  // PTX image.
+  ptx_image:string;
 }
 
 table ExecutableDef {
-  // A map of entry point ordinals to string names as used in the shader
-  // library.
-  entry_points:[string];
+  // Exported functions in canonical executable entry point order.
+  exports:[ExportDef];
 
-  // Block sizes for each entry point.
-  //
-  // Currently the thread group size/block size is decided during code gen but
-  // in CUDA it is set by the runtime.
-  block_sizes:[BlockSizeDef];
-  // Size of dynamic shared memory.
-  shared_memory_size:[uint32];
+  // A list of all kernel modules used by the executable.
+  // Exports index into this list and multiple exports may use the same library.
+  modules:[ModuleDef];
 
-  // PTX string of the module.
-  ptx_image:string;
-
-  // TODO(thomasraoux): Add potential cuBin binary specialized for some targets.
-
-  // A map of entry point ordinals to source locations.
-  // This information is optional and may be used by debuggers and profilers to
-  // associate executable entry points with the source that generated them.
-  source_locations:[FileLineLocDef];
+  // Embedded source files sorted ascending by path.
+  source_files:[iree.hal.debug.SourceFileDef];
 }
 
 root_type ExecutableDef;
diff --git a/runtime/src/iree/schemas/executable_debug_info.fbs b/runtime/src/iree/schemas/executable_debug_info.fbs
new file mode 100644
index 0000000..7388a6f
--- /dev/null
+++ b/runtime/src/iree/schemas/executable_debug_info.fbs
@@ -0,0 +1,41 @@
+// Copyright 2024 The IREE Authors
+//
+// Licensed under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+
+namespace iree.hal.debug;
+
+// Source code location denoted by a file name and line within that file.
+table FileLineLocDef {
+  filename:string;
+  line:int32;
+}
+
+// Source location keyed by a string compilation stage name.
+table StageLocationDef {
+  stage:string;
+  location:FileLineLocDef;
+}
+
+// Debug information for an exported function.
+// Empty/omitted if the compilation debug level is 0.
+table ExportDef {
+  // Original export name from the producer tool.
+  name:string;
+
+  // Source location in the canonical form to be presented in most tooling.
+  // Generally included with compilation debug level >= 1.
+  location:FileLineLocDef;
+
+  // Table of source locations keyed by compilation stage name.
+  // Sorted ascending by stage name.
+  // Generally included with compilation debug level >= 3.
+  stage_locations:[StageLocationDef];
+}
+
+// An embedded source file referenced by locations in the file.
+table SourceFileDef {
+  path:string;
+  content:[uint8];
+}
diff --git a/runtime/src/iree/schemas/hip_executable_def.fbs b/runtime/src/iree/schemas/hip_executable_def.fbs
new file mode 100644
index 0000000..79a8642
--- /dev/null
+++ b/runtime/src/iree/schemas/hip_executable_def.fbs
@@ -0,0 +1,71 @@
+// Copyright 2021 The IREE Authors
+//
+// Licensed under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+
+include "iree/schemas/executable_debug_info.fbs";
+
+namespace iree.hal.hip;
+
+// 'HIP v1 Executable'.
+file_identifier "HIP1";
+file_extension "hip1";
+
+// A struct for the kernel block size along each dimension.
+struct BlockDims {
+  x:uint32;
+  y:uint32;
+  z:uint32;
+}
+
+// Describes the behavior of each binding.
+enum BindingBits:uint64 (bit_flags) {
+  READ_ONLY = 0,  // 1u << 0
+  INDIRECT = 1,  // 1u << 1
+}
+
+// Information about an exported function on the executable.
+table ExportDef {
+  // Ordinal of the shader library containing the entry point in the executable
+  // libraries list.
+  module_ordinal:uint32;
+
+  // String name of the exported kernel function in the module.
+  kernel_name:string;
+
+  // Grid block dimensions for the export.
+  block_dims:BlockDims;
+
+  // Size of dynamic shared memory per block.
+  block_shared_memory_size:uint32;
+
+  // Total number of 32-bit push constants used by the export.
+  constant_count:uint32;
+
+  // Binding count and flags for each binding.
+  binding_flags:[BindingBits];
+
+  // Optional debug information related to the export.
+  debug_info:iree.hal.debug.ExportDef;
+}
+
+// A library containing one or more exported functions.
+table ModuleDef {
+  // HSACO image.
+  hsaco_image:string;
+}
+
+table ExecutableDef {
+  // Exported functions in canonical executable entry point order.
+  exports:[ExportDef];
+
+  // A list of all kernel modules used by the executable.
+  // Exports index into this list and multiple exports may use the same library.
+  modules:[ModuleDef];
+
+  // Embedded source files sorted ascending by path.
+  source_files:[iree.hal.debug.SourceFileDef];
+}
+
+root_type ExecutableDef;
diff --git a/runtime/src/iree/schemas/metal_executable_def.fbs b/runtime/src/iree/schemas/metal_executable_def.fbs
index dc72781..fbee2ad 100644
--- a/runtime/src/iree/schemas/metal_executable_def.fbs
+++ b/runtime/src/iree/schemas/metal_executable_def.fbs
@@ -4,44 +4,105 @@
 // See https://llvm.org/LICENSE.txt for license information.
 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 
+include "iree/schemas/executable_debug_info.fbs";
+
 namespace iree.hal.metal;
 
-// 'Metal Executable'.
-file_identifier "MTLE";
-file_extension "mtle";
+// 'Metal v1 Executable'.
+file_identifier "MTL1";
+file_extension "mtl1";
 
-// A struct for Metal threadgroup size along each dimension.
+// Defines the threadgroup size along each dimension.
 struct ThreadgroupSize {
   x:uint32;
   y:uint32;
   z:uint32;
 }
 
+// Describes the behavior of each buffer binding.
+// Used to populate MTLPipelineBufferDescriptor.
+enum BindingBits:uint64 (bit_flags) {
+  IMMUTABLE = 0,  // 1u << 0
+}
+
+// A compute pipeline (roughly MTLComputePipelineDescriptor).
+table PipelineDef {
+  // Ordinal of the MTLLibrary containing the entry point in the executable
+  // libraries list.
+  library_ordinal:uint32;
+
+  // String name of the entry point MTLFunction in the library.
+  entry_point:string;
+
+  // The maximum number of threads in a threadgroup that can be dispatched to
+  // the compute function. Omit for the default as determined at runtime.
+  // See: maxTotalThreadsPerThreadgroup
+  max_threads_per_threadgroup:uint32;
+
+  // Threadgroup size passed to dispatch commands.
+  threadgroup_size:ThreadgroupSize;
+
+  // Indicates whether the threadgroup size is always a multiple of the thread
+  // execution width.
+  // See: threadGroupSizeIsMultipleOfThreadExecutionWidth
+  threadgroup_size_aligned:bool;
+
+  // Total count in 32-bit words of constants passed to dispatch operations.
+  constant_count:uint32;
+
+  // Buffers used by the pipeline in the order as passed to dispatch operations.
+  binding_flags:[BindingBits];
+
+  // Optional debug information related to the export.
+  debug_info:iree.hal.debug.ExportDef;
+}
+
+// MSL (Metal Shading Language) source code.
+table MSLSourceDef {
+  // MTLLanguageVersion enum indicating the version to interpret the code as.
+  // When omitted the default (latest) version will be used.
+  version:uint32;
+
+  // Source text.
+  code:string;
+
+  // TODO: add compilation options we want to control from the compiler:
+  // https://developer.apple.com/documentation/metal/mtlcompileoptions/4354201-mathmode
+  // https://developer.apple.com/documentation/metal/mtlcompileoptions/4354200-mathfloatingpointfunctions
+  // https://developer.apple.com/documentation/metal/mtlcompileoptions/3564462-preserveinvariance
+}
+
+// A MTLLibrary containing one or more functions.
+table LibraryDef {
+  // Optional MSL (Metal Shading Language) textual source code.
+  // May be provided even if metallib binaries are present in order to support
+  // fallback compilation on new devices.
+  source:MSLSourceDef;
+
+  // Precompiled Metal library.
+  // https://developer.apple.com/documentation/metal/shader_libraries/metal_libraries/building_a_shader_library_by_precompiling_source_files
+  metallib:string;
+
+  // Split debug symbols for the precompiled Metal library.
+  // https://developer.apple.com/documentation/metal/shader_libraries/metal_libraries/generating_and_loading_a_metal_library_symbol_file
+  metallibsym:string;
+}
+
 // A Metal shader library and runtime pipeline state description.
 // This information is used to create MTLLibrary, MTLFunction and pipeline
 // state objects.
 table ExecutableDef {
-  // A map of entry point ordinals to string names as used in the shader
-  // library.
-  entry_points:[string];
+  // Exported functions in canonical executable entry point order.
+  // Each creates a single MTLComputePipelineState.
+  pipelines:[PipelineDef];
 
-  // Threadgroup sizes for each entry point.
-  //
-  // We need this because workgroup size is directly baked inside SPIR-V code,
-  // but in Metal it's specified when dispatching workload. So when cross
-  // compiling SPIR-V to MSL, we need to persist the information here so that
-  // later it can be used for dispatching.
-  // TODO(antiagainst): support SPIR-V specialization constant.
-  threadgroup_sizes:[ThreadgroupSize];
+  // A list of libraries hosting various entry points. Each library contains at
+  // least one entry point.
+  // This list may not have the same size as the exports list.
+  libraries:[LibraryDef];
 
-  // Shader content can be provided as either a serialized library or in the
-  // form of source code strings.
-
-  // Serialized Metal shader library.
-  // TODO(#14047): enable linking and consolidate into one library.
-  shader_libraries:[string];
-  // Original Metal shader source code.
-  shader_sources:[string];
+  // Embedded source files sorted ascending by path.
+  source_files:[iree.hal.debug.SourceFileDef];
 }
 
 root_type ExecutableDef;
diff --git a/runtime/src/iree/schemas/rocm_executable_def.fbs b/runtime/src/iree/schemas/rocm_executable_def.fbs
deleted file mode 100644
index 6df6d02..0000000
--- a/runtime/src/iree/schemas/rocm_executable_def.fbs
+++ /dev/null
@@ -1,73 +0,0 @@
-// Copyright 2021 The IREE Authors
-//
-// Licensed under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-
-namespace iree.hal.rocm;
-
-// 'ROCM Executable'.
-file_identifier "ROCM";
-file_extension "rocm";
-
-// A struct for the kernel block size along each dimensions.
-struct BlockSizeDef {
-  x:uint32;
-  y:uint32;
-  z:uint32;
-}
-
-// A struct for a source code location that consists of a file name and
-// a line number within that file.
-table FileLineLocDef {
-  filename:string;
-  line:int32;
-}
-
-// Source location keyed by a string compilation stage name.
-table StageLocationDef {
-  stage:string;
-  location:FileLineLocDef;
-}
-
-// Table of stage locations sorted in ascending order by stage name.
-table StageLocationsDef {
-  locations:[StageLocationDef];
-}
-
-// An embedded source file referenced by locations in the file.
-table SourceFileDef {
-  path:string;
-  content:[uint8];
-}
-
-table ExecutableDef {
-  // A map of entry point ordinals to string names as used in the shader
-  // library.
-  entry_points:[string];
-
-  // Block sizes for each entry point.
-  // This list has the same size as the entry_points list.
-  block_sizes:[BlockSizeDef];
-
-  // Size of dynamic shared memory.
-  // This list has the same size as the entry_points list.
-  shared_memory_sizes:[uint32];
-
-  // HSACO string of the module.
-  hsaco_image:string;
-
-  // A map of entry point ordinals to source locations.
-  // This information is optional and may be used by debuggers and profilers to
-  // associate executable entry points with the source that generated them.
-  source_locations:[FileLineLocDef];
-
-  // Table of source locations per entry point keyed by a string compilation
-  // stage name. Sorted ascending by name.
-  stage_locations:[StageLocationsDef];
-
-  // Embedded source files sorted ascending by path.
-  source_files:[SourceFileDef];
-}
-
-root_type ExecutableDef;
diff --git a/runtime/src/iree/schemas/spirv_executable_def.fbs b/runtime/src/iree/schemas/spirv_executable_def.fbs
deleted file mode 100644
index 4eaea8f..0000000
--- a/runtime/src/iree/schemas/spirv_executable_def.fbs
+++ /dev/null
@@ -1,76 +0,0 @@
-// Copyright 2019 The IREE Authors
-//
-// Licensed under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-
-namespace iree.hal.spirv;
-
-// 'SPIR-V Executable'.
-file_identifier "SPVE";
-file_extension "spve";
-
-table ShaderModuleDef {
-  // SPIR-V code blob.
-  code:[uint32];
-}
-
-// Source code location denoted by a file name and line within that file.
-table FileLineLocDef {
-  filename:string;
-  line:int32;
-}
-
-// Source location keyed by a string compilation stage name.
-table StageLocationDef {
-  stage:string;
-  location:FileLineLocDef;
-}
-
-// Table of stage locations sorted in ascending order by stage name.
-table StageLocationsDef {
-  locations:[StageLocationDef];
-}
-
-// An embedded source file referenced by locations in the file.
-table SourceFileDef {
-  path:string;
-  content:[uint8];
-}
-
-// A SPIR-V shader module and runtime pipeline layout description.
-// This information is used to create the VkShaderModule, VkPipelineLayout, and
-// any required VkDescriptorSetLayouts.
-table ExecutableDef {
-  // A map of entry point ordinals to string names as used in the shader module.
-  entry_points:[string];
-
-  // A list of required subgroup sizes for each entry point. 0 means no
-  // requirement.
-  // This list has the same size as the entry_points list.
-  subgroup_sizes:[uint32];
-
-  // A map of entry point ordinals to the indices of the containing shader
-  // modules (the following field).
-  // This list has the same size as the entry_points list.
-  shader_module_indices:[uint32];
-
-  // A list of shader modules hosting various entry points. Each shader module
-  // contains at least one entry point.
-  // This list may not have the same size as the entry_points list.
-  shader_modules:[ShaderModuleDef];
-
-  // A map of entry point ordinals to source locations.
-  // This information is optional and may be used by debuggers and profilers to
-  // associate executable entry points with the source that generated them.
-  source_locations:[FileLineLocDef];
-
-  // Table of source locations per entry point keyed by a string compilation
-  // stage name. Sorted ascending by name.
-  stage_locations:[StageLocationsDef];
-
-  // Embedded source files sorted ascending by path.
-  source_files:[SourceFileDef];
-}
-
-root_type ExecutableDef;
diff --git a/runtime/src/iree/schemas/vulkan_executable_def.fbs b/runtime/src/iree/schemas/vulkan_executable_def.fbs
new file mode 100644
index 0000000..476e0d3
--- /dev/null
+++ b/runtime/src/iree/schemas/vulkan_executable_def.fbs
@@ -0,0 +1,109 @@
+// Copyright 2019 The IREE Authors
+//
+// Licensed under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+
+include "iree/schemas/executable_debug_info.fbs";
+
+namespace iree.hal.vulkan;
+
+// 'Vulkan v1 Executable'.
+file_identifier "VKE1";
+file_extension "vk1";
+
+// Direct overlay of the VkPushConstantRange struct.
+struct PushConstantRange {
+  stage_flags:uint32;  // VkShaderStageFlags
+  offset:uint32;
+  size:uint32;
+}
+
+// Direct overlay of the VkDescriptorType enum.
+enum VkDescriptorType:uint32 {
+  SAMPLER = 0,                              // VK_DESCRIPTOR_TYPE_SAMPLER
+  UNIFORM_BUFFER = 6,                       // VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER
+  STORAGE_BUFFER = 7,                       // VK_DESCRIPTOR_TYPE_STORAGE_BUFFER
+}
+
+// Subset of VkDescriptorSetLayoutBinding.
+table DescriptorSetLayoutBindingDef {
+  binding:uint32;
+  descriptor_type:VkDescriptorType;
+  descriptor_count:uint32;
+  stage_flags:uint32;  // VkShaderStageFlags
+}
+
+// Used to create an VkDescriptorSetLayout.
+// This is the minimum required information. Additional flags and values are
+// provided by the runtime as they may vary based on available extensions or
+// implementation options.
+table DescriptorSetLayoutDef {
+  bindings:[DescriptorSetLayoutBindingDef];
+}
+
+// Used to create an VkPipelineLayout.
+// This is the minimum required information. Additional flags and values are
+// provided by the runtime as they may vary based on available extensions or
+// implementation options. We may have fields only used by the implementation
+// in order to convey semantics instead of concrete values.
+table PipelineLayoutDef {
+  descriptor_set_layout_ordinals:[uint32];
+  push_constant_ranges:[PushConstantRange];
+}
+
+// Used to create a VkShaderModule.
+table ShaderModuleDef {
+  // SPIR-V code blob.
+  spirv_code:[uint32];
+}
+
+// Information about an exported function on the executable represented as a
+// VkPipeline in Vulkan.
+table PipelineDef {
+  // Ordinal of the shader module containing the entry point in the executable
+  // shader module list.
+  shader_module_ordinal:uint32;
+
+  // String name of the entry point function in the shader module.
+  entry_point:string;
+
+  // Ordinal of the pipeline layout used by the entry point in the executable
+  // pipeline layouts list.
+  pipeline_layout_ordinal:uint32;
+
+  // Required subgroup size as used for VK_EXT_subgroup_size_control, if any.
+  // Omitting or setting to zero indicates no size is specified.
+  subgroup_size:uint32;
+
+  // Optional debug information related to the export.
+  debug_info:iree.hal.debug.ExportDef;
+}
+
+// A SPIR-V shader module and runtime pipeline layout description.
+// This information is used to create the VkShaderModule, VkPipelineLayout, and
+// any required VkDescriptorSetLayouts.
+table ExecutableDef {
+  // Exported functions in canonical executable entry point order.
+  // Each creates a single VkPipeline.
+  pipelines:[PipelineDef];
+
+  // A list of descriptor set layouts used by the pipeline_layouts in this def.
+  // Pipeline layouts reference into the list.
+  descriptor_set_layouts:[DescriptorSetLayoutDef];
+
+  // A list of pipeline layouts. Exports reference layouts in this list and
+  // multiple exports present in mutliple shader modules may share layouts.
+  // This list may not have the same size as the pipelines list.
+  pipeline_layouts:[PipelineLayoutDef];
+
+  // A list of shader modules hosting various entry points. Each shader module
+  // contains at least one entry point.
+  // This list may not have the same size as the pipelines list.
+  shader_modules:[ShaderModuleDef];
+
+  // Embedded source files sorted ascending by path.
+  source_files:[iree.hal.debug.SourceFileDef];
+}
+
+root_type ExecutableDef;
diff --git a/runtime/src/iree/schemas/wgsl_executable_def.fbs b/runtime/src/iree/schemas/webgpu_executable_def.fbs
similarity index 74%
rename from runtime/src/iree/schemas/wgsl_executable_def.fbs
rename to runtime/src/iree/schemas/webgpu_executable_def.fbs
index 79c821f..accdaa1 100644
--- a/runtime/src/iree/schemas/wgsl_executable_def.fbs
+++ b/runtime/src/iree/schemas/webgpu_executable_def.fbs
@@ -4,18 +4,20 @@
 // See https://llvm.org/LICENSE.txt for license information.
 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 
-namespace iree.hal.wgsl;
+include "iree/schemas/executable_debug_info.fbs";
 
-// 'WGSL Executable'.
-file_identifier "WGSL";
-file_extension "wgsl";
+namespace iree.hal.webgpu;
+
+// 'WGSL v1 Executable'.
+file_identifier "WGS1";
+file_extension "wgs1";
 
 // Contents of one WGPUShaderModule, possibly with multiple entry points.
 // Entry points have the name "dN" where N is the executable-wide entry point
 // ordinal.
 table ShaderModuleDef {
   // WGSL source code.
-  code:string;
+  wgsl_source:string;
 
   // Optional `source-map-v3` format source map.
   source_map:string;
@@ -28,6 +30,9 @@
   // A mapping of executable entry point ordinals to the shader module in which
   // they reside.
   entry_points:[uint];
+
+  // Embedded source files sorted ascending by path.
+  source_files:[iree.hal.debug.SourceFileDef];
 }
 
 root_type ExecutableDef;
diff --git a/runtime/src/iree/task/task.c b/runtime/src/iree/task/task.c
index 3146b1d..ae4fbf9 100644
--- a/runtime/src/iree/task/task.c
+++ b/runtime/src/iree/task/task.c
@@ -592,7 +592,7 @@
     int xyz_string_length =
         snprintf(xyz_string, IREE_ARRAYSIZE(xyz_string), "%ux%ux%u",
                  workgroup_count[0], workgroup_count[1], workgroup_count[2]);
-    IREE_TRACE_ZONE_APPEND_TEXT_STRING_VIEW(z0, xyz_string, xyz_string_length);
+    IREE_TRACE_ZONE_APPEND_TEXT(z0, xyz_string, xyz_string_length);
   });
 #endif  // IREE_HAL_VERBOSE_TRACING_ENABLE
 
diff --git a/runtime/src/iree/vm/bytecode/dispatch.c b/runtime/src/iree/vm/bytecode/dispatch.c
index b8352e5..d3aea37 100644
--- a/runtime/src/iree/vm/bytecode/dispatch.c
+++ b/runtime/src/iree/vm/bytecode/dispatch.c
@@ -633,8 +633,13 @@
       stack, call.function, cconv_arguments, call.arguments, cconv_results,
       &current_frame, &regs));
 
-  return iree_vm_bytecode_dispatch(stack, module, current_frame, regs,
-                                   call.results);
+  iree_status_t status = iree_vm_bytecode_dispatch(stack, module, current_frame,
+                                                   regs, call.results);
+  if (!iree_status_is_ok(status) && !iree_status_is_deferred(status)) {
+    // Balance the external_enter on failure.
+    IREE_IGNORE_ERROR(iree_vm_stack_function_leave(stack));
+  }
+  return status;
 }
 
 iree_status_t iree_vm_bytecode_dispatch_resume(
diff --git a/samples/custom_dispatch/cpu/embedded/example_hal.mlir b/samples/custom_dispatch/cpu/embedded/example_hal.mlir
index 23d15e4..64b246b 100644
--- a/samples/custom_dispatch/cpu/embedded/example_hal.mlir
+++ b/samples/custom_dispatch/cpu/embedded/example_hal.mlir
@@ -41,16 +41,19 @@
 // These can come from compiler flags and multiple targets can be supported
 // It's possible, for example, to support targeting multiple devices in the same
 // compiled binary (CPU + Vulkan, etc).
-#cpu_target = #hal.device.target<"llvm-cpu", [
+#cpu_target = #hal.device.target<"local", [
   #x86_64_target
 ]> : !hal.device
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  <0, bindings = [
-      <0, storage_buffer, ReadOnly>,
-      <1, storage_buffer, ReadOnly>,
-      <2, storage_buffer>
-  ]>
+#pipeline_layout_0 = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer, ReadOnly>,
+  #hal.pipeline.binding<storage_buffer, ReadOnly>,
+  #hal.pipeline.binding<storage_buffer>
+]>
+
+#pipeline_layout_1 = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer, ReadOnly>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 module @example attributes {hal.device.targets = [#cpu_target]} {
@@ -89,19 +92,7 @@
       // The ordinal must be assigned by the user and unique for the executable.
       // The layout defines the required bindings and push constants and can be
       // thought of as the function signature.
-      hal.executable.export public @simple_mul ordinal(0) layout(#pipeline_layout) attributes {
-        // Bindings are automatically inferred when possible as part of the
-        // ABI but can be overridden if the user wants to use features such
-        // as sparse bindings or multiple descriptor sets. To do so the
-        // `hal.interface.bindings` attribute can be added to a dispatch op
-        // as follows mapping tensor operands/results to the pipeline layout
-        // sets/bindings:
-        hal.interface.bindings = [
-          #hal.interface.binding<0, 0>,
-          #hal.interface.binding<0, 1>,
-          #hal.interface.binding<0, 2>
-        ]
-      } {
+      hal.executable.export public @simple_mul ordinal(0) layout(#pipeline_layout_0) {
       ^bb0(%device: !hal.device, %workload: index):
         // This host function is used to compute the XYZ workgroup count
         // dispatched at runtime. It can query the %device for capabilities
@@ -114,7 +105,7 @@
       }
 
       // Similar to the above but in-place by using a read/write binding.
-      hal.executable.export public @simple_mul_inplace ordinal(1) layout(#pipeline_layout) {
+      hal.executable.export public @simple_mul_inplace ordinal(1) layout(#pipeline_layout_1) {
       ^bb0(%device: !hal.device, %workload: index):
         %x = affine.apply affine_map<()[s0] -> (s0 ceildiv 64)>()[%workload]
         %c1 = arith.constant 1 : index
@@ -168,7 +159,7 @@
           %c0 = arith.constant 0 : index
 
           // Push constants representing primitive operands can be loaded here.
-          %dim_i32 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : i32
+          %dim_i32 = hal.interface.constant.load layout(#pipeline_layout_0) ordinal(0) : i32
           %dim = arith.index_castui %dim_i32 : i32 to index
 
           // This function is invoked once per workgroup so determine where this
@@ -179,9 +170,9 @@
           %tid = affine.apply affine_map<()[s0] -> (s0 * 64)>()[%workgroup_id_x]
 
           // Bindings are accessed by reference.
-          %binding0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<?xf32>{%dim}
-          %binding1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<?xf32>{%dim}
-          %binding2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<?xf32>{%dim}
+          %binding0 = hal.interface.binding.subspan layout(#pipeline_layout_0) binding(0) alignment(64) offset(%c0) : memref<?xf32>{%dim}
+          %binding1 = hal.interface.binding.subspan layout(#pipeline_layout_0) binding(1) alignment(64) offset(%c0) : memref<?xf32>{%dim}
+          %binding2 = hal.interface.binding.subspan layout(#pipeline_layout_0) binding(2) alignment(64) offset(%c0) : memref<?xf32>{%dim}
 
           // Call the externally defined C function with an (almost) plain C
           // calling convention (see above for details about the mess memrefs
@@ -206,15 +197,15 @@
         func.func @simple_mul_inplace() {
           %c0 = arith.constant 0 : index
 
-          %dim_i32 = hal.interface.constant.load layout(#pipeline_layout) ordinal(0) : i32
+          %dim_i32 = hal.interface.constant.load layout(#pipeline_layout_1) ordinal(0) : i32
           %dim = arith.index_castui %dim_i32 : i32 to index
 
           %workgroup_id_x = hal.interface.workgroup.id[0] : index
           %tid = affine.apply affine_map<()[s0] -> (s0 * 64)>()[%workgroup_id_x]
 
           // Same as above but note that we're treating %binding1 as read/write.
-          %binding0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<?xf32>{%dim}
-          %binding1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<?xf32>{%dim}
+          %binding0 = hal.interface.binding.subspan layout(#pipeline_layout_1) binding(0) alignment(64) offset(%c0) : memref<?xf32>{%dim}
+          %binding1 = hal.interface.binding.subspan layout(#pipeline_layout_1) binding(1) alignment(64) offset(%c0) : memref<?xf32>{%dim}
 
           func.call @simple_mul_inplace_workgroup(%binding0, %binding1, %dim, %tid) : (memref<?xf32>, memref<?xf32>, index, index) -> ()
 
@@ -266,10 +257,6 @@
     // to allow user-controlled overrides of the dispatches, custom selection
     // logic based on runtime parameters, etc. In general, though, the above
     // automatic selection should be used.
-    //
-    // Note that we don't declare the hal.interface.bindings and let them be
-    // inferred - this only works when either specifying the variant that has
-    // a pipeline layout defined or all variants have the same pipeline layouts.
     %2 = flow.dispatch @executable::@x86_64::@simple_mul_inplace[%dim](%dim_i32, %0, %1) : (i32, tensor<?xf32>{%dim}, tensor<?xf32>{%dim}) -> %1{%dim}
 
     // CHECK: 8xf32=96 96 96 96 96 96 96 96
diff --git a/samples/custom_dispatch/cpu/embedded/example_stream.mlir b/samples/custom_dispatch/cpu/embedded/example_stream.mlir
index 910a007..9ca8773 100644
--- a/samples/custom_dispatch/cpu/embedded/example_stream.mlir
+++ b/samples/custom_dispatch/cpu/embedded/example_stream.mlir
@@ -45,7 +45,7 @@
 // These can come from compiler flags and multiple targets can be supported
 // It's possible, for example, to support targeting multiple devices in the same
 // compiled binary (CPU + Vulkan, etc).
-#cpu_target = #hal.device.target<"llvm-cpu", [
+#cpu_target = #hal.device.target<"local", [
   #arm_64_target,
   #x86_64_target
 ]> : !hal.device
diff --git a/samples/custom_dispatch/cpu/embedded/example_transform.mlir b/samples/custom_dispatch/cpu/embedded/example_transform.mlir
index 858052c..970c6b3 100644
--- a/samples/custom_dispatch/cpu/embedded/example_transform.mlir
+++ b/samples/custom_dispatch/cpu/embedded/example_transform.mlir
@@ -26,7 +26,7 @@
 // multiple targets, but this example is maintaining an implicit requirement
 // that the custom kernel being spliced in is supported by the target device,
 // hence we only support llvm-cpu here.
-#cpu_target = #hal.device.target<"llvm-cpu", [
+#cpu_target = #hal.device.target<"local", [
   #x86_64_target
 ]> : !hal.device
 
diff --git a/samples/custom_dispatch/cpu/embedded/example_transform_spec.mlir b/samples/custom_dispatch/cpu/embedded/example_transform_spec.mlir
index 999539c..159c95f 100644
--- a/samples/custom_dispatch/cpu/embedded/example_transform_spec.mlir
+++ b/samples/custom_dispatch/cpu/embedded/example_transform_spec.mlir
@@ -16,16 +16,14 @@
 // These can come from compiler flags and multiple targets can be supported
 // It's possible, for example, to support targeting multiple devices in the same
 // compiled binary (CPU + Vulkan, etc).
-#cpu_target = #hal.device.target<"llvm-cpu", [
+#cpu_target = #hal.device.target<"local", [
   #x86_64_target
 ]>
 
-#pipeline_layout = #hal.pipeline.layout<push_constants = 1, sets = [
-  <0, bindings = [
-      <0, storage_buffer, ReadOnly>,
-      <1, storage_buffer, ReadOnly>,
-      <2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 1, bindings = [
+  #hal.pipeline.binding<storage_buffer, ReadOnly>,
+  #hal.pipeline.binding<storage_buffer, ReadOnly>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 module attributes {transform.with_named_sequence} {
@@ -56,9 +54,9 @@
           %workgroup_id_x = hal.interface.workgroup.id[0] : index
           %tid = affine.apply affine_map<()[s0] -> (s0 * 64)>()[%workgroup_id_x]
 
-          %binding0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(64) offset(%c0) : memref<?xf32>{%dim}
-          %binding1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(64) offset(%c0) : memref<?xf32>{%dim}
-          %binding2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(64) offset(%c0) : memref<?xf32>{%dim}
+          %binding0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(64) offset(%c0) : memref<?xf32>{%dim}
+          %binding1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(64) offset(%c0) : memref<?xf32>{%dim}
+          %binding2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(64) offset(%c0) : memref<?xf32>{%dim}
 
           func.call @simple_mul_abs_negate_workgroup(%binding0, %binding1, %binding2, %dim, %tid) : (memref<?xf32>, memref<?xf32>, memref<?xf32>, index, index) -> ()
           return
diff --git a/samples/custom_dispatch/cpu/embedded/functions.c b/samples/custom_dispatch/cpu/embedded/functions.c
index a962666..495fd6d 100644
--- a/samples/custom_dispatch/cpu/embedded/functions.c
+++ b/samples/custom_dispatch/cpu/embedded/functions.c
@@ -36,12 +36,10 @@
 // `ret = lhs * rhs`
 //
 // Conforms to ABI:
-// #hal.pipeline.layout<push_constants = 1, sets = [
-//   <0, bindings = [
-//       <0, storage_buffer, ReadOnly>,
-//       <1, storage_buffer, ReadOnly>,
-//       <2, storage_buffer>
-//   ]>
+// #hal.pipeline.layout<constants = 1, bindings = [
+//   #hal.pipeline.binding<storage_buffer, ReadOnly>,
+//   #hal.pipeline.binding<storage_buffer, ReadOnly>,
+//   #hal.pipeline.binding<storage_buffer>
 // ]>
 // With a workgroup size of 64x1x1.
 void simple_mul_workgroup(
@@ -64,11 +62,9 @@
 // `rhs *= lhs`
 //
 // Conforms to ABI:
-// #hal.pipeline.layout<push_constants = 1, sets = [
-//   <0, bindings = [
-//       <0, storage_buffer, ReadOnly>,
-//       <1, storage_buffer>
-//   ]>
+// #hal.pipeline.layout<constants = 1, bindings = [
+//   #hal.pipeline.binding<storage_buffer, ReadOnly>,
+//   #hal.pipeline.binding<storage_buffer>
 // ]>
 // With a workgroup size of 64x1x1.
 void simple_mul_inplace_workgroup(
@@ -89,12 +85,10 @@
 // `ret = -|lhs * rhs|`
 //
 // Conforms to ABI:
-// #hal.pipeline.layout<push_constants = 1, sets = [
-//   <0, bindings = [
-//       <0, storage_buffer, ReadOnly>,
-//       <1, storage_buffer, ReadOnly>,
-//       <2, storage_buffer>
-//   ]>
+// #hal.pipeline.layout<constants = 1, bindings = [
+//   #hal.pipeline.binding<storage_buffer, ReadOnly>,
+//   #hal.pipeline.binding<storage_buffer, ReadOnly>,
+//   #hal.pipeline.binding<storage_buffer>
 // ]>
 // With a workgroup size of 64x1x1.
 void simple_mul_abs_negate_workgroup(
diff --git a/samples/custom_dispatch/cpu/mlp_plugin/mlp.mlir b/samples/custom_dispatch/cpu/mlp_plugin/mlp.mlir
index 2aa5943..f2aa6ae 100644
--- a/samples/custom_dispatch/cpu/mlp_plugin/mlp.mlir
+++ b/samples/custom_dispatch/cpu/mlp_plugin/mlp.mlir
@@ -1,4 +1,4 @@
-// RUN: iree-compile --iree-preprocessing-transform-spec-filename=%p/mlp_spec.mlir  %s | \
+// RUN: iree-compile --iree-preprocessing-transform-spec-filename=%p/mlp_spec.mlir %s | \
 // RUN: iree-run-module --device=local-sync \
 // RUN:     --executable_plugin=$IREE_BINARY_DIR/samples/custom_dispatch/cpu/mlp_plugin/mlp_plugin$IREE_DYLIB_EXT \
 // RUN:     --module=- \
@@ -25,7 +25,6 @@
 
 #map = affine_map<(d0, d1) -> (d0, d1)>
 module @example attributes {hal.device.targets = [#cpu_target]} {
-
   // CHECK-LABEL: EXEC @mlp_invocation
   //       CHECK: [Plugin]: M = 2, N = 2, K = 2, doRelu = 1
   //       CHECK: 2x2xf32=[-12 -0][-0 -12]
diff --git a/samples/custom_dispatch/cpu/mlp_plugin/mlp_plugin.c b/samples/custom_dispatch/cpu/mlp_plugin/mlp_plugin.c
index 88417a6..4fe4c4e 100644
--- a/samples/custom_dispatch/cpu/mlp_plugin/mlp_plugin.c
+++ b/samples/custom_dispatch/cpu/mlp_plugin/mlp_plugin.c
@@ -32,12 +32,10 @@
 // `ret = mlp(lhs, rhs)`
 //
 // Conforms to ABI:
-// #hal.pipeline.layout<push_constants = 1, sets = [
-//   <0, bindings = [
-//       <0, storage_buffer, ReadOnly>,
-//       <1, storage_buffer, ReadOnly>,
-//       <2, storage_buffer>
-//   ]>
+// #hal.pipeline.layout<constants = 1, bindings = [
+//   #hal.pipeline.binding<storage_buffer, ReadOnly>,
+//   #hal.pipeline.binding<storage_buffer, ReadOnly>,
+//   #hal.pipeline.binding<storage_buffer>
 // ]>
 // With a workgroup size of 64x1x1.
 //
diff --git a/samples/custom_dispatch/cpu/plugin/standalone_plugin.c b/samples/custom_dispatch/cpu/plugin/standalone_plugin.c
index 038666f..0305fb1 100644
--- a/samples/custom_dispatch/cpu/plugin/standalone_plugin.c
+++ b/samples/custom_dispatch/cpu/plugin/standalone_plugin.c
@@ -24,12 +24,10 @@
 // `ret = lhs * rhs`
 //
 // Conforms to ABI:
-// #hal.pipeline.layout<push_constants = 1, sets = [
-//   <0, bindings = [
-//       <0, storage_buffer, ReadOnly>,
-//       <1, storage_buffer, ReadOnly>,
-//       <2, storage_buffer>
-//   ]>
+// #hal.pipeline.layout<constants = 1, bindings = [
+//   #hal.pipeline.binding<storage_buffer, ReadOnly>,
+//   #hal.pipeline.binding<storage_buffer, ReadOnly>,
+//   #hal.pipeline.binding<storage_buffer>
 // ]>
 // With a workgroup size of 64x1x1.
 //
diff --git a/samples/custom_dispatch/cpu/plugin/system_plugin.c b/samples/custom_dispatch/cpu/plugin/system_plugin.c
index 816daac..86ac02f 100644
--- a/samples/custom_dispatch/cpu/plugin/system_plugin.c
+++ b/samples/custom_dispatch/cpu/plugin/system_plugin.c
@@ -42,12 +42,10 @@
 // `ret = lhs * rhs`
 //
 // Conforms to ABI:
-// #hal.pipeline.layout<push_constants = 1, sets = [
-//   <0, bindings = [
-//       <0, storage_buffer, ReadOnly>,
-//       <1, storage_buffer, ReadOnly>,
-//       <2, storage_buffer>
-//   ]>
+// #hal.pipeline.layout<constants = 1, bindings = [
+//   #hal.pipeline.binding<storage_buffer, ReadOnly>,
+//   #hal.pipeline.binding<storage_buffer, ReadOnly>,
+//   #hal.pipeline.binding<storage_buffer>
 // ]>
 // With a workgroup size of 64x1x1.
 //
diff --git a/samples/custom_dispatch/cuda/kernels/README.md b/samples/custom_dispatch/cuda/kernels/README.md
index ac4e99d..51decc8 100644
--- a/samples/custom_dispatch/cuda/kernels/README.md
+++ b/samples/custom_dispatch/cuda/kernels/README.md
@@ -71,12 +71,10 @@
       ]
     }>
     hal.executable.export public @simple_mul ordinal(0)
-        layout(#hal.pipeline.layout<push_constants = 1, sets = [
-          <0, bindings = [
-              <0, storage_buffer, ReadOnly>,
-              <1, storage_buffer, ReadOnly>,
-              <2, storage_buffer>
-          ]>
+        layout(#hal.pipeline.layout<constants = 1, bindings = [
+          #hal.pipeline.binding<storage_buffer, ReadOnly>,
+          #hal.pipeline.binding<storage_buffer, ReadOnly>,
+          #hal.pipeline.binding<storage_buffer>
         ]>) attributes {workgroup_size = [64 : index, 1 : index, 1 : index]} {
     ^bb0(%device: !hal.device, %workload: index):
       %x = affine.apply affine_map<()[s0] -> (s0 ceildiv 64)>()[%workload]
diff --git a/samples/custom_dispatch/cuda/kernels/example.mlir b/samples/custom_dispatch/cuda/kernels/example.mlir
index 62e49c6..aaba1af 100644
--- a/samples/custom_dispatch/cuda/kernels/example.mlir
+++ b/samples/custom_dispatch/cuda/kernels/example.mlir
@@ -75,27 +75,14 @@
     // The layout defines the required bindings and push constants and can be
     // thought of as the function signature.
     hal.executable.export public @simple_mul ordinal(0)
-        layout(#hal.pipeline.layout<push_constants = 1, sets = [
-          <0, bindings = [
-              <0, storage_buffer, ReadOnly>,
-              <1, storage_buffer, ReadOnly>,
-              <2, storage_buffer>
-          ]>
+        layout(#hal.pipeline.layout<constants = 1, bindings = [
+          #hal.pipeline.binding<storage_buffer, ReadOnly>,
+          #hal.pipeline.binding<storage_buffer, ReadOnly>,
+          #hal.pipeline.binding<storage_buffer>
         ]>) attributes {
       // Certain backends (like CUDA) require a workgroup size (aka block
       // size) to be defined ahead of time.
-      workgroup_size = [64 : index, 1 : index, 1 : index],
-      // Bindings are automatically inferred when possible as part of the ABI
-      // but can be overridden if the user wants to use features such as sparse
-      // bindings or multiple descriptor sets. To do so the
-      // `hal.interface.bindings` attribute can be added to a dispatch op as
-      // follows mapping tensor operands/results to the pipeline layout
-      // sets/bindings:
-      hal.interface.bindings = [
-        #hal.interface.binding<0, 0>,
-        #hal.interface.binding<0, 1>,
-        #hal.interface.binding<0, 2>
-      ]
+      workgroup_size = [64 : index, 1 : index, 1 : index]
     } {
     ^bb0(%device: !hal.device, %workload: index):
       // This host function is used to compute the XYZ workgroup count
@@ -110,11 +97,9 @@
 
     // Similar to the above but in-place by using a read/write binding.
     hal.executable.export public @simple_mul_inplace ordinal(1)
-        layout(#hal.pipeline.layout<push_constants = 1, sets = [
-          <0, bindings = [
-              <0, storage_buffer, ReadOnly>,
-              <1, storage_buffer>
-          ]>
+        layout(#hal.pipeline.layout<constants = 1, bindings = [
+          #hal.pipeline.binding<storage_buffer, ReadOnly>,
+          #hal.pipeline.binding<storage_buffer>
         ]>) attributes {
       workgroup_size = [64 : index, 1 : index, 1 : index]
     } {
@@ -153,10 +138,6 @@
     %1 = arith.addf %0, %arg1 : tensor<?xf32>
 
     // Dispatch an in-place `rhs *= lhs` kernel.
-    //
-    // Note that we don't declare the hal.interface.bindings and let them be
-    // inferred - this only works when either specifying the variant that has
-    // a pipeline layout defined or all variants have the same pipeline layouts.
     %2 = flow.dispatch @executable::@simple_mul_inplace[%dim](%dim_i32, %0, %1) : (i32, tensor<?xf32>{%dim}, tensor<?xf32>{%dim}) -> %1{%dim}
 
     // CHECK: 8xf32=96 96 96 96 96 96 96 96
diff --git a/samples/custom_dispatch/cuda/kernels/kernels.cu b/samples/custom_dispatch/cuda/kernels/kernels.cu
index 250f676..8bca00e 100644
--- a/samples/custom_dispatch/cuda/kernels/kernels.cu
+++ b/samples/custom_dispatch/cuda/kernels/kernels.cu
@@ -36,12 +36,10 @@
 // `ret = lhs * rhs`
 //
 // Conforms to ABI:
-// #hal.pipeline.layout<push_constants = 1, sets = [
-//   <0, bindings = [
-//       <0, storage_buffer, ReadOnly>,
-//       <1, storage_buffer, ReadOnly>,
-//       <2, storage_buffer>
-//   ]>
+// #hal.pipeline.layout<constants = 1, bindings = [
+//   #hal.pipeline.binding<storage_buffer, ReadOnly>,
+//   #hal.pipeline.binding<storage_buffer, ReadOnly>,
+//   #hal.pipeline.binding<storage_buffer>
 // ]>
 // workgroup_size = [64 : index, 1 : index, 1 : index]
 extern "C" __global__ void simple_mul(const float* __restrict__ binding0,
@@ -56,11 +54,9 @@
 // `rhs *= lhs`
 //
 // Conforms to ABI:
-// #hal.pipeline.layout<push_constants = 1, sets = [
-//   <0, bindings = [
-//       <0, storage_buffer, ReadOnly>,
-//       <1, storage_buffer>
-//   ]>
+// #hal.pipeline.layout<constants = 1, bindings = [
+//   #hal.pipeline.binding<storage_buffer, ReadOnly>,
+//   #hal.pipeline.binding<storage_buffer>
 // ]>
 // workgroup_size = [64 : index, 1 : index, 1 : index]
 extern "C" __global__ void simple_mul_inplace(
diff --git a/samples/custom_dispatch/hip/kernels/example.mlir b/samples/custom_dispatch/hip/kernels/example.mlir
index 6a2148c..a24a999 100644
--- a/samples/custom_dispatch/hip/kernels/example.mlir
+++ b/samples/custom_dispatch/hip/kernels/example.mlir
@@ -66,27 +66,14 @@
     // The layout defines the required bindings and push constants and can be
     // thought of as the function signature.
     hal.executable.export public @simple_mul ordinal(0)
-        layout(#hal.pipeline.layout<push_constants = 1, sets = [
-          <0, bindings = [
-              <0, storage_buffer, ReadOnly>,
-              <1, storage_buffer, ReadOnly>,
-              <2, storage_buffer>
-          ]>
+        layout(#hal.pipeline.layout<constants = 1, bindings = [
+          #hal.pipeline.binding<storage_buffer, ReadOnly>,
+          #hal.pipeline.binding<storage_buffer, ReadOnly>,
+          #hal.pipeline.binding<storage_buffer>
         ]>) attributes {
       // Certain backends (like ROCM) require a workgroup size (aka block
       // size) to be defined ahead of time.
-      workgroup_size = [64 : index, 1 : index, 1 : index],
-      // Bindings are automatically inferred when possible as part of the ABI
-      // but can be overridden if the user wants to use features such as sparse
-      // bindings or multiple descriptor sets. To do so the
-      // `hal.interface.bindings` attribute can be added to a dispatch op as
-      // follows mapping tensor operands/results to the pipeline layout
-      // sets/bindings:
-      hal.interface.bindings = [
-        #hal.interface.binding<0, 0>,
-        #hal.interface.binding<0, 1>,
-        #hal.interface.binding<0, 2>
-      ]
+      workgroup_size = [64 : index, 1 : index, 1 : index]
     } {
     ^bb0(%device: !hal.device, %workload: index):
       // This host function is used to compute the XYZ workgroup count
@@ -101,11 +88,9 @@
 
     // Similar to the above but in-place by using a read/write binding.
     hal.executable.export public @simple_mul_inplace ordinal(1)
-        layout(#hal.pipeline.layout<push_constants = 1, sets = [
-          <0, bindings = [
-              <0, storage_buffer, ReadOnly>,
-              <1, storage_buffer>
-          ]>
+        layout(#hal.pipeline.layout<constants = 1, bindings = [
+          #hal.pipeline.binding<storage_buffer, ReadOnly>,
+          #hal.pipeline.binding<storage_buffer>
         ]>) attributes {
       workgroup_size = [64 : index, 1 : index, 1 : index]
     } {
diff --git a/samples/custom_dispatch/hip/kernels/kernels.cu b/samples/custom_dispatch/hip/kernels/kernels.cu
index 87e29ee..cc4762b 100644
--- a/samples/custom_dispatch/hip/kernels/kernels.cu
+++ b/samples/custom_dispatch/hip/kernels/kernels.cu
@@ -38,12 +38,10 @@
 // `ret = lhs * rhs`
 //
 // Conforms to ABI:
-// #hal.pipeline.layout<push_constants = 1, sets = [
-//   <0, bindings = [
-//       <0, storage_buffer, ReadOnly>,
-//       <1, storage_buffer, ReadOnly>,
-//       <2, storage_buffer>
-//   ]>
+// #hal.pipeline.layout<constants = 1, bindings = [
+//   #hal.pipeline.binding<storage_buffer, ReadOnly>,
+//   #hal.pipeline.binding<storage_buffer, ReadOnly>,
+//   #hal.pipeline.binding<storage_buffer>
 // ]>
 // workgroup_size = [64 : index, 1 : index, 1 : index]
 extern "C" __global__ void simple_mul(const float* __restrict__ binding0,
@@ -58,11 +56,9 @@
 // `rhs *= lhs`
 //
 // Conforms to ABI:
-// #hal.pipeline.layout<push_constants = 1, sets = [
-//   <0, bindings = [
-//       <0, storage_buffer, ReadOnly>,
-//       <1, storage_buffer>
-//   ]>
+// #hal.pipeline.layout<constants = 1, bindings = [
+//   #hal.pipeline.binding<storage_buffer, ReadOnly>,
+//   #hal.pipeline.binding<storage_buffer>
 // ]>
 // workgroup_size = [64 : index, 1 : index, 1 : index]
 extern "C" __global__ void simple_mul_inplace(
diff --git a/samples/custom_dispatch/vulkan/shaders/README.md b/samples/custom_dispatch/vulkan/shaders/README.md
index fb7406b..7e4129f 100644
--- a/samples/custom_dispatch/vulkan/shaders/README.md
+++ b/samples/custom_dispatch/vulkan/shaders/README.md
@@ -75,12 +75,10 @@
       ]
     }>
     hal.executable.export public @simple_mul ordinal(0)
-        layout(#hal.pipeline.layout<push_constants = 1, sets = [
-          <0, bindings = [
-              <0, storage_buffer, ReadOnly>,
-              <1, storage_buffer, ReadOnly>,
-              <2, storage_buffer>
-          ]>
+        layout(#hal.pipeline.layout<constants = 1, bindings = [
+          #hal.pipeline.binding<storage_buffer, ReadOnly>,
+          #hal.pipeline.binding<storage_buffer, ReadOnly>,
+          #hal.pipeline.binding<storage_buffer>
         ]>) {
     ^bb0(%device: !hal.device, %workload: index):
       %x = affine.apply affine_map<()[s0] -> (s0 ceildiv 64)>()[%workload]
diff --git a/samples/custom_dispatch/vulkan/shaders/example.mlir b/samples/custom_dispatch/vulkan/shaders/example.mlir
index d9cb5e1..17e2e45 100644
--- a/samples/custom_dispatch/vulkan/shaders/example.mlir
+++ b/samples/custom_dispatch/vulkan/shaders/example.mlir
@@ -76,25 +76,11 @@
     // The layout defines the required bindings and push constants and can be
     // thought of as the function signature.
     hal.executable.export public @main ordinal(0)
-        layout(#hal.pipeline.layout<push_constants = 1, sets = [
-          <0, bindings = [
-              <0, storage_buffer, ReadOnly>,
-              <1, storage_buffer, ReadOnly>,
-              <2, storage_buffer>
-          ]>
-        ]>) attributes {
-          // Bindings are automatically inferred when possible as part of the
-          // ABI but can be overridden if the user wants to use features such as
-          // sparse bindings or multiple descriptor sets. To do so the
-          // `hal.interface.bindings` attribute can be added to an export op as
-          // follows mapping tensor operands/results to the pipeline layout
-          // sets/bindings:
-          hal.interface.bindings = [
-            #hal.interface.binding<0, 0>,
-            #hal.interface.binding<0, 1>,
-            #hal.interface.binding<0, 2>
-          ]
-        } {
+        layout(#hal.pipeline.layout<constants = 1, bindings = [
+          #hal.pipeline.binding<storage_buffer, ReadOnly>,
+          #hal.pipeline.binding<storage_buffer, ReadOnly>,
+          #hal.pipeline.binding<storage_buffer>
+        ]>) {
     ^bb0(%device: !hal.device, %workload: index):
       // This host function is used to compute the XYZ workgroup count
       // dispatched at runtime. It can query the %device for capabilities
@@ -119,11 +105,9 @@
   } {
     // Similar to the above but in-place by using a read/write binding.
     hal.executable.export public @main ordinal(0)
-        layout(#hal.pipeline.layout<push_constants = 1, sets = [
-          <0, bindings = [
-              <0, storage_buffer, ReadOnly>,
-              <1, storage_buffer>
-          ]>
+        layout(#hal.pipeline.layout<constants = 1, bindings = [
+          #hal.pipeline.binding<storage_buffer, ReadOnly>,
+          #hal.pipeline.binding<storage_buffer>
         ]>) {
     ^bb0(%device: !hal.device, %workload: index):
       %x = affine.apply affine_map<()[s0] -> (s0 ceildiv 64)>()[%workload]
diff --git a/samples/custom_dispatch/vulkan/shaders/example_inline.mlir b/samples/custom_dispatch/vulkan/shaders/example_inline.mlir
index 2882134..d6ef84f 100644
--- a/samples/custom_dispatch/vulkan/shaders/example_inline.mlir
+++ b/samples/custom_dispatch/vulkan/shaders/example_inline.mlir
@@ -67,24 +67,11 @@
       }
       // The layout defines the required bindings and push constants and can be
       // thought of as the function signature.
-      layout(#hal.pipeline.layout<push_constants = 1, sets = [
-        <0, bindings = [
-            <0, storage_buffer, ReadOnly>,
-            <1, storage_buffer, ReadOnly>,
-            <2, storage_buffer>
-        ]>
+      layout(#hal.pipeline.layout<constants = 1, bindings = [
+        #hal.pipeline.binding<storage_buffer, ReadOnly>,
+        #hal.pipeline.binding<storage_buffer, ReadOnly>,
+        #hal.pipeline.binding<storage_buffer>
       ]>)
-      // Bindings are automatically inferred when possible as part of the ABI
-      // but can be overridden if the user wants to use features such as sparse
-      // bindings or multiple descriptor sets. To do so the
-      // `hal.interface.bindings` attribute can be added to a dispatch op as
-      // follows mapping tensor operands/results to the pipeline layout
-      // sets/bindings:
-      bindings([
-        #hal.interface.binding<0, 0>,
-        #hal.interface.binding<0, 1>,
-        #hal.interface.binding<0, 2>
-      ])
       // Object files linked into the executable.
       // Certain backends (today) support either wholesale definition or linking
       // of partial objects for imports used by generated code. Each compilation
diff --git a/samples/custom_dispatch/vulkan/shaders/example_transform_spec.mlir b/samples/custom_dispatch/vulkan/shaders/example_transform_spec.mlir
index 8e23206..6d2cba8 100644
--- a/samples/custom_dispatch/vulkan/shaders/example_transform_spec.mlir
+++ b/samples/custom_dispatch/vulkan/shaders/example_transform_spec.mlir
@@ -36,15 +36,13 @@
         %c1_0 = arith.constant 1 : index
         hal.return %c1_0, %c1_0, %c1_0 : index, index, index
       }
-      layout(#hal.pipeline.layout<push_constants = 1, sets = [
-        <0, bindings = [
-            <0, storage_buffer, ReadOnly>,
-            <1, storage_buffer>
-        ]>
+      layout(#hal.pipeline.layout<constants = 1, bindings = [
+        #hal.pipeline.binding<storage_buffer, ReadOnly>,
+        #hal.pipeline.binding<storage_buffer>
       ]>)
       bindings([
-        #hal.interface.binding<0, 0>,
-        #hal.interface.binding<0, 1>
+        #hal.interface.binding<0>,
+        #hal.interface.binding<1>
       ])
       objects({
         #spirv_target ordinal(0) = [
diff --git a/samples/custom_dispatch/vulkan/shaders/one_workgroup_argmax_subgroup_f32.glsl b/samples/custom_dispatch/vulkan/shaders/one_workgroup_argmax_subgroup_f32.glsl
index 5a9c6f6..b5a563a 100644
--- a/samples/custom_dispatch/vulkan/shaders/one_workgroup_argmax_subgroup_f32.glsl
+++ b/samples/custom_dispatch/vulkan/shaders/one_workgroup_argmax_subgroup_f32.glsl
@@ -7,11 +7,9 @@
 // `ret = argmax(in)`
 //
 // Conforms to ABI:
-// #hal.pipeline.layout<push_constants = 1, sets = [
-//   <0, bindings = [
-//       <0, storage_buffer, ReadOnly>,
-//       <1, storage_buffer>
-//   ]>
+// #hal.pipeline.layout<constants = 1, bindings = [
+//   #hal.pipeline.binding<storage_buffer, ReadOnly>,
+//   #hal.pipeline.binding<storage_buffer>
 // ]>
 
 #version 450 core
diff --git a/samples/custom_dispatch/vulkan/shaders/simple_mul.glsl b/samples/custom_dispatch/vulkan/shaders/simple_mul.glsl
index ec40146..f2418a6 100644
--- a/samples/custom_dispatch/vulkan/shaders/simple_mul.glsl
+++ b/samples/custom_dispatch/vulkan/shaders/simple_mul.glsl
@@ -7,12 +7,10 @@
 // `ret = lhs * rhs`
 //
 // Conforms to ABI:
-// #hal.pipeline.layout<push_constants = 1, sets = [
-//   <0, bindings = [
-//       <0, storage_buffer, ReadOnly>,
-//       <1, storage_buffer, ReadOnly>,
-//       <2, storage_buffer>
-//   ]>
+// #hal.pipeline.layout<constants = 1, bindings = [
+//   #hal.pipeline.binding<storage_buffer, ReadOnly>,
+//   #hal.pipeline.binding<storage_buffer, ReadOnly>,
+//   #hal.pipeline.binding<storage_buffer>
 // ]>
 
 #version 450
diff --git a/samples/custom_dispatch/vulkan/shaders/simple_mul_inplace.glsl b/samples/custom_dispatch/vulkan/shaders/simple_mul_inplace.glsl
index adc6d82..b24d9a2 100644
--- a/samples/custom_dispatch/vulkan/shaders/simple_mul_inplace.glsl
+++ b/samples/custom_dispatch/vulkan/shaders/simple_mul_inplace.glsl
@@ -7,11 +7,9 @@
 // `rhs *= lhs`
 //
 // Conforms to ABI:
-// #hal.pipeline.layout<push_constants = 1, sets = [
-//   <0, bindings = [
-//       <0, storage_buffer, ReadOnly>,
-//       <1, storage_buffer>
-//   ]>
+// #hal.pipeline.layout<constants = 1, bindings = [
+//   #hal.pipeline.binding<storage_buffer, ReadOnly>,
+//   #hal.pipeline.binding<storage_buffer>
 // ]>
 
 #version 450
diff --git a/samples/transform_dialect/example_module.mlir b/samples/transform_dialect/example_module.mlir
index 2fb3498..40f2fee 100644
--- a/samples/transform_dialect/example_module.mlir
+++ b/samples/transform_dialect/example_module.mlir
@@ -29,9 +29,19 @@
   compute = fp32|int32, storage = b32, subgroup = none, dot = none, mma = [], subgroup_size_choices = [64, 64],
   max_workgroup_sizes = [128, 128, 64], max_thread_count_per_workgroup = 128, max_workgroup_memory_bytes = 16384, max_workgroup_counts = [65535, 65535, 65535]>>
 
-#pipeline_layout_0 = #hal.pipeline.layout<push_constants = 0, sets = [<0, bindings = [<0, storage_buffer, ReadOnly>, <1, storage_buffer>]>]>
-#pipeline_layout_1 = #hal.pipeline.layout<push_constants = 0, sets = [<0, bindings = [<0, storage_buffer, ReadOnly>, <1, storage_buffer, ReadOnly>, <2, storage_buffer>]>]>
-#pipeline_layout_2 = #hal.pipeline.layout<push_constants = 0, sets = [<0, bindings = [<0, storage_buffer, ReadOnly>, <1, storage_buffer>]>]>
+#pipeline_layout_0 = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer, ReadOnly>,
+  #hal.pipeline.binding<storage_buffer>
+]>
+#pipeline_layout_1 = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer, ReadOnly>,
+  #hal.pipeline.binding<storage_buffer, ReadOnly>,
+  #hal.pipeline.binding<storage_buffer>
+]>
+#pipeline_layout_2 = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer, ReadOnly>,
+  #hal.pipeline.binding<storage_buffer>
+]>
 
 module attributes {
   hal.device.targets = [
@@ -52,8 +62,8 @@
       builtin.module {
         func.func @example_module_dispatch_0_generic_80_f32() {
           %c0 = arith.constant 0 : index
-          %0 = hal.interface.binding.subspan layout(#pipeline_layout_0) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<80xf32>>
-          %1 = hal.interface.binding.subspan layout(#pipeline_layout_0) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<80xf32>>
+          %0 = hal.interface.binding.subspan layout(#pipeline_layout_0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<80xf32>>
+          %1 = hal.interface.binding.subspan layout(#pipeline_layout_0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<80xf32>>
           %2 = flow.dispatch.tensor.load %0, offsets = [0], sizes = [80], strides = [1] : !flow.dispatch.tensor<readonly:tensor<80xf32>> -> tensor<80xf32>
           %3 = tensor.empty() : tensor<80xf32>
           %4 = linalg.generic {indexing_maps = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> (d0)>], iterator_types = ["parallel"]} ins(%2 : tensor<80xf32>) outs(%3 : tensor<80xf32>) {
@@ -77,9 +87,9 @@
       builtin.module {
         func.func @example_module_dispatch_1_matmul_16x16x5_f32() {
           %c0 = arith.constant 0 : index
-          %0 = hal.interface.binding.subspan layout(#pipeline_layout_1) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<16x5xf32>>
-          %1 = hal.interface.binding.subspan layout(#pipeline_layout_1) set(0) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<5x16xf32>>
-          %2 = hal.interface.binding.subspan layout(#pipeline_layout_1) set(0) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<16x16xf32>>
+          %0 = hal.interface.binding.subspan layout(#pipeline_layout_1) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<16x5xf32>>
+          %1 = hal.interface.binding.subspan layout(#pipeline_layout_1) binding(1) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<5x16xf32>>
+          %2 = hal.interface.binding.subspan layout(#pipeline_layout_1) binding(2) alignment(64) offset(%c0) : !flow.dispatch.tensor<readwrite:tensor<16x16xf32>>
           %3 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [16, 5], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<16x5xf32>> -> tensor<16x5xf32>
           %4 = flow.dispatch.tensor.load %1, offsets = [0, 0], sizes = [5, 16], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<5x16xf32>> -> tensor<5x16xf32>
           %5 = flow.dispatch.tensor.load %2, offsets = [0, 0], sizes = [16, 16], strides = [1, 1] : !flow.dispatch.tensor<readwrite:tensor<16x16xf32>> -> tensor<16x16xf32>
@@ -100,8 +110,8 @@
       builtin.module {
         func.func @example_module_dispatch_2_generic_16x16_f32() {
           %c0 = arith.constant 0 : index
-          %0 = hal.interface.binding.subspan layout(#pipeline_layout_2) set(0) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<16x16xf32>>
-          %1 = hal.interface.binding.subspan layout(#pipeline_layout_2) set(0) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<16xf32>>
+          %0 = hal.interface.binding.subspan layout(#pipeline_layout_2) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : !flow.dispatch.tensor<readonly:tensor<16x16xf32>>
+          %1 = hal.interface.binding.subspan layout(#pipeline_layout_2) binding(1) alignment(64) offset(%c0) : !flow.dispatch.tensor<writeonly:tensor<16xf32>>
           %2 = flow.dispatch.tensor.load %0, offsets = [0, 0], sizes = [16, 16], strides = [1, 1] : !flow.dispatch.tensor<readonly:tensor<16x16xf32>> -> tensor<16x16xf32>
           %3 = tensor.empty() : tensor<16xf32>
           %4 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0)>], iterator_types = ["parallel", "reduction"]} ins(%2 : tensor<16x16xf32>) outs(%3 : tensor<16xf32>) {
diff --git a/tests/compiler_driver/executable_benchmarks.mlir b/tests/compiler_driver/executable_benchmarks.mlir
index b1ee0ec..f1eb326 100644
--- a/tests/compiler_driver/executable_benchmarks.mlir
+++ b/tests/compiler_driver/executable_benchmarks.mlir
@@ -18,4 +18,4 @@
 // CHECK: vm.rodata private @abs_dispatch_0_vmvx_bytecode_fb
 // CHECK: vm.func private @abs_dispatch_0_vmvx_bytecode_fb_abs_dispatch_0{{.+}}(%arg0: i32)
 // CHECK-SAME: iree.reflection = {iree.benchmark = "dispatch"}
-// CHECK: vm.call @hal.command_buffer.dispatch
+// CHECK: vm.call.variadic @hal.command_buffer.dispatch
diff --git a/tests/compiler_driver/hal_executable.mlir b/tests/compiler_driver/hal_executable.mlir
index d3e9ed0..2e7dc2c 100644
--- a/tests/compiler_driver/hal_executable.mlir
+++ b/tests/compiler_driver/hal_executable.mlir
@@ -7,12 +7,10 @@
 // push constants available and the descriptor sets and their bindings.
 // Push constants are dense (0..N) while the sets/bindings are sparse and may
 // contain unused or omitted entries.
-#pipeline_layout = #hal.pipeline.layout<push_constants = 2, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<constants = 2, bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 
 // A single executable source definition is allowed per translation in this mode
@@ -39,9 +37,9 @@
       // Bindings are dereferenced by their set/binding ordinal and may have a
       // byte offset from the base of the descriptor. Alignment information when
       // available can help code generation emit better loads/stores.
-      %s0b0 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32) : !flow.dispatch.tensor<readonly:tensor<4xf32>>
-      %s0b1 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(32) offset(%offset) : !flow.dispatch.tensor<readonly:tensor<4xf32>>
-      %s0b2 = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(32) : !flow.dispatch.tensor<writeonly:tensor<4xf32>>
+      %s0b0 = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32) : !flow.dispatch.tensor<readonly:tensor<4xf32>>
+      %s0b1 = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(32) offset(%offset) : !flow.dispatch.tensor<readonly:tensor<4xf32>>
+      %s0b2 = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(32) : !flow.dispatch.tensor<writeonly:tensor<4xf32>>
 
       // Workgroup information can be queried from the interface.
       %workgroup_id_x = hal.interface.workgroup.id[0] : index
diff --git a/tests/compiler_driver/streams.mlir b/tests/compiler_driver/streams.mlir
index b1fe335..03ebbc3 100644
--- a/tests/compiler_driver/streams.mlir
+++ b/tests/compiler_driver/streams.mlir
@@ -54,7 +54,7 @@
 // CHECK: vm.func private @simple_mul
 func.func @simple_mul(%arg0: tensor<4xf32>, %arg1: tensor<4xf32>) -> tensor<4xf32> {
   %c4 = arith.constant 4 : index
-  // CHECK: vm.call @hal.command_buffer.dispatch
+  // CHECK: vm.call.variadic @hal.command_buffer.dispatch
   %ret0 = flow.dispatch @executable_0::@dispatch[%c4](%arg0, %arg1) : (tensor<4xf32>, tensor<4xf32>) -> tensor<4xf32>
   return %ret0 : tensor<4xf32>
 }
@@ -101,7 +101,7 @@
 // CHECK: vm.func private @simple_mul_inplace
 func.func @simple_mul_inplace(%arg0: tensor<4xf32>, %arg1: tensor<4xf32>) -> tensor<4xf32> {
   %c4 = arith.constant 4 : index
-  // CHECK: vm.call @hal.command_buffer.dispatch
+  // CHECK: vm.call.variadic @hal.command_buffer.dispatch
   %ret0 = flow.dispatch @executable_1::@dispatch[%c4](%arg0, %arg1) : (tensor<4xf32>, tensor<4xf32>) -> %arg0
   return %ret0 : tensor<4xf32>
 }
@@ -155,7 +155,7 @@
   %arg0_dim0 = tensor.dim %arg0, %c0 : tensor<?xf32>
   // CHECK: vm.call @hal.buffer_view.dim
   %arg1_dim0 = tensor.dim %arg1, %c0 : tensor<?xf32>
-  // CHECK: vm.call @hal.command_buffer.dispatch
+  // CHECK: vm.call.variadic @hal.command_buffer.dispatch
   %ret0 = flow.dispatch @executable_2::@dispatch[%arg0_dim0](%arg0, %arg0_dim0, %arg1, %arg1_dim0) : (tensor<?xf32>{%arg0_dim0}, index, tensor<?xf32>{%arg1_dim0}, index) -> tensor<?xf32>{%arg0_dim0}
   return %ret0 : tensor<?xf32>
 }
diff --git a/tests/e2e/stablehlo_ops/CMakeLists.txt b/tests/e2e/stablehlo_ops/CMakeLists.txt
index 4ebc7b5..d5e67db 100644
--- a/tests/e2e/stablehlo_ops/CMakeLists.txt
+++ b/tests/e2e/stablehlo_ops/CMakeLists.txt
@@ -657,8 +657,8 @@
     "divide.mlir"
     "dot.mlir"
     "dot_general.mlir"
-    "dynamic_slice.mlir"
-    "dynamic_update_slice.mlir"
+    # "dynamic_slice.mlir"  # TODO(#13702): update WebGPU to simplified bindings.
+    # "dynamic_update_slice.mlir"  # TODO(#13702): update WebGPU to simplified bindings.
     "exponential.mlir"
     "exponential_fp16.mlir"
     "exponential_minus_one.mlir"
@@ -685,8 +685,8 @@
     "rng_uniform.mlir"
     "round.mlir"
     "rsqrt.mlir"
-    "scatter.mlir"
-    "scatter_dynamic.mlir"
+    # "scatter.mlir"  # TODO(#13702): update WebGPU to simplified bindings.
+    # "scatter_dynamic.mlir"  # TODO(#13702): update WebGPU to simplified bindings.
     "select.mlir"
     "shape_assertion.mlir"
     "sine.mlir"
diff --git a/tools/iree-benchmark-executable-main.c b/tools/iree-benchmark-executable-main.c
index da015df..f5cfb4a 100644
--- a/tools/iree-benchmark-executable-main.c
+++ b/tools/iree-benchmark-executable-main.c
@@ -43,41 +43,29 @@
 // dynamically growable.
 #define IREE_HAL_MAX_EXECUTABLE_CONSTANT_COUNT 512
 // Total number of push constants we (currently) allow any executable to have.
-#define IREE_HAL_MAX_PUSH_CONSTANT_COUNT 64
-// Maximum number of descriptor sets in an pipeline layout.
-#define IREE_HAL_MAX_DESCRIPTOR_SET_COUNT 2
+#define IREE_HAL_MAX_CONSTANT_COUNT 64
 // Total number of bindings we (currently) allow any executable to have.
-#define IREE_HAL_MAX_TOTAL_BINDING_COUNT \
-  (IREE_HAL_MAX_DESCRIPTOR_SET_COUNT * 32)
+#define IREE_HAL_MAX_BINDING_COUNT 64
 
 // Parsed dispatch parameters from flags.
 // Used to construct the dispatch parameters for the benchmark invocation.
 struct {
-  int32_t set_count;
-  struct {
-    // For now we only track the binding counts and assume they are all storage
-    // buffers. When we support more types we'll need an encoding.
-    int32_t binding_count;
-  } sets[IREE_HAL_MAX_DESCRIPTOR_SET_COUNT];
-
   int32_t executable_constant_count;
   union {
     uint32_t ui32;
   } executable_constants[IREE_HAL_MAX_EXECUTABLE_CONSTANT_COUNT];
 
-  int32_t push_constant_count;
+  int32_t constant_count;
   union {
     uint32_t ui32;
-  } push_constants[IREE_HAL_MAX_PUSH_CONSTANT_COUNT];
+  } constants[IREE_HAL_MAX_CONSTANT_COUNT];
 
   int32_t binding_count;
-  iree_string_view_t binding_specs[IREE_HAL_MAX_TOTAL_BINDING_COUNT];
-  char binding_cconv[IREE_HAL_MAX_TOTAL_BINDING_COUNT];
-  iree_hal_descriptor_set_layout_binding_t
-      binding_layouts[IREE_HAL_MAX_TOTAL_BINDING_COUNT];
+  iree_string_view_t binding_specs[IREE_HAL_MAX_BINDING_COUNT];
+  char binding_cconv[IREE_HAL_MAX_BINDING_COUNT];
 } parsed_params = {
     .executable_constant_count = 0,
-    .push_constant_count = 0,
+    .constant_count = 0,
     .binding_count = 0,
 };
 
@@ -117,11 +105,10 @@
                    &parsed_params, executable_constant,
                    "Appends a uint32_t executable constant value.\n");
 
-static iree_status_t parse_push_constant(iree_string_view_t flag_name,
-                                         void* storage,
-                                         iree_string_view_t value) {
-  IREE_ASSERT_LE(parsed_params.push_constant_count + 1,
-                 IREE_ARRAYSIZE(parsed_params.push_constants),
+static iree_status_t parse_constant(iree_string_view_t flag_name, void* storage,
+                                    iree_string_view_t value) {
+  IREE_ASSERT_LE(parsed_params.constant_count + 1,
+                 IREE_ARRAYSIZE(parsed_params.constants),
                  "too many push constants");
   uint32_t value_ui32 = 0;
   if (!iree_string_view_atoi_uint32(value, &value_ui32)) {
@@ -130,27 +117,26 @@
         "invalid push constant value `%.*s`; expects uint32_t", (int)value.size,
         value.data);
   }
-  parsed_params.push_constants[parsed_params.push_constant_count++].ui32 =
-      value_ui32;
+  parsed_params.constants[parsed_params.constant_count++].ui32 = value_ui32;
   return iree_ok_status();
 }
-static void print_push_constant(iree_string_view_t flag_name, void* storage,
-                                FILE* file) {
-  if (parsed_params.push_constant_count == 0) {
+static void print_constant(iree_string_view_t flag_name, void* storage,
+                           FILE* file) {
+  if (parsed_params.constant_count == 0) {
     fprintf(file, "# --%.*s=[integer value]\n", (int)flag_name.size,
             flag_name.data);
     return;
   }
-  for (int32_t i = 0; i < parsed_params.push_constant_count; ++i) {
+  for (int32_t i = 0; i < parsed_params.constant_count; ++i) {
     fprintf(file, "--%.*s=%u", (int)flag_name.size, flag_name.data,
-            parsed_params.push_constants[i].ui32);
-    if (i < parsed_params.push_constant_count - 1) {
+            parsed_params.constants[i].ui32);
+    if (i < parsed_params.constant_count - 1) {
       fprintf(file, "\n");
     }
   }
 }
-IREE_FLAG_CALLBACK(parse_push_constant, print_push_constant, &parsed_params,
-                   push_constant, "Appends a uint32_t push constant value.\n");
+IREE_FLAG_CALLBACK(parse_constant, print_constant, &parsed_params, constant,
+                   "Appends a uint32_t constant value.\n");
 
 static iree_status_t parse_binding(iree_string_view_t flag_name, void* storage,
                                    iree_string_view_t value) {
@@ -160,12 +146,6 @@
   int32_t i = parsed_params.binding_count++;
   parsed_params.binding_specs[i] = value;
   parsed_params.binding_cconv[i] = 'r';
-  // TODO(benvanik): allow for a specification of type/immutability.
-  parsed_params.binding_layouts[i] = (iree_hal_descriptor_set_layout_binding_t){
-      .binding = (uint32_t)i,
-      .type = IREE_HAL_DESCRIPTOR_TYPE_STORAGE_BUFFER,
-      .flags = IREE_HAL_DESCRIPTOR_FLAG_NONE,
-  };
   return iree_ok_status();
 }
 static void print_binding(iree_string_view_t flag_name, void* storage,
@@ -205,7 +185,6 @@
 typedef struct iree_benchmark_executable_args_t {
   iree_hal_device_t* device;
   iree_hal_executable_t* executable;
-  iree_hal_pipeline_layout_t* pipeline_layout;
   const iree_hal_buffer_ref_t* bindings;
   uint32_t workgroup_count[3];
 } iree_benchmark_executable_args_t;
@@ -232,6 +211,33 @@
       .payload_values = &fence_value,
   };
 
+  // Record a command buffer with the dispatches.
+  // The same command buffer recording is reused on each benchmark step.
+  iree_hal_command_buffer_t* command_buffer = NULL;
+  IREE_RETURN_IF_ERROR(iree_hal_command_buffer_create(
+      args->device, IREE_HAL_COMMAND_BUFFER_MODE_DEFAULT,
+      IREE_HAL_COMMAND_CATEGORY_DISPATCH, IREE_HAL_QUEUE_AFFINITY_ANY,
+      /*binding_capacity=*/0, &command_buffer));
+  IREE_RETURN_IF_ERROR(iree_hal_command_buffer_begin(command_buffer));
+  iree_const_byte_span_t constants = iree_make_const_byte_span(
+      &parsed_params.constants[0].ui32,
+      parsed_params.constant_count * sizeof(parsed_params.constants[0]));
+  iree_hal_buffer_ref_list_t bindings = {
+      .count = parsed_params.binding_count,
+      .values = args->bindings,
+  };
+  for (int32_t i = 0; i < FLAG_batch_size; ++i) {
+    IREE_RETURN_IF_ERROR(iree_hal_command_buffer_dispatch(
+        command_buffer, args->executable, FLAG_entry_point,
+        args->workgroup_count, constants, bindings,
+        IREE_HAL_DISPATCH_FLAG_NONE));
+    IREE_RETURN_IF_ERROR(iree_hal_command_buffer_execution_barrier(
+        command_buffer, IREE_HAL_EXECUTION_STAGE_COMMAND_RETIRE,
+        IREE_HAL_EXECUTION_STAGE_COMMAND_ISSUE,
+        IREE_HAL_EXECUTION_BARRIER_FLAG_NONE, 0, NULL, 0, NULL));
+  }
+  IREE_RETURN_IF_ERROR(iree_hal_command_buffer_end(command_buffer));
+
   // Start profiling now - all subsequent device operations will be what the
   // user wants to measure.
   IREE_RETURN_IF_ERROR(iree_hal_begin_profiling_from_flags(args->device));
@@ -244,48 +250,6 @@
   // number of workgroups executed.
   int64_t dispatch_count = 0;
   while (iree_benchmark_keep_running(benchmark_state, FLAG_batch_size)) {
-    // TODO(benvanik): record a secondary command buffer and just replay it
-    // here. This should fix the overhead at just primary command buffer
-    // creation. Most backends don't support reusable command buffers, yet, and
-    // some only support inline execution so we are conservatively doing that.
-    // In the future we should have an option (possibly based on device query)
-    // as to which path to use.
-
-    // Record a command buffer with the dispatches.
-    // Note that today we are doing this inside of the benchmark loop so that
-    // we can use inline execution. This is a boost to devices that support it
-    // like CUDA streams and synchronous CPU executors but a pessimization to
-    // devices that benefit from reusable command buffers like CUDA graphs.
-    // In the future we can add a flag that switches the mode between
-    // reusable and one-shot.
-    iree_hal_command_buffer_t* command_buffer = NULL;
-    IREE_RETURN_IF_ERROR(iree_hal_command_buffer_create(
-        args->device,
-        IREE_HAL_COMMAND_BUFFER_MODE_ONE_SHOT |
-            IREE_HAL_COMMAND_BUFFER_MODE_ALLOW_INLINE_EXECUTION,
-        IREE_HAL_COMMAND_CATEGORY_DISPATCH, IREE_HAL_QUEUE_AFFINITY_ANY,
-        /*binding_capacity=*/0, &command_buffer));
-    IREE_RETURN_IF_ERROR(iree_hal_command_buffer_begin(command_buffer));
-    IREE_RETURN_IF_ERROR(iree_hal_command_buffer_push_constants(
-        command_buffer, args->pipeline_layout, /*offset=*/0,
-        &parsed_params.push_constants[0].ui32,
-        parsed_params.push_constant_count *
-            sizeof(parsed_params.push_constants[0])));
-    IREE_RETURN_IF_ERROR(iree_hal_command_buffer_push_descriptor_set(
-        command_buffer, args->pipeline_layout, /*set=*/0,
-        parsed_params.binding_count, args->bindings));
-    for (int32_t i = 0; i < FLAG_batch_size; ++i) {
-      IREE_RETURN_IF_ERROR(iree_hal_command_buffer_dispatch(
-          command_buffer, args->executable, FLAG_entry_point,
-          args->workgroup_count[0], args->workgroup_count[1],
-          args->workgroup_count[2], IREE_HAL_DISPATCH_FLAG_NONE));
-      IREE_RETURN_IF_ERROR(iree_hal_command_buffer_execution_barrier(
-          command_buffer, IREE_HAL_EXECUTION_STAGE_COMMAND_RETIRE,
-          IREE_HAL_EXECUTION_STAGE_COMMAND_ISSUE,
-          IREE_HAL_EXECUTION_BARRIER_FLAG_NONE, 0, NULL, 0, NULL));
-    }
-    IREE_RETURN_IF_ERROR(iree_hal_command_buffer_end(command_buffer));
-
     // Submit the command buffer; if the device could not start executing while
     // we were recording then this will kick off the execution.
     ++fence_value;
@@ -301,9 +265,6 @@
 
     iree_benchmark_pause_timing(benchmark_state);
 
-    // Don't count cleanup time in the benchmark.
-    iree_hal_command_buffer_release(command_buffer);
-
     // Accumulate the total number of dispatches executed.
     dispatch_count += FLAG_batch_size;
 
@@ -325,6 +286,7 @@
                               args->workgroup_count[2];
   iree_benchmark_set_items_processed(benchmark_state, total_invocations);
 
+  iree_hal_command_buffer_release(command_buffer);
   iree_hal_semaphore_release(fence_semaphore);
 
   return iree_ok_status();
@@ -389,7 +351,7 @@
       (iree_string_view_list_t){parsed_params.binding_count,
                                 parsed_params.binding_specs},
       device, device_allocator, host_allocator, &binding_list));
-  iree_hal_buffer_ref_t bindings[IREE_HAL_MAX_TOTAL_BINDING_COUNT];
+  iree_hal_buffer_ref_t bindings[IREE_HAL_MAX_BINDING_COUNT];
   for (iree_host_size_t i = 0; i < parsed_params.binding_count; ++i) {
     iree_vm_ref_t value = iree_vm_ref_null();
     IREE_RETURN_IF_ERROR(iree_vm_list_get_ref_assign(binding_list, i, &value));
@@ -406,7 +368,6 @@
           i);
     }
     bindings[i] = iree_hal_make_buffer_ref(buffer, 0, IREE_WHOLE_BUFFER);
-    bindings[i].ordinal = i;
   }
 
   // Setup the specification used to perform the executable load.
@@ -435,19 +396,6 @@
       iree_make_cstring_view(FLAG_executable_format);
   executable_params.executable_data = file_contents->const_buffer;
 
-  // Setup the layouts defining how each entry point is interpreted.
-  iree_hal_pipeline_layout_t* pipeline_layout = NULL;
-  iree_hal_descriptor_set_layout_t* descriptor_set_layout = NULL;
-  IREE_RETURN_IF_ERROR(iree_hal_descriptor_set_layout_create(
-      device, IREE_HAL_DESCRIPTOR_SET_LAYOUT_FLAG_NONE,
-      parsed_params.binding_count, parsed_params.binding_layouts,
-      &descriptor_set_layout));
-  IREE_RETURN_IF_ERROR(iree_hal_pipeline_layout_create(
-      device, parsed_params.push_constant_count,
-      /*set_layout_count=*/1, &descriptor_set_layout, &pipeline_layout));
-  executable_params.pipeline_layout_count = 1;
-  executable_params.pipeline_layouts = &pipeline_layout;
-
   // Executable-level constants allow us to perform some basic load-time value
   // propagation - usually dependent on device features or tuning parameters.
   executable_params.constant_count = parsed_params.executable_constant_count;
@@ -468,7 +416,6 @@
     args[i] = (iree_benchmark_executable_args_t){
         .device = device,
         .executable = executable,
-        .pipeline_layout = pipeline_layout,
         .bindings = bindings,
         .workgroup_count = {1, 1, 1},
     };
@@ -495,8 +442,6 @@
 
   iree_vm_list_release(binding_list);
   iree_hal_executable_release(executable);
-  iree_hal_descriptor_set_layout_release(descriptor_set_layout);
-  iree_hal_pipeline_layout_release(pipeline_layout);
   iree_file_contents_free(file_contents);
   iree_hal_executable_cache_release(executable_cache);
   iree_hal_device_release(device);
diff --git a/tools/test/compile_to_phase.mlir b/tools/test/compile_to_phase.mlir
index f1861a0..db22ffa 100644
--- a/tools/test/compile_to_phase.mlir
+++ b/tools/test/compile_to_phase.mlir
@@ -41,12 +41,12 @@
 
 // RUN: iree-compile --compile-to=vm --iree-hal-target-device=local --iree-hal-local-target-device-backends=vmvx %s | FileCheck %s --check-prefix=VM-PHASE
 // VM-PHASE: vm.rodata private @abs_dispatch_0
-// VM-PHASE: vm.call @hal.command_buffer.dispatch
+// VM-PHASE: vm.call.variadic @hal.command_buffer.dispatch
 
 // RUN: iree-compile --output-format=vm-asm --compile-to=end --iree-hal-target-device=local --iree-hal-local-target-device-backends=vmvx %s | FileCheck %s --check-prefix=END-PHASE
 // RUN: iree-compile --output-format=vm-asm --iree-hal-target-device=local --iree-hal-local-target-device-backends=vmvx %s | FileCheck %s --check-prefix=END-PHASE
 // END-PHASE: vm.rodata private @abs_dispatch_0
-// END-PHASE: vm.call @hal.command_buffer.dispatch
+// END-PHASE: vm.call.variadic @hal.command_buffer.dispatch
 
 func.func @abs(%input : tensor<f32>) -> (tensor<f32>) {
   %result = math.absf %input : tensor<f32>
diff --git a/tools/test/iree-benchmark-executable.mlir b/tools/test/iree-benchmark-executable.mlir
index 2950514..be8e093 100644
--- a/tools/test/iree-benchmark-executable.mlir
+++ b/tools/test/iree-benchmark-executable.mlir
@@ -34,12 +34,10 @@
 // CHECK: BM_dispatch_512x1x1
 
 // lhs * rhs => dst / s0b0 * s0b1 => s0b2
-#pipeline_layout = #hal.pipeline.layout<push_constants = 0, sets = [
-  #hal.descriptor_set.layout<0, bindings = [
-    #hal.descriptor_set.binding<0, storage_buffer>,
-    #hal.descriptor_set.binding<1, storage_buffer>,
-    #hal.descriptor_set.binding<2, storage_buffer>
-  ]>
+#pipeline_layout = #hal.pipeline.layout<bindings = [
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>,
+  #hal.pipeline.binding<storage_buffer>
 ]>
 hal.executable.source public @executable {
   hal.executable.export public @elementwise_mul ordinal(0) layout(#pipeline_layout) attributes {
@@ -52,9 +50,9 @@
   }
   builtin.module {
     func.func @elementwise_mul() {
-      %lhs = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(0) alignment(32) : !flow.dispatch.tensor<readonly:tensor<4xf32>>
-      %rhs = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(1) alignment(32) : !flow.dispatch.tensor<readonly:tensor<4xf32>>
-      %dst = hal.interface.binding.subspan layout(#pipeline_layout) set(0) binding(2) alignment(32) : !flow.dispatch.tensor<writeonly:tensor<4xf32>>
+      %lhs = hal.interface.binding.subspan layout(#pipeline_layout) binding(0) alignment(32) : !flow.dispatch.tensor<readonly:tensor<4xf32>>
+      %rhs = hal.interface.binding.subspan layout(#pipeline_layout) binding(1) alignment(32) : !flow.dispatch.tensor<readonly:tensor<4xf32>>
+      %dst = hal.interface.binding.subspan layout(#pipeline_layout) binding(2) alignment(32) : !flow.dispatch.tensor<writeonly:tensor<4xf32>>
       // TODO(#16554): GPU/SPIR-V lowering doesn't handle workgroup size queries.
       // %workgroup_size_x = hal.interface.workgroup.size[0] : index
       %workgroup_size_x = arith.constant 1 : index