Create a func to use dummy input to call exported functions. (#3624)

This is the last piece to wire benchmarking translation up. The
benchmarking tool has an option to benchmark all exported functions
that do not have any arguments. The pass clones the exported functions
to take constants as input, instead of user's input.

Also rename the translation with `-iree-mlir-to-executable-benchmark-vm-module`

An example usage:

```
$ build/iree/tools/iree-translate \
  -iree-mlir-to-executable-benchmark-vm-module \
  -iree-hal-target-backends=vmla \
  iree/test/e2e/models/fullyconnected.mlir \
  -o /tmp/fullyconnected.vmfb
$ build/iree/tools/iree-benchmark-module
  --module_file=/tmp/fullyconnected.vmfb
  --driver=vmla
```

The output looks like:

```
---------------------------------------------------------------------------------------------
Benchmark                                                   Time             CPU   Iterations
---------------------------------------------------------------------------------------------
BM_main_ex_dispatch_0_entry/process_time/real_time      0.022 ms        0.028 ms        35167
BM_main_ex_dispatch_1_entry/process_time/real_time      0.023 ms        0.029 ms        30101
BM_main_ex_dispatch_2_entry/process_time/real_time      0.038 ms        0.047 ms        19453
BM_main_ex_dispatch_3_entry/process_time/real_time      0.030 ms        0.037 ms        26303
BM_main_ex_dispatch_4_entry/process_time/real_time      0.042 ms        0.051 ms        16264
BM_main_ex_dispatch_5_entry/process_time/real_time      0.032 ms        0.039 ms        20603
BM_main_ex_dispatch_6_entry/process_time/real_time      0.031 ms        0.037 ms        31923
BM_main_dummy_args/process_time/real_time               0.097 ms        0.105 ms         7139
```
diff --git a/docs/developing_iree/benchmarking.md b/docs/developing_iree/benchmarking.md
index 4a6ff63..82d8fec 100644
--- a/docs/developing_iree/benchmarking.md
+++ b/docs/developing_iree/benchmarking.md
@@ -99,11 +99,55 @@
 
 Remember to [restore CPU scaling](#cpu-configuration) when you're done.
 
-## Microbenchmarks
+## Executable Benchmarks
 
-We also benchmark the performance of individual parts (more of these coming
-soon) of the IREE system in isolation. These measurements provide more targeted
-metrics to direct development work.
+We also benchmark the performance of individual parts of the IREE system in
+isolation. IREE breaks a model down to dispatch functions. To benchmark all the
+dispatch functions, generate an IREE module with
+`-iree-mlir-to-executable-benchmark-vm-module` for the target backend:
+
+```shell
+$ build/iree/tools/iree-translate \
+  -iree-mlir-to-executable-benchmark-vm-module \
+  -iree-hal-target-backends=vmla \
+  iree/test/e2e/models/fullyconnected.mlir \
+  -o /tmp/fullyconnected.vmfb
+```
+
+and then benchmark all exported dispatch functions (and all exported functions)
+in that module:
+
+```shell
+$ build/iree/tools/iree-benchmark-module
+  --module_file=/tmp/fullyconnected.vmfb
+  --driver=vmla
+```
+
+If no `entry_function` is specified, `iree-benchmark-module` will register a
+benchmark for each exported function that takes no inputs.
+
+You will see output like:
+
+```shell
+Run on (72 X 3700 MHz CPU s)
+CPU Caches:
+  L1 Data 32 KiB (x36)
+  L1 Instruction 32 KiB (x36)
+  L2 Unified 1024 KiB (x36)
+  L3 Unified 25344 KiB (x2)
+Load Average: 4.39, 5.72, 6.76
+---------------------------------------------------------------------------------------------
+Benchmark                                                   Time             CPU   Iterations
+---------------------------------------------------------------------------------------------
+BM_main_ex_dispatch_0_entry/process_time/real_time      0.030 ms        0.037 ms        34065
+BM_main_ex_dispatch_1_entry/process_time/real_time      0.034 ms        0.042 ms        20567
+BM_main_ex_dispatch_2_entry/process_time/real_time      0.043 ms        0.051 ms        18576
+BM_main_ex_dispatch_3_entry/process_time/real_time      0.029 ms        0.036 ms        21345
+BM_main_ex_dispatch_4_entry/process_time/real_time      0.042 ms        0.051 ms        15880
+BM_main_ex_dispatch_5_entry/process_time/real_time      0.030 ms        0.037 ms        17854
+BM_main_ex_dispatch_6_entry/process_time/real_time      0.043 ms        0.052 ms        14919
+BM_main_dummy_args/process_time/real_time               0.099 ms        0.107 ms         5892
+```
 
 ### Bytecode Module Benchmarks
 
diff --git a/iree/compiler/Dialect/Flow/Transforms/BUILD b/iree/compiler/Dialect/Flow/Transforms/BUILD
index 9e937cd..b5abc33 100644
--- a/iree/compiler/Dialect/Flow/Transforms/BUILD
+++ b/iree/compiler/Dialect/Flow/Transforms/BUILD
@@ -21,7 +21,7 @@
 cc_library(
     name = "Transforms",
     srcs = [
-        "CreateFuncsToInvokeExecOps.cpp",
+        "CreateBenchmarkFuncs.cpp",
         "DispatchConfig.cpp",
         "DispatchabilityAnalysis.cpp",
         "FlattenTuplesInCFG.cpp",
diff --git a/iree/compiler/Dialect/Flow/Transforms/CMakeLists.txt b/iree/compiler/Dialect/Flow/Transforms/CMakeLists.txt
index 87c11fa..a67f8b0 100644
--- a/iree/compiler/Dialect/Flow/Transforms/CMakeLists.txt
+++ b/iree/compiler/Dialect/Flow/Transforms/CMakeLists.txt
@@ -21,7 +21,7 @@
     "DispatchConfig.h"
     "Passes.h"
   SRCS
-    "CreateFuncsToInvokeExecOps.cpp"
+    "CreateBenchmarkFuncs.cpp"
     "DispatchConfig.cpp"
     "DispatchabilityAnalysis.cpp"
     "FlattenTuplesInCFG.cpp"
diff --git a/iree/compiler/Dialect/Flow/Transforms/CreateBenchmarkFuncs.cpp b/iree/compiler/Dialect/Flow/Transforms/CreateBenchmarkFuncs.cpp
new file mode 100644
index 0000000..836c4f2
--- /dev/null
+++ b/iree/compiler/Dialect/Flow/Transforms/CreateBenchmarkFuncs.cpp
@@ -0,0 +1,139 @@
+// Copyright 2020 Google LLC
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//      https://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include "iree/compiler/Dialect/Flow/IR/FlowOps.h"
+#include "mlir/Dialect/StandardOps/IR/Ops.h"
+#include "mlir/IR/BlockAndValueMapping.h"
+#include "mlir/IR/Builders.h"
+#include "mlir/IR/StandardTypes.h"
+#include "mlir/Pass/Pass.h"
+
+namespace mlir {
+namespace iree_compiler {
+namespace IREE {
+namespace Flow {
+
+// Creates two kind of benchmark functions:
+//   - Creates exported functions to invoke each executable op.
+//   - Clones each exported functions (including those just created) with
+//     placeholder constant inputs instead of arguments and removes the
+//     exported attribute from the old functions.
+// The input are provided using flow.variable and flow.lookup.
+class CreateBenchmarkFuncs
+    : public PassWrapper<CreateBenchmarkFuncs, OperationPass<ModuleOp>> {
+ public:
+  void runOnOperation() override {
+    ModuleOp moduleOp = getOperation();
+    auto builder = OpBuilder::atBlockBegin(moduleOp.getBody());
+    SymbolTable moduleSymbols(moduleOp);
+    for (auto execOp : moduleOp.getOps<IREE::Flow::ExecutableOp>()) {
+      for (auto& op : execOp.getBlock()) {
+        auto dispatchEntryOp = dyn_cast<IREE::Flow::DispatchEntryOp>(op);
+        if (!dispatchEntryOp) continue;
+        auto execFuncOp = execOp.getInnerModule().lookupSymbol<FuncOp>(
+            dispatchEntryOp.function_ref());
+        Location loc = execFuncOp.getLoc();
+
+        // Create a funcOp to invoke the dispatch function.
+        std::string funcName = std::string(execFuncOp.getName()) + "_entry";
+        auto funcType =
+            builder.getFunctionType({}, execFuncOp.getType().getResults());
+        auto funcOp = builder.create<FuncOp>(loc, funcName, funcType);
+        funcOp.setAttr("iree.module.export", UnitAttr::get(&getContext()));
+        Block* block = funcOp.addEntryBlock();
+
+        // Build the body of the FuncOp.
+        OpBuilder::InsertionGuard guard(builder);
+        builder.setInsertionPoint(funcOp);
+        auto blockBuilder = OpBuilder(block, block->begin());
+        SmallVector<Value, 4> args;
+        for (auto inputType : execFuncOp.getType().getInputs()) {
+          args.push_back(getDummyInput(builder, blockBuilder, loc, inputType,
+                                       moduleSymbols));
+        }
+
+        // TODO(hanchung): Use a real workload instead? We can probably
+        // calculate the workload from the results.
+        auto dummyWorkload = blockBuilder.create<ConstantIndexOp>(loc, 0);
+        auto dispatchOp = blockBuilder.create<DispatchOp>(
+            loc, dispatchEntryOp, dummyWorkload, funcType.getResults(), args);
+        blockBuilder.create<mlir::ReturnOp>(loc, dispatchOp.getResults());
+      }
+    }
+
+    // TODO(#3577): Move below part to a separate pass and use CallOp instead of
+    // clone the region. The CallOp is materialized in an earlier stage. We
+    // don't expect to see it at flow level.
+    for (auto funcOp : moduleOp.getOps<FuncOp>()) {
+      if (!funcOp.getAttr("iree.module.export")) {
+        continue;
+      }
+      if (funcOp.getNumArguments() == 0) {
+        continue;
+      }
+
+      Location loc = funcOp.getLoc();
+      auto funcType =
+          builder.getFunctionType({}, funcOp.getType().getResults());
+      std::string funcName = std::string(funcOp.getName()) + "_dummy_args";
+      auto newFuncOp = builder.create<FuncOp>(loc, funcName, funcType);
+      newFuncOp.setAttr("iree.module.export", builder.getUnitAttr());
+      Block* block = newFuncOp.addEntryBlock();
+
+      OpBuilder::InsertionGuard guard(builder);
+      builder.setInsertionPoint(newFuncOp);
+      auto blockBuilder = OpBuilder::atBlockBegin(block);
+      BlockAndValueMapping mapping;
+      for (auto iter : llvm::enumerate(funcOp.getType().getInputs())) {
+        auto arg = getDummyInput(builder, blockBuilder, loc, iter.value(),
+                                 moduleSymbols);
+        mapping.map(funcOp.getArgument(iter.index()), arg);
+      }
+      for (auto& op : funcOp.getRegion().begin()->getOperations()) {
+        blockBuilder.clone(op, mapping);
+      }
+
+      funcOp.removeAttr("iree.module.export");
+    }
+  }
+
+ private:
+  Value getDummyInput(OpBuilder& moduleBuilder, OpBuilder& blockBuilder,
+                      Location loc, Type inputType,
+                      const SymbolTable& moduleSymbols) {
+    std::string baseName = "_benchmark_input_";
+    std::string name = baseName + std::to_string(uniqueId++);
+    auto attr = blockBuilder.getZeroAttr(inputType);
+    auto variableOp =
+        moduleBuilder.create<VariableOp>(loc, name,
+                                         /*isMutable=*/false, inputType, attr);
+    SymbolTable::setSymbolVisibility(variableOp,
+                                     SymbolTable::Visibility::Private);
+    variableOp.setAttr("noinline", UnitAttr::get(moduleBuilder.getContext()));
+    auto lookupOp = blockBuilder.create<IREE::Flow::VariableLoadOp>(
+        loc, inputType, variableOp.getName());
+    return lookupOp.getResult();
+  }
+
+  int uniqueId = 0;
+};
+
+std::unique_ptr<OperationPass<ModuleOp>> createCreateBenchmarkFuncs() {
+  return std::make_unique<CreateBenchmarkFuncs>();
+}
+
+}  // namespace Flow
+}  // namespace IREE
+}  // namespace iree_compiler
+}  // namespace mlir
diff --git a/iree/compiler/Dialect/Flow/Transforms/CreateFuncsToInvokeExecOps.cpp b/iree/compiler/Dialect/Flow/Transforms/CreateFuncsToInvokeExecOps.cpp
deleted file mode 100644
index 6e5c258..0000000
--- a/iree/compiler/Dialect/Flow/Transforms/CreateFuncsToInvokeExecOps.cpp
+++ /dev/null
@@ -1,81 +0,0 @@
-// Copyright 2020 Google LLC
-//
-// Licensed under the Apache License, Version 2.0 (the "License");
-// you may not use this file except in compliance with the License.
-// You may obtain a copy of the License at
-//
-//      https://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing, software
-// distributed under the License is distributed on an "AS IS" BASIS,
-// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-// See the License for the specific language governing permissions and
-// limitations under the License.
-
-#include "iree/compiler/Dialect/Flow/IR/FlowOps.h"
-#include "mlir/Dialect/StandardOps/IR/Ops.h"
-#include "mlir/IR/BlockAndValueMapping.h"
-#include "mlir/IR/Builders.h"
-#include "mlir/IR/StandardTypes.h"
-#include "mlir/Pass/Pass.h"
-
-namespace mlir {
-namespace iree_compiler {
-namespace IREE {
-namespace Flow {
-
-// Walks through all the execuatable ops and creates Funcs to invoke them. The
-// input are provided using constants.
-class CreateFuncsToInvokeExecOpsPass
-    : public PassWrapper<CreateFuncsToInvokeExecOpsPass,
-                         OperationPass<ModuleOp>> {
- public:
-  CreateFuncsToInvokeExecOpsPass() = default;
-
-  void runOnOperation() override {
-    ModuleOp moduleOp = getOperation();
-    auto builder = OpBuilder::atBlockBegin(moduleOp.getBody());
-    Location loc = moduleOp.getLoc();
-    auto execOps = moduleOp.getOps<IREE::Flow::ExecutableOp>();
-    for (auto execOp : execOps) {
-      for (auto& op : execOp.getBlock()) {
-        if (auto dispatchEntryOp = dyn_cast<IREE::Flow::DispatchEntryOp>(op)) {
-          auto execFuncOp = execOp.getInnerModule().lookupSymbol<FuncOp>(
-              dispatchEntryOp.function_ref());
-          std::string funcName = std::string(execFuncOp.getName()) + "_entry";
-          auto funcType =
-              builder.getFunctionType({}, execFuncOp.getType().getResults());
-          auto funcOp =
-              builder.create<FuncOp>(moduleOp.getLoc(), funcName, funcType);
-          funcOp.setAttr("iree.module.export", UnitAttr::get(&getContext()));
-          Block* block = funcOp.addEntryBlock();
-          auto blockBuilder = OpBuilder(block, block->begin());
-          SmallVector<Value, 4> args;
-          for (auto inputType : execFuncOp.getType().getInputs()) {
-            // TODO(hanchung): Use non-zero or random values as inputs.
-            auto attr = blockBuilder.getZeroAttr(inputType);
-            auto cst = blockBuilder.create<ConstantOp>(moduleOp.getLoc(),
-                                                       inputType, attr);
-            args.push_back(cst);
-          }
-          // TODO(hanchung): Use a real workload instead? We can probably
-          // calculate the workload from the results.
-          auto dummyWorkload = blockBuilder.create<ConstantIndexOp>(loc, 0);
-          auto dispatchOp = blockBuilder.create<DispatchOp>(
-              loc, dispatchEntryOp, dummyWorkload, funcType.getResults(), args);
-          blockBuilder.create<mlir::ReturnOp>(loc, dispatchOp.getResults());
-        }
-      }
-    }
-  }
-};
-
-std::unique_ptr<OperationPass<ModuleOp>>
-createCreateFuncsToInvokeExecOpsPass() {
-  return std::make_unique<CreateFuncsToInvokeExecOpsPass>();
-}
-
-}  // namespace Flow
-}  // namespace IREE
-}  // namespace iree_compiler
-}  // namespace mlir
diff --git a/iree/compiler/Dialect/Flow/Transforms/Passes.cpp b/iree/compiler/Dialect/Flow/Transforms/Passes.cpp
index 384f102..76431f9 100644
--- a/iree/compiler/Dialect/Flow/Transforms/Passes.cpp
+++ b/iree/compiler/Dialect/Flow/Transforms/Passes.cpp
@@ -215,10 +215,7 @@
 }
 
 void buildExportDispatchesTransformPassPipeline(OpPassManager &passManager) {
-  passManager.addPass(IREE::Flow::createCreateFuncsToInvokeExecOpsPass());
-  // Move all the constants to flow.variables.
-  passManager.addPass(createOutlineLargeConstantsPass(
-      /*minLargeConstantSize=*/0));
+  passManager.addPass(IREE::Flow::createCreateBenchmarkFuncs());
   passManager.addPass(IREE::Flow::createMaterializeExportedReflection());
   passManager.addPass(IREE::Flow::createMergeExportedReflection());
   passManager.addPass(IREE::Flow::createFormStreamsPass());
diff --git a/iree/compiler/Dialect/Flow/Transforms/Passes.h b/iree/compiler/Dialect/Flow/Transforms/Passes.h
index 2e66736..8c856f7 100644
--- a/iree/compiler/Dialect/Flow/Transforms/Passes.h
+++ b/iree/compiler/Dialect/Flow/Transforms/Passes.h
@@ -126,7 +126,7 @@
 std::unique_ptr<OperationPass<ModuleOp>> createOutlineDispatchRegionsPass();
 
 // Exports all the dispatch functions to the module.
-std::unique_ptr<OperationPass<ModuleOp>> createCreateFuncsToInvokeExecOpsPass();
+std::unique_ptr<OperationPass<ModuleOp>> createCreateBenchmarkFuncs();
 
 //===----------------------------------------------------------------------===//
 // Optimizations
@@ -192,7 +192,7 @@
   createRematerializeDispatchConstantsPass();
   createOutlineDispatchRegionsPass();
   createOutlineLargeConstantsPass();
-  createCreateFuncsToInvokeExecOpsPass();
+  createCreateBenchmarkFuncs();
   createFormStreamsPass();
   createHoistUnstreamableOpsPass();
   createStripAndSplatConstantVariablesPass();
diff --git a/iree/compiler/Dialect/Flow/Transforms/test/create_benchmark_funcs.mlir b/iree/compiler/Dialect/Flow/Transforms/test/create_benchmark_funcs.mlir
new file mode 100644
index 0000000..b742302
--- /dev/null
+++ b/iree/compiler/Dialect/Flow/Transforms/test/create_benchmark_funcs.mlir
@@ -0,0 +1,38 @@
+// RUN: iree-opt -iree-flow-transformation-pipeline -iree-flow-export-dispatches %s | IreeFileCheck %s
+
+module {
+  func @two_dispatch(%arg0: tensor<5x3xf32>, %arg1: tensor<3x5xf32>) -> (tensor<5x5xf32>, tensor<3x5xf32>) attributes { iree.module.export } {
+    %0 = "mhlo.dot"(%arg0, %arg1) : (tensor<5x3xf32>, tensor<3x5xf32>) -> tensor<5x5xf32>
+    %1 = "mhlo.dot"(%arg1, %0) : (tensor<3x5xf32>, tensor<5x5xf32>) -> tensor<3x5xf32>
+    return %0, %1 : tensor<5x5xf32>, tensor<3x5xf32>
+  }
+}
+// CHECK-DAG: flow.variable @[[IN0_0:.+]] dense<{{.*}}> : tensor<5x3xf32>
+// CHECK-DAG: flow.variable @[[IN0_1:.+]] dense<{{.*}}> : tensor<3x5xf32>
+//     CHECK: func @two_dispatch_ex_dispatch_0_entry
+//     CHECK: %{{.+}} = flow.variable.load @[[IN0_0]] : tensor<5x3xf32>
+//     CHECK: %{{.+}} = flow.variable.load @[[IN0_1]] : tensor<3x5xf32>
+//     CHECK: %[[RES:.+]] = flow.ex.stream.fragment({{.+}}) -> tensor<5x5xf32> {
+//     CHECK:   %[[DISPATCH_RES:.+]] = flow.dispatch @two_dispatch_ex_dispatch_0::@two_dispatch_ex_dispatch_0[%{{.+}} : index](%{{.+}}, %{{.+}}) : (tensor<5x3xf32>, tensor<3x5xf32>) -> tensor<5x5xf32>
+//     CHECK:   flow.return %[[DISPATCH_RES]] : tensor<5x5xf32>
+//     CHECK: return %[[RES]] : tensor<5x5xf32>
+//
+// CHECK-DAG: flow.variable @[[IN1_0:.+]] dense<{{.*}}> : tensor<3x5xf32>
+// CHECK-DAG: flow.variable @[[IN1_1:.+]] dense<{{.*}}> : tensor<5x5xf32>
+//     CHECK: func @two_dispatch_ex_dispatch_1_entry
+//     CHECK: %{{.+}} = flow.variable.load @[[IN1_0]] : tensor<3x5xf32>
+//     CHECK: %{{.+}} = flow.variable.load @[[IN1_1]] : tensor<5x5xf32>
+//     CHECK: %[[RES:.+]] = flow.ex.stream.fragment({{.+}}) -> tensor<3x5xf32>
+//     CHECK:   %[[DISPATCH_RES:.+]] = flow.dispatch @two_dispatch_ex_dispatch_1::@two_dispatch_ex_dispatch_1[%{{.+}} : index](%{{.+}}, %{{.+}}) : (tensor<3x5xf32>, tensor<5x5xf32>) -> tensor<3x5xf32>
+//     CHECK:   flow.return %[[DISPATCH_RES]] : tensor<3x5xf32>
+//     CHECK: return %[[RES]] : tensor<3x5xf32>
+//
+// CHECK-DAG: flow.variable @[[MAIN_IN_0:.+]] dense<{{.*}}> : tensor<5x3xf32>
+// CHECK-DAG: flow.variable @[[MAIN_IN_1:.+]] dense<{{.*}}> : tensor<3x5xf32>
+//     CHECK: func @two_dispatch_dummy_args()
+//     CHECK: %{{.+}} = flow.variable.load @[[MAIN_IN_0]] : tensor<5x3xf32>
+//     CHECK: %{{.+}} = flow.variable.load @[[MAIN_IN_1]] : tensor<3x5xf32>
+//     CHECK: flow.ex.stream.fragment({{.+}}) -> (tensor<5x5xf32>, tensor<3x5xf32>) {
+//     CHECK:   %[[DISPATCH_RES1:.+]] = flow.dispatch
+//     CHECK:   %[[DISPATCH_RES2:.+]] = flow.dispatch
+//     CHECK:   flow.return %[[DISPATCH_RES1]], %[[DISPATCH_RES2]] : tensor<5x5xf32>, tensor<3x5xf32>
diff --git a/iree/compiler/Dialect/Flow/Transforms/test/create_funcs_to_invoke_exec_ops.mlir b/iree/compiler/Dialect/Flow/Transforms/test/create_funcs_to_invoke_exec_ops.mlir
deleted file mode 100644
index 4f297c6..0000000
--- a/iree/compiler/Dialect/Flow/Transforms/test/create_funcs_to_invoke_exec_ops.mlir
+++ /dev/null
@@ -1,24 +0,0 @@
-// RUN: iree-opt -iree-flow-transformation-pipeline -iree-flow-export-dispatches %s | IreeFileCheck %s
-
-module {
-  func @two_dispatch(%arg0: tensor<5x3xf32>, %arg1: tensor<3x5xf32>) -> (tensor<5x5xf32>, tensor<3x5xf32>) attributes { iree.module.export } {
-    %0 = "mhlo.dot"(%arg0, %arg1) : (tensor<5x3xf32>, tensor<3x5xf32>) -> tensor<5x5xf32>
-    %1 = "mhlo.dot"(%arg1, %0) : (tensor<3x5xf32>, tensor<5x5xf32>) -> tensor<3x5xf32>
-    return %0, %1 : tensor<5x5xf32>, tensor<3x5xf32>
-  }
-}
-// CHECK: func @two_dispatch_ex_dispatch_0_entry
-// CHECK: %{{.+}} = flow.variable.load {{.*}} : tensor<5x3xf32>
-// CHECK: %{{.+}} = flow.variable.load {{.*}} : tensor<3x5xf32>
-// CHECK: %[[RES:.+]] = flow.ex.stream.fragment({{.+}}) -> tensor<5x5xf32> {
-// CHECK:   %[[DISPATCH_RES:.+]] = flow.dispatch @two_dispatch_ex_dispatch_0::@two_dispatch_ex_dispatch_0[%{{.+}} : index](%{{.+}}, %{{.+}}) : (tensor<5x3xf32>, tensor<3x5xf32>) -> tensor<5x5xf32>
-// CHECK:   flow.return %[[DISPATCH_RES]] : tensor<5x5xf32>
-// CHECK: return %[[RES]] : tensor<5x5xf32>
-//
-// CHECK: func @two_dispatch_ex_dispatch_1_entry
-// CHECK: %[[ARG0:.+]] = flow.variable.load {{.*}} : tensor<3x5xf32>
-// CHECK: %[[ARG1:.+]] = flow.variable.load {{.*}} : tensor<5x5xf32>
-// CHECK: %[[RES:.+]] = flow.ex.stream.fragment({{.+}}) -> tensor<3x5xf32>
-// CHECK:   %[[DISPATCH_RES:.+]] = flow.dispatch @two_dispatch_ex_dispatch_1::@two_dispatch_ex_dispatch_1[%{{.+}} : index](%{{.+}}, %{{.+}}) : (tensor<3x5xf32>, tensor<5x5xf32>) -> tensor<3x5xf32>
-// CHECK:   flow.return %[[DISPATCH_RES]] : tensor<3x5xf32>
-// CHECK: return %[[RES]] : tensor<3x5xf32>
diff --git a/iree/compiler/Translation/IREEVM.cpp b/iree/compiler/Translation/IREEVM.cpp
index 353e4ed..b3ad838 100644
--- a/iree/compiler/Translation/IREEVM.cpp
+++ b/iree/compiler/Translation/IREEVM.cpp
@@ -170,7 +170,7 @@
       translateFromMLIRToVMBytecodeModuleWithFlags);
 
   TranslateFromMLIRRegistration toBenchmarkVMBytecodeModuleWithFlags(
-      "iree-mlir-to-benchmark-vm-bytecode-module",
+      "iree-mlir-to-executable-benchmark-vm-module",
       translateFromMLIRToBenchmarkVMBytecodeModuleWithFlags);
 
 #ifdef IREE_HAVE_EMITC_DIALECT
diff --git a/iree/test/e2e/regression/BUILD b/iree/test/e2e/regression/BUILD
index fc3ef1f..6ec1eef 100644
--- a/iree/test/e2e/regression/BUILD
+++ b/iree/test/e2e/regression/BUILD
@@ -29,7 +29,9 @@
     srcs = glob(["*.mlir"]),
     data = [
         "//iree/tools:IreeFileCheck",
+        "//iree/tools:iree-benchmark-module",
         "//iree/tools:iree-run-mlir",
+        "//iree/tools:iree-translate",
     ],
     tags = ["hostonly"],
 )
diff --git a/iree/test/e2e/regression/CMakeLists.txt b/iree/test/e2e/regression/CMakeLists.txt
index e2b4157..09734e6 100644
--- a/iree/test/e2e/regression/CMakeLists.txt
+++ b/iree/test/e2e/regression/CMakeLists.txt
@@ -22,7 +22,9 @@
     "${_GLOB_X_MLIR}"
   DATA
     iree::tools::IreeFileCheck
+    iree::tools::iree-benchmark-module
     iree::tools::iree-run-mlir
+    iree::tools::iree-translate
   LABELS
     "hostonly"
 )
diff --git a/iree/test/e2e/regression/executable_benchmark.mlir b/iree/test/e2e/regression/executable_benchmark.mlir
new file mode 100644
index 0000000..1543fd5
--- /dev/null
+++ b/iree/test/e2e/regression/executable_benchmark.mlir
@@ -0,0 +1,13 @@
+// Only checks registered benchmarks.
+// RUN: iree-translate --iree-hal-target-backends=vmla -iree-mlir-to-executable-benchmark-vm-module %s -o ${TEST_TMPDIR?}/bc.module && iree-benchmark-module --driver=vmla -module_file=${TEST_TMPDIR?}/bc.module --benchmark_list_tests=true | IreeFileCheck %s
+
+func @two_dispatch() -> (tensor<5x5xf32>, tensor<3x5xf32>) attributes { iree.module.export } {
+  %0 = iree.unfoldable_constant dense<1.0> : tensor<5x3xf32>
+  %1 = iree.unfoldable_constant dense<0.4> : tensor<3x5xf32>
+  %2 = "mhlo.dot"(%0, %1) : (tensor<5x3xf32>, tensor<3x5xf32>) -> tensor<5x5xf32>
+  %3 = "mhlo.dot"(%1, %2) : (tensor<3x5xf32>, tensor<5x5xf32>) -> tensor<3x5xf32>
+  return %2, %3 : tensor<5x5xf32>, tensor<3x5xf32>
+}
+// CHECK: BM_two_dispatch_ex_dispatch_0_entry
+// CHECK: BM_two_dispatch_ex_dispatch_1_entry
+// CHECK: BM_two_dispatch