Adding multiple_modules sample (and fixing bugs). (#15653) This demonstrates multiple VM modules calling each other in both synchronous and asynchronous modes. This is useful for both designing reusable components as well as during testing/development/benchmarking as it shows how easily pipelines can be constructed for running models that may have state or non-trivial call sequences. Users can make as many calls as they want or have their own state or control flow in the pipelines to build arbitrarily complex sequences of work without needing to author python/C/yaml for iree-run-trace.

commit: dc6f0cd5f4a784531bde3dcac48f2effc28c4224 [log] [tgz]
author: Ben Vanik <ben.vanik@gmail.com> Mon Nov 20 21:36:58 2023 -0800
committer: GitHub <noreply@github.com> Mon Nov 20 21:36:58 2023 -0800
tree: d1a1652f9092b4e93f44f0635904ef2a4fdc6cc1
parent: 8ff8a848379882ed4a32a46d4192c457c535e46c [diff] [blame]
diff --git a/samples/multiple_modules/pipeline_async.mlir b/samples/multiple_modules/pipeline_async.mlir
new file mode 100644
index 0000000..46ad83c
--- /dev/null
+++ b/samples/multiple_modules/pipeline_async.mlir

@@ -0,0 +1,66 @@
+// RUN: (iree-compile --iree-execution-model=async-external --iree-hal-target-backends=vmvx %p/module_a.mlir -o=%t.module_a.vmfb && \
+// RUN:  iree-compile --iree-execution-model=async-external --iree-hal-target-backends=vmvx %p/module_b.mlir -o=%t.module_b.vmfb && \
+// RUN:  iree-compile --iree-execution-model=async-external --iree-hal-target-backends=vmvx %s | \
+// RUN:  iree-run-module --device=local-task \
+// RUN:    --module=%t.module_a.vmfb \
+// RUN:    --module=%t.module_b.vmfb \
+// RUN:    --module=- --function=run \
+// RUN:    --input=4096xf32=-2.0 \
+// RUN:    --expected_output=4096xf32=4.0) | \
+// RUN:  FileCheck %s
+// CHECK: [SUCCESS]
+
+// Functions declared in external modules - note `module_name.func_name`.
+// `abs` will allocate transient memory to pass back the result.
+// `mul` will use the provided output memory to produce the result in-place.
+// Note that though the returned SSA tensor value shares its storage with the
+// `%output` arg the returned value *must* be used to reference the produced
+// version of its contents.
+//
+// In this asynchronous example both functions follow the "coarse-fences" ABI
+// model where the compiler inserts a wait and signal fence pair on each call.
+// To enable this the modules must compiled with the
+// `--iree-execution-model=async-external` and the external declarations must
+// be annotated with the `iree.abi.model` attribute so that the compiler knows
+// the calls have the fences. Note that it's possible to have any combination of
+// asynchronous and synchronous modules and calls in the same program.
+func.func private @module_a.abs(%input: tensor<4096xf32>) -> tensor<4096xf32> attributes {
+  iree.abi.model = "coarse-fences"
+}
+func.func private @module_b.mul(%lhs: tensor<4096xf32>, %rhs: tensor<4096xf32>, %output: tensor<4096xf32> {iree.abi.output = 0 : index}) -> tensor<4096xf32> attributes {
+  iree.abi.model = "coarse-fences"
+}
+
+// Top-level pipeline invoked by the command line tool.
+// Since this is compiled with `--iree-execution-model=async-external` this
+// export will have a wait and signal fence pair that allows the hosting
+// application to execute the entire pipeline asynchronously.
+func.func @run(%input: tensor<4096xf32>) -> tensor<4096xf32> {
+  // Make a simple call that produces a transient result tensor.
+  // Since the call is asynchronous the result is not ready upon return to this
+  // function and it'll be passed with the fence down to the consumer call.
+  %input_abs = call @module_a.abs(%input) : (tensor<4096xf32>) -> tensor<4096xf32>
+
+  // Allocate output storage for the next call. This isn't needed here and
+  // functionally equivalent to `abs` above allocating its own transient memory
+  // but demonstrates how in-place operations can be performed across module
+  // boundaries. The allocation is asynchronous and will be passed with a fence
+  // indicating when it's ready to the consumer call.
+  %result_storage = tensor.empty() : tensor<4096xf32>
+
+  // Make a call that produces its output in the given `%result_storage`.
+  // The inputs and result storage are passed with their respective fences and
+  // no guarantee that they are available at the time the call is made. The
+  // `mul` implementation will chain its work with the fences and only signal
+  // its fence when all transitive dependencies and its own execution has
+  // completed.
+  %result = call @module_b.mul(%input_abs, %input_abs, %result_storage) : (tensor<4096xf32>, tensor<4096xf32>, tensor<4096xf32>) -> tensor<4096xf32>
+
+  // Return the final result value - note that we pass back the result of the
+  // `mul` call that aliases the `%result_storage` representing the computed
+  // value and not just `%result_storage`. This is required as the `%result` has
+  // an associated fence indicating when it is available for use and using
+  // `%result_storage` would just wait for the storage to be allocated and not
+  // for the contents to have been populated by `mul`.
+  return %result : tensor<4096xf32>
+}
commit	dc6f0cd5f4a784531bde3dcac48f2effc28c4224	[log] [tgz]
author	Ben Vanik <ben.vanik@gmail.com>	Mon Nov 20 21:36:58 2023 -0800
committer	GitHub <noreply@github.com>	Mon Nov 20 21:36:58 2023 -0800
tree	d1a1652f9092b4e93f44f0635904ef2a4fdc6cc1
parent	8ff8a848379882ed4a32a46d4192c457c535e46c [diff] [blame]