Add initial benchmarking features to WebAssembly experiments. (#9100)

### Background

I'm working towards identifying and quantifying key metrics for our web platform support. Metrics of interest thus far include:
* runtime binary size
* program binary size
* runtime startup time
* program load time
* total function call time (as seen from a JavaScript application)
* no-overhead function call time (`iree_runtime_call_invoke()`)

Once we have baselines for those, we can start deeper analysis and optimization work. Some optimization work may be purely in the IREE compiler (e.g. codegen, Flow/Stream/HAL dialects, etc.), while other work may be in the web port itself (e.g. Emscripten flags, use of web APIs, runtime JS bindings).

### Current Status

This is still all experimental, but I'd like to checkpoint what I've built so far and let other people try it out. Expect rough edges (e.g. some Windows/Linux paths specific to my setup that can be overwritten).

Summary of changes:

* `sample_dynamic/index.html` now supports interactive benchmarking
  * a "benchmark iterations" form input drives a loop around `iree_runtime_call_invoke()` down in C/Wasm
  * timing information is rendered onto the page itself, instead of being logged to stdout / the console
  * sample screenshot: https://user-images.githubusercontent.com/4010439/167958736-228a1541-8ed6-4b2c-9af9-55ef0b10bf74.png
* `generate_web_metrics.sh` imports and compiles programs from our existing benchmark suite, preserving all sorts of artifacts for further manual inspection or automated use
* `run_native_benchmarks.sh` executes the compiled native programs from `generate_web_metrics.sh`
* `sample_dynamic/benchmarks.html` loads and runs each compiled Wasm program from `generate_web_metrics.sh`
  * Sample output: https://gist.github.com/ScottTodd/f2bb1f274c5895c8f979400abd6d2b67

Also of note: I haven't really looked at the browser profiling tools yet. I expect those will help with detailed analysis while the general scaffolding and `performance.now()` measurements will help with comparisons between frameworks and across browsers/devices.
diff --git a/experimental/web/generate_web_metrics.sh b/experimental/web/generate_web_metrics.sh
new file mode 100644
index 0000000..0257df0
--- /dev/null
+++ b/experimental/web/generate_web_metrics.sh
@@ -0,0 +1,199 @@
+#!/bin/bash
+
+# Copyright 2022 The IREE Authors
+#
+# Licensed under the Apache License v2.0 with LLVM Exceptions.
+# See https://llvm.org/LICENSE.txt for license information.
+# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+
+# Generate metrics for a set of programs using IREE's WebAssembly backend.
+#
+# Intended for interactive developer use, so this generates lots of output
+# files for manual inspection outside of the script.
+# Some hardcoded paths should be edited before use, and you might want to
+# comment out specific portions (such as python package installs) depending
+# on your environment or workflow choices.
+#
+# Script steps:
+#   * Install IREE tools into a Python virtual environment (venv)
+#   * Download program source files (.tflite files from GCS)
+#   * Import programs into MLIR (.tflite -> .mlir)
+#   * Compile programs for WebAssembly (.mlir -> .vmfb, intermediates)
+#   * (TODO) Print statistics that can be produced by a script (i.e. no
+#     launching a webpage and waiting for benchmark results there)
+#
+# Sample usage 1:
+#   generate_web_metrics.sh /tmp/iree/web_metrics/
+#
+#   then look in that directory for any files you want to process further,
+#   and/or run run_native_benchmarks.sh to get native metrics for comparison
+#
+# Sample usage 2:
+#   sample_dynamic/build_sample.sh (optional install path)
+#   generate_web_metrics.sh /tmp/iree/web_metrics/
+#   sample_dynamic/serve_sample.sh
+#
+#   then open http://localhost:8000/benchmarks.html (for automated benchmarks)
+#   or open http://localhost:8000/ (for interactive benchmarks)
+
+set -eo pipefail
+
+TARGET_DIR="$1"
+if [[ -z "$TARGET_DIR" ]]; then
+  >&2 echo "ERROR: Expected target directory (e.g. /tmp/iree/web_metrics/)"
+  exit 1
+fi
+
+# If used in conjunction with sample_dynamic/build_sample.sh, this script can
+# copy compiled programs into that sample's build directory for benchmarking.
+ROOT_DIR="$(git rev-parse --show-toplevel)"
+BUILD_DIR="${ROOT_DIR?}"/build-emscripten
+SAMPLE_BINARY_DIR="${BUILD_DIR}"/experimental/web/sample_dynamic/
+
+echo "Working in directory '${TARGET_DIR}'"
+mkdir -p "${TARGET_DIR}"
+cd "${TARGET_DIR}"
+
+###############################################################################
+# Set up Python virtual environment                                           #
+###############################################################################
+
+python -m venv .venv
+source .venv/bin/activate
+trap "deactivate 2> /dev/null" EXIT
+
+# Skip package installs when you want by commenting this out. Freezing to a
+# specific version when iterating on metrics is useful, and fetching is slow.
+
+python -m pip install --upgrade \
+  --find-links https://github.com/google/iree/releases \
+  iree-compiler iree-tools-tflite iree-tools-xla
+
+###############################################################################
+# Download program source files                                               #
+###############################################################################
+
+wget -nc https://storage.googleapis.com/iree-model-artifacts/mobile_ssd_v2_float_coco.tflite
+wget -nc https://storage.googleapis.com/iree-model-artifacts/deeplabv3.tflite
+wget -nc https://storage.googleapis.com/iree-model-artifacts/posenet.tflite
+wget -nc https://storage.googleapis.com/iree-model-artifacts/mobilebert-baseline-tf2-float.tflite
+wget -nc https://storage.googleapis.com/iree-model-artifacts/mobilenet_v2_1.0_224.tflite
+wget -nc https://storage.googleapis.com/iree-model-artifacts/MobileNetV3SmallStaticBatch.tflite
+
+###############################################################################
+# Import programs into MLIR                                                   #
+###############################################################################
+
+# Note: you can also download imported programs from runs of the
+# https://buildkite.com/iree/iree-benchmark-android pipeline.
+
+IREE_IMPORT_TFLITE_PATH=iree-import-tflite
+
+# import_program helper
+#   Args: program_name, tflite_source_path
+#   Imports tflite_source_path to program_name.tflite.mlir
+function import_program {
+  OUTPUT_FILE=./$1.tflite.mlir
+  echo "Importing '$1' to '${OUTPUT_FILE}'..."
+  "${IREE_IMPORT_TFLITE_PATH?}" "$2" -o "${OUTPUT_FILE}"
+}
+
+import_program "deeplabv3" "deeplabv3.tflite"
+import_program "mobile_ssd_v2_float_coco" "mobile_ssd_v2_float_coco.tflite"
+import_program "posenet" "posenet.tflite"
+import_program "mobilebertsquad" "./mobilebert-baseline-tf2-float.tflite"
+import_program "mobilenet_v2_1.0_224" "./mobilenet_v2_1.0_224.tflite"
+import_program "MobileNetV3SmallStaticBatch" "MobileNetV3SmallStaticBatch.tflite"
+
+###############################################################################
+# Compile programs                                                            #
+###############################################################################
+
+# Either build from source (setting this path), or use from the python packages.
+# IREE_COMPILE_PATH=~/code/iree-build/iree/tools/iree-compile
+# IREE_COMPILE_PATH="D:\dev\projects\iree-build\iree\tools\iree-compile"
+IREE_COMPILE_PATH=iree-compile
+
+# compile_program_wasm helper
+#   Args: program_name
+#   Compiles program_name.tflite.mlir to program_name_wasm.vmfb, dumping
+#   statistics and intermediate files to disk.
+function compile_program_wasm {
+  INPUT_FILE=./"$1".tflite.mlir
+  OUTPUT_FILE=./"$1"_wasm.vmfb
+  echo "Compiling '${INPUT_FILE}' to '${OUTPUT_FILE}'..."
+
+  ARTIFACTS_DIR=./"$1"-wasm_artifacts/
+  mkdir -p "${ARTIFACTS_DIR}"
+
+  # Compile from .mlir to .vmfb, dumping all the intermediate files and
+  # compile-time statistics that we can.
+  "${IREE_COMPILE_PATH?}" "${INPUT_FILE}" \
+    --iree-input-type=tosa \
+    --iree-hal-target-backends=llvm \
+    --iree-llvm-target-triple=wasm32-unknown-emscripten \
+    --iree-hal-dump-executable-sources-to="${ARTIFACTS_DIR}" \
+    --iree-hal-dump-executable-binaries-to="${ARTIFACTS_DIR}" \
+    --iree-scheduling-dump-statistics-format=csv \
+    --iree-scheduling-dump-statistics-file=${ARTIFACTS_DIR}/$1_statistics.csv \
+    --o ${OUTPUT_FILE}
+
+  # Compress the .vmfb file (ideally it would be compressed already, but we can
+  # expect some compression support from platforms like the web).
+  gzip -k -f "${OUTPUT_FILE}"
+
+  if [[ -d "$SAMPLE_BINARY_DIR" ]]; then
+    echo "Copying '${OUTPUT_FILE}' to '${SAMPLE_BINARY_DIR}' for benchmarking"
+    cp "${OUTPUT_FILE}" "${SAMPLE_BINARY_DIR}"
+  fi
+}
+
+# compile_program_native helper
+#   Args: program_name
+#   Compiles program_name.tflite.mlir to program_name_native.vmfb, dumping
+#   statistics and intermediate files to disk.
+function compile_program_native {
+  INPUT_FILE=./"$1".tflite.mlir
+  OUTPUT_FILE=./"$1"_native.vmfb
+  echo "Compiling '${INPUT_FILE}' to '${OUTPUT_FILE}'..."
+
+  ARTIFACTS_DIR=./"$1"-native_artifacts/
+  mkdir -p "${ARTIFACTS_DIR}"
+
+  # Compile from .mlir to .vmfb, dumping all the intermediate files and
+  # compile-time statistics that we can.
+  "${IREE_COMPILE_PATH?}" "${INPUT_FILE}" \
+    --iree-input-type=tosa \
+    --iree-hal-target-backends=llvm \
+    --iree-hal-dump-executable-sources-to="${ARTIFACTS_DIR}" \
+    --iree-hal-dump-executable-binaries-to="${ARTIFACTS_DIR}" \
+    --iree-scheduling-dump-statistics-format=csv \
+    --iree-scheduling-dump-statistics-file="${ARTIFACTS_DIR}"/$1_statistics.csv \
+    --o "${OUTPUT_FILE}"
+
+  # Compress the .vmfb file (ideally it would be compressed already, but we can
+  # expect some compression support from platforms like the web).
+  gzip -k -f "${OUTPUT_FILE}"
+}
+
+# compile_program helper
+#   Args: program_name
+#   Wraps compile_program_wasm and compile_program_native.
+function compile_program {
+  compile_program_wasm $1
+  compile_program_native $1
+}
+
+compile_program "deeplabv3"
+compile_program "mobile_ssd_v2_float_coco"
+compile_program "posenet"
+compile_program "mobilebertsquad"
+compile_program "mobilenet_v2_1.0_224"
+compile_program "MobileNetV3SmallStaticBatch"
+
+###############################################################################
+# TODO: collect/summarize statistics (manual inspection or scripted)
+#   * .vmfb size
+#   * number of executables
+#   * size of each executable (data size)
+#   * size of constants
diff --git a/experimental/web/run_native_benchmarks.sh b/experimental/web/run_native_benchmarks.sh
new file mode 100644
index 0000000..b244bea
--- /dev/null
+++ b/experimental/web/run_native_benchmarks.sh
@@ -0,0 +1,93 @@
+#!/bin/bash
+
+# Copyright 2022 The IREE Authors
+#
+# Licensed under the Apache License v2.0 with LLVM Exceptions.
+# See https://llvm.org/LICENSE.txt for license information.
+# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+
+# Run native benchmarks to compare with WebAssembly.
+#
+# Sample usage (after generate_web_metrics.sh):
+#   run_native_benchmarks.sh /tmp/iree/web_metrics/
+
+set -euo pipefail
+
+TARGET_DIR="$1"
+if [[ -z "$TARGET_DIR" ]]; then
+  >&2 echo "ERROR: Expected target directory (e.g. /tmp/iree/web_metrics/)"
+  exit 1
+fi
+echo "Working in directory '$TARGET_DIR'"
+mkdir -p "${TARGET_DIR}"
+cd "${TARGET_DIR}"
+
+###############################################################################
+# Run benchmarks                                                              #
+###############################################################################
+
+# Either build from source (setting this path), or use from the python packages.
+# IREE_BENCHMARK_MODULE_PATH=~/code/iree-build/iree/tools/iree-benchmark-module
+# IREE_BENCHMARK_MODULE_PATH="D:\dev\projects\iree-build\iree\tools\iree-benchmark-module"
+IREE_BENCHMARK_MODULE_PATH=iree-benchmark-module
+
+echo "Benchmarking DeepLabV3..."
+"${IREE_BENCHMARK_MODULE_PATH?}" \
+    --module_file=./deeplabv3_native.vmfb \
+    --driver=dylib \
+    --task_topology_group_count=1 \
+    --entry_function=main \
+    --function_input=1x257x257x3xf32 \
+    --benchmark_min_time=3
+
+echo ""
+echo "Benchmarking MobileSSD..."
+"${IREE_BENCHMARK_MODULE_PATH?}" \
+    --module_file=./mobile_ssd_v2_float_coco_native.vmfb \
+    --driver=dylib \
+    --task_topology_group_count=1 \
+    --entry_function=main \
+    --function_input=1x320x320x3xf32 \
+    --benchmark_min_time=3
+
+echo ""
+echo "Benchmarking PoseNet..."
+"${IREE_BENCHMARK_MODULE_PATH?}" \
+    --module_file=./posenet_native.vmfb \
+    --driver=dylib \
+    --task_topology_group_count=1 \
+    --entry_function=main \
+    --function_input=1x353x257x3xf32 \
+    --benchmark_min_time=3
+
+echo ""
+echo "Benchmarking MobileBertSquad..."
+"${IREE_BENCHMARK_MODULE_PATH?}" \
+    --module_file=./mobilebertsquad_native.vmfb \
+    --driver=dylib \
+    --task_topology_group_count=1 \
+    --entry_function=main \
+    --function_input=1x384xi32 \
+    --function_input=1x384xi32 \
+    --function_input=1x384xi32 \
+    --benchmark_min_time=10
+
+echo ""
+echo "Benchmarking MobileNetV2..."
+"${IREE_BENCHMARK_MODULE_PATH?}" \
+    --module_file=./mobilenet_v2_1.0_224_native.vmfb \
+    --driver=dylib \
+    --task_topology_group_count=1 \
+    --entry_function=main \
+    --function_input=1x224x224x3xf32 \
+    --benchmark_min_time=3
+
+echo ""
+echo "Benchmarking MobileNetV3Small..."
+"${IREE_BENCHMARK_MODULE_PATH?}" \
+    --module_file=./MobileNetV3SmallStaticBatch_native.vmfb \
+    --driver=dylib \
+    --task_topology_group_count=1 \
+    --entry_function=main \
+    --function_input=1x224x224x3xf32 \
+    --benchmark_min_time=3
diff --git a/experimental/web/sample_dynamic/CMakeLists.txt b/experimental/web/sample_dynamic/CMakeLists.txt
index 97aaade..554d477 100644
--- a/experimental/web/sample_dynamic/CMakeLists.txt
+++ b/experimental/web/sample_dynamic/CMakeLists.txt
@@ -35,7 +35,7 @@
 
 target_link_options(${_NAME} PRIVATE
   # https://emscripten.org/docs/porting/connecting_cpp_and_javascript/Interacting-with-code.html#interacting-with-code-ccall-cwrap
-  "-sEXPORTED_FUNCTIONS=['_setup_sample', '_cleanup_sample', '_load_program', '_unload_program', '_call_function']"
+  "-sEXPORTED_FUNCTIONS=['_setup_sample', '_cleanup_sample', '_load_program', '_inspect_program', '_unload_program', '_call_function']"
   "-sEXPORTED_RUNTIME_METHODS=['ccall','cwrap']"
   #
   "-sASSERTIONS=1"
diff --git a/experimental/web/sample_dynamic/benchmarks.html b/experimental/web/sample_dynamic/benchmarks.html
new file mode 100644
index 0000000..21956d1
--- /dev/null
+++ b/experimental/web/sample_dynamic/benchmarks.html
@@ -0,0 +1,110 @@
+<!DOCTYPE html>
+<html>
+
+<!--
+Copyright 2022 The IREE Authors
+
+Licensed under the Apache License v2.0 with LLVM Exceptions.
+See https://llvm.org/LICENSE.txt for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+<head>
+  <meta charset="utf-8" />
+  <title>IREE Dynamic Web Benchmarks</title>
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <link rel="icon" href="/ghost.svg" type="image/svg+xml">
+
+  <style>
+    body {
+      padding: 16px;
+    }
+  </style>
+
+  <!-- https://getbootstrap.com/ for some webpage styling-->
+  <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-1BmE4kWBq78iYhFldvKuhfTAU6auU8tT94WrHftjDbrCEXSU1oBoqyl2QvZ6jIW3" crossorigin="anonymous">
+  <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/js/bootstrap.bundle.min.js" integrity="sha384-ka7Sk0Gln4gmtz2MlQnikT1wXgYsOg+OMhuP+IlRH9sENBO0LRn5q+8nbTov4+1p" crossorigin="anonymous"></script>
+
+  <script src="./iree_api.js"></script>
+</head>
+
+<body>
+  <div class="container">
+    <h1>IREE Dynamic Web Benchmarks</h1>
+
+    <p>
+      <br><b>Note:</b> Some outputs are logged to the console.</p>
+    </p>
+
+    <!-- TODO: button to run startup benchmarks -->
+    <!-- TODO: button to run inference benchmarks -->
+    <!-- TODO: button to run all benchmarks -->
+    <!-- TODO: UI to set number of iterations -->
+
+  </div>
+
+  <script>
+
+    async function runProgramBenchmarks(programPath, functionName, inputs) {
+      // Program loading.
+      // The IREE runtime has already been initialized, so this:
+      //   * issues an XMLHttpRequest for the program data
+      //   * passes the program data in via the WebAssembly heap
+      //   * creates an IREE "runtime session" and loads the program into it
+      console.log("Running benchmarks for program '" + programPath +
+                  "' and function '" + functionName + "'");
+      const startLoadTime = performance.now();
+      const program = await ireeLoadProgram(programPath);
+      const totalLoadTime = performance.now() - startLoadTime;
+      console.log("  Application load time (including overheads): " +
+                  totalLoadTime.toFixed(3) + "ms");
+
+      // Function calling.
+      // The runtime is initialized and the program is loaded, so this:
+      //   * passes the inputs in as a string from JS to Wasm
+      //   * parses the inputs string into buffer data
+      //   * invokes the function
+      //   * formats the function outputs into a string and passes that to JS
+      const startCallTime = performance.now();
+      const iterations = 1;
+      const resultObject =
+          await ireeCallFunction(program, functionName, inputs, iterations);
+      const totalCallTime = performance.now() - startCallTime;
+      console.log("  Application call time (including overheads): " +
+                  totalCallTime.toFixed(3) + "ms");
+    }
+
+    async function runAllBenchmarks() {
+      await runProgramBenchmarks("deeplabv3_wasm.vmfb", "main", "1x257x257x3xf32");
+      await runProgramBenchmarks("mobile_ssd_v2_float_coco_wasm.vmfb", "main", "1x320x320x3xf32");
+      await runProgramBenchmarks("posenet_wasm.vmfb", "main", "1x353x257x3xf32");
+      await runProgramBenchmarks("mobilebertsquad_wasm.vmfb", "main", ["1x384xi32", "1x384xi32", "1x384xi32"]);
+      await runProgramBenchmarks("mobilenet_v2_1.0_224_wasm.vmfb", "main", "1x224x224x3xf32");
+      await runProgramBenchmarks("MobileNetV3SmallStaticBatch_wasm.vmfb", "main", "1x224x224x3xf32");
+    }
+
+    async function main() {
+      // General runtime initialization.
+      // The IREE API script has already been loaded, so this:
+      //   * initializes a Web Worker
+      //   * loads the IREE runtime worker script
+      //   * loads the IREE runtime JavaScript bundle (compiled via Emscripten)
+      //   * instantiates the IREE runtime WebAssembly module
+      //   * initializes the IREE runtime (runtime context, HAL devices, etc.)
+      //   * (if threading is enabled) creates worker thread Web Workers
+      const startInitTime = performance.now();
+      await ireeInitializeWorker();
+      const totalInitTime = performance.now() - startInitTime;
+      console.log("IREE runtime initialized after " + totalInitTime.toFixed(3) + "ms");
+
+      await runAllBenchmarks();
+    }
+
+    console.log("=== Running benchmarks ===");
+    main().then(() => { console.log("=== Finished running benchmarks ==="); })
+          .catch((error) => { console.error("Error: '" + error + "'"); });
+
+  </script>
+</body>
+
+</html>
diff --git a/experimental/web/sample_dynamic/build_sample.sh b/experimental/web/sample_dynamic/build_sample.sh
index 534e4c3..1a027fb 100755
--- a/experimental/web/sample_dynamic/build_sample.sh
+++ b/experimental/web/sample_dynamic/build_sample.sh
@@ -87,5 +87,7 @@
 echo "=== Copying static files (.html, .js) to the build directory ==="
 
 cp ${SOURCE_DIR?}/index.html ${BINARY_DIR}
+cp ${SOURCE_DIR?}/benchmarks.html ${BINARY_DIR}
+cp ${ROOT_DIR?}/docs/website/overrides/ghost.svg ${BINARY_DIR}
 cp ${SOURCE_DIR?}/iree_api.js ${BINARY_DIR}
 cp ${SOURCE_DIR?}/iree_worker.js ${BINARY_DIR}
diff --git a/experimental/web/sample_dynamic/index.html b/experimental/web/sample_dynamic/index.html
index 9246d37..d26ffec 100644
--- a/experimental/web/sample_dynamic/index.html
+++ b/experimental/web/sample_dynamic/index.html
@@ -13,6 +13,7 @@
   <meta charset="utf-8" />
   <title>IREE Dynamic Web Sample</title>
   <meta name="viewport" content="width=device-width, initial-scale=1">
+  <link rel="icon" href="/ghost.svg" type="image/svg+xml">
 
   <style>
     body {
@@ -72,8 +73,8 @@
     <form>
       <p>
         <label for="function-name-input" class="form-label">Function name:</label>
-        <input type="text" id="function-name-input"
-              class="form-control" style="width:400px; font-family: monospace;" value="main"></input>
+        <input type="text" id="function-name-input" class="form-control"
+               style="width:400px; font-family: monospace;" value="main"></input>
       </p>
 
       <p>
@@ -83,6 +84,13 @@
                   style="min-width:400px; width:initial; min-height:100px; resize:both; font-family: monospace;"></textarea>
       </p>
 
+      <p>
+        <label for="benchmark-iterations-input" class="form-label">
+          Benchmark iterations (inner invoke call):</label>
+        <input type="number" id="benchmark-iterations-input" class="form-control"
+               style="width:400px; font-family: monospace;" value="1" min="1"></input>
+      </p>
+
       <button id="call-function" class="btn btn-primary" type="button"
               onclick="callFunctionWithFormInputs()" disabled>Call function</button>
       <button id="update-url" class="btn btn-secondary" type="button"
@@ -97,6 +105,11 @@
                 style="min-width:400px; width:initial; height:100px; resize:both; font-family: monospace;"></textarea>
     </p>
 
+    <p>Total time (including overheads):
+      <code id="benchmark-time-js-output" style="font-family: monospace;"></code></p>
+    <p>Mean inference time (Wasm only):
+      <code id="benchmark-time-wasm-output" style="font-family: monospace;"></code></p>
+
     <hr>
     <h2>Samples</h2>
 
@@ -162,7 +175,10 @@
     const callFunctionButton = document.getElementById("call-function");
     const functionNameInput = document.getElementById("function-name-input");
     const functionArgumentsInput = document.getElementById("function-arguments-input");
+    const benchmarkIterationsInput = document.getElementById("benchmark-iterations-input");
     const functionOutputsElement = document.getElementById("function-outputs");
+    const timeJsOutputElement = document.getElementById("benchmark-time-js-output");
+    const timeWasmOutputElement = document.getElementById("benchmark-time-wasm-output");
 
     async function finishLoadingProgram(newProgram, newProgramName) {
       if (loadedProgram !== null) {
@@ -171,6 +187,8 @@
         await ireeUnloadProgram(loadedProgram);
       }
 
+      await ireeInspectProgram(newProgram);
+
       loadedProgram = newProgram;
       programNameElement.innerText = newProgramName;
       callFunctionButton.disabled = false;
@@ -188,6 +206,10 @@
         functionArgumentsInput.value = searchParams.get("arguments");
       }
 
+      if (searchParams.has("iterations")) {
+        benchmarkIterationsInput.value = searchParams.get("iterations");
+      }
+
       if (searchParams.has("program")) {
         const programPath = searchParams.get("program");
 
@@ -200,6 +222,11 @@
     }
 
     async function tryLoadFromBuffer(programDataBuffer, programName) {
+      // Clear 'program' from the URL.
+      const searchParams = new URLSearchParams(window.location.search);
+      searchParams.delete("program");
+      replaceUrlWithSearchParams(searchParams);
+
       await initializePromise;
       const program = await ireeLoadProgram(programDataBuffer);
 
@@ -246,11 +273,23 @@
       }
 
       const functionName = functionNameInput.value;
-      const functionArguments = functionArgumentsInput.value.split("\n");
+      const inputs = functionArgumentsInput.value.split("\n");
+      const iterations = benchmarkIterationsInput.value;
+      const startJsTime = performance.now();
 
-      ireeCallFunction(loadedProgram, functionName, functionArguments)
-          .then((result) => {
-            functionOutputsElement.value = result.replace(";", "\n");
+      ireeCallFunction(loadedProgram, functionName, inputs, iterations)
+          .then((resultObject) => {
+            functionOutputsElement.value =
+                resultObject['outputs'].replace(";", "\n");
+
+            const endJsTime = performance.now();
+            const totalJsTime = endJsTime - startJsTime;
+            timeJsOutputElement.textContent = totalJsTime.toFixed(3) + "ms";
+
+            const totalWasmTimeMs = resultObject['total_invoke_time_ms'];
+            const meanWasmTimeMs = totalWasmTimeMs / iterations;
+            timeWasmOutputElement.textContent = meanWasmTimeMs.toFixed(3) +
+                "ms / iteration over " + iterations + " iteration(s)";
           })
           .catch((error) => {
             console.error("Function call error: '" + error + "'");
@@ -269,6 +308,7 @@
       const searchParams = new URLSearchParams(window.location.search);
       searchParams.set("function", functionNameInput.value);
       searchParams.set("arguments", functionArgumentsInput.value);
+      searchParams.set("iterations", benchmarkIterationsInput.value);
       replaceUrlWithSearchParams(searchParams);
     }
 
@@ -277,6 +317,7 @@
       searchParams.delete("program");
       searchParams.delete("function");
       searchParams.delete("arguments");
+      searchParams.delete("iterations");
       replaceUrlWithSearchParams(searchParams);
     }
     // ------------------------------------------------------------------------
diff --git a/experimental/web/sample_dynamic/iree_api.js b/experimental/web/sample_dynamic/iree_api.js
index fb0574f..4714fd5 100644
--- a/experimental/web/sample_dynamic/iree_api.js
+++ b/experimental/web/sample_dynamic/iree_api.js
@@ -86,6 +86,14 @@
   });
 }
 
+// Inspects a program, asynchronously.
+function ireeInspectProgram(programState) {
+  return _callIntoWorker({
+    'messageType': 'inspectProgram',
+    'payload': programState,
+  });
+}
+
 // Unloads a program, asynchronously.
 function ireeUnloadProgram(programState) {
   return _callIntoWorker({
@@ -96,14 +104,19 @@
 
 // Calls a function on a loaded program, asynchronously.
 //
-// Returns a semicolon delimited list of formatted outputs on success.
-function ireeCallFunction(programState, functionName, inputs) {
+// Returns a parsed JSON object on success:
+// {
+//   "total_invoke_time_ms": [number],
+//   "outputs": [semicolon delimited list of formatted outputs]
+// }
+function ireeCallFunction(programState, functionName, inputs, iterations) {
   return _callIntoWorker({
     'messageType': 'callFunction',
     'payload': {
       'programState': programState,
       'functionName': functionName,
       'inputs': inputs,
+      'iterations': iterations !== undefined ? iterations : 1,
     },
   });
 }
diff --git a/experimental/web/sample_dynamic/iree_worker.js b/experimental/web/sample_dynamic/iree_worker.js
index 01cb9e0..0143b24 100644
--- a/experimental/web/sample_dynamic/iree_worker.js
+++ b/experimental/web/sample_dynamic/iree_worker.js
@@ -11,6 +11,7 @@
 let wasmSetupSampleFn;
 let wasmCleanupSampleFn;
 let wasmLoadProgramFn;
+let wasmInspectProgramFn;
 let wasmUnloadProgramFn;
 let wasmCallFunctionFn;
 
@@ -28,9 +29,10 @@
     wasmCleanupSampleFn = Module.cwrap('cleanup_sample', null, ['number']);
     wasmLoadProgramFn =
         Module.cwrap('load_program', 'number', ['number', 'number', 'number']);
+    wasmInspectProgramFn = Module.cwrap('inspect_program', null, ['number']);
     wasmUnloadProgramFn = Module.cwrap('unload_program', null, ['number']);
-    wasmCallFunctionFn =
-        Module.cwrap('call_function', 'string', ['number', 'string', 'string']);
+    wasmCallFunctionFn = Module.cwrap(
+        'call_function', 'string', ['number', 'string', 'string', 'number']);
 
     sampleState = wasmSetupSampleFn();
 
@@ -83,6 +85,15 @@
   fetchRequest.send();
 }
 
+function inspectProgram(id, programState) {
+  wasmInspectProgramFn(programState);
+
+  postMessage({
+    'messageType': 'callResult',
+    'id': id,
+  });
+}
+
 function unloadProgram(id, programState) {
   wasmUnloadProgramFn(programState);
 
@@ -93,7 +104,7 @@
 }
 
 function callFunction(id, functionParams) {
-  const {programState, functionName, inputs} = functionParams;
+  const {programState, functionName, inputs, iterations} = functionParams;
 
   let inputsJoined;
   if (Array.isArray(inputs)) {
@@ -110,7 +121,7 @@
   }
 
   const returnValue =
-      wasmCallFunctionFn(programState, functionName, inputsJoined);
+      wasmCallFunctionFn(programState, functionName, inputsJoined, iterations);
 
   if (returnValue === '') {
     postMessage({
@@ -122,7 +133,7 @@
     postMessage({
       'messageType': 'callResult',
       'id': id,
-      'payload': returnValue,
+      'payload': JSON.parse(returnValue),
     });
     // TODO(scotttodd): free char* buffer? Or does Emscripten handle that?
     // Could refactor to
@@ -137,6 +148,8 @@
 
   if (messageType == 'loadProgram') {
     loadProgram(id, payload);
+  } else if (messageType == 'inspectProgram') {
+    inspectProgram(id, payload);
   } else if (messageType == 'unloadProgram') {
     unloadProgram(id, payload);
   } else if (messageType == 'callFunction') {
diff --git a/experimental/web/sample_dynamic/main.c b/experimental/web/sample_dynamic/main.c
index dcd9843..c0a3efb 100644
--- a/experimental/web/sample_dynamic/main.c
+++ b/experimental/web/sample_dynamic/main.c
@@ -35,6 +35,9 @@
 iree_program_state_t* load_program(iree_sample_state_t* sample_state,
                                    uint8_t* vmfb_data, size_t length);
 
+// Inspects metadata about a loaded program, printing to stdout.
+void inspect_program(iree_program_state_t* program_state);
+
 // Unloads a program and frees its state.
 void unload_program(iree_program_state_t* program_state);
 
@@ -49,8 +52,10 @@
 //   described in iree/tools/utils/vm_util and used in IREE's CLI tools.
 //   For example, the CLI `--function_input=f32=1 --function_input=f32=2`
 //   should be passed here as `f32=1;f32=2`.
+// * |iterations| is the number of times to call the function, for benchmarking
 const char* call_function(iree_program_state_t* program_state,
-                          const char* function_name, const char* inputs);
+                          const char* function_name, const char* inputs,
+                          int iterations);
 
 //===----------------------------------------------------------------------===//
 // Implementation
@@ -69,46 +74,6 @@
 extern iree_status_t create_device_with_wasm_loader(
     iree_allocator_t host_allocator, iree_hal_device_t** out_device);
 
-static void inspect_module(iree_vm_module_t* module) {
-  fprintf(stdout, "=== module properties ===\n");
-
-  iree_string_view_t module_name = iree_vm_module_name(module);
-  fprintf(stdout, "  module name: '%.*s'\n", (int)module_name.size,
-          module_name.data);
-
-  iree_vm_module_signature_t module_signature =
-      iree_vm_module_signature(module);
-  fprintf(stdout, "  module signature:\n");
-  fprintf(stdout, "    %" PRIhsz " imported functions\n",
-          module_signature.import_function_count);
-  fprintf(stdout, "    %" PRIhsz " exported functions\n",
-          module_signature.export_function_count);
-  fprintf(stdout, "    %" PRIhsz " internal functions\n",
-          module_signature.internal_function_count);
-
-  fprintf(stdout, "  exported functions:\n");
-  for (iree_host_size_t i = 0; i < module_signature.export_function_count;
-       ++i) {
-    iree_vm_function_t function;
-    iree_status_t status = iree_vm_module_lookup_function_by_ordinal(
-        module, IREE_VM_FUNCTION_LINKAGE_EXPORT, i, &function);
-    if (!iree_status_is_ok(status)) {
-      iree_status_fprint(stderr, status);
-      iree_status_free(status);
-      continue;
-    }
-
-    iree_string_view_t function_name = iree_vm_function_name(&function);
-    iree_vm_function_signature_t function_signature =
-        iree_vm_function_signature(&function);
-    iree_string_view_t calling_convention =
-        function_signature.calling_convention;
-    fprintf(stdout, "    function name: '%.*s', calling convention: %.*s'\n",
-            (int)function_name.size, function_name.data,
-            (int)calling_convention.size, calling_convention.data);
-  }
-}
-
 iree_sample_state_t* setup_sample() {
   iree_sample_state_t* sample_state = NULL;
   iree_status_t status =
@@ -148,8 +113,6 @@
 
 iree_program_state_t* load_program(iree_sample_state_t* sample_state,
                                    uint8_t* vmfb_data, size_t length) {
-  fprintf(stdout, "load_program() received %zu bytes of data\n", length);
-
   iree_program_state_t* program_state = NULL;
   iree_status_t status = iree_allocator_malloc(iree_allocator_system(),
                                                sizeof(iree_program_state_t),
@@ -177,7 +140,6 @@
   }
 
   if (iree_status_is_ok(status)) {
-    inspect_module(program_state->module);
     status = iree_runtime_session_append_module(program_state->session,
                                                 program_state->module);
   }
@@ -192,6 +154,47 @@
   return program_state;
 }
 
+void inspect_program(iree_program_state_t* program_state) {
+  fprintf(stdout, "=== program properties ===\n");
+
+  iree_vm_module_t* module = program_state->module;
+  iree_string_view_t module_name = iree_vm_module_name(module);
+  fprintf(stdout, "  module name: '%.*s'\n", (int)module_name.size,
+          module_name.data);
+
+  iree_vm_module_signature_t module_signature =
+      iree_vm_module_signature(module);
+  fprintf(stdout, "  module signature:\n");
+  fprintf(stdout, "    %" PRIhsz " imported functions\n",
+          module_signature.import_function_count);
+  fprintf(stdout, "    %" PRIhsz " exported functions\n",
+          module_signature.export_function_count);
+  fprintf(stdout, "    %" PRIhsz " internal functions\n",
+          module_signature.internal_function_count);
+
+  fprintf(stdout, "  exported functions:\n");
+  for (iree_host_size_t i = 0; i < module_signature.export_function_count;
+       ++i) {
+    iree_vm_function_t function;
+    iree_status_t status = iree_vm_module_lookup_function_by_ordinal(
+        module, IREE_VM_FUNCTION_LINKAGE_EXPORT, i, &function);
+    if (!iree_status_is_ok(status)) {
+      iree_status_fprint(stderr, status);
+      iree_status_free(status);
+      continue;
+    }
+
+    iree_string_view_t function_name = iree_vm_function_name(&function);
+    iree_vm_function_signature_t function_signature =
+        iree_vm_function_signature(&function);
+    iree_string_view_t calling_convention =
+        function_signature.calling_convention;
+    fprintf(stdout, "    function name: '%.*s', calling convention: %.*s'\n",
+            (int)function_name.size, function_name.data,
+            (int)calling_convention.size, calling_convention.data);
+  }
+}
+
 void unload_program(iree_program_state_t* program_state) {
   iree_vm_module_release(program_state->module);
   iree_runtime_session_release(program_state->session);
@@ -363,7 +366,8 @@
 }
 
 const char* call_function(iree_program_state_t* program_state,
-                          const char* function_name, const char* inputs) {
+                          const char* function_name, const char* inputs,
+                          int iterations) {
   iree_status_t status = iree_ok_status();
 
   // Fully qualify the function name. This sample only supports loading one
@@ -394,20 +398,34 @@
   // side-channel security threats.
   // https://developer.mozilla.org/en-US/docs/Web/API/Performance/now#reduced_time_precision
   iree_time_t start_time = iree_time_now();
-  if (iree_status_is_ok(status)) {
-    status = iree_runtime_call_invoke(&call, /*flags=*/0);
+  for (int i = 0; i < iterations; ++i) {
+    if (iree_status_is_ok(status)) {
+      status = iree_runtime_call_invoke(&call, /*flags=*/0);
+    }
   }
   iree_time_t end_time = iree_time_now();
   iree_time_t time_elapsed = end_time - start_time;
-  fprintf(stdout,
-          "(Approximate) time for calling '%s': %" PRId64 " nanoseconds\n",
-          function_name, time_elapsed);
 
   iree_string_builder_t outputs_builder;
   iree_string_builder_initialize(iree_allocator_system(), &outputs_builder);
+
+  // Output a JSON object as a string:
+  // {
+  //   "total_invoke_time_ms": [number],
+  //   "outputs": [semicolon delimited list of formatted outputs]
+  // }
+  if (iree_status_is_ok(status)) {
+    status = iree_string_builder_append_format(
+        &outputs_builder,
+        "{ \"total_invoke_time_ms\": %" PRId64 ", \"outputs\": \"",
+        time_elapsed / 1000000);
+  }
   if (iree_status_is_ok(status)) {
     status = print_outputs_from_call(&call, &outputs_builder);
   }
+  if (iree_status_is_ok(status)) {
+    status = iree_string_builder_append_cstring(&outputs_builder, "\"}");
+  }
 
   if (!iree_status_is_ok(status)) {
     iree_string_builder_deinitialize(&outputs_builder);
diff --git a/experimental/web/sample_dynamic/serve_sample.sh b/experimental/web/sample_dynamic/serve_sample.sh
index d83f873..e1b1a7e 100755
--- a/experimental/web/sample_dynamic/serve_sample.sh
+++ b/experimental/web/sample_dynamic/serve_sample.sh
@@ -10,5 +10,6 @@
 BINARY_DIR=${BUILD_DIR}/experimental/web/sample_dynamic
 
 echo "=== Running local webserver, open at http://localhost:8000/ ==="
+echo "    For benchmarks, open http://localhost:8000/benchmarks.html"
 
 python3 ${ROOT_DIR?}/build_tools/scripts/local_web_server.py --directory ${BINARY_DIR}
diff --git a/experimental/web/sample_static/build_sample.sh b/experimental/web/sample_static/build_sample.sh
index e6b4c0f..c708d1b 100755
--- a/experimental/web/sample_static/build_sample.sh
+++ b/experimental/web/sample_static/build_sample.sh
@@ -93,6 +93,7 @@
 echo "=== Copying static files to the build directory ==="
 
 cp ${SOURCE_DIR}/index.html ${BINARY_DIR}
+cp ${ROOT_DIR?}/docs/website/overrides/ghost.svg ${BINARY_DIR}
 cp ${SOURCE_DIR}/iree_api.js ${BINARY_DIR}
 cp ${SOURCE_DIR}/iree_worker.js ${BINARY_DIR}
 
diff --git a/experimental/web/sample_static/index.html b/experimental/web/sample_static/index.html
index 34d3992..dbf2afa 100644
--- a/experimental/web/sample_static/index.html
+++ b/experimental/web/sample_static/index.html
@@ -13,6 +13,7 @@
   <meta charset="utf-8" />
   <title>IREE Static Web Sample</title>
   <meta name="viewport" content="width=device-width, initial-scale=1">
+  <link rel="icon" href="/ghost.svg" type="image/svg+xml">
 
   <script src="./easeljs.min.js"></script>
   <script src="./iree_api.js"></script>