integrations/tensorflow/e2e/README.md - 3p/openxla/iree - Git at Google

 # TensorFlow e2e tests

 <!-- TODO(meadowlark): Update this doc once the API is stable. -->

 > Note:<br>
 > &nbsp;&nbsp;&nbsp;&nbsp;The TensorFlow integrations are currently being
   refactored. The `bazel` build is deprecated. Refer to
   https://google.github.io/iree/building-from-source/python-bindings-and-importers/ for a general
   overview of how to build and execute the e2e tests.

 This is a collection of e2e tests that compile a TensorFlow model with IREE (and
 potentially TFLite), run it on multiple backends, and crosscheck the results.

 ## Pre-Requisites

 You will need a TensorFlow 2.0+ nightly installed in your python environment:
 the python binary in `$PYTHON_BIN` should be able to `import tensorflow` and
 that TensorFlow should be version 2.0+. This can be checked with
 `tensorflow.version`.

 See [Install TensorFlow with pip](https://www.tensorflow.org/install/pip) for
 instructions.

 ## Vulkan Setup

 If you do not have your environment setup to use IREE with Vulkan (see
 [this doc](https://google.github.io/iree/deployment-configurations/gpu-vulkan/)),
 then you can run the manual test targets with
 `--target_backends=tf,iree_vmvx,iree_llvmaot` (that is, by omitting
 `iree_vulkan` from the list of backends to run the tests on).

 The test suites can be run excluding Vulkan by specifying
 `--test_tag_filters="-driver=vulkan"` in the `bazel test` invocation, or by
 adding `test --test_tag_filters="-driver=vulkan"` to your `user.bazelrc`.

 ## Compiling `tf.Module`s

 Compatible TensorFlow modules can be compiled to specific IREE backends using
 `IreeCompiledModule`. This also optionally saves compilation artifacts to a
 specified directory. These artifacts include MLIR across various lowerings and
 the compiled VM FlatBuffer. A basic example of creating and calling an
 `IreeCompiledModule` can be found in
 [`module_utils_test.py`](https://github.com/google/iree/blob/main/integrations/tensorflow/bindings/python/iree/tf/support/module_utils_test.py)

 When using Keras models or tf.Modules with functions that IREE can't compile,
 `exported_names` should be specified. For example:

 ```python
 from iree.tf.support import module_utils
 vmvx_module = module_utils.IreeCompiledModule(
     module_class=KerasTFModuleClass,
     backend_info=module_utils.BackendInfo('iree_vmvx'),
     exported_names=['predict'])
 vmvx_module.predict(...)
 ```

 ## Running Tests

 For locally running tests and iterating on backend development, `bazel run` is
 preferred.

 ```shell
 # Run conv_test on all backends.
 bazel run //integrations/tensorflow/e2e:conv_test_manual

 # Run conv_test comparing TensorFlow to itself (e.g. to debug randomization).
 bazel run //integrations/tensorflow/e2e:conv_test_manual -- --target_backends=tf

 # Run conv_test comparing the VMLA backend and TensorFlow.
 bazel run //integrations/tensorflow/e2e:conv_test_manual -- --target_backends=iree_vmvx

 # Run conv_test comparing the VMLA backend to itself multiple times.
 bazel run //integrations/tensorflow/e2e:conv_test_manual -- \
   --reference_backend=iree_vmvx --target_backends=iree_vmvx,iree_vmvx
 ```

 For reproducibility of the unit tests `CompiledModule()` sets the random seeds
 of `tf`, `numpy` and `python` by calling `tf_utils.set_random_seed()` before
 model creation.

 ## Writing Tests

 There are two ways to write tests – via `tf_test_utils.tf_function_unit_test` and
 via test methods on a child of `tf_test_utils.TracedModuleTestCase`.

 ### Via `tf_test_utils.tf_function_unit_test`

 This is preferred in the cases where

 1. Only a single call to the module needs to be tested at once
 2. The inputs are simple to automatically generate or specify inline.
 3. The functions that you want to test are generated automatically from a
    configuration (e.g. in `.../e2e/keras/layers/layers_test.py`)

 Tests are specified by writing modules that inherit from
 `tf_test_utils.TestModule` (which is a thin wrapper around `tf.Module`) with
 methods decorated with `@tf_test_utils.tf_function_unit_test` (with is a thin
 wrapper around `tf.function`).

 #### Basic example

 We use part of `.../e2e/conv_test.py` as an example. The first component is
 the `TestModule` itself:

 ```python
 class Conv2dModule(tf_test_utils.TestModule):

   # This decorator tells the testing infra to generate a unittest for this
   # function. The 'input_signature' is required. If no other arguments are
   # specified then uniform random data is generated from the input signature
   # to numerically test the function.
   @tf_test_utils.tf_function_unit_test(input_signature=[
       tf.TensorSpec([1, 4, 5, 1], tf.float32),
       tf.TensorSpec([1, 1, 1, 1], tf.float32),
   ])
   def conv2d_1451x1111_valid(self, img, kernel):
     return tf.nn.conv2d(img, kernel, [1, 1, 1, 1], "VALID", name="result")

   @tf_test_utils.tf_function_unit_test(input_signature=[
       tf.TensorSpec([2, 4, 5, 1], tf.float32),
       tf.TensorSpec([1, 1, 1, 1], tf.float32),
   ])
   def conv2d_2451x1111_valid(self, img, kernel):
     return tf.nn.conv2d(img, kernel, [1, 1, 1, 1], "VALID", name="result")
 ```

 Second, you need to write a test case that inherits from
 `tf_test_utils.TracedModuleTestCase`. This is essentially boiler plate that
 tells `tf.test.main()` what `tf.Module` to test and allows us to generate
 the unittests we specified above.

 ```python
 class ConvTest(tf_test_utils.TracedModuleTestCase):

   def __init__(self, *args, **kwargs):
     super().__init__(*args, **kwargs)
     self._modules = tf_test_utils.compile_tf_module(Conv2dModule)
 ```

 Finally, in the `main` function, you need to call
 `.generate_unit_tests(module_class)` on your `TestCase` to actually generate
 the unittests that we specified:

 ```python
 def main(argv):
   del argv  # Unused
   if hasattr(tf, 'enable_v2_behavior'):
     tf.enable_v2_behavior()
   # Generates unittests for all @tf_test_utils.tf_function_unit_test decorated
   # functions on the module class.
   # Note: if you are automatically generating functions to test they need to be
   # specified via a `classmethod` prior to this call _as well_ as via `__init__`
   # to properly handle stateful `tf.function`s.
   ConvTest.generate_unit_tests(Conv2dModule)
   tf.test.main()


 if __name__ == '__main__':
   app.run(main)
 ```

 This generates two unittests: `test_conv2d_1451x1111_valid` and
 `test_conv2d_2451x1111_valid`.

 #### Configuring `@tf_test_utils.tf_function_unit_test`

 By default `@tf_test_utils.tf_function_unit_test` uses uniform random input data
 to numerically test the function, but you can specify an `input_generator` or
 `input_args` to test data-specific behaviors:

 - `input_generator` can be `tf_utils.uniform`, `tf_utils.ndarange`, or any
 function which takes an `shape` and `dtype` as positional args and returns an
 `np.ndarray`.
 - `input_args` is a list of `np.ndarray`s to use as positional arguments.

 The comparison `atol` and `rtol` can also be specified in the decorator.

 ### Via test methods

 This is preferred in the cases where

 1. The `tf.function` that you want to test is already defined on the module
    (e.g. on a downloaded model like in `mobile_bert_test.py`)
 2. The inputs are difficult to specify inline and require multiple function
    calls / reshaping to create
 3. You want to test multiple consecutive calls to a `tf.function` (e.g. to test
    mutated state in `ring_buffer_test.py`)

 Our tests use a class `TracedModule` to capture and store all of the inputs and
 outputs of a `CompiledModule` in a `Trace`. Each unittest on a `TestCase` uses
 the `compare_backends` method. This method runs the function it is passed with a
 `TracedModule` once for each reference and target backend. The inputs and
 outputs to these modules are then checked for correctness, using the reference
 backend as a source of truth.

 We use `simple_arithmetic_test.py` as an example:

 ```python
 # Create a tf.Module with one or more `@tf.function` decorated methods to test.
 class SimpleArithmeticModule(tf.Module):

   @tf.function(input_signature=[
       tf.TensorSpec([4], tf.float32),
       tf.TensorSpec([4], tf.float32)
   ])
   def simple_mul(self, a, b):
     return a * b

 # Inherit from `TracedModuleTestCase`.
 class SimpleArithmeticTest(tf_test_utils.TracedModuleTestCase):

   def __init__(self, *args, **kwargs):
     super().__init__(*args, **kwargs)
     # Compile a `tf.Module` named `SimpleArithmeticModule` into
     # `CompiledModule`s for each reference and target backend.
     self._modules = tf_test_utils.compile_tf_module(SimpleArithmeticModule)

   # Unit test.
   def test_simple_mul(self):

     # Trace function.
     def simple_mul(module):
       # A random seed is automatically set before each call to `simple_mul`.
       a = tf_utils.uniform([4])
       b = np.array([400., 5., 6., 7.], dtype=np.float32)

       # The inputs `a` and `b` are recorded along with the output `c`
       c = module.simple_mul(a, b)

       # The inputs `a` and `b` are recorded along with the (unnamed) output
       # module.simple_mul returns.
       module.simple_mul(a, c)

     # Calls `simple_mul` once for each backend, recording the inputs and outputs
     # to `module` and then comparing them.
     self.compare_backends(simple_mul, self._modules)
 ```

 ## Test Suites

 Test targets are automatically generated for each test file and for each backend
 to check numerical correctness against TensorFlow. Tests targets that pass are
 placed into the `e2e_tests` test suite. Tests that fail on particular backends
 are recorded in lists in the `BUILD` files. For example, if
 `experimental_new_test.py` fails on the `iree_llvmaot` and `iree_vulkan`
 backend then the following lines should be added to the `BUILD` file:

 ```build
 LLVM_FAILING = [
     ...
     "experimental_new_test.py",
     ...
 ]

 VULKAN_FAILING = [
     ...
     "experimental_new_test.py",
     ...
 ]
 ```

 Test targets for these backends are placed into the `e2e_tests_failing` test
 suite. Test targets in these test suites can be run as follows:

 ```shell
 # Run all e2e tests that are expected to pass.
 bazel test //integrations/tensorflow/e2e:e2e_tests

 # Run all e2e tests that are expected to fail.
 bazel test //integrations/tensorflow/e2e:e2e_tests_failing

 # Run a specific failing e2e test target.
 # Note that generated test targets are prefixed with their test suite name.
 # Also, if broadcasting_test starts working on iree_vulkan after the time
 # of writing then this command will fail.
 bazel test //integrations/tensorflow/e2e:e2e_tests_failing_broadcasting_test__tf__iree_vulkan
 ```

 ## Generated Artifacts

 By default, running an E2E test generates a number of compilation, debugging and
 benchmarking artifacts. These artifacts will be saved

 - in `/tmp/iree/modules/` when using `bazel run` or `bazel_test` with
   `--test_arg=--artifacts_dir=/tmp/iree/modules/`.
 - in `bazel-testlogs/integrations/tensorflow/e2e/test_suite_target_name` when
   using `bazel test` without specifying `--artifacts_dir`.

 The generated directory structure for each module is as follows:

 ```shell
 /tmp/iree/modules/ModuleName
   ├── reproducer__backend.mlir
   │   # If there is a compilation error, a MLIR file that reproduces the error
   │   # for a specific backend is included.
   ├── tf_input.mlir
   │   # MLIR for ModuleName in TF's input dialect.
   ├── iree_input.mlir
   │   # tf_input.mlir translated to IREE MLIR.
   ├── iree_vmvx
   │   # Or any other IREE backend.
   │   ├── compiled.vmfb
   │   │   # A flatbuffer containing IREE's compiled code.
   │   └── traces
   │       # Directory with a trace for each unittest in vision_model_test.py.
   │       ├── trace_function_1
   │       │   # Directory storing logs and serialization for a specific trace.
   │       │   │── flagfile
   │       │   │   # An Abseil flagfile containing arguments
   │       │   │   # iree-benchmark-module needs to benchmark this trace.
   │       │   └── log.txt
   │       │       # A more detailed version of the test logs.
   │       │── trace_function_2
   │       └── ...
   ├── tflite  # If TFLite supports compiling ModuleName.
   │   ├── method_1.tflite  # Methods on ModuleName compiled to bytes with TFLite
   │   │   # A method on ModuleName compiled to bytes with TFLite, which can
   │   │   # be ingested by TFLite's benchmark_model binary.
   │   ├── method_2.tflite
   │   └── traces
   │       └── ...
   └── tf_ref  # Directory storing the tensorflow reference traces.
       └── traces
           └── ...
 ```

 Traces for a particular test can be loaded via the `Trace.load(trace_dir)`
 method. For example:

 ```python
 ref_trace = Trace.load("/tmp/iree/modules/ModuleName/tf_ref/traces/predict/")
 tar_trace = Trace.load("/tmp/iree/modules/ModuleName/iree_vmvx/traces/predict/")
 abs_diff = np.abs(ref_trace.calls[0].outputs[0] - tar_trace.calls[0].outputs[0])
 print(np.mean(abs_diff))
 ```

 Traces are named after the trace functions defined in their unittests. So in the
 `SimpleArithmeticModule` example above, the `trace_dir` would be
 `/tmp/iree/modules/SimpleArithmeticModule/iree_vmvx/traces/simple_mul/`.

 ## Benchmarking E2E Modules

 We use our end-to-end TensorFlow integrations tests to generate tested
 compilation and benchmarking artifacts. This allows us to validate that our
 benchmarks are behaving as we expect them to, and to run them using valid inputs
 for each model. An overview of how to run benchmarks on IREE and TFLite can be
 found in
 [this doc](https://github.com/google/iree/blob/main/docs/developers/developing_iree/e2e_benchmarking.md).

 ## Debugging Tests

 If the compiler fails to compile the program, then it will create a crash
 reproducer (see
 [MLIR documentation](https://mlir.llvm.org/docs/PassManagement/#crash-and-failure-reproduction)),
 which then allows reproducing the bug with an appropriate "opt" tool. Further
 debugging iteration can happen in opt.

 TODO(silvasean): debugging miscompiles

 ## Testing SignatureDef SavedModels

 TensorFlow 1.x SavedModels can be tested using
 `tf_test_utils.compile_tf_signature_def_saved_model` instead of
 `tf_test_utils.compile_tf_module`. See `mobile_bert_squad_test.py` for a
 concrete example. The compilation artifacts will be saved under whatever
 you specify for `module_name`.
	# TensorFlow e2e tests

	<!-- TODO(meadowlark): Update this doc once the API is stable. -->

	> Note:<br>
	>     The TensorFlow integrations are currently being
	refactored. The `bazel` build is deprecated. Refer to
	https://google.github.io/iree/building-from-source/python-bindings-and-importers/ for a general
	overview of how to build and execute the e2e tests.

	This is a collection of e2e tests that compile a TensorFlow model with IREE (and
	potentially TFLite), run it on multiple backends, and crosscheck the results.

	## Pre-Requisites

	You will need a TensorFlow 2.0+ nightly installed in your python environment:
	the python binary in `$PYTHON_BIN` should be able to `import tensorflow` and
	that TensorFlow should be version 2.0+. This can be checked with
	`tensorflow.version`.

	See [Install TensorFlow with pip](https://www.tensorflow.org/install/pip) for
	instructions.

	## Vulkan Setup

	If you do not have your environment setup to use IREE with Vulkan (see
	[this doc](https://google.github.io/iree/deployment-configurations/gpu-vulkan/)),
	then you can run the manual test targets with
	`--target_backends=tf,iree_vmvx,iree_llvmaot` (that is, by omitting
	`iree_vulkan` from the list of backends to run the tests on).

	The test suites can be run excluding Vulkan by specifying
	`--test_tag_filters="-driver=vulkan"` in the `bazel test` invocation, or by
	adding `test --test_tag_filters="-driver=vulkan"` to your `user.bazelrc`.

	## Compiling `tf.Module`s

	Compatible TensorFlow modules can be compiled to specific IREE backends using
	`IreeCompiledModule`. This also optionally saves compilation artifacts to a
	specified directory. These artifacts include MLIR across various lowerings and
	the compiled VM FlatBuffer. A basic example of creating and calling an
	`IreeCompiledModule` can be found in
	[`module_utils_test.py`](https://github.com/google/iree/blob/main/integrations/tensorflow/bindings/python/iree/tf/support/module_utils_test.py)

	When using Keras models or tf.Modules with functions that IREE can't compile,
	`exported_names` should be specified. For example:

	```python
	from iree.tf.support import module_utils
	vmvx_module = module_utils.IreeCompiledModule(
	module_class=KerasTFModuleClass,
	backend_info=module_utils.BackendInfo('iree_vmvx'),
	exported_names=['predict'])
	vmvx_module.predict(...)
	```

	## Running Tests

	For locally running tests and iterating on backend development, `bazel run` is
	preferred.

	```shell
	# Run conv_test on all backends.
	bazel run //integrations/tensorflow/e2e:conv_test_manual

	# Run conv_test comparing TensorFlow to itself (e.g. to debug randomization).
	bazel run //integrations/tensorflow/e2e:conv_test_manual -- --target_backends=tf

	# Run conv_test comparing the VMLA backend and TensorFlow.
	bazel run //integrations/tensorflow/e2e:conv_test_manual -- --target_backends=iree_vmvx

	# Run conv_test comparing the VMLA backend to itself multiple times.
	bazel run //integrations/tensorflow/e2e:conv_test_manual -- \
	--reference_backend=iree_vmvx --target_backends=iree_vmvx,iree_vmvx
	```

	For reproducibility of the unit tests `CompiledModule()` sets the random seeds
	of `tf`, `numpy` and `python` by calling `tf_utils.set_random_seed()` before
	model creation.

	## Writing Tests

	There are two ways to write tests – via `tf_test_utils.tf_function_unit_test` and
	via test methods on a child of `tf_test_utils.TracedModuleTestCase`.

	### Via `tf_test_utils.tf_function_unit_test`

	This is preferred in the cases where

	1. Only a single call to the module needs to be tested at once
	2. The inputs are simple to automatically generate or specify inline.
	3. The functions that you want to test are generated automatically from a
	configuration (e.g. in `.../e2e/keras/layers/layers_test.py`)

	Tests are specified by writing modules that inherit from
	`tf_test_utils.TestModule` (which is a thin wrapper around `tf.Module`) with
	methods decorated with `@tf_test_utils.tf_function_unit_test` (with is a thin
	wrapper around `tf.function`).

	#### Basic example

	We use part of `.../e2e/conv_test.py` as an example. The first component is
	the `TestModule` itself:

	```python
	class Conv2dModule(tf_test_utils.TestModule):

	# This decorator tells the testing infra to generate a unittest for this
	# function. The 'input_signature' is required. If no other arguments are
	# specified then uniform random data is generated from the input signature
	# to numerically test the function.
	@tf_test_utils.tf_function_unit_test(input_signature=[
	tf.TensorSpec([1, 4, 5, 1], tf.float32),
	tf.TensorSpec([1, 1, 1, 1], tf.float32),
	])
	def conv2d_1451x1111_valid(self, img, kernel):
	return tf.nn.conv2d(img, kernel, [1, 1, 1, 1], "VALID", name="result")

	@tf_test_utils.tf_function_unit_test(input_signature=[
	tf.TensorSpec([2, 4, 5, 1], tf.float32),
	tf.TensorSpec([1, 1, 1, 1], tf.float32),
	])
	def conv2d_2451x1111_valid(self, img, kernel):
	return tf.nn.conv2d(img, kernel, [1, 1, 1, 1], "VALID", name="result")
	```

	Second, you need to write a test case that inherits from
	`tf_test_utils.TracedModuleTestCase`. This is essentially boiler plate that
	tells `tf.test.main()` what `tf.Module` to test and allows us to generate
	the unittests we specified above.

	```python
	class ConvTest(tf_test_utils.TracedModuleTestCase):

	def __init__(self, args, *kwargs):
	super().__init__(args, *kwargs)
	self._modules = tf_test_utils.compile_tf_module(Conv2dModule)
	```

	Finally, in the `main` function, you need to call
	`.generate_unit_tests(module_class)` on your `TestCase` to actually generate
	the unittests that we specified:

	```python
	def main(argv):
	del argv # Unused
	if hasattr(tf, 'enable_v2_behavior'):
	tf.enable_v2_behavior()
	# Generates unittests for all @tf_test_utils.tf_function_unit_test decorated
	# functions on the module class.
	# Note: if you are automatically generating functions to test they need to be
	# specified via a `classmethod` prior to this call _as well_ as via `__init__`
	# to properly handle stateful `tf.function`s.
	ConvTest.generate_unit_tests(Conv2dModule)
	tf.test.main()


	if __name__ == '__main__':
	app.run(main)
	```

	This generates two unittests: `test_conv2d_1451x1111_valid` and
	`test_conv2d_2451x1111_valid`.

	#### Configuring `@tf_test_utils.tf_function_unit_test`

	By default `@tf_test_utils.tf_function_unit_test` uses uniform random input data
	to numerically test the function, but you can specify an `input_generator` or
	`input_args` to test data-specific behaviors:

	- `input_generator` can be `tf_utils.uniform`, `tf_utils.ndarange`, or any
	function which takes an `shape` and `dtype` as positional args and returns an
	`np.ndarray`.
	- `input_args` is a list of `np.ndarray`s to use as positional arguments.

	The comparison `atol` and `rtol` can also be specified in the decorator.

	### Via test methods

	This is preferred in the cases where

	1. The `tf.function` that you want to test is already defined on the module
	(e.g. on a downloaded model like in `mobile_bert_test.py`)
	2. The inputs are difficult to specify inline and require multiple function
	calls / reshaping to create
	3. You want to test multiple consecutive calls to a `tf.function` (e.g. to test
	mutated state in `ring_buffer_test.py`)

	Our tests use a class `TracedModule` to capture and store all of the inputs and
	outputs of a `CompiledModule` in a `Trace`. Each unittest on a `TestCase` uses
	the `compare_backends` method. This method runs the function it is passed with a
	`TracedModule` once for each reference and target backend. The inputs and
	outputs to these modules are then checked for correctness, using the reference
	backend as a source of truth.

	We use `simple_arithmetic_test.py` as an example:

	```python
	# Create a tf.Module with one or more `@tf.function` decorated methods to test.
	class SimpleArithmeticModule(tf.Module):

	@tf.function(input_signature=[
	tf.TensorSpec([4], tf.float32),
	tf.TensorSpec([4], tf.float32)
	])
	def simple_mul(self, a, b):
	return a * b

	# Inherit from `TracedModuleTestCase`.
	class SimpleArithmeticTest(tf_test_utils.TracedModuleTestCase):

	def __init__(self, args, *kwargs):
	super().__init__(args, *kwargs)
	# Compile a `tf.Module` named `SimpleArithmeticModule` into
	# `CompiledModule`s for each reference and target backend.
	self._modules = tf_test_utils.compile_tf_module(SimpleArithmeticModule)

	# Unit test.
	def test_simple_mul(self):

	# Trace function.
	def simple_mul(module):
	# A random seed is automatically set before each call to `simple_mul`.
	a = tf_utils.uniform([4])
	b = np.array([400., 5., 6., 7.], dtype=np.float32)

	# The inputs `a` and `b` are recorded along with the output `c`
	c = module.simple_mul(a, b)

	# The inputs `a` and `b` are recorded along with the (unnamed) output
	# module.simple_mul returns.
	module.simple_mul(a, c)

	# Calls `simple_mul` once for each backend, recording the inputs and outputs
	# to `module` and then comparing them.
	self.compare_backends(simple_mul, self._modules)
	```

	## Test Suites

	Test targets are automatically generated for each test file and for each backend
	to check numerical correctness against TensorFlow. Tests targets that pass are
	placed into the `e2e_tests` test suite. Tests that fail on particular backends
	are recorded in lists in the `BUILD` files. For example, if
	`experimental_new_test.py` fails on the `iree_llvmaot` and `iree_vulkan`
	backend then the following lines should be added to the `BUILD` file:

	```build
	LLVM_FAILING = [
	...
	"experimental_new_test.py",
	...
	]

	VULKAN_FAILING = [
	...
	"experimental_new_test.py",
	...
	]
	```

	Test targets for these backends are placed into the `e2e_tests_failing` test
	suite. Test targets in these test suites can be run as follows:

	```shell
	# Run all e2e tests that are expected to pass.
	bazel test //integrations/tensorflow/e2e:e2e_tests

	# Run all e2e tests that are expected to fail.
	bazel test //integrations/tensorflow/e2e:e2e_tests_failing

	# Run a specific failing e2e test target.
	# Note that generated test targets are prefixed with their test suite name.
	# Also, if broadcasting_test starts working on iree_vulkan after the time
	# of writing then this command will fail.
	bazel test //integrations/tensorflow/e2e:e2e_tests_failing_broadcasting_test__tf__iree_vulkan
	```

	## Generated Artifacts

	By default, running an E2E test generates a number of compilation, debugging and
	benchmarking artifacts. These artifacts will be saved

	- in `/tmp/iree/modules/` when using `bazel run` or `bazel_test` with
	`--test_arg=--artifacts_dir=/tmp/iree/modules/`.
	- in `bazel-testlogs/integrations/tensorflow/e2e/test_suite_target_name` when
	using `bazel test` without specifying `--artifacts_dir`.

	The generated directory structure for each module is as follows:

	```shell
	/tmp/iree/modules/ModuleName
	├── reproducer__backend.mlir
	│ # If there is a compilation error, a MLIR file that reproduces the error
	│ # for a specific backend is included.
	├── tf_input.mlir
	│ # MLIR for ModuleName in TF's input dialect.
	├── iree_input.mlir
	│ # tf_input.mlir translated to IREE MLIR.
	├── iree_vmvx
	│ # Or any other IREE backend.
	│ ├── compiled.vmfb
	│ │ # A flatbuffer containing IREE's compiled code.
	│ └── traces
	│ # Directory with a trace for each unittest in vision_model_test.py.
	│ ├── trace_function_1
	│ │ # Directory storing logs and serialization for a specific trace.
	│ │ │── flagfile
	│ │ │ # An Abseil flagfile containing arguments
	│ │ │ # iree-benchmark-module needs to benchmark this trace.
	│ │ └── log.txt
	│ │ # A more detailed version of the test logs.
	│ │── trace_function_2
	│ └── ...
	├── tflite # If TFLite supports compiling ModuleName.
	│ ├── method_1.tflite # Methods on ModuleName compiled to bytes with TFLite
	│ │ # A method on ModuleName compiled to bytes with TFLite, which can
	│ │ # be ingested by TFLite's benchmark_model binary.
	│ ├── method_2.tflite
	│ └── traces
	│ └── ...
	└── tf_ref # Directory storing the tensorflow reference traces.
	└── traces
	└── ...
	```

	Traces for a particular test can be loaded via the `Trace.load(trace_dir)`
	method. For example:

	```python
	ref_trace = Trace.load("/tmp/iree/modules/ModuleName/tf_ref/traces/predict/")
	tar_trace = Trace.load("/tmp/iree/modules/ModuleName/iree_vmvx/traces/predict/")
	abs_diff = np.abs(ref_trace.calls[0].outputs[0] - tar_trace.calls[0].outputs[0])
	print(np.mean(abs_diff))
	```

	Traces are named after the trace functions defined in their unittests. So in the
	`SimpleArithmeticModule` example above, the `trace_dir` would be
	`/tmp/iree/modules/SimpleArithmeticModule/iree_vmvx/traces/simple_mul/`.

	## Benchmarking E2E Modules

	We use our end-to-end TensorFlow integrations tests to generate tested
	compilation and benchmarking artifacts. This allows us to validate that our
	benchmarks are behaving as we expect them to, and to run them using valid inputs
	for each model. An overview of how to run benchmarks on IREE and TFLite can be
	found in
	[this doc](https://github.com/google/iree/blob/main/docs/developers/developing_iree/e2e_benchmarking.md).

	## Debugging Tests

	If the compiler fails to compile the program, then it will create a crash
	reproducer (see
	[MLIR documentation](https://mlir.llvm.org/docs/PassManagement/#crash-and-failure-reproduction)),
	which then allows reproducing the bug with an appropriate "opt" tool. Further
	debugging iteration can happen in opt.

	TODO(silvasean): debugging miscompiles

	## Testing SignatureDef SavedModels

	TensorFlow 1.x SavedModels can be tested using
	`tf_test_utils.compile_tf_signature_def_saved_model` instead of
	`tf_test_utils.compile_tf_module`. See `mobile_bert_squad_test.py` for a
	concrete example. The compilation artifacts will be saved under whatever
	you specify for `module_name`.