integrations/tensorflow/e2e/README.md - 3p/openxla/iree - Git at Google

 # TensorFlow e2e tests

 This is a collection of e2e tests that save a TensorFlow model, compile it with
 IREE, run it on multiple backends and crosscheck the results.

 ## Pre-Requisites

 You will need a TensorFlow 2.0+ nightly installed in your python environment:
 the python binary in `$PYTHON_BIN` should be able to `import tensorflow` and
 that TensorFlow should be version 2.0+. This can be checked with
 `tensorflow.version`.

 See [Install TensorFlow with pip](https://www.tensorflow.org/install/pip) for
 instructions.

 ## Vulkan setup

 If you do not have your environment setup to use IREE with Vulkan (see
 [the doc](../../../docs/vulkan_and_spirv.md)), then you can run the manual test
 targets with `--target_backends=tf,iree_vmla,iree_llvmjit` (that is, by omitting
 `iree_vulkan` from the list of backends to run the tests on).

 The test suites can be run excluding Vulkan by specifying
 `--test_tag_filters="-driver=vulkan"` in the `bazel test` invocation, or by
 adding `test --test_tag_filters="-driver=vulkan"` to your `user.bazelrc`.

 ## Compiling `tf.Module`s

 Compatible TensorFlow modules can be compiled to specific IREE backends using
 `IreeCompiledModule`. This also optionally saves compilation artifacts to a
 specified directory. These artifacts include: MLIR across various lowerings, a
 TensorFlow SavedModel, and the compiled VM FlatBuffer. A basic example of
 creating and calling an `IreeCompiledModule` can be found in
 [`tf_utils_test.py`](https://github.com/google/iree/blob/main/integrations/tensorflow/bindings/python/pyiree/tf/support/tf_utils_test.py)

 When using Keras models or tf.Modules with functions that IREE can't compile,
 `exported_names` should be specified. For example:

 ```python
 from pyiree.tf.support import tf_utils
 vmla_module = tf_utils.IreeCompiledModule(
     module_class=KerasTFModuleClass,
     backend_info=tf_utils.BackendInfo('iree_vmla'),
     exported_names=['predict'])
 vmla_module.predict(...)
 ```

 By default the TensorFlow SavedModels will not be kept. This can be overridden
 via the `--keep_saved_model` flag.

 ## Running tests

 For locally running tests and iterating on backend development, `bazel run` is
 preferred.

 ```shell
 # Run math_test on all backends.
 bazel run :math_test_manual

 # Run math_test comparing TensorFlow to itself (e.g. to debug randomization).
 bazel run :math_test_manual -- target_backends=tf

 # Run math_test comparing the VMLA backend and TensorFlow.
 bazel run :math_test_manual -- --target_backends=iree_vmla

 # Run math_test comparing the VMLA backend to itself multiple times.
 bazel run :math_test_manual -- \
   --reference_backend=iree_vmla --target_backends=iree_vmla,iree_vmla

 # Run math_test and output on failure.
 bazel test :math_test_manual --test_output=errors

 # Run an individual test interactively.
 bazel run :math_test_manual -- --test_output=streamed
 ```

 For reproducibility of the unit tests `CompiledModule()` sets the random seeds
 of `tf`, `numpy` and `python` by calling `tf_utils.set_random_seed()` before
 model creation.

 ## Writing Tests

 Our tests use a class `TracedModule` to capture and store all of the inputs and
 outputs of a `CompiledModule` in a `Trace`. Each unittest on a `TestCase` uses
 the `compare_backends` method. This method runs the function it is passed with a
 `TracedModule` once for each reference and target backend. The inputs and
 outputs to these modules are then checked for correctness, using the reference
 backend as a source of truth. For example:

 ```python
 # Compile a `tf.Module` named `SimpleArithmeticModule` into a `CompiledModule`.
 @tf_test_utils.compile_module(SimpleArithmeticModule)
 # Inherit from `TracedModuleTestCase`.
 class SimpleArithmeticTest(tf_test_utils.TracedModuleTestCase):

   # Unit test.
   def test_simple_mul(self):

     # Trace function.
     def simple_mul(module):
       # A random seed is automatically set before each call to `simple_mul`.
       a = tf_utils.uniform([4])
       b = np.array([400., 5., 6., 7.], dtype=np.float32)
       # The inputs `a` and `b` are recorded along with the output `c`
       c = module.simple_mul(a, b)
       # The inputs `a` and `b` are recorded along with the (unnamed) output
       # module.simple_mul returns.
       module.simple_mul(a, c)

     # Calls `simple_mul` once for each backend, recording the inputs and outputs
     # to `module` and then comparing them.
     self.compare_backends(simple_mul)
 ```

 ## Test Suites

 Test targets are automatically generated for each test file and for each backend
 to check numerical correctness against TensorFlow. Tests targets that pass are
 placed into the `e2e_tests` test suite. Tests that fail on particular backends
 are recorded in lists in the `BUILD` files. For example, if
 `experimental_new_test.py` fails on the `iree_llvmjit` and `iree_vulkan`
 backends then the following lines should be added to the `BUILD` file:

 ```build
 LLVM_FAILING = [
     ...
     "experimental_new_test.py",
     ...
 ]

 VULKAN_FAILING = [
     ...
     "experimental_new_test.py",
     ...
 ]
 ```

 Test targets for these backends are placed into the `e2e_tests_failing` test
 suite. Test targets in these test suites can be run as follows:

 ```shell
 # Run all e2e tests that are expected to pass.
 bazel test :e2e_tests

 # Run all e2e tests that are expected to fail.
 bazel test :e2e_tests_failing

 # Run a specific failing e2e test target.
 # Note that generated test targets are prefixed with their test suite name.
 bazel test :e2e_tests_failing_broadcasting_test__tf__iree_vulkan
 ```

 ## Debugging tests

 If the compiler fails to compile the program, then it will create a crash
 reproducer (see [MLIR documentation](https://mlir.llvm.org/docs/WritingAPass/)),
 which then allows reproducing the bug with an appropriate "opt" tool. Further
 debugging iteration can happen in opt.

 TODO(silvasean): debugging miscompiles

 ## Test harnesses

 ### Simple function tests

 See `simple_arithmetic_test.py` for some basic examples.
	# TensorFlow e2e tests

	This is a collection of e2e tests that save a TensorFlow model, compile it with
	IREE, run it on multiple backends and crosscheck the results.

	## Pre-Requisites

	You will need a TensorFlow 2.0+ nightly installed in your python environment:
	the python binary in `$PYTHON_BIN` should be able to `import tensorflow` and
	that TensorFlow should be version 2.0+. This can be checked with
	`tensorflow.version`.

	See [Install TensorFlow with pip](https://www.tensorflow.org/install/pip) for
	instructions.

	## Vulkan setup

	If you do not have your environment setup to use IREE with Vulkan (see
	[the doc](../../../docs/vulkan_and_spirv.md)), then you can run the manual test
	targets with `--target_backends=tf,iree_vmla,iree_llvmjit` (that is, by omitting
	`iree_vulkan` from the list of backends to run the tests on).

	The test suites can be run excluding Vulkan by specifying
	`--test_tag_filters="-driver=vulkan"` in the `bazel test` invocation, or by
	adding `test --test_tag_filters="-driver=vulkan"` to your `user.bazelrc`.

	## Compiling `tf.Module`s

	Compatible TensorFlow modules can be compiled to specific IREE backends using
	`IreeCompiledModule`. This also optionally saves compilation artifacts to a
	specified directory. These artifacts include: MLIR across various lowerings, a
	TensorFlow SavedModel, and the compiled VM FlatBuffer. A basic example of
	creating and calling an `IreeCompiledModule` can be found in
	[`tf_utils_test.py`](https://github.com/google/iree/blob/main/integrations/tensorflow/bindings/python/pyiree/tf/support/tf_utils_test.py)

	When using Keras models or tf.Modules with functions that IREE can't compile,
	`exported_names` should be specified. For example:

	```python
	from pyiree.tf.support import tf_utils
	vmla_module = tf_utils.IreeCompiledModule(
	module_class=KerasTFModuleClass,
	backend_info=tf_utils.BackendInfo('iree_vmla'),
	exported_names=['predict'])
	vmla_module.predict(...)
	```

	By default the TensorFlow SavedModels will not be kept. This can be overridden
	via the `--keep_saved_model` flag.

	## Running tests

	For locally running tests and iterating on backend development, `bazel run` is
	preferred.

	```shell
	# Run math_test on all backends.
	bazel run :math_test_manual

	# Run math_test comparing TensorFlow to itself (e.g. to debug randomization).
	bazel run :math_test_manual -- target_backends=tf

	# Run math_test comparing the VMLA backend and TensorFlow.
	bazel run :math_test_manual -- --target_backends=iree_vmla

	# Run math_test comparing the VMLA backend to itself multiple times.
	bazel run :math_test_manual -- \
	--reference_backend=iree_vmla --target_backends=iree_vmla,iree_vmla

	# Run math_test and output on failure.
	bazel test :math_test_manual --test_output=errors

	# Run an individual test interactively.
	bazel run :math_test_manual -- --test_output=streamed
	```

	For reproducibility of the unit tests `CompiledModule()` sets the random seeds
	of `tf`, `numpy` and `python` by calling `tf_utils.set_random_seed()` before
	model creation.

	## Writing Tests

	Our tests use a class `TracedModule` to capture and store all of the inputs and
	outputs of a `CompiledModule` in a `Trace`. Each unittest on a `TestCase` uses
	the `compare_backends` method. This method runs the function it is passed with a
	`TracedModule` once for each reference and target backend. The inputs and
	outputs to these modules are then checked for correctness, using the reference
	backend as a source of truth. For example:

	```python
	# Compile a `tf.Module` named `SimpleArithmeticModule` into a `CompiledModule`.
	@tf_test_utils.compile_module(SimpleArithmeticModule)
	# Inherit from `TracedModuleTestCase`.
	class SimpleArithmeticTest(tf_test_utils.TracedModuleTestCase):

	# Unit test.
	def test_simple_mul(self):

	# Trace function.
	def simple_mul(module):
	# A random seed is automatically set before each call to `simple_mul`.
	a = tf_utils.uniform([4])
	b = np.array([400., 5., 6., 7.], dtype=np.float32)
	# The inputs `a` and `b` are recorded along with the output `c`
	c = module.simple_mul(a, b)
	# The inputs `a` and `b` are recorded along with the (unnamed) output
	# module.simple_mul returns.
	module.simple_mul(a, c)

	# Calls `simple_mul` once for each backend, recording the inputs and outputs
	# to `module` and then comparing them.
	self.compare_backends(simple_mul)
	```

	## Test Suites

	Test targets are automatically generated for each test file and for each backend
	to check numerical correctness against TensorFlow. Tests targets that pass are
	placed into the `e2e_tests` test suite. Tests that fail on particular backends
	are recorded in lists in the `BUILD` files. For example, if
	`experimental_new_test.py` fails on the `iree_llvmjit` and `iree_vulkan`
	backends then the following lines should be added to the `BUILD` file:

	```build
	LLVM_FAILING = [
	...
	"experimental_new_test.py",
	...
	]

	VULKAN_FAILING = [
	...
	"experimental_new_test.py",
	...
	]
	```

	Test targets for these backends are placed into the `e2e_tests_failing` test
	suite. Test targets in these test suites can be run as follows:

	```shell
	# Run all e2e tests that are expected to pass.
	bazel test :e2e_tests

	# Run all e2e tests that are expected to fail.
	bazel test :e2e_tests_failing

	# Run a specific failing e2e test target.
	# Note that generated test targets are prefixed with their test suite name.
	bazel test :e2e_tests_failing_broadcasting_test__tf__iree_vulkan
	```

	## Debugging tests

	If the compiler fails to compile the program, then it will create a crash
	reproducer (see [MLIR documentation](https://mlir.llvm.org/docs/WritingAPass/)),
	which then allows reproducing the bug with an appropriate "opt" tool. Further
	debugging iteration can happen in opt.

	TODO(silvasean): debugging miscompiles

	## Test harnesses

	### Simple function tests

	See `simple_arithmetic_test.py` for some basic examples.