blob: 300cadfc6d09a8948002cae41ab2860a93c8e13f [file] [log] [blame] [view]
# TensorFlow e2e tests
This is a collection of e2e tests that save a TensorFlow model, compile it with
IREE, run it on multiple backends and crosscheck the results.
## Pre-Requisites
You will need a TensorFlow 2.0+ nightly installed in your python environment:
the python binary in `$PYTHON_BIN` should be able to `import tensorflow` and
that TensorFlow should be version 2.0+. This can be checked with
`tensorflow.version`.
See [Install TensorFlow with pip](https://www.tensorflow.org/install/pip) for
instructions.
## Vulkan setup
If you do not have your environment setup to use IREE with Vulkan (see
[the doc](../../../docs/vulkan_and_spirv.md)), then you can run the manual test
targets with `--target_backends=tf,iree_vmla,iree_llvmjit` (that is, by omitting
`iree_vulkan` from the list of backends to run the tests on).
The test suites can be run excluding Vulkan by specifying
`--test_tag_filters="-driver=vulkan"` in the `bazel test` invocation, or by
adding `test --test_tag_filters="-driver=vulkan"` to your `user.bazelrc`.
## Compiling `tf.Module`s
Compatible TensorFlow modules can be compiled to specific IREE backends using
`IreeCompiledModule`. This also optionally saves compilation artifacts to a
specified directory. These artifacts include: MLIR across various lowerings, a
TensorFlow SavedModel, and the compiled VM FlatBuffer. A basic example of
creating and calling an `IreeCompiledModule` can be found in
[`tf_utils_test.py`](https://github.com/google/iree/blob/main/integrations/tensorflow/bindings/python/pyiree/tf/support/tf_utils_test.py)
When using Keras models or tf.Modules with functions that IREE can't compile,
`exported_names` should be specified. For example:
```python
from pyiree.tf.support import tf_utils
vmla_module = tf_utils.IreeCompiledModule(
module_class=KerasTFModuleClass,
backend_info=tf_utils.BackendInfo('iree_vmla'),
exported_names=['predict'])
vmla_module.predict(...)
```
By default the TensorFlow SavedModels will not be kept. This can be overridden
via the `--keep_saved_model` flag.
## Running tests
For locally running tests and iterating on backend development, `bazel run` is
preferred.
```shell
# Run math_test on all backends.
bazel run :math_test_manual
# Run math_test comparing TensorFlow to itself (e.g. to debug randomization).
bazel run :math_test_manual -- target_backends=tf
# Run math_test comparing the VMLA backend and TensorFlow.
bazel run :math_test_manual -- --target_backends=iree_vmla
# Run math_test comparing the VMLA backend to itself multiple times.
bazel run :math_test_manual -- \
--reference_backend=iree_vmla --target_backends=iree_vmla,iree_vmla
# Run math_test and output on failure.
bazel test :math_test_manual --test_output=errors
# Run an individual test interactively.
bazel run :math_test_manual -- --test_output=streamed
```
For reproducibility of the unit tests `CompiledModule()` sets the random seeds
of `tf`, `numpy` and `python` by calling `tf_utils.set_random_seed()` before
model creation.
## Writing Tests
Our tests use a class `TracedModule` to capture and store all of the inputs and
outputs of a `CompiledModule` in a `Trace`. Each unittest on a `TestCase` uses
the `compare_backends` method. This method runs the function it is passed with a
`TracedModule` once for each reference and target backend. The inputs and
outputs to these modules are then checked for correctness, using the reference
backend as a source of truth. For example:
```python
# Compile a `tf.Module` named `SimpleArithmeticModule` into a `CompiledModule`.
@tf_test_utils.compile_module(SimpleArithmeticModule)
# Inherit from `TracedModuleTestCase`.
class SimpleArithmeticTest(tf_test_utils.TracedModuleTestCase):
# Unit test.
def test_simple_mul(self):
# Trace function.
def simple_mul(module):
# A random seed is automatically set before each call to `simple_mul`.
a = tf_utils.uniform([4])
b = np.array([400., 5., 6., 7.], dtype=np.float32)
# The inputs `a` and `b` are recorded along with the output `c`
c = module.simple_mul(a, b)
# The inputs `a` and `b` are recorded along with the (unnamed) output
# module.simple_mul returns.
module.simple_mul(a, c)
# Calls `simple_mul` once for each backend, recording the inputs and outputs
# to `module` and then comparing them.
self.compare_backends(simple_mul)
```
## Test Suites
Test targets are automatically generated for each test file and for each backend
to check numerical correctness against TensorFlow. Tests targets that pass are
placed into the `e2e_tests` test suite. Tests that fail on particular backends
are recorded in lists in the `BUILD` files. For example, if
`experimental_new_test.py` fails on the `iree_llvmjit` and `iree_vulkan`
backends then the following lines should be added to the `BUILD` file:
```build
LLVM_FAILING = [
...
"experimental_new_test.py",
...
]
VULKAN_FAILING = [
...
"experimental_new_test.py",
...
]
```
Test targets for these backends are placed into the `e2e_tests_failing` test
suite. Test targets in these test suites can be run as follows:
```shell
# Run all e2e tests that are expected to pass.
bazel test :e2e_tests
# Run all e2e tests that are expected to fail.
bazel test :e2e_tests_failing
# Run a specific failing e2e test target.
# Note that generated test targets are prefixed with their test suite name.
bazel test :e2e_tests_failing_broadcasting_test__tf__iree_vulkan
```
## Debugging tests
If the compiler fails to compile the program, then it will create a crash
reproducer (see [MLIR documentation](https://mlir.llvm.org/docs/WritingAPass/)),
which then allows reproducing the bug with an appropriate "opt" tool. Further
debugging iteration can happen in opt.
TODO(silvasean): debugging miscompiles
## Test harnesses
### Simple function tests
See `simple_arithmetic_test.py` for some basic examples.