|  | # TensorFlow e2e tests | 
|  |  | 
|  | <!-- TODO(meadowlark): Update this doc once the API is stable. --> | 
|  |  | 
|  | > Note:<br> | 
|  | >     The TensorFlow integrations are currently being | 
|  | refactored. The `bazel` build is deprecated. Refer to | 
|  | https://google.github.io/iree/building-from-source/optional-features/ for a general | 
|  | overview of how to build and execute the e2e tests. | 
|  |  | 
|  | This is a collection of e2e tests that compile a TensorFlow model with IREE (and | 
|  | potentially TFLite), run it on multiple backends, and crosscheck the results. | 
|  |  | 
|  | ## Pre-Requisites | 
|  |  | 
|  | You will need a TensorFlow 2.0+ nightly installed in your python environment: | 
|  | the python binary in `$PYTHON_BIN` should be able to `import tensorflow` and | 
|  | that TensorFlow should be version 2.0+. This can be checked with | 
|  | `tensorflow.version`. | 
|  |  | 
|  | See [Install TensorFlow with pip](https://www.tensorflow.org/install/pip) for | 
|  | instructions. | 
|  |  | 
|  | ## Vulkan Setup | 
|  |  | 
|  | If you do not have your environment setup to use IREE with Vulkan (see | 
|  | [this doc](https://google.github.io/iree/deployment-configurations/gpu-vulkan/)), | 
|  | then you can run the manual test targets with | 
|  | `--target_backends=tf,iree_vmvx,iree_llvmaot` (that is, by omitting | 
|  | `iree_vulkan` from the list of backends to run the tests on). | 
|  |  | 
|  | The test suites can be run excluding Vulkan by specifying | 
|  | `--test_tag_filters="-driver=vulkan"` in the `bazel test` invocation, or by | 
|  | adding `test --test_tag_filters="-driver=vulkan"` to your `user.bazelrc`. | 
|  |  | 
|  | ## Compiling `tf.Module`s | 
|  |  | 
|  | Compatible TensorFlow modules can be compiled to specific IREE backends using | 
|  | `IreeCompiledModule`. This also optionally saves compilation artifacts to a | 
|  | specified directory. These artifacts include MLIR across various lowerings and | 
|  | the compiled VM FlatBuffer. A basic example of creating and calling an | 
|  | `IreeCompiledModule` can be found in | 
|  | [`module_utils_test.py`](https://github.com/google/iree/blob/main/integrations/tensorflow/bindings/python/iree/tf/support/module_utils_test.py) | 
|  |  | 
|  | When using Keras models or tf.Modules with functions that IREE can't compile, | 
|  | `exported_names` should be specified. For example: | 
|  |  | 
|  | ```python | 
|  | from iree.tf.support import module_utils | 
|  | vmvx_module = module_utils.IreeCompiledModule( | 
|  | module_class=KerasTFModuleClass, | 
|  | backend_info=module_utils.BackendInfo('iree_vmvx'), | 
|  | exported_names=['predict']) | 
|  | vmvx_module.predict(...) | 
|  | ``` | 
|  |  | 
|  | ## Running Tests | 
|  |  | 
|  | For locally running tests and iterating on backend development, `bazel run` is | 
|  | preferred. | 
|  |  | 
|  | ```shell | 
|  | # Run conv_test on all backends. | 
|  | bazel run //integrations/tensorflow/e2e:conv_test_manual | 
|  |  | 
|  | # Run conv_test comparing TensorFlow to itself (e.g. to debug randomization). | 
|  | bazel run //integrations/tensorflow/e2e:conv_test_manual -- --target_backends=tf | 
|  |  | 
|  | # Run conv_test comparing the VMLA backend and TensorFlow. | 
|  | bazel run //integrations/tensorflow/e2e:conv_test_manual -- --target_backends=iree_vmvx | 
|  |  | 
|  | # Run conv_test comparing the VMLA backend to itself multiple times. | 
|  | bazel run //integrations/tensorflow/e2e:conv_test_manual -- \ | 
|  | --reference_backend=iree_vmvx --target_backends=iree_vmvx,iree_vmvx | 
|  | ``` | 
|  |  | 
|  | For reproducibility of the unit tests `CompiledModule()` sets the random seeds | 
|  | of `tf`, `numpy` and `python` by calling `tf_utils.set_random_seed()` before | 
|  | model creation. | 
|  |  | 
|  | ## Writing Tests | 
|  |  | 
|  | There are two ways to write tests – via `tf_test_utils.tf_function_unit_test` and | 
|  | via test methods on a child of `tf_test_utils.TracedModuleTestCase`. | 
|  |  | 
|  | ### Via `tf_test_utils.tf_function_unit_test` | 
|  |  | 
|  | This is preferred in the cases where | 
|  |  | 
|  | 1. Only a single call to the module needs to be tested at once | 
|  | 2. The inputs are simple to automatically generate or specify inline. | 
|  | 3. The functions that you want to test are generated automatically from a | 
|  | configuration (e.g. in `.../e2e/keras/layers/layers_test.py`) | 
|  |  | 
|  | Tests are specified by writing modules that inherit from | 
|  | `tf_test_utils.TestModule` (which is a thin wrapper around `tf.Module`) with | 
|  | methods decorated with `@tf_test_utils.tf_function_unit_test` (with is a thin | 
|  | wrapper around `tf.function`). | 
|  |  | 
|  | #### Basic example | 
|  |  | 
|  | We use part of `.../e2e/conv_test.py` as an example. The first component is | 
|  | the `TestModule` itself: | 
|  |  | 
|  | ```python | 
|  | class Conv2dModule(tf_test_utils.TestModule): | 
|  |  | 
|  | # This decorator tells the testing infra to generate a unittest for this | 
|  | # function. The 'input_signature' is required. If no other arguments are | 
|  | # specified then uniform random data is generated from the input signature | 
|  | # to numerically test the function. | 
|  | @tf_test_utils.tf_function_unit_test(input_signature=[ | 
|  | tf.TensorSpec([1, 4, 5, 1], tf.float32), | 
|  | tf.TensorSpec([1, 1, 1, 1], tf.float32), | 
|  | ]) | 
|  | def conv2d_1451x1111_valid(self, img, kernel): | 
|  | return tf.nn.conv2d(img, kernel, [1, 1, 1, 1], "VALID", name="result") | 
|  |  | 
|  | @tf_test_utils.tf_function_unit_test(input_signature=[ | 
|  | tf.TensorSpec([2, 4, 5, 1], tf.float32), | 
|  | tf.TensorSpec([1, 1, 1, 1], tf.float32), | 
|  | ]) | 
|  | def conv2d_2451x1111_valid(self, img, kernel): | 
|  | return tf.nn.conv2d(img, kernel, [1, 1, 1, 1], "VALID", name="result") | 
|  | ``` | 
|  |  | 
|  | Second, you need to write a test case that inherits from | 
|  | `tf_test_utils.TracedModuleTestCase`. This is essentially boiler plate that | 
|  | tells `tf.test.main()` what `tf.Module` to test and allows us to generate | 
|  | the unittests we specified above. | 
|  |  | 
|  | ```python | 
|  | class ConvTest(tf_test_utils.TracedModuleTestCase): | 
|  |  | 
|  | def __init__(self, *args, **kwargs): | 
|  | super().__init__(*args, **kwargs) | 
|  | self._modules = tf_test_utils.compile_tf_module(Conv2dModule) | 
|  | ``` | 
|  |  | 
|  | Finally, in the `main` function, you need to call | 
|  | `.generate_unit_tests(module_class)` on your `TestCase` to actually generate | 
|  | the unittests that we specified: | 
|  |  | 
|  | ```python | 
|  | def main(argv): | 
|  | del argv  # Unused | 
|  | if hasattr(tf, 'enable_v2_behavior'): | 
|  | tf.enable_v2_behavior() | 
|  | # Generates unittests for all @tf_test_utils.tf_function_unit_test decorated | 
|  | # functions on the module class. | 
|  | # Note: if you are automatically generating functions to test they need to be | 
|  | # specified via a `classmethod` prior to this call _as well_ as via `__init__` | 
|  | # to properly handle stateful `tf.function`s. | 
|  | ConvTest.generate_unit_tests(Conv2dModule) | 
|  | tf.test.main() | 
|  |  | 
|  |  | 
|  | if __name__ == '__main__': | 
|  | app.run(main) | 
|  | ``` | 
|  |  | 
|  | This generates two unittests: `test_conv2d_1451x1111_valid` and | 
|  | `test_conv2d_2451x1111_valid`. | 
|  |  | 
|  | #### Configuring `@tf_test_utils.tf_function_unit_test` | 
|  |  | 
|  | By default `@tf_test_utils.tf_function_unit_test` uses uniform random input data | 
|  | to numerically test the function, but you can specify an `input_generator` or | 
|  | `input_args` to test data-specific behaviors: | 
|  |  | 
|  | - `input_generator` can be `tf_utils.uniform`, `tf_utils.ndarange`, or any | 
|  | function which takes an `shape` and `dtype` as positional args and returns an | 
|  | `np.ndarray`. | 
|  | - `input_args` is a list of `np.ndarray`s to use as positional arguments. | 
|  |  | 
|  | The comparison `atol` and `rtol` can also be specified in the decorator. | 
|  |  | 
|  | ### Via test methods | 
|  |  | 
|  | This is preferred in the cases where | 
|  |  | 
|  | 1. The `tf.function` that you want to test is already defined on the module | 
|  | (e.g. on a downloaded model like in `mobile_bert_test.py`) | 
|  | 2. The inputs are difficult to specify inline and require multiple function | 
|  | calls / reshaping to create | 
|  | 3. You want to test multiple consecutive calls to a `tf.function` (e.g. to test | 
|  | mutated state in `ring_buffer_test.py`) | 
|  |  | 
|  | Our tests use a class `TracedModule` to capture and store all of the inputs and | 
|  | outputs of a `CompiledModule` in a `Trace`. Each unittest on a `TestCase` uses | 
|  | the `compare_backends` method. This method runs the function it is passed with a | 
|  | `TracedModule` once for each reference and target backend. The inputs and | 
|  | outputs to these modules are then checked for correctness, using the reference | 
|  | backend as a source of truth. | 
|  |  | 
|  | We use `simple_arithmetic_test.py` as an example: | 
|  |  | 
|  | ```python | 
|  | # Create a tf.Module with one or more `@tf.function` decorated methods to test. | 
|  | class SimpleArithmeticModule(tf.Module): | 
|  |  | 
|  | @tf.function(input_signature=[ | 
|  | tf.TensorSpec([4], tf.float32), | 
|  | tf.TensorSpec([4], tf.float32) | 
|  | ]) | 
|  | def simple_mul(self, a, b): | 
|  | return a * b | 
|  |  | 
|  | # Inherit from `TracedModuleTestCase`. | 
|  | class SimpleArithmeticTest(tf_test_utils.TracedModuleTestCase): | 
|  |  | 
|  | def __init__(self, *args, **kwargs): | 
|  | super().__init__(*args, **kwargs) | 
|  | # Compile a `tf.Module` named `SimpleArithmeticModule` into | 
|  | # `CompiledModule`s for each reference and target backend. | 
|  | self._modules = tf_test_utils.compile_tf_module(SimpleArithmeticModule) | 
|  |  | 
|  | # Unit test. | 
|  | def test_simple_mul(self): | 
|  |  | 
|  | # Trace function. | 
|  | def simple_mul(module): | 
|  | # A random seed is automatically set before each call to `simple_mul`. | 
|  | a = tf_utils.uniform([4]) | 
|  | b = np.array([400., 5., 6., 7.], dtype=np.float32) | 
|  |  | 
|  | # The inputs `a` and `b` are recorded along with the output `c` | 
|  | c = module.simple_mul(a, b) | 
|  |  | 
|  | # The inputs `a` and `b` are recorded along with the (unnamed) output | 
|  | # module.simple_mul returns. | 
|  | module.simple_mul(a, c) | 
|  |  | 
|  | # Calls `simple_mul` once for each backend, recording the inputs and outputs | 
|  | # to `module` and then comparing them. | 
|  | self.compare_backends(simple_mul, self._modules) | 
|  | ``` | 
|  |  | 
|  | ## Test Suites | 
|  |  | 
|  | Test targets are automatically generated for each test file and for each backend | 
|  | to check numerical correctness against TensorFlow. Tests targets that pass are | 
|  | placed into the `e2e_tests` test suite. Tests that fail on particular backends | 
|  | are recorded in lists in the `BUILD` files. For example, if | 
|  | `experimental_new_test.py` fails on the `iree_llvmaot` and `iree_vulkan` | 
|  | backend then the following lines should be added to the `BUILD` file: | 
|  |  | 
|  | ```build | 
|  | LLVM_FAILING = [ | 
|  | ... | 
|  | "experimental_new_test.py", | 
|  | ... | 
|  | ] | 
|  |  | 
|  | VULKAN_FAILING = [ | 
|  | ... | 
|  | "experimental_new_test.py", | 
|  | ... | 
|  | ] | 
|  | ``` | 
|  |  | 
|  | Test targets for these backends are placed into the `e2e_tests_failing` test | 
|  | suite. Test targets in these test suites can be run as follows: | 
|  |  | 
|  | ```shell | 
|  | # Run all e2e tests that are expected to pass. | 
|  | bazel test //integrations/tensorflow/e2e:e2e_tests | 
|  |  | 
|  | # Run all e2e tests that are expected to fail. | 
|  | bazel test //integrations/tensorflow/e2e:e2e_tests_failing | 
|  |  | 
|  | # Run a specific failing e2e test target. | 
|  | # Note that generated test targets are prefixed with their test suite name. | 
|  | # Also, if broadcasting_test starts working on iree_vulkan after the time | 
|  | # of writing then this command will fail. | 
|  | bazel test //integrations/tensorflow/e2e:e2e_tests_failing_broadcasting_test__tf__iree_vulkan | 
|  | ``` | 
|  |  | 
|  | ## Generated Artifacts | 
|  |  | 
|  | By default, running an E2E test generates a number of compilation, debugging and | 
|  | benchmarking artifacts. These artifacts will be saved | 
|  |  | 
|  | - in `/tmp/iree/modules/` when using `bazel run` or `bazel_test` with | 
|  | `--test_arg=--artifacts_dir=/tmp/iree/modules/`. | 
|  | - in `bazel-testlogs/integrations/tensorflow/e2e/test_suite_target_name` when | 
|  | using `bazel test` without specifying `--artifacts_dir`. | 
|  |  | 
|  | The generated directory structure for each module is as follows: | 
|  |  | 
|  | ```shell | 
|  | /tmp/iree/modules/ModuleName | 
|  | ├── reproducer__backend.mlir | 
|  | │   # If there is a compilation error, a MLIR file that reproduces the error | 
|  | │   # for a specific backend is included. | 
|  | ├── tf_input.mlir | 
|  | │   # MLIR for ModuleName in TF's input dialect. | 
|  | ├── iree_input.mlir | 
|  | │   # tf_input.mlir translated to IREE MLIR. | 
|  | ├── iree_vmvx | 
|  | │   # Or any other IREE backend. | 
|  | │   ├── compiled.vmfb | 
|  | │   │   # A flatbuffer containing IREE's compiled code. | 
|  | │   └── traces | 
|  | │       # Directory with a trace for each unittest in vision_model_test.py. | 
|  | │       ├── trace_function_1 | 
|  | │       │   # Directory storing logs and serialization for a specific trace. | 
|  | │       │   │── flagfile | 
|  | │       │   │   # An Abseil flagfile containing arguments | 
|  | │       │   │   # iree-benchmark-module needs to benchmark this trace. | 
|  | │       │   └── log.txt | 
|  | │       │       # A more detailed version of the test logs. | 
|  | │       │── trace_function_2 | 
|  | │       └── ... | 
|  | ├── tflite  # If TFLite supports compiling ModuleName. | 
|  | │   ├── method_1.tflite  # Methods on ModuleName compiled to bytes with TFLite | 
|  | │   │   # A method on ModuleName compiled to bytes with TFLite, which can | 
|  | │   │   # be ingested by TFLite's benchmark_model binary. | 
|  | │   ├── method_2.tflite | 
|  | │   └── traces | 
|  | │       └── ... | 
|  | └── tf_ref  # Directory storing the tensorflow reference traces. | 
|  | └── traces | 
|  | └── ... | 
|  | ``` | 
|  |  | 
|  | Traces for a particular test can be loaded via the `Trace.load(trace_dir)` | 
|  | method. For example: | 
|  |  | 
|  | ```python | 
|  | ref_trace = Trace.load("/tmp/iree/modules/ModuleName/tf_ref/traces/predict/") | 
|  | tar_trace = Trace.load("/tmp/iree/modules/ModuleName/iree_vmvx/traces/predict/") | 
|  | abs_diff = np.abs(ref_trace.calls[0].outputs[0] - tar_trace.calls[0].outputs[0]) | 
|  | print(np.mean(abs_diff)) | 
|  | ``` | 
|  |  | 
|  | Traces are named after the trace functions defined in their unittests. So in the | 
|  | `SimpleArithmeticModule` example above, the `trace_dir` would be | 
|  | `/tmp/iree/modules/SimpleArithmeticModule/iree_vmvx/traces/simple_mul/`. | 
|  |  | 
|  | ## Benchmarking E2E Modules | 
|  |  | 
|  | We use our end-to-end TensorFlow integrations tests to generate tested | 
|  | compilation and benchmarking artifacts. This allows us to validate that our | 
|  | benchmarks are behaving as we expect them to, and to run them using valid inputs | 
|  | for each model. An overview of how to run benchmarks on IREE and TFLite can be | 
|  | found in | 
|  | [this doc](https://github.com/google/iree/blob/main/docs/developers/developing_iree/e2e_benchmarking.md). | 
|  |  | 
|  | ## Debugging Tests | 
|  |  | 
|  | If the compiler fails to compile the program, then it will create a crash | 
|  | reproducer (see | 
|  | [MLIR documentation](https://mlir.llvm.org/docs/PassManagement/#crash-and-failure-reproduction)), | 
|  | which then allows reproducing the bug with an appropriate "opt" tool. Further | 
|  | debugging iteration can happen in opt. | 
|  |  | 
|  | TODO(silvasean): debugging miscompiles | 
|  |  | 
|  | ## Testing SignatureDef SavedModels | 
|  |  | 
|  | TensorFlow 1.x SavedModels can be tested using | 
|  | `tf_test_utils.compile_tf_signature_def_saved_model` instead of | 
|  | `tf_test_utils.compile_tf_module`. See `mobile_bert_squad_test.py` for a | 
|  | concrete example. The compilation artifacts will be saved under whatever | 
|  | you specify for `module_name`. |