Note:
The TensorFlow integrations are currently being refactored. Thebazel
build is deprecated. Refer to https://google.github.io/iree/building-from-source/optional-features/ for a general overview of how to build and execute the e2e tests.
This is a collection of e2e tests that compile a TensorFlow model with IREE (and potentially TFLite), run it on multiple backends, and crosscheck the results.
You will need a TensorFlow 2.0+ nightly installed in your python environment: the python binary in $PYTHON_BIN
should be able to import tensorflow
and that TensorFlow should be version 2.0+. This can be checked with tensorflow.version
.
See Install TensorFlow with pip for instructions.
If you do not have your environment setup to use IREE with Vulkan (see this doc), then you can run the manual test targets with --target_backends=tf,iree_vmvx,iree_llvmaot
(that is, by omitting iree_vulkan
from the list of backends to run the tests on).
The test suites can be run excluding Vulkan by specifying --test_tag_filters="-driver=vulkan"
in the bazel test
invocation, or by adding test --test_tag_filters="-driver=vulkan"
to your user.bazelrc
.
tf.Module
sCompatible TensorFlow modules can be compiled to specific IREE backends using IreeCompiledModule
. This also optionally saves compilation artifacts to a specified directory. These artifacts include MLIR across various lowerings and the compiled VM FlatBuffer. A basic example of creating and calling an IreeCompiledModule
can be found in module_utils_test.py
When using Keras models or tf.Modules with functions that IREE can't compile, exported_names
should be specified. For example:
from iree.tf.support import module_utils vmvx_module = module_utils.IreeCompiledModule( module_class=KerasTFModuleClass, backend_info=module_utils.BackendInfo('iree_vmvx'), exported_names=['predict']) vmvx_module.predict(...)
For locally running tests and iterating on backend development, bazel run
is preferred.
# Run conv_test on all backends. bazel run //integrations/tensorflow/e2e:conv_test_manual # Run conv_test comparing TensorFlow to itself (e.g. to debug randomization). bazel run //integrations/tensorflow/e2e:conv_test_manual -- --target_backends=tf # Run conv_test comparing the VMLA backend and TensorFlow. bazel run //integrations/tensorflow/e2e:conv_test_manual -- --target_backends=iree_vmvx # Run conv_test comparing the VMLA backend to itself multiple times. bazel run //integrations/tensorflow/e2e:conv_test_manual -- \ --reference_backend=iree_vmvx --target_backends=iree_vmvx,iree_vmvx
For reproducibility of the unit tests CompiledModule()
sets the random seeds of tf
, numpy
and python
by calling tf_utils.set_random_seed()
before model creation.
There are two ways to write tests – via tf_test_utils.tf_function_unit_test
and via test methods on a child of tf_test_utils.TracedModuleTestCase
.
tf_test_utils.tf_function_unit_test
This is preferred in the cases where
.../e2e/keras/layers/layers_test.py
)Tests are specified by writing modules that inherit from tf_test_utils.TestModule
(which is a thin wrapper around tf.Module
) with methods decorated with @tf_test_utils.tf_function_unit_test
(with is a thin wrapper around tf.function
).
We use part of .../e2e/conv_test.py
as an example. The first component is the TestModule
itself:
class Conv2dModule(tf_test_utils.TestModule): # This decorator tells the testing infra to generate a unittest for this # function. The 'input_signature' is required. If no other arguments are # specified then uniform random data is generated from the input signature # to numerically test the function. @tf_test_utils.tf_function_unit_test(input_signature=[ tf.TensorSpec([1, 4, 5, 1], tf.float32), tf.TensorSpec([1, 1, 1, 1], tf.float32), ]) def conv2d_1451x1111_valid(self, img, kernel): return tf.nn.conv2d(img, kernel, [1, 1, 1, 1], "VALID", name="result") @tf_test_utils.tf_function_unit_test(input_signature=[ tf.TensorSpec([2, 4, 5, 1], tf.float32), tf.TensorSpec([1, 1, 1, 1], tf.float32), ]) def conv2d_2451x1111_valid(self, img, kernel): return tf.nn.conv2d(img, kernel, [1, 1, 1, 1], "VALID", name="result")
Second, you need to write a test case that inherits from tf_test_utils.TracedModuleTestCase
. This is essentially boiler plate that tells tf.test.main()
what tf.Module
to test and allows us to generate the unittests we specified above.
class ConvTest(tf_test_utils.TracedModuleTestCase): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self._modules = tf_test_utils.compile_tf_module(Conv2dModule)
Finally, in the main
function, you need to call .generate_unit_tests(module_class)
on your TestCase
to actually generate the unittests that we specified:
def main(argv): del argv # Unused if hasattr(tf, 'enable_v2_behavior'): tf.enable_v2_behavior() # Generates unittests for all @tf_test_utils.tf_function_unit_test decorated # functions on the module class. # Note: if you are automatically generating functions to test they need to be # specified via a `classmethod` prior to this call _as well_ as via `__init__` # to properly handle stateful `tf.function`s. ConvTest.generate_unit_tests(Conv2dModule) tf.test.main() if __name__ == '__main__': app.run(main)
This generates two unittests: test_conv2d_1451x1111_valid
and test_conv2d_2451x1111_valid
.
@tf_test_utils.tf_function_unit_test
By default @tf_test_utils.tf_function_unit_test
uses uniform random input data to numerically test the function, but you can specify an input_generator
or input_args
to test data-specific behaviors:
input_generator
can be tf_utils.uniform
, tf_utils.ndarange
, or any function which takes an shape
and dtype
as positional args and returns an np.ndarray
.input_args
is a list of np.ndarray
s to use as positional arguments.The comparison atol
and rtol
can also be specified in the decorator.
This is preferred in the cases where
tf.function
that you want to test is already defined on the module (e.g. on a downloaded model like in mobile_bert_test.py
)tf.function
(e.g. to test mutated state in ring_buffer_test.py
)Our tests use a class TracedModule
to capture and store all of the inputs and outputs of a CompiledModule
in a Trace
. Each unittest on a TestCase
uses the compare_backends
method. This method runs the function it is passed with a TracedModule
once for each reference and target backend. The inputs and outputs to these modules are then checked for correctness, using the reference backend as a source of truth.
We use simple_arithmetic_test.py
as an example:
# Create a tf.Module with one or more `@tf.function` decorated methods to test. class SimpleArithmeticModule(tf.Module): @tf.function(input_signature=[ tf.TensorSpec([4], tf.float32), tf.TensorSpec([4], tf.float32) ]) def simple_mul(self, a, b): return a * b # Inherit from `TracedModuleTestCase`. class SimpleArithmeticTest(tf_test_utils.TracedModuleTestCase): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) # Compile a `tf.Module` named `SimpleArithmeticModule` into # `CompiledModule`s for each reference and target backend. self._modules = tf_test_utils.compile_tf_module(SimpleArithmeticModule) # Unit test. def test_simple_mul(self): # Trace function. def simple_mul(module): # A random seed is automatically set before each call to `simple_mul`. a = tf_utils.uniform([4]) b = np.array([400., 5., 6., 7.], dtype=np.float32) # The inputs `a` and `b` are recorded along with the output `c` c = module.simple_mul(a, b) # The inputs `a` and `b` are recorded along with the (unnamed) output # module.simple_mul returns. module.simple_mul(a, c) # Calls `simple_mul` once for each backend, recording the inputs and outputs # to `module` and then comparing them. self.compare_backends(simple_mul, self._modules)
Test targets are automatically generated for each test file and for each backend to check numerical correctness against TensorFlow. Tests targets that pass are placed into the e2e_tests
test suite. Tests that fail on particular backends are recorded in lists in the BUILD
files. For example, if experimental_new_test.py
fails on the iree_llvmaot
and iree_vulkan
backend then the following lines should be added to the BUILD
file:
LLVM_FAILING = [ ... "experimental_new_test.py", ... ] VULKAN_FAILING = [ ... "experimental_new_test.py", ... ]
Test targets for these backends are placed into the e2e_tests_failing
test suite. Test targets in these test suites can be run as follows:
# Run all e2e tests that are expected to pass. bazel test //integrations/tensorflow/e2e:e2e_tests # Run all e2e tests that are expected to fail. bazel test //integrations/tensorflow/e2e:e2e_tests_failing # Run a specific failing e2e test target. # Note that generated test targets are prefixed with their test suite name. # Also, if broadcasting_test starts working on iree_vulkan after the time # of writing then this command will fail. bazel test //integrations/tensorflow/e2e:e2e_tests_failing_broadcasting_test__tf__iree_vulkan
By default, running an E2E test generates a number of compilation, debugging and benchmarking artifacts. These artifacts will be saved
/tmp/iree/modules/
when using bazel run
or bazel_test
with --test_arg=--artifacts_dir=/tmp/iree/modules/
.bazel-testlogs/integrations/tensorflow/e2e/test_suite_target_name
when using bazel test
without specifying --artifacts_dir
.The generated directory structure for each module is as follows:
/tmp/iree/modules/ModuleName ├── reproducer__backend.mlir │ # If there is a compilation error, a MLIR file that reproduces the error │ # for a specific backend is included. ├── tf_input.mlir │ # MLIR for ModuleName in TF's input dialect. ├── iree_input.mlir │ # tf_input.mlir translated to IREE MLIR. ├── iree_vmvx │ # Or any other IREE backend. │ ├── compiled.vmfb │ │ # A flatbuffer containing IREE's compiled code. │ └── traces │ # Directory with a trace for each unittest in vision_model_test.py. │ ├── trace_function_1 │ │ # Directory storing logs and serialization for a specific trace. │ │ │── flagfile │ │ │ # An Abseil flagfile containing arguments │ │ │ # iree-benchmark-module needs to benchmark this trace. │ │ └── log.txt │ │ # A more detailed version of the test logs. │ │── trace_function_2 │ └── ... ├── tflite # If TFLite supports compiling ModuleName. │ ├── method_1.tflite # Methods on ModuleName compiled to bytes with TFLite │ │ # A method on ModuleName compiled to bytes with TFLite, which can │ │ # be ingested by TFLite's benchmark_model binary. │ ├── method_2.tflite │ └── traces │ └── ... └── tf_ref # Directory storing the tensorflow reference traces. └── traces └── ...
Traces for a particular test can be loaded via the Trace.load(trace_dir)
method. For example:
ref_trace = Trace.load("/tmp/iree/modules/ModuleName/tf_ref/traces/predict/") tar_trace = Trace.load("/tmp/iree/modules/ModuleName/iree_vmvx/traces/predict/") abs_diff = np.abs(ref_trace.calls[0].outputs[0] - tar_trace.calls[0].outputs[0]) print(np.mean(abs_diff))
Traces are named after the trace functions defined in their unittests. So in the SimpleArithmeticModule
example above, the trace_dir
would be /tmp/iree/modules/SimpleArithmeticModule/iree_vmvx/traces/simple_mul/
.
We use our end-to-end TensorFlow integrations tests to generate tested compilation and benchmarking artifacts. This allows us to validate that our benchmarks are behaving as we expect them to, and to run them using valid inputs for each model. An overview of how to run benchmarks on IREE and TFLite can be found in this doc.
If the compiler fails to compile the program, then it will create a crash reproducer (see MLIR documentation), which then allows reproducing the bug with an appropriate “opt” tool. Further debugging iteration can happen in opt.
TODO(silvasean): debugging miscompiles
TensorFlow 1.x SavedModels can be tested using tf_test_utils.compile_tf_signature_def_saved_model
instead of tf_test_utils.compile_tf_module
. See mobile_bert_squad_test.py
for a concrete example. The compilation artifacts will be saved under whatever you specify for module_name
.