blob: 065cd19c243c3e01b8f77867a1d65b260d2ca847 [file] [log] [blame] [view] [edit]
This directory contains examples to get developers started
with using IREE's HalModuleDebugSink feature with callbacks
to bisect layers of multi-dispatch model, to locate numerical
divergences. It contains 2 examples:
## Simple callback example: `sync_callback_log.py`
A simple multi-dispatch example showing how to log statistics of
all dispatch inputs and outputs. When a model is compiled with
`--iree-flow-trace-dispatch-tensors`, IREE inserts trace points
before and after each dispatch function.
When iree-run-module executes a model with trace points, the
tensor values of dispatch inputs and outputs are printed to standard
output.
For models with small tensors, this is useful for debugging
numerical issues: By comparing tensor values before and after a
numerical regression, one can often find the dispatch that
introduced the numerical error. However, for large models, it is
infeasible to print all tensor values to the terminal. Even saving
tensor values to disk may be impractical due to size constraints.
In such cases, it is often sufficient to compare simple summary
statistics, such as the mean. To enable this level of flexibility,
one can use HalModuleDebugSink with arbitrary callback functions.
This example presents a simple callback that logs mean values.
## Two model callback example: `sync_callback_log_async.py`
This example extends Example 1. It shows how to run two versions of
a model in parallel. The versions differ only in one dispatch, and
the goal is automatically identify the 'bad' dispatch.
In practice, the two models are usually compilation artifacts of
the same source model, built before and after a numerical
regression. This technique is especially useful when numerical
divergence appears early in execution.