This doc outlines how to use the TFLite Micro profiler to gather information about per-op invoke duration and to use the profiler to identify bottlenecks from within operator kernels and other TFLite Micro routines.
The MicroInterpreter class constructor contains an optional profiler argument. This profiler must be an instance of the tflite::Profiler class, and should implement the BeginEvent and EndEvent methods. There is a default implementation in tensorflow/lite/micro/micro_profiler.cc which can be used for most purposes.
The best practice for profiling across multiple invocations is to reset or call ClearEvents()
in between invocations.
There is a feature in the MicroInterpreter to enable per-op profiling. To enable this, provide a MicroProfiler to the MicroInterpreter's constructor then build with a non-release build to disable the NDEBUG define surrounding the ScopedOperatorProfile within the MicroInterpreter.
In order to further dig into performance of specific routines, the MicroProfiler can be used directly from the TFLiteContext or a new MicroProfiler can be created if the TFLiteContext is not available where the profiling needs to happen. The MicroProfiler's BeginEvent and EndEvent can be called directly, or wrapped using a ScopedProfile.