|  | .. _module-pw_snapshot-setup: | 
|  |  | 
|  | ============================== | 
|  | Setting up a Snapshot Pipeline | 
|  | ============================== | 
|  |  | 
|  | .. contents:: Table of Contents | 
|  |  | 
|  | ------------------- | 
|  | Crash Handler Setup | 
|  | ------------------- | 
|  | The Snapshot proto was designed first and foremost as a crash reporting format. | 
|  | This section covers how to set up a crash handler to capture Snapshots. | 
|  |  | 
|  | .. image:: images/generic_crash_flow.svg | 
|  | :width: 600 | 
|  | :alt: Generic crash handler flow | 
|  |  | 
|  | A typical crash handler has two entry points: | 
|  |  | 
|  | 1. A software entry path through developer-written ASSERT() or CHECK() calls | 
|  | that indicate a device should go down for a crash if a condition is not met. | 
|  | 2. A hardware-triggered exception handler path that is initiated when a CPU | 
|  | encounters a fault signal (invalid memory access, bad instruction, etc.). | 
|  |  | 
|  | Before deferring to a common crash handler, these entry paths should disable | 
|  | interrupts to force the system into a single-threaded execution mode. This | 
|  | prevents other threads from operating on potentially bad data or clobbering | 
|  | system state that could be useful for debugging. | 
|  |  | 
|  | The first step in a crash handler should always be a check for nested crashes to | 
|  | prevent infinitely recursive crashes. Once it's deemed it's safe to continue, | 
|  | the crash handler can re-initialize logging, initialize storage for crash report | 
|  | capture, and then build a snapshot to later be retrieved from the device. Once | 
|  | the crash report collection process is complete, some post-crash callbacks can | 
|  | be run on a best-effort basis to clean up the system before rebooting. For | 
|  | devices with debug port access, it's helpful to optionally hold the device in | 
|  | an infinite loop rather than resetting to allow developers to access the device | 
|  | via a hardware debugger. | 
|  |  | 
|  | Assert Handler Setup | 
|  | ==================== | 
|  | :ref:`pw_assert <module-pw_assert>` is Pigweed's entry point for software | 
|  | crashes. Route any existing assert functions through pw_assert to centralize the | 
|  | software crash path. You’ll need to create a :ref:`pw_assert backend | 
|  | <module-pw_assert-backend_api>` or a custom :ref:`pw_assert_basic handler | 
|  | <module-pw_assert_basic-custom_handler>` to pass collected information to a more | 
|  | sophisticated crash handler. One way to do this is to collect the data into a | 
|  | statically allocated struct that is passed to a common crash handler. It’s | 
|  | important to immediately disable interrupts to prevent the system from doing | 
|  | other things while in an impacted state. | 
|  |  | 
|  | .. code-block:: cpp | 
|  |  | 
|  | // This can be be directly accessed by a crash handler | 
|  | static CrashData crash_data; | 
|  | extern "C" void pw_assert_basic_HandleFailure(const char* file_name, | 
|  | int line_number, | 
|  | const char* format, | 
|  | ...) { | 
|  | // Always disable interrupts first! How this is done depends | 
|  | // on your platform. | 
|  | __disable_irq(); | 
|  |  | 
|  | va_list args; | 
|  | va_start(args, format); | 
|  | crash_data.file_name = file_name; | 
|  | crash_data.line_number = line_number; | 
|  | crash_data.reason_fmt = format; | 
|  | crash_data.reason_args = &args; | 
|  | crash_data.cpu_state = nullptr; | 
|  |  | 
|  | HandleCrash(crash_data); | 
|  | PW_UNREACHABLE; | 
|  | } | 
|  |  | 
|  | Exception Handler Setup | 
|  | ======================= | 
|  | :ref:`pw_cpu_exception <module-pw_cpu_exception>` is Pigweed's recommended entry | 
|  | point for CPU-triggered faults (divide by zero, invalid memory access, etc.). | 
|  | You will need to provide a definition for pw_cpu_exception_DefaultHandler() that | 
|  | passes the exception state produced by pw_cpu_exception to your common crash | 
|  | handler. | 
|  |  | 
|  | .. code-block:: cpp | 
|  |  | 
|  | static CrashData crash_data; | 
|  | // This helper turns a format string to a va_list that can be used by the | 
|  | // common crash handling path. | 
|  | void HandleExceptionWithString(pw_cpu_exception_State& state, | 
|  | const char* fmt, | 
|  | ...) { | 
|  | va_list args; | 
|  | va_start(args, fmt); | 
|  | crash_data.cpu_state = state; | 
|  | crash_data.file_name = nullptr; | 
|  | crash_data.reason_fmt = fmt; | 
|  | crash_data.reason_args = &args; | 
|  |  | 
|  | HandleCrash(crash_data); | 
|  | PW_UNREACHABLE; | 
|  | } | 
|  |  | 
|  | extern "C" void pw_cpu_exception_DefaultHandler( | 
|  | pw_cpu_exception_State* state) { | 
|  | // Always disable interrupts first! How this is done depends | 
|  | // on your platform. | 
|  | __disable_irq(); | 
|  |  | 
|  | crash_data.state = cpu_state; | 
|  | // The CFSR is an extremely useful register for understanding ARMv7-M and | 
|  | // ARMv8-M CPU faults. Other architectures should put something else here. | 
|  | HandleExceptionWithString(crash_data, | 
|  | "Exception encountered, cfsr=0x%", | 
|  | cpu_state->extended.cfsr); | 
|  | } | 
|  |  | 
|  | Common Crash Handler Setup | 
|  | ========================== | 
|  | To minimize duplication of crash handling logic, it's good practice to route the | 
|  | pw_assert and pw_cpu_exception handlers to a common crash handling codepath. | 
|  | Ensure you can pass both pw_cpu_exception's CPU state and pw_assert's assert | 
|  | information to the shared handler. | 
|  |  | 
|  | .. code-block:: cpp | 
|  |  | 
|  | struct CrashData { | 
|  | pw_cpu_exception_State *cpu_state; | 
|  | const char *reason_fmt; | 
|  | const va_list *reason_args; | 
|  | const char *file_name; | 
|  | int line_number; | 
|  | }; | 
|  |  | 
|  | // This function assumes interrupts are properly disabled BEFORE it is called. | 
|  | [[noreturn]] void HandleCrash(CrashData& crash_info) { | 
|  | // Handle crash | 
|  | } | 
|  |  | 
|  | In the crash handler your project can re-initialize a minimal subset of the | 
|  | system needed to safely capture a snapshot before rebooting the device. The | 
|  | remainder of this section focuses on ways you can improve the reliability and | 
|  | usability of your project's crash handler. | 
|  |  | 
|  | Check for Nested Crashes | 
|  | ------------------------ | 
|  | It’s important to include crash handler checks that prevent infinite recursive | 
|  | nesting of crashes. Maintain a static variable that checks the crash nesting | 
|  | depth. After one or two nested crashes, abort crash handling entirely and reset | 
|  | the device or sit in an infinite loop to wait for a hardware debugger to attach. | 
|  | It’s simpler to put this logic at the beginning of the shared crash handler, but | 
|  | if your assert/exception handlers are complex it might be safer to inject the | 
|  | checks earlier in both codepaths. | 
|  |  | 
|  | .. code-block:: cpp | 
|  |  | 
|  | [[noreturn]] void HandleCrash(CrashData &crash_info) { | 
|  | static size_t crash_depth = 0; | 
|  | if (crash_depth > kMaxCrashDepth) { | 
|  | Abort(/*run_callbacks=*/false); | 
|  | } | 
|  | crash_depth++; | 
|  | ... | 
|  | } | 
|  |  | 
|  | Re-initialize Logging (Optional) | 
|  | -------------------------------- | 
|  | Logging can be helpful for debugging your crash handler, but depending on your | 
|  | device/system design may be challenging to safely support at crash time. To | 
|  | re-initialize logging, you’ll need to re-construct C++ objects and re-initialize | 
|  | any systems/hardware in the logging codepath. You may even need an entirely | 
|  | separate logging pipeline that is single-threaded and interrupt-safe. Depending | 
|  | on your system’s design, this may be difficult to set up. | 
|  |  | 
|  | Reinitialize Dependencies | 
|  | ------------------------- | 
|  | It's good practice to design a crash handler that can run before C++ static | 
|  | constructors have run. This means any initialization (whether manual or through | 
|  | constructors) that your crash handler depends on should be manually invoked at | 
|  | crash time. If an initialization step might not be safe, evaluate if it's | 
|  | possible to omit the dependency. | 
|  |  | 
|  | System Cleanup | 
|  | -------------- | 
|  | After collecting a snapshot, some parts of your system may benefit from some | 
|  | cleanup before explicitly resetting a device. This might include flushing | 
|  | buffers or safely shutting down attached hardware. The order of shutdown should | 
|  | be deterministic, keeping in mind that any of these steps may have the potential | 
|  | of causing a nested crash that skips the remainder of the handlers and forces | 
|  | the device to immediately reset. | 
|  |  | 
|  | ---------------------- | 
|  | Snapshot Storage Setup | 
|  | ---------------------- | 
|  | Use a storage class with a ``pw::stream::Writer`` interface to simplify | 
|  | capturing a pw_snapshot proto. This can be a :ref:`pw::BlobStore | 
|  | <module-pw_blob_store>`, an in-memory buffer that is flushed to flash, or a | 
|  | :ref:`pw::PersistentBuffer <module-pw_persistent_ram-persistent_buffer>` that | 
|  | lives in persistent memory. It's good practice to use lazy initialization for | 
|  | storage objects used by your Snapshot capture codepath. | 
|  |  | 
|  | .. code-block:: cpp | 
|  |  | 
|  | // Persistent RAM objects are highly available. They don't rely on | 
|  | // their constructor being run, and require no initialization. | 
|  | PW_KEEP_IN_SECTION(".noinit") | 
|  | pw::persistent_ram::PersistentBuffer<2048> persistent_snapshot; | 
|  |  | 
|  | void CaptureSnapshot(CrashInfo& crash_info) { | 
|  | ... | 
|  | persistent_snapshot.clear(); | 
|  | PersistentBufferWriter& writer = persistent_snapshot.GetWriter(); | 
|  | ... | 
|  | } | 
|  |  | 
|  | ---------------------- | 
|  | Snapshot Capture Setup | 
|  | ---------------------- | 
|  |  | 
|  | .. note:: | 
|  |  | 
|  | These instructions do not yet use the ``pw::protobuf::StreamingEncoder``. | 
|  |  | 
|  | Capturing a snapshot is as simple as encoding any other proto message. Some | 
|  | modules provide helper functions that will populate parts of a Snapshot, which | 
|  | eases the burden of custom work that must be set up uniquely for each project. | 
|  |  | 
|  | Capture Reason | 
|  | ============== | 
|  | A snapshot's "reason" should be considered the single most important field in a | 
|  | captured snapshot. If a snapshot capture was triggered by a crash, this should | 
|  | be the assert string. Other entry paths should describe here why the snapshot | 
|  | was captured ("Host communication buffer full!", "Exception encountered at | 
|  | 0x00000004", etc.). | 
|  |  | 
|  | .. code-block:: cpp | 
|  |  | 
|  | Status CaptureSnapshot(CrashData& crash_info) { | 
|  | // Temporary buffer for encoding "reason" to. | 
|  | static std::byte temp_buffer[500]; | 
|  | // Temporary buffer to encode serialized proto to before dumping to the | 
|  | // final ``pw::stream::Writer``. | 
|  | static std::byte proto_encode_buffer[512]; | 
|  | ... | 
|  | pw::protobuf::NestedEncoder<kMaxDepth> proto_encoder(proto_encode_buffer); | 
|  | pw::snapshot::Snapshot::Encoder snapshot_encoder(&proto_encoder); | 
|  | size_t length = snprintf(temp_buffer, | 
|  | sizeof(temp_buffer, | 
|  | crash_info.reason_fmt), | 
|  | *crash_info.reason_args); | 
|  | snapshot_encoder.WriteReason(temp_buffer, length)); | 
|  |  | 
|  | // Final encode and write. | 
|  | Result<ConstByteSpan> encoded_proto = proto_encoder.Encode(); | 
|  | PW_TRY(encoded_proto.status()); | 
|  | PW_TRY(writer.Write(encoded_proto.value())); | 
|  | ... | 
|  | } | 
|  |  | 
|  | Capture CPU State | 
|  | ================= | 
|  | When using pw_cpu_exception, exceptions will automatically collect CPU state | 
|  | that can be directly dumped into a snapshot. As it's not always easy to describe | 
|  | a CPU exception in a single "reason" string, this captures the information | 
|  | needed to more verbosely automatically generate a descriptive reason at analysis | 
|  | time once the snapshot is retrieved from the device. | 
|  |  | 
|  | .. code-block:: cpp | 
|  |  | 
|  | Status CaptureSnapshot(CrashData& crash_info) { | 
|  | ... | 
|  |  | 
|  | proto_encoder.clear(); | 
|  |  | 
|  | // Write CPU state. | 
|  | if (crash_info.cpu_state) { | 
|  | PW_TRY(DumpCpuStateProto(snapshot_encoder.GetArmv7mCpuStateEncoder(), | 
|  | *crash_info.cpu_state)); | 
|  |  | 
|  | // Final encode and write. | 
|  | Result<ConstByteSpan> encoded_proto = proto_encoder.Encode(); | 
|  | PW_TRY(encoded_proto.status()); | 
|  | PW_TRY(writer.Write(encoded_proto.value())); | 
|  | } | 
|  | } | 
|  |  | 
|  | ----------------------- | 
|  | Snapshot Transfer Setup | 
|  | ----------------------- | 
|  | Pigweed’s pw_rpc system is well suited for retrieving a snapshot from a device. | 
|  | Pigweed does not yet provide a generalized transfer service for moving files | 
|  | to/from a device. When this feature is added to Pigweed, this section will be | 
|  | updated to include guidance for connecting a storage system to a transfer | 
|  | service. | 
|  |  | 
|  | ---------------------- | 
|  | Snapshot Tooling Setup | 
|  | ---------------------- | 
|  | Pigweed will provide Python tooling to dump snapshot protos as human-readable | 
|  | text dumps. This section will be updated as this functionality is introduced. |