hw/ip/alert_handler/doc/alert_handler.md - 3p/lowrisc/opentitan - Git at Google

 {{% lowrisc-doc-hdr Alert Handler Technical Specification }}
 {{% regfile alert_handler.hjson }}


 {{% section1 Overview }}

 This document specifies the functionality of the alert handler mechanism. The
 alert handler is a module that is a peripheral on the chip interconnect bus,
 and thus follows the [Comportability Specification](../../../../doc/rm/comportability_specification.md).
 It gathers alerts - defined as interrupt-type signals from other peripherals
 that are designated as a potential security threat - from throughout the design,
 and converts them to interrupts that the processor can handle. If the processor
 does not handle them, the alert handler mechanism will provide hardware
 responses to handle the threat.


 {{% toc 4 }}


 {{% section2 Features }}

 - Differentially-signaled, asynchronous alert inputs from NAlerts peripheral
 sources, where NAlerts is a function of the requirements of the peripherals

 - Ping testing of alert sources: responder module requests periodic alert
 response from each source to ensure proper wiring

 - Register locking on all configuration registers
     - Once locked, can not be modified by software

 - Register-based assignment of alert to alert-class
     - Four classes, can be individually disabled.
     - Each class generates one interrupt.
     - Disambiguation history for software to determine which alert caused the
     class interrupt.
     - Each class has configurable response time for escalation.
     - Disable allows for ignoring alerts, should only be used in cases when
     alerts are faulty. Undesirable access is enforced by locking the register
     state after initial configuration.

 - Register-based escalation controls
     - Number of alerts in class before escalation
     - Timeout for unhandled alert IRQs can also trigger escalation
     - Configurable escalation enables for 4 escalation signals
         - Could map to NMI, wipe secrets signal, lower permission, chip reset,
          etc.
         - Escalation signals differentially-signaled with heartbeat, should
          trigger response if differential or heartbeat failure at destination
     * Configurable time in cycles between each escalation level

 - Two locally sourced hardware alerts
     - Differential signaling from a source has failed
     - Ping response from a source has failed


 {{% section2 Description }}

 The alert handler module manages incoming alerts from throughout the system,
 classifies them, sends interrupts, and escalates interrupts to hardware
 responses if the processor does not respond to any interrupts. The intention is
 for this module to be a stand-in for security responses in the case where the
 processor can not handle the security alerts.

 It is first notable that all security alerts are rare events. Module and top
 level designers should only designate events as alerts if they are expected to
 never happen, and if they have potential security consequences. Examples are
 parity errors (which might indicate an attack), illegal actions on cryptography
 or security modules, physical sensors of environmental modification
 (e.g. voltage, temperature), etc. Alerts will be routed through this module and
 initially converted to interrupts for the processor to handle. The expectation
 is that the secure operating system has a protocol for handling any such alert
 interrupt in software. The operating system should respond, then clear the
 interrupt. Since these are possible security attacks, the response is not always
 obvious, but the response is beyond the scope of this document.

 This module is designed to help the full chip respond to security threats in the
 case where the processor is not trusted: either it has been attacked, or is not
 responding. It does this by escalating alerts beyond a processor interrupt. It
 provides four such escalation signals that can be routed to chip functions for
 attack responses. This could include such functions as wiping secret chip
 material, power down, reset, etc. It is beyond the scope of this document to
 specify what those escalation responses are at the chip level.

 To ease in the management of alerts by software, classification is provided
 whereby each alert can be classified into one of four classes. How the
 classification is done by software is beyond the scope of this document, but it
 is suggested that alerts of a similar profile (risk of occurring, level of
 security concern, frequency of false trigger, etc) are classed together. For
 each class a counter of alerts is kept, clearable by software. If that counter
 exceeds a programmable maximum value, then the escalation protocol for that
 class begins.

 The details for alert signaling, classification, and escalation are all given in
 the details section that follows.


 {{% section1 Theory of Operations }}


 {{% section2 Parameters }}

 The following table lists the main parameters used throughout the alert handler
 design. Note that the alert handler is generated based on the system
 configuration, and hence these parameters are placed into a package as
 "localparams". The parameterization rules are explained in more detail in the
 architectural description.

 Localparam     | Default (Max)         | Description
 ---------------|-----------------------|---------------
 `NAlerts`      | 8 (248)               | Number of alert instances. Maximum number bounded by LFSR implementation that generates ping timing.
 `EscCntWidth`  | 32 (32)               | Width of the escalation counters in bit.
 `AccuCntWidth` | 16 (32)               | Width of the alert accumulation counters in bit.
 `AsyncOn`      | '0 (2^`NAlerts`-1)    | This is a bit array specifying whether a certain alert sender / receiver pair goes across an asynchronous boundary or not.
 `LfsrSeed`     | '1 (2^31-1)           | Seed for the LFSR timer, must be nonzero.

 The next table lists free parameters in the `prim_alert_sender` and
 `prim_alert receiver` submodules.

 Parameter      | Default (Max)    | Description
 ---------------|------------------|---------------
 `AsyncOn`      | `1'b0` (`1'b1`)  | 0: Synchronous, 1: Asynchronous, determines whether additional synchronizer flops and logic need to be instantiated.

 {{% section2 Block Diagram }}

 The figure below shows a block diagram of the alert handler module, as well as
 a few examples of alert senders in other peripheral modules. In this diagram,
 there are seven sources of alerts: three sources from external modules (two from
 `periph0` and one from `periph1`), and four local sources (`alert_ping_fail`,
 `alert_sig_int`, `esc_ping_fail`, `esc_sig_int`). The local sources represent
 alerts that are created by this module itself. See the later section on special
 local alerts.

 ![Alert Handler Block Diagram](alert_handler_block_diagram.svg)

 Also shown are internal modules for classification, interrupt generation,
 accumulation, escalation and ping generation. These are described later in the
 document. Note that the differential alert sender and receiver blocks used for
 alert signaling support both _asynchronous_ and _synchronous_ clocking schemes,
 and hence peripherals able to raise alerts may be placed in clock domains
 different from that of the alert handler (Jittered clock domains are also
 supported in the asynchronous clocking scheme). Proper care must however be
 taken when formulating the timing constraints for the diff pairs, and when
 determining clock-dependent parameters (such as the ping timeout) of the design.
 On the escalation sender / receiver side, the differential signaling blocks
 employ a fully synchronous clocking scheme throughout.

 {{% section2 Signals }}

 The table below lists the alert handler module signals. The number of alert
 instances is parametric and hence alert and ping diff pairs are grouped together
 in packed arrays. The diff pair signals are indexed with the corresponding alert
 instance `<number>`.

 Signal                  | Direction        | Type           | Description
 ------------------------|------------------|----------------|---------------
 `tl_i`                  | `input`          | `tl_h2d_t`     | TileLink-UL input for control register access.
 `tl_o`                  | `output`         | `tl_d2h_t`     | TileLink-UL output for control register access.
 `irq_o[<class>]`        | `output`         | packed `logic` | Interrupt outputs, level sensitive, active high. Indices 0-3 correspond to classes A-D.
 `crashdump_o`           | `output`         | packed `struct`| This is a collection of alert handler state registers that can be latched by hardware debugging circuitry, if needed.
 `entropy_i`             | `input`          | `logic`        | Entropy input bit for LFSRtimer (can be connected to TRNG, otherwise tie off to `1'b0` if unused).
 `alert_pi/ni[<number>]` | `input`          | packed `logic` | Incoming alert or ping response(s), differentially encoded. Index range: `[NAlerts-1:0]`
 `ack_po/no[<number>]`   | `output`         | packed `logic` | Outgoing alert acknowledgment, differentially encoded. Index range: `[NAlerts-1:0]`
 `ping_po/no[<number>]`  | `output`         | packed `logic` | Ping request to alert sender, differentially encoded. Index range: `[NAlerts-1:0]`
 `esc_po/no[<sev>]`      | `output`         | packed `logic` | Escalation or ping request, differentially encoded. Index corresponds to severity level, and ranges from 0 to 3.
 `resp_pi/ni[<sev>]`     | `input`          | packed `logic` | Escalation ping response, differentially encoded. Index corresponds to severity level, and ranges from 0 to 3.


 For each alert, there is a pair of input and two pairs of output signals. These
 signals are connected to a differential sender module within the source, and a
 differential receiver module within the alert handler. Both of these modules are
 described in more detail in the following section. These signal pairs carry
 differentially encoded messages that enable two types of signaling: a native
 alert and a ping/response test of the alert mechanism. The latter is to ensure
 that all alert senders are always active and have not been the target of an
 attack. Note that low power states are not considered at this time, but could
 affect the signaling and testing of alerts.

 The `crashdump_o` struct outputs a collection of CSRs and alert handler state
 bits that can be latched by hardware debugging circuitry. This can be useful for
 extracting more information about possible failures or bugs without having to
 use the tile-link bus interface (which may become unresponsive under certain
 circumstances). It is recommended for the top level to store this information in
 an always-on location.

 {{% section2 Design Details }}

 This section gives the full design details of the alert handler module and its
 submodules.


 ### Alert Definition

 Alerts are defined as events that have security implications, and should be
 handled by the main processor, or escalated to other hardware modules to take
 action. Each peripheral has the option to define one or more alert signals.
 Those peripherals should instantiate one module (`prim_alert_sender`) to convert
 the event associated with that alert into a signal to the alert handler module.
 The alert handler instantiates one receiver module (`prim_alert_receiver`) per
 alert, then handles the classification, accumulation, and escalation of the
 received signal. The differential signaling submodules may either use a
 synchronous or asynchronous clocking scheme, since the message type to be
 transferred is a single discrete event.


 ### Differential Alert Signaling

 Each alert sender is connected to the corresponding alert receiver via the 3
 differential pairs `alert_p/n`, `ack_p/n` and `ping_p/n`, as illustrated below:

 ![Alert Handler Alert RXTX](alert_handler_alert_rxtx.svg)

 Alerts are encoded differentially and signaled using a full handshake on the
 `alert_p/n` and `ack_p/n` wires. The use of a full handshake protocol enables to
 use this mechanism with an asynchronous clocking strategy, where peripherals may
 reside in a different clock domain than the alert handler. The full handshake
 guarantees that alert messages are correctly back-pressured and no alert is
 "lost" at the asynchronous boundary due to (possibly variable) clock ratios
 greater or less than 1.0. The "native alert message" will be repeated on the
 output wires as long as the alert event is still true within the peripheral.

 The wave pattern below illustrates differential full handshake mechanism.

 ```wavejson
 {
   signal: [
     { name: 'clk_i',        wave: 'p...............' },
     { name: 'alert_i',      wave: '01.|..|..|...|..' },
     { name: 'alert_p',      wave: '01.|..|0.|..1|..' , node: '.a.....c....e'},
     { name: 'alert_n',      wave: '10.|..|1.|..0|..' },
     { name: 'ack_p',        wave: '0..|1.|..|0..|1.' , node: '....b.....d..'},
     { name: 'ack_n',        wave: '1..|0.|..|1..|0.' },
     { name: 'alert_o',      wave: '0..|10|..|...|10' },
   ],
   edge: [
    'a~>b Phase 0->1'
    'b~>c Phase 1->2'
    'c~>d Phase 2->3'
    'd~>e 2 Pause Cycles'
   ],
   head: {
     text: 'Alert signaling and repeat pattern',
   },
   foot: {
     text: 'Native alert at time 1 with 4-phase handshake; repeated alert at time 12;',
     tick: 0,
   }
 }
 ```

 The repeat pattern for this alert is such that as long as the alert is true,
 the handshake is re-initiated after waiting pausing for 2 cycles on the sender
 side.

 Note that the alert is immediately propagated to the `alert_o` output once the
 initial level change on `alert_p/n` has been received and synchronized to the
 local clock. This ensures that the first occurrence of an alert is always
 propagated - even if the handshake lines have been manipulated to emulate
 backpressure. (In such a scenario, all subsequent alerts would be back-pressured
 and eventually the ping testing mechanism described in the next subsection would
 detect that the wires have been tampered with.)

 The alert sender and receiver modules can either be used synchronously or
 asynchronously. The signaling protocol remains the same in both cases, but the
 additional synchronizer flops at the diff pair inputs may be omitted, which
 results in lower signaling latency.

 ### Ping Testing

 In order to ensure that the event sending modules have not been compromised,
 the alert receiver module `prim_alert_receiver` will "ping" or line-test the
 senders periodically every few microseconds. Pings timing is randomized so their
 appearance can not be predicted.


 The ping timing is generated by a central LFSR-based timer within the alert
 handler, that randomly asserts the `ping_en_i` signal of a particular
 `prim_alert_receiver` module. Once the ping enable signal is asserted, the
 receiver module encodes the ping message as a level change on the differential
 `ping_p/n` output, and waits until the sender responds with a full handshake on
 the `alert_p/n` and `ack_p/n` lines. Once that handshake is complete, the
 `ping_ok_o` signal is asserted. The LFSR timer has a programmable ping timeout,
 after which it will automatically assert a "pingfail" alert. That timeout is a
 function of the clock ratios present in the system, and has to be programmed
 accordingly at system startup (as explained later in the LFSR timer subsection).

 The following wave diagram illustrates a correct ping sequences:

 ```wavejson
 {
   signal: [
     { name: 'clk_i',        wave: 'p..............' },
     { name: 'ping_en_i',    wave: '01.|..|..|..|.0' },
     { name: 'ping_ok_o',    wave: '0..|..|..|..|10' , node: '.............e'},
     { name: 'ping_p',       wave: '01.|..|..|..|..' , node: '.a'},
     { name: 'ping_n',       wave: '10.|..|..|..|..' , node: '.b'},
     { name: 'alert_p',      wave: '0..|1.|..|0.|..' , node: '....c'},
     { name: 'alert_n',      wave: '1..|0.|..|1.|..' },
     { name: 'ack_p',        wave: '0..|..|1.|..|0.' , node: '.............d'},
     { name: 'ack_n',        wave: '1..|..|0.|..|1.' },
   ],
   edge: [
    'a-b'
    'b-~>c ping response'
    'd->e response complete'
   ],
   head: {
     text: 'Ping testing',
   },
   foot: {
     text: 'Level change at time 1 triggers a full handshake (ping response) at time 4',
     tick: 0,
   }
 }
 ```

 In the unlikely case that a ping request collides with a native alert at the
 sender side, the native alert is held back until the ping handshake has been
 completed. This slightly delays the transmission of a native alert, but the
 alert will eventually be signaled. Further, if an alert is sent out right
 before a ping requests comes in at the sender side, the receiver will treat the
 alert as a ping response. However, the "true" ping response will be returned
 right after the alert handshake completed, and thus the alert will eventually
 be signaled with a slight delay.

 Note that in both collision cases mentioned, the delay will be in the order of
 the handshake length, plus the constant amount of pause cycles between
 handshakes (2 sender cycles).


 ### Monitoring of Signal Integrity Issues

 All differential pairs are monitored for signal integrity issues, and if an
 encoding failure is detected, the receiver module asserts a signal integrity
 alert via `integ_fail_o`. In particular, this covers the following failure
 cases:

 1. The `alert_p/n` pair is not correctly encoded. This can be detected directly
 at the receiver side.

 2. The `ping_p/n` or the `ack_p/n` pairs are not correctly encoded. This is
 signaled to the receiver by setting the `alert_p/n` wires to the same value,
 and that value will be continuously toggled. This implicitly triggers a signal
 integrity alert on the receiver side.

 Some of these failure patterns are illustrated in the wave diagram below:

 ```wavejson
 {
   signal: [
     { name: 'clk_i',        wave: 'p..............' },
     { name: 'alert_p',      wave: '0.1...|0..10101' , node: '..a.......d'},
     { name: 'alert_n',      wave: '1.....|....0101' },
     { name: 'ack_p',        wave: '0.....|.1......' , node: '........c'},
     { name: 'ack_n',        wave: '1.....|........' },
     { name: 'integ_fail_o', wave: '0...1.|0....1..' , node: '....b.......e'},
   ],
   edge: [
    'a~>b sigint issue detected'
    'c~>d'
    'd~>e indirect sigint issue detected'
   ],
   head: {
     text: 'Detection of Signal Integrity Issues',
   },
   foot: {
     text: 'signal integrity issues occur at times 2 and 8; synchronizer latency is 2 cycles.',
     tick: 0,
   }
 }
 ```

 Note that if signal integrity failures occur during ping or alert handshaking,
 it is possible that the protocol state-machines lock up and the alert sender and
 receiver modules become unresponsive. However, the above mechanisms ensure that
 this will always trigger either a signal integrity alert or eventually a
 "pingfail" alert.

 ### Skew on Asynchronous Differential Pairs

 Note that there is likely a (small) skew present within each differential pair
 of the signaling mechanism above. Since these pairs cross clock domain
 boundaries, it may thus happen that a level change appears in staggered manner
 after resynchronization, as illustrated below:

 ```wavejson
 {
   signal: [
     { name: 'clk_i',   wave: 'p...........' },
     { name: 'diff_p',  wave: '0.1.|.0.|..1' , node: '......a....d' },
     { name: 'diff_n',  wave: '1.0.|..1|.0.' , node: '.......b..c.' },
   ],
   edge: [
    'a-~>b skew',
    'c-~>d skew'
   ],
   head: {
     text: 'Skewed diff pair',
   },
   foot: {
     text: 'Correctly sampled diff pair at time 2; staggered samples at time 6-7 and 10-11',
     tick: 0,
   }
 }
 ```

 This behavior is permissible, but needs to be accounted for in the protocol
 logic. Further, the skew within the differential pair should be constrained to
 be smaller than the shortest clock period in the system. This ensures that the
 staggered level changes appear at most 1 cycle apart from each other.


 ### LFSR Timer

 The ping_en inputs of all signaling modules
 (`prim_alert_receiver`, `prim_esc_sender`) instantiated within the alert handler
 are connected to a central ping timer that determines the random amount of wait
 cycles between ping requests. Further, this ping timer also randomly selects a
 particular line to be pinged.  That should make it more difficult to predict the
 next ping occurrence based on past observations.

 The ping timer is implemented using an
 [LFSR-based PRNG of Galois type](../../prim/doc/prim_lfsr.md). In order
 to increase the entropy of the pseudo random sequence, 1 random bit from the
 TRNG is XOR'ed into the LFSR state every time a new random number is drawn
 (which happens every few 10k cycles). The LFSR is 32bits wide, but only 24bits
 of its state are actually being used to generate the random timer count and
 select the alert/escalation line to be pinged. I.e., the 32bits first go through
 a fixed permutation function, and then bits `[23:16]` are used to determine which
 peripheral to ping. The random cycle count is created by splitting bits `[15:0]`
 and concatenating them as follows:

 ```
 cycle_cnt = {permuted[15:2], 8'b01, permuted[1:0]}
 ```

 This constant DC offset introduces a minimum ping spacing of 4 cycles (1 cycle +
 margin) to ensure that the handshake protocols of the sender receiver pairs work.

 After selecting one of the peripherals to ping, the LFSR timer waits until
 either the corresponding `*_ping_ok[<number>]`  signal is asserted, or until the
 programmable ping timeout value is reached. In both cases, the LFSR timer
 proceeds with the next ping, but in the second case it will additionally raise a
 "pingfail" alert. The ping enable signal remains asserted during the time where
 the LFSR counter waits.

 The timeout value is a function of the ratios between the alert handler clock
 and peripheral clocks present in the system, and can be programmed at startup
 time via the register !!PING_TIMEOUT_CYC. Note that this register is locked in
 together with the alert enable and disable configuration.

 The ping timer starts as soon as the initial configuration phase is over and the
 registers have been locked in.

 Note that the ping timer directly flags a "pingfail" alert if a spurious "ping
 ok" message comes in that has not been requested.


 ### Alert Receiving

 The alert handler module contains one alert receiver module
 (`prim_alert_receiver`) per sending module. This receiver module has three
 outputs based upon the signaling of the input alert. Primary is the signal of a
 received native alert, shown in the top-level diagram as
 `alert_triggered[<number>]`. Also generated are two other outputs, one that
 signals a differential encoding error (`alert_integ_fail[<number>]`), and one
 that signals the receipt of a ping response (`alert_ping_ok[<number>]`). Each
 "triggered" alert received is sent into the classification block for individual
 configuration. All of the `integ_fail` signals are OR'ed together to create one
 alert for classification. The ping responses are fed to the LFSR timer, which
 determines whether a ping has correctly completed within the timeout window or
 not.


 ### Alert Classification and Interrupts

 Each of the incoming and local alert signals can be classified generically to
 one of four classes, or disabled for no classification at all. These are the
 classes A, B, C, and D. There is no pre-determined definition of a class, that
 is left to software. But for guidance, software can consider that some alert
 types are similar to others; some alert types are more "noisy" than others
 (i.e. when triggered they stay on for long periods of time); some are more
 critical than others, etc.

 For each alert class (A-D), an interrupt is generally sent. Like all other
 peripheral interrupts, there is a triad of registers: enable, status, test. Thus
 like all other interrupts, software should handle the source of the interrupt
 (in this case, the original alert), then clear the state. Since the interrupt
 class is disassociated with the original alert (due to the classification
 process), software can access cause registers to determine which alerts have
 fired since the last clearing. Since alerts are expected to be rare (if ever)
 events, the complexity of dealing with multiple interrupts per class firing
 during the same time period should not be of concern. See the programming
 section on interrupt clearing.

 Each of the four interrupts can optionally trigger a timeout counter that
 triggers escalation if the interrupt is not handled and cleared within a certain
 time frame. This feature is explained in more detail in the next subsection
 about escalation mechanisms.

 Note that an interrupt always fires once an alert has been registered in the
 corresponding class. Interrupts are not dependent on escalation mechanisms like
 alert accumulation or timeout as described in the next subsection.


 ### Escalation Mechanisms

 There are two mechanisms per class that can trigger the corresponding escalation
 protocol:

 1. The first consists of an accumulation counter that counts the amount of alert
    occurrences within a particular class. An alert classified to class A
    indicates that on every received alert trigger, the accumulation counter for
    class A is incremented. Note: since alerts are expected to be rare or never
    occur, the module does not attempt to count every alert per cycle, but rather
    all triggers per class are ORd before sending to the accumulation counter as
    an increment signal. Once the threshold has been reached, the next occurrence
    triggers the escalation escalation protocol for this particular class. The
    counter is a saturation counter, meaning that it will not wrap around once it
    hits the maximum representable count. This mechanism has two associated CSRs:

     - Accumulation max value. This is the total number (sum of all alerts
       classified in this group) of alerts required to enter escalation phase
       (see below). Example register is !!CLASSA_ACCUM_THRESH.
     - Current accumulation register. This clearable register indicates how many
       alerts have been accumulated to date. Software should clear before it
       reaches the accumulation setting to avoid escalation. Example register is
       !!CLASSA_ACCUM_CNT.

 2. The second way is an interrupt timeout counter which triggers escalation
    if an alert interrupt is not handled within the programmable timeout
    window. Once the counter hits the timeout threshold, the escalation
    protocol is triggered. The corresponding CSRs are:

     - Interrupt timeout value in cycles !!CLASSA_TIMEOUT_CYC. The
       interrupt timeout is disabled if this is set to 0 (default).
     - The current interrupt timeout value can be read via !!CLASSA_ESC_CNT
       if !!CLASSA_STATE is in the `Timeout` state. Software should clear the
       corresponding interrupt state bit !!INTR_STATE.CLASSA before the
       timeout expires to avoid escalation.

 Technically, the interrupt timeout feature (2. above) is implemented using the
 same counter used to time the escalation phases. This is possible since
 escalation phases or interrupt timeout periods are non-overlapping (escalation
 always takes precedence should it be triggered).


 ### Programmable Escalation Protocol

 There are four output escalation signals, 0, 1, 2, and 3. There is no
 predetermined definition of an escalation signal, that is left to the
 top-level integration. Examples could be processor Non Maskable Interrupt (NMI),
 permission lowering, secret wiping, chip reset, etc. Typically the assumption
 is that escalation level 0 is the first to trigger, followed by 1, 2, and then
 3, emulating a "fuse" that is lit that can't be stopped once the first triggers
 (this is however not a requirement). See register section for discussion of
 counter clearing and register locking to determine the finality of accumulation
 triggers.

 Each class can be programmed with its own escalation protocol. If one of the two
 mechanisms described above fires, a timer for that particular class is started.
 The timer can be programmed with up to 4 delays (e.g., !!CLASSA_PHASE0_CYC),
 each representing a distinct escalation phase (0 - 3). Each of the four
 escalation severity outputs (0 - 3) are by default configured to be asserted
 during the corresponding phase, e.g., severity 0 in phase 0,  severity 1 in
 phase 1, etc. However, this mapping can be freely reassigned by modifying the
 corresponding enable/phase mappings (e.g., !!CLASSA_CTRL.E0_MAP for enable bit
 0 of class A). This mapping will be locked in together with the alert enable
 configuration after initial configuration.

 SW can stop a triggered escalation protocol by clearing the corresponding
 escalation counter (e.g., !!CLASSA_ESC_CNT). Protection of this clearing is up
 to software, see the register control section that follows for
 !!CLASSA_CTRL.LOCK.

 It should be noted that each of the escalation phases have a duration of at
 least 1 clock cycle, even if the cycle count of a particular phase has been
 set to 0.

 The next waveform shows the gathering of alerts of one class until eventually
 the escalation protocol is engaged. In this diagram, two different alerts are
 shown for class A, and the gathering and escalation configuration values are
 shown.

 ```wavejson
 {
   signal: [
     { name: 'clk_i',                wave: 'p...................' },
     { name: 'CLASSA_ACCUM_THRESH',  wave: '2...................', data: ['15'] },
     { name: 'CLASSA_PHASE0_CYC',    wave: '2...................', data: ['1e3 cycles'] },
     { name: 'CLASSA_PHASE1_CYC',    wave: '2...................', data: ['1e4 cycles'] },
     { name: 'CLASSA_PHASE2_CYC',    wave: '2...................', data: ['1e5 cycles'] },
     { name: 'CLASSA_PHASE3_CYC',    wave: '2...................', data: ['1e6 cycles'] },
     { name: 'alert_triggered[0]',   wave: '010|.10.............' },
     { name: 'alert_triggered[1]',   wave: '0..|10..............' },
     { name: 'CLASSA_ACCUM_CNT',     wave: '33.|33..............', data: ['0', '1','15','16'] },
     { name: 'irq_o[0]',             wave: '01.|................' },
     { name: 'CLASSA_STATE',         wave: '3..|.3|3.|3..|3..|3.', data: ['Idle', '   Phase0','Phase1','Phase2','Phase3','Terminal'] },
     { name: 'CLASSA_ESC_CNT',       wave: '3..|.3|33|333|333|3.', data: ['0','1','1','2','1','2','3','1','2','3','0'] },
     { name: 'esc_po[0]',            wave: '0..|.1|0............', node: '.....a.b' },
     { name: 'esc_no[0]',            wave: '1..|.0|1............' },
     { name: 'esc_po[1]',            wave: '0..|..|1.|0.........', node: '.......c..d' },
     { name: 'esc_no[1]',            wave: '1..|..|0.|1.........' },
     { name: 'esc_po[2]',            wave: '0..|.....|1..|0.....', node: '..........e...f' },
     { name: 'esc_no[2]',            wave: '1..|.....|0..|1.....' },
     { name: 'esc_po[3]',            wave: '0..|.........|1..|0.', node: '..............g...h' },
     { name: 'esc_no[3]',            wave: '1..|.........|0..|1.' },
   ],
   edge: [
    'a->b 1e3 cycles',
    'c->d 1e4 cycles',
    'e->f 1e5 cycles',
    'g->h 1e6 cycles',
   ],
   head: {
     text: 'Alert class gathering and escalation triggers (fully synchronous case)',
   },
   foot: {
     text: 'alert class A gathers 16 alerts, triggers first escalation, followed by three more',
     tick: 0,
     }
 }
 ```

 In this diagram, the first alert triggers an interrupt to class A. The
 assumption is that the processor is wedged or taken over, in which case it
 does not handle the interrupt. Once enough interrupts gather (16 in this case),
 the first escalation phase is entered, followed by three more (each phase has
 its own programmable length). Note that the accumulator threshold is set to 15
 in order to trigger on the 16th occurrence. If escalation shall be triggered on
 the first occurrence within an alert class, the accumulation threshold shall be
 set to 0. Also note that it takes one cycle to activate escalation and enter
 phase 0.


 The next wave shows a case where an interrupt remains unhandled and hence the
 interrupt timeout counter triggers escalation.

 ```wavejson
 {
   signal: [
     { name: 'clk_i',                   wave: 'p.....................' },
     { name: 'CLASSA_TIMEOUT_CYC',      wave: '2.....................', data: ['1e4 cycles'] },
     { name: 'alert_triggered[0]',      wave: '010.|.................' },
     { name: 'irq_o[0]',                wave: '01..|.................', node: '.a..|.b' },
     { name: 'CLASSA_ESC_STATE',        wave: '33..|.3|3.|3..|3...|3.', data: ['Idle', 'Timeout','   Phase0','Phase1','Phase2','Phase3','Terminal'] },
     { name: 'CLASSA_ESC_CNT',          wave: '3333|33|33|333|3333|3.', data: ['0', '1','2','3','1e4','1','1','2','1','2','3','1','2','3','4','0'] },
     { name: 'esc_po[0]',               wave: '0...|.1|0.............' },
     { name: 'esc_no[0]',               wave: '1...|.0|1.............' },
     { name: 'esc_po[1]',               wave: '0...|..|1.|0..........' },
     { name: 'esc_no[1]',               wave: '1...|..|0.|1..........' },
     { name: 'esc_po[2]',               wave: '0...|.....|1..|0......' },
     { name: 'esc_no[2]',               wave: '1...|.....|0..|1......' },
     { name: 'esc_po[3]',               wave: '0...|.........|1...|0.' },
     { name: 'esc_no[3]',               wave: '1...|.........|0...|1.' },
   ],
   edge: [
    'a->b 1e4 cycles',
   ],
   head: {
     text: 'Escalation due to an interrupt timeout (fully synchronous case)',
   },
   foot: {
     text: 'alert class A triggers an interrupt and the timeout counter, which eventually triggers escalation after 1e4 cycles.',
     tick: 0,
     }
 }
 ```

 ### Escalation Signaling

 For each of the four escalation severities, the alert handler instantiates a
 `prim_esc_sender` module and each of the four escalation countermeasures
 instantiates an `prim_esc_receiver` module. The signaling mechanism has
 similarities with the alert signaling mechanism - but it is a fully
 synchronous protocol. Hence, it must be ensured at the top-level that all
 escalation sender and receiver modules are using the same clock and reset
 signals.

 As illustrated in the following block diagram, a sender-receiver pair is
 connected with two differential lines, one going from sender to receiver and
 the other going from receiver to sender.

 ![Alert Handler Escalation RXTX](alert_handler_escalation_rxtx.svg)

 Upon receiving an escalation enable pulse of width N > 0 at the `esc_en_i`
 input, the escalation sender encodes that signal as a differential pulse of
 width N+1 on `esc_po/no`. The receiver decodes that message and asserts the
 `esc_en_o` output after one cycle of delay. Further, it acknowledges the
 receipt of that message by continuously toggling the `resp_po/no` signals as
 long as the escalation signal is asserted. Any failure to respond correctly
 will trigger a `integ_fail_o` alert, as illustrated below:

 ```wavejson
 {
   signal: [
     { name: 'clk_i',        wave: 'p..................' },
     { name: 'ping_en_i',    wave: '0........|.........' },
     { name: 'ping_ok_o',    wave: '0........|.........' },
     { name: 'integ_fail_o', wave: '0........|..1010...' , node: '............b.d' },
     { name: 'ping_fail_o',  wave: '0........|.........' },
     { name: 'esc_en_i',     wave: '01....0..|.1....0..' },
     { name: 'resp_p',       wave: '0.101010.|.........',  node: '............a.c' },
     { name: 'resp_n',       wave: '1.010101.|.........' },
     { name: 'esc_p',        wave: '01.....0.|.1.....0.' },
     { name: 'esc_n',        wave: '10.....1.|.0.....1.' },
     { name: 'esc_en_o',     wave: '0.1....0.|..?....0.'},
   ],
   edge: [
    'a~>b missing response'
    'c~>d'
   ],
   head: {
     text: 'Escalation signaling and response',
   },
   foot: {
     text: 'escalation enable pulse shown at input sender at time 1 and 11; missing response and repeated integfail at time 12 and 14',
     tick: 0,
   }
 }
 ```

 Further, any differential signal mismatch on both the `esc_p/n` and `resp_p/n`
 lines will trigger an `integ_fail_o` alert. Mismatches on `resp_p/n` can be
 directly detected at the sender. Mismatches on the `esc_p/n` line will be
 signaled back to the sender by setting both the positive and negative response
 wires to the same value - and that value is being toggled each cycle. This
 implicitly triggers a signal integrity alert on the sender side.

 This back-signaling mechanism can be leveraged to fast-track escalation and use
 another countermeasure in case it is detected that a particular escalation
 signaling path has been tampered with.

 Some signal integrity failure cases are illustrated in the wave diagram below:

 ```wavejson
 {
   signal: [
     { name: 'clk_i',        wave: 'p...........' },
     { name: 'ping_en_i',    wave: '0....|......' },
     { name: 'ping_ok_o',    wave: '0....|......' },
     { name: 'integ_fail_o', wave: '0.1.0|.1....' , node: '..b....e' },
     { name: 'esc_en_i',     wave: '0....|......' },
     { name: 'resp_p',       wave: '0.1.0|..1010',  node: '..a..' },
     { name: 'resp_n',       wave: '1....|.01010',  node: '.......d' },
     { name: 'esc_p',        wave: '0....|1.....',  node: '......c..' },
     { name: 'esc_n',        wave: '1....|......' },
     { name: 'esc_en_o',     wave: '0....|......'},
   ],
   edge: [
    'a~>b'
    'c->d'
    'd->e'
   ],
   head: {
     text: 'possible signal integrity failure cases',
   },
   foot: {
     text: 'direct signal integrity failure at time 2; indirect failure at time 6',
     tick: 0,
   }
 }
 ```


 ### Ping Testing of the Escalation Signals

 Similarly to the alert signaling scheme, the escalation signaling lines can
 be pinged / line tested in order to test whether the escalation receiver has
 been tampered with. This is achieved by asserting ping_en_i at the escalation
 sender module. A ping request is encoded as a single cycle pulse on the
 `esc_po/no` outputs. Hence, the receiver module will not decode this single
 cycle pulse as an escalation enable message, but it will respond to it with a
 "1010" pattern on the `resp_p/n` lines. The escalation sender module will assert
 `ping_ok_o` if that pattern is received correctly after one cycle of latency.
 Otherwise the LFSR timer will raise a "pingfail" alert after the programmable
 timeout is reached.

 ```wavejson
 {
   signal: [
     { name: 'clk_i',        wave: 'p..............' },
     { name: 'ping_en_i',    wave: '01....0|.1.0...' ,  node: '.a'},
     { name: 'ping_ok_o',    wave: '0....10|.......' ,  node: '.....e....g'},
     { name: 'integ_fail_o', wave: '0......|.......' },
     { name: 'esc_en_i',     wave: '0......|.......' },
     { name: 'resp_p',       wave: '0.1010.|.......' ,  node: '..c..d....f'},
     { name: 'resp_n',       wave: '1.0101.|.......' },
     { name: 'esc_p',        wave: '010....|.10....' ,  node: '.b'},
     { name: 'esc_n',        wave: '101....|.01....' },
     { name: 'esc_en_o',     wave: '0......|.......' },
   ],
   edge: [
   'a->b'
   'b->c'
   'd->e correct response'
   'f->g missing response'
   ],
   head: {
     text: 'ping testing of escalation lines',
   },
   foot: {
     text: 'ping trig at sender input at time 1 and 9; correct response at time 5; missing response at time 10',
     tick: 0,
   }
 }
 ```

 Note that the escalation signal always takes precedence, and the `ping_en_i`
 will just be acknowledged with `ping_ok_o` in case `esc_en_i` is already
 asserted. An ongoing ping sequence will be aborted immediately.


 {{% section1 Programmer's Guide }}


 {{% section2 Power-up and Reset Considerations }}

 False alerts during power-up and reset are not possible since the alerts are
 disabled by default, and need to be configured and locked in by the firmware.

 The ping timer won't start until initial configuration is over and the registers
 are locked in.


 {{% section2 Initialization }}

 To initialize the block, software running at a high permission level (early in
 the security settings process) should do the following:

 1. foreach alert and each local alert:

     - Determine if alert is enabled (should only be false if alert is known to
       be faulty). Set !!ALERT_EN.EN_A0 and !!LOC_ALERT_EN.EN_LA0 accordingly.

     - Determine which class (A..D) the alert is associated with. Set
       !!ALERT_CLASS.CLASS_A and !!LOC_ALERT_CLASS.CLASS_LA accordingly.

 2. Set the ping timeout value !!PING_TIMEOUT_CYC. This value is dependent on
    the clock ratios present in the system.

 3. foreach class (A..D):

     - Determine whether to enable escalation mechanisms (accumulation /
       interrupt timeout) for this particular class. Set !!CLASSA_CTRL.EN
       accordingly.

     - Determine if this class of alerts allows clearing of
       escalation once it has begun. Set !!CLASSA_CTRL.LOCK to true if clearing
       should be disabled. If true, once escalation protocol begins, it can not
       be stopped, the assumption being that it ends in a chip reset else it
       will be rendered useless thenceforth.

     - Determine the number of alerts required to be accumulated before
       escalation protocol kicks in. Set !!CLASSA_ACCUM_THRESH accordingly.

     - Determine whether the interrupt associated with that class needs a
       timeout. If yes, set !!CLASSA_TIMEOUT_CYC to an appropriate value greater
       than zero (zero corresponds to an infinite timeout and disables the
       mechanism).

     - Foreach escalation phase (0..3):
         - Determine length of each escalation phase by setting
           !!CLASSA_PHASE0_CYC accordingly

     - Foreach escalation signal (0..3):
         - Determine whether to enable the escalation signal, and set the
           !!CLASSA_CTRL.E0_EN bit accordingly (default is enabled).
         - Determine the phase -> escalation mapping of this class and
           program it via the !!CLASSA_CTRL.E0_MAP values if it needs to be
           changed from the default mapping (0->0, 1->1, 2->2, 3->3).

 4. After initial configuration at startup, lock the alert enable and escalation
 config registers by writing 1 to !!REGEN. This protects the registers from being
 altered later on, and activates the ping mechanism for the enabled alerts and
 escalation signals.


 {{% section2 Interrupt Handling }}

 For every alert that is enabled, an interrupt will be triggered on class A, B,
 C, or D. To handle an interrupt of a particular class, software should execute
 the following steps:

 1. If needed, check the escalation state of this class by reading
    !!CLASSA_STATE. This reveals whether escalation protocol has been triggered
    and in which escalation phase the class is. In case interrupt timeouts are
    enabled the class will be in timeout state unless escalation has already been
    triggered. The current interrupt or escalation cycle counter can be read via
    !!CLASSA_ESC_CNT.

 2. Since the interrupt does not indicate which alert triggered, SW must read the
    cause registers !!LOC_ALERT_CAUSE and !!ALERT_CAUSE etc. The cause bits of
    all alerts are concatenated and chunked into 32bit words. Hence the register
    file contains as many cause words as needed to cover all alerts present in
    the system. Each cause register contains a sticky bit that is set by the
    incoming alert, and is clearable with a write by software. This should only
    be cleared after software has cleared the event trigger, if applicable. It is
    possible that the event requires no clearing (e.g. a parity error), or can't
    be cleared (a breach in the metal mesh protecting the chip).

    Note that in the rare case when multiple events are triggered at or about the
    same time, all events should be cleared before proceeding.

 3. After the event is cleared (if needed or possible), software should handle
 the interrupt as follows:

     - Resetting the accumulation register for the class by writing !!CLASSA_CLR.
       This also aborts escalation protocol if it has been triggered. If for some
       reason it is desired to never allow the accumulator or escalation to be
       cleared, software can initialize the !!CLASSA_CLREN register to zero.
       If !!CLASSA_CLREN is already false when an alert interrupt is detected
       (either due to software control or hardware trigger via
       !!CLASSA_CTRL.LOCK), then the accumulation counter can not be cleared and
       this step has no effect.

     - After the accumulation counter is reset (if applicable), software should
       clear the class A interrupt state bit !!INTR_STATE.CLASSA. Clearing the
       class A interrupt state bit also clears and stops the interrupt timeout
       counter (if enabled).

 Note that testing interrupts by writing to the interrupt test registers does
 also trigger the internal interrupt timeout (if enabled), since the interrupt
 state is used as enable signal for the timer. However, alert accumulation will
 not be triggered by this testing mechanism.


 {{% section1 Additional Notes}}


 ### Timing Constraints

 The skew within all differential signal pairs has to be constrained to be
 strictly smaller than the shortest clock period in the system. The maximum
 propagation delay of differential pair should also be constrained (although it
 may be longer than the clock periods involved).


 ### Fast-track Alerts

 Note that it is possible to program a certain class to provide a fast-track
 response for critical alerts by setting its accumulation trigger value to 1,
 and configuring the escalation protocol such that the appropriate escalation
 measure is triggered within escalation phase 0. This results in a minimal
 escalation latency of 4 clock cycles from alert sender input to escalation
 receiver output in the case where all involved signaling modules are
 completely synchronous with the alert handler. In case the alert sender is
 asynchronous w.r.t. to the alert handler, the actual latency depends on the
 clock periods involved. Assuming both clocks have the same frequency alert
 propagation takes at least 6-8 clock alert handler clock cycles.

 For alerts that mandate an asynchronous response (i.e. without requiring a
 clock to be active), it is highly recommended to build a separate network at
 the top-level. That network should OR' the critical sources together and
 route the asynchronous alert signal directly to the highest severity
 countermeasure device. Examples for alert conditions of this sort would be
 attacks on the secure clock.


 ### Future Improvements and Enhancements

 The list below sketches ideas for features and improvements which might be
 added in the future.

 - Ping pause support for IPs with low-power modes. This feature is a function
   of dedicated (and optional) low-power state signals exposed by the peripheral
   IPs. The software has only read access to these control signals. I.e., the
   idea is to have (possibly differentially encoded) power status bits coming
   from the peripheral IPs which can pause ping testing IFF this low-power mode
   has been enabled during initial configuration.

 - Support for LFSR cross checking. This feature adds a second redundant LFSR to
   enable continuous comparison of the timer states. If they are not consistent
   (e.g. due to a glitch attack), a local alert will be raised. One idea would be
   to operate a Galois and Fibonacci implementation in parallel, since under
   certain circumstances they are guaranteed to produce the same sequence - yet
   with a differently wired logic and hence different response to glitching
   attacks. This needs some further theoretical work in order to validate whether
   this can work (there may be restrictions on the polynomials that can be used,
   and the initial states of the two LFSRs need to be correctly computed).

 - Monitoring of the pings in order to make sure that every peripheral has been
   pinged during a certain time window. Since a full histogram approach would be
   expensive and is not really needed, the idea is to add a bank of set regs (one
   for each peripheral) which are set whenever a peripheral has successfully been
   pinged. Every now and then, these registers need to be checked. They could be
   exposed to SW in a similar manner as the cause / interrupt state bits such
   that SW can do the checking. HW could check these registers after a certain
   amount of correctly executed pings, and raise a local alert if not all
   peripherals have been exercised. Note however that since we have an entropy
   input in the LFSR timer, there is no way to guarantee a safe #pings threshold
   in this case. I.e., there is always a small probability of triggering false
   positives which have to be handled in SW.


 {{% section1 Register Table }}

 The below register description can be generated with the `reg_alert_handler.py`
 script. The reason for having yet another script for register generation is
 that the alert handler is configurable for the number of alert sources
 (similar to the rv_plic design).

 The register description below may not match with the actual instance in Top
 Earlgrey, since the alert handler is generated by the topgen tool so that the
 number of alert sources matches.

 In order to generate the register file below, from `hw/ip/alert_handler/doc`:

 ```console
 $ ./reg_alert_handler.py alert_handler.hjson.tpl -n 4 > alert_handler.hjson
 ```

 This creates the register file for 4 alert sources.

 Note that you should also update the regfile wrapper after updating the
 regfile using

 ```console
 $ ./alert_handler_reg_wrap.sv.tpl -n 4 > ../rtl/alert_handler_reg_wrap.sv
 ```


 {{% registers x }}