doc/security/implementation_guidelines/hardware/README.md - 3p/lowrisc/opentitan - Git at Google

 # Secure Hardware Design Guidelines

 ## Overview

 Silicon designs for security devices require special guidelines to protect the
 designs against myriad attacks. For OpenTitan, the universe of potential attacks
 is described in our threat model. In order to have the most robust defensive
 posture, a general approach to secure hardware design should rely on the
 concepts of (1) defense in depth, (2) consideration of recovery methods
 post-breach, and (3) thinking with an attacker mindset.

 In all cases, as designers, we need to think of equalizing the difficulty of any
 particular attack for the adversary. If a design has a distribution of attack
 vectors (sometimes called the "attack surface" or "attack surface area"), it is
 not the strength of the strongest defenses that is particularly of interest but
 rather the weakest, since these will be the most likely to be exploited by the
 adversary. For example, it's unlikely that an attacker will try to brute-force a
 system based on AES-128 encryption, as the difficulty level of such an attack is
 high, and our confidence in the estimate of the difficulty is also high. But, if
 the security of the AES-128 depends on a global secret, more mundane attacks
 like theft or bribery become more likely avenues for the adversary to exploit.

 Defense in depth means having multiple layers of defenses/controls acting
 independently. Classically in information security, these are grouped into three
 main categories: physical, technical and administrative[^1]. We map these into
 slightly different elements when considering secure hardware design:

 *   Physical security typically maps to sensors and shields, but also separation
     of critical information into different locations on the die.
 *   Technical security includes techniques like encrypting data-at-rest,
     scrambling buses for data-in-motion, and integrity checking for all kinds of
     data.
 *   Administrative security encompasses architectural elements like permissions,
     lifecycle states, and key splits (potentially also linked to physical
     security).

 Consideration of recovery methods means assuming that some or all of the
 defenses will fail, with an eye to limiting the extent of the resulting system
 failure/compromise. If an adversary gains control over a sub-block, but cannot
 use this to escalate to full-chip control, we have succeeded. If control over a
 sub-block is detected, but an alert is generated that ultimately causes a device
 reset or other de-escalation sequence, we have created a recovery strategy. If
 the software is compromised but access to keys/secrets is prevented by hardware
 controls, we have succeeded. If compromise of secrets from a single device
 cannot be leveraged into attacks on other (or all) devices, again we have
 succeeded. If compromised devices can be identified and quarantined when
 enrolled into a larger system, then we have a successful recovery strategy.

 Thinking with an attacker mindset means "breaking the rules" or violating
 assumptions: what if two linked state machines no longer are "in sync" - how
 will they operate, and how can they recover? What happens if the adversary
 manipulates an internal value (fault injection)? What happens if the adversary
 can learn some or all of a secret value (side channel leakage)? This document
 will primarily try to give generic guidance for defense against the latter two
 attacks (fault injection, and side channel information leakage). It also
 discusses ways to either prevent attacks, mitigate them, or alert of their
 existence. Other attack vectors (especially software compromises or operational
 security failures) are not in the scope of this document, or will be addressed
 at a later stage.

 In general, when thinking of protecting against fault injection attacks, the
 designer should consider the consequences of any particular net/node being
 inverted or forced by an adversary. State of the art fault attacks can stimulate
 two nodes in close succession; robustness to this type of attack depends on the
 declared threat model. Designers need to be well aware of the power of an attack
 like SIFA [[15](#ref-15)], which can bypass "conventional" fault countermeasures (e.g.
 redundancy/detectors) and requires only modest numbers of traces.

 For increased resistance against side channel leakage (typically: power,
 electromagnetic radiation, or timing), designs in general should ensure that the
 creation or transmission of *secret material* is handled in such a way as to not
 work with "small" subsets of bits of sensitive information. Attacks like DPA are
 very powerful because they are able to "divide and conquer" an AES operation
 (regardless of key size) into its 8-bit S-Boxes and enumerate all 256 possible
 values to evaluate hypotheses. Evaluating/processing information in 32-bit
 quanta (or larger) will make these kinds of enumerations much more difficult;
 operating on a single bit at a time makes them almost trivial.

 Below we will go deeper into these recommendations for general design practices.
 Individual module guidance for particular IP (processor, AES, SHA, etc) will be
 handled in addenda to this document.

 ## General Module Level Design Guidance

 These guidelines are for sub-block / module level design. System architecture,
 identity management, and protocol design are outside the scope of this document,
 but may create some dependencies here. For general reading, the slides of [[10](#ref-10)]
 are considered a useful companion to these guidelines.

 ### **Recommendation 1**: Identify sensitive/privileged operations

 Identify any sensitive/privileged operations performed by the module
 (non-exhaustive list of examples: working with secret keys, writing to OTP,
 potentially writing to flash, enabling debug functionality, lifting/releasing
 access restrictions, changing lifecycle state)

 1.  Having these operations documented helps to analyze the potential issues of
     any attack discussed below.
 2.  Subsequent design/verification reviews can use these sensitive operations as
     focus areas or even coverage points.

 ### **Recommendation 2**: Side-channel leakage considerations

 Consider side-channel leakage of any secret information (side channels include
 timing, power, EM radiation, caches, and micro-architectural state, among
 others)

 1.  Process secret information in at least a 32-bit wide datapath
 2.  Use fixed/constant time operations when handling secrets (see [[6](#ref-6)] and [[11](#ref-11)])
 3.  Don't branch/perform conditional operations based on secret values
 4.  Incorporate temporal randomness (example: add delay cycles based on LFSR
     around critical operations, see [[9](#ref-9)])
 5.  Cryptographic operations should incorporate entropy (via masking/blinding,
     see [[9](#ref-9)]), especially if the key is long-lived, or a global/class-wide value.
     Short-lived keys may not require this, but careful study of the information
     leakage rate is necessary
 6.  Noise generation - run other "chaff" switching actions in parallel with
     sensitive calculations, if power budget permits (see [[9](#ref-9)])
 7.  Secrets should not be stored in a processor cache (see [[3](#ref-3)])
 8.  Speculative execution in a processor can lead to leakage of secrets via
     micro-architectural state (see [[4](#ref-4)]/[[5](#ref-5)])
 9.  When clearing secrets, use an LFSR to wipe values to prevent a Hamming
     weight leakage that would occur if clearing to zero. For secrets stored in
     multiple shares, use different permutations (or separate LFSRs) to perform
     the clearing of the shares.

 ### **Recommendation 3**: Fault injection countermeasures

 Consider defenses to fault injection / glitching attacks (survey and overview of
 attacks, see [[12](#ref-12)] and [[13](#ref-13)])

 1.  Initially assume that the adversary can glitch any node arbitrarily, and
     determine the resulting worst-case scenario. This is a very conservative
     approach and might lead to over-pessimism, but serves to highlight potential
     issues. Then, to ease implementation burden, assume the adversary can glitch
     to all-1's or all-0's (since these are considered "easier" to reach), and
     that reset can be asserted semi-arbitrarily.
 2.  Use parity/ECC on memories and data paths (note here that ECC is not true
     integrity, have to use hash to prevent forgery, see [[1](#ref-1)]). For memories, ECC
     is helpful to protect instruction streams or values that can cause
     "branching control flows" that redirect execution flow. Parity is
     potentially helpful if detection of corruption is adequate (though
     double-glitch fault injection can fool parity, so Hsiao or other
     detect-2-error codes can be used, even without correction circuitry
     implemented). When committing to an irrevocable action (e.g. burning into
     OTP, unlocking part of the device/increasing permissions), ECC is probably
     more appropriate.
     1.  When selecting a specific ECC implementation, the error detection
         properties are likely more important than error correction (assuming
         memory lifetime retention/wear are not considered). For a good example
         of how to consider the effectiveness of error correction, see
         [this PR comment](https://github.com/lowRISC/opentitan/pull/3899#issuecomment-716799810).
 3.  State machines:
     1.  Have a minimum Hamming distance for state machine transitions, to make
         single bit faults non-effective
     2.  Use a
         [sparsely populated state encoding](https://github.com/lowRISC/opentitan/blob/master/util/design/sparse-fsm-encode.py),
         with all others marked invalid - see 11.1 about optimization concerns
         when doing this though
     3.  All states could have the same Hamming weight, then can constantly check
         for this property (or use ECC-type coding on state variable and check
         this)
     4.  If waiting for a counter to expire to transition to the next state,
         better if the terminal count that causes the transition is not
         all-0/all-1. One could use an LFSR instead of a binary counter, but
         debugging this can be a bit painful then
 4.  Maintain value-and-its-complement throughout datapath (sometimes called
     "dual rail" logic), especially if unlocking/enabling something sensitive,
     and continually check for validity/consistency of representation
 5.  Incorporate temporal randomness where possible (example: add delay cycles
     based on LFSR around sensitive operations)
 6.  Run-it-twice and compare results for sensitive calculations
 7.  Redundancy - keep/store multiple copies of sensitive checks/data
 8.  For maximum sensitivity, compare combinational and sequential paths with
     hair-trigger/one-shot latch of miscompare
 9.  Empty detection for OTP/flash (if needed, but especially for lifecycle
     determination)
 10. Avoid local resets / prefer larger reset domains, since a glitch on this
     larger reset keeps more of the design "in sync." But, consider the
     implications of any block with more than one reset domain (see also 9.1).
 11. Similar to the "mix in" idea of 4.4, in any case where multiple contributing
     "votes" are going to an enable/unlock decision, consider mixing them into
     some cryptographic structure over time that will be diverted from its path
     by attempts to glitch each vote. (Note: if the final outcome of this is
     simply a wide-compare that produces a single-bit final unlock/enable vote
     then this is only marginally helpful - since that vote is now the glitch
     target. Finding a way to bind the final cryptographic result to the vote is
     preferred, but potentially very difficult / impossible, depending on the
     situation.)
 12. When checking/creating a signal to
     <span style="text-decoration:underline;">permit</span> some sensitive
     operation, prefer that the checking logic is maximally volatile (e.g.
     performs a lot of the calculation in a single cycle after a register), such
     that a glitch prevents the operation. Whereas, when checking to
     <span style="text-decoration:underline;">deny</span> a sensitive operation,
     prefer that the checking logic is minimally volatile (is directly following
     a register with minimal combinational logic), such that a glitch will be
     recovered on the next clock and the denial will be continued/preserved.
 13. CFI (control flow integrity) hardware can help protect a processor /
     programmable peripheral from some types of glitch attacks. This topic is
     very involved and beyond the scope of these guidelines, consult [[2](#ref-2)]
     for an introduction to previous techniques.
 14. Analog sensors (under/over-voltage, laser light, mesh breach, among others)
     can be used to generate SoC-level alerts and/or inhibit sensitive
     operations. Many of these sensors require calibration/trimming, or require
     hysteresis circuits to prevent false-positives, so they may not be usable in
     fast-reacting situations.
 15. Running an operation (e.g. AES or KMAC) to completion, even with a detected
     fault, is sometimes useful since it suppresses information for the adversary
     about the success/failure of the attempted fault, and minimizes any timing
     side channel. However, for some operations (e.g. ECDSA sign), operations on
     faulty inputs can have catastrophic consequences. These guidelines cannot
     recommend a default-safe posture, but each decision about handling detected
     faults should be carefully considered.
 16. For request-acknowledge interfaces, monitor the acknowledge line for
     spurious pulses at all times (not only when pending request) and use this
     as a glitch/fault detector to escalate locally and/or generate alerts.
 17. When arbitrating between two or more transaction sources with different
     privilege/access levels, consider how to protect a request from one source
     being glitched/forged to masquerade as being sourced from another
     higher-privilege source (for example, to return side-loaded
     hardware-visible-only data via a software read path). At a minimum,
     redundant arbitration and multiple-bit encoding of the arbitration "winner"
     can help to mitigate this type of attack.

 ### **Recommendation 4**: Handling of secrets

 1.  Diversify types/sources of secrets (e.g. use combination of RTL constants +
     OTP + flash) to prevent a single compromise from being effective
 2.  Rather than "check an unlock value directly" - use a hash function with a
     user-supplied input, and check the output of the hash matches. This way the
     unlock value is not contained in the netlist.
 3.  Qualify operations with allowed lifecycle state (even if redundant with
     other checks)
 4.  Where possible, mix in operating modes to calculation of derived secrets to
     create parallel/non-substitutable operating/keying domains. (i.e. mixing in
     devmode, lifecycle state)
     1.  If defenses can be bypassed for debugging/recovery, considering mixing
         in activation vector/bypass bits of defenses as well, consider this like
         small-scale attestation of device state
 5.  Encrypt (or at least scramble) any secrets stored at-rest in flash/OTP, to
     reduce risks of static/offline inspection.

 ### **Recommendation 5**: Alerts

 1.  Generate alerts on any detected anomaly (need to define what
     priority/severity should be assigned)
 2.  Where possible, prefer to take a local action (clearing/randomizing state,
     cease processing) in addition to generating the alert

 ### **Recommendation 6**: Safe default values

 1.  All case statements, if statements, and ternaries should consider what the
     safest default value is. Having an "invalid" state/value is nice to have for
     this purpose, but isn't always possible.
 2.  Operate in a general policy/philosophy of starting with lowest allowed
     privilege and augmenting by approvals/unlocks.
 3.  Implement enforcement of inputs on CSRs - qualify/force data attempted to be
     written based on lifecycle state, peripheral state, or other values. The
     designer must determine the safest remapping, e.g. write --> read, read -->
     nop, write --> nop and so forth. Blanket implementation of input enforcement
     complicates verification, so this style of design should be chosen only
     where the inputs are particularly sensitive (requests to unlock, privilege
     increase requests, debug mode enables, etc).

 ### **Recommendation 7**: DFT issues

 1.  Entry and exit from scan mode should cause a reset to prevent insertion or
     exfiltration of sensitive values
 2.  Ensure that when in production (e.g. not in lab debug) environments, scan
     chains are disabled
 3.  Processor debug paths (via JTAG) may need to be disabled in production modes
 4.  Beware of self-repair or redundant-row/columns schemes for memories (SRAM
     and OTP), as they can be exploited to misdirect reads to
     adversary-controlled locations

 ### **Recommendation 8**: Power management issues

 1.  If module is not in an always-on power domain, consider that a sleep/wake
     sequence can be used to force a re-derivation of secrets needed in the
     module, as many times as desired by the adversary
 2.  Fine-grained clock gating should never be used for any module that processes
     secret data, only coarse-grained (module-level) gating is acceptable. (Fine
     grained gates essentially compute `clock_gate = D ^ Q` which often acts as
     an SCA "amplifier").

 ### **Recommendation 9**: "Synchronization" (consistency) issues

 1.  If a module interacts with other modules in a stateful way (think of two
     data-transfer counters moving in ~lockstep, but the counts are not sent back
     and forth for performance optimization), what happens if:
     1.  One side is reset and the other is not
     2.  One side is clock-gated and the other is not
     3.  One side is power-gated and the other is not
     4.  The counter on one side is glitched
 2.  Generally these kind of blind lockstep situations should be avoided where
     possible, and current module/interface status should exchanged in both
     directions and constantly checked for validity/consistency

 ### **Recommendation 10**: Recovery mechanism considerations

 1.  What happens if a security mechanism fails? (Classic problem of this variety
     is on-die sensors being too sensitive and resetting the chip) Traditionally,
     fuses can disable some mechanisms if they are faulty.
 2.  Could an adversary exploit a recovery mechanism? (If a sensor can be
     fuse-disabled, wouldn't the adversary just do that? See 4.4 above.)

 ### **Recommendation 11**: Optimization concerns

 1.  Sometimes synthesis will optimize away redundant (but necessary for
     security) logic - `dont_touch` or `size_only` attributes may sometimes be
     needed, or even more aggressive preservation strategies. Example: when using
     the sparse FSM encoding, use the `prim_flop` component for the state vector
     register.
 2.  Value-and-complement strategies can also be optimized away, or partially
     disconnected such that only half of the datapath is contributing to the
     logic, or a single register with both Q & Qbar outputs becomes the source of
     both values to save area.
 3.  Retiming around pipeline registers can create DPA issues, due to inadvertent
     combination of shares, or intra-cycle glitchy evaluation. For DPA-resistant
     logic, explicitly declare functions and registers using `prim_*` components,
     and make sure that pipeline retiming is not enabled in synthesis.

 ### **Recommendation 12**: Entropy concerns

 1.  Verify that all nonces are truly only used once
 2.  If entropy is broadcast, verify list of consumers and arbitration scheme to
     prevent reuse / duplicate use of entropy in sensitive calculations
 3.  Seeds for local LFSRs need to be unique/diversified

 ### **Recommendation 13**: Global secrets

 1.  Avoid if at all possible
 2.  If not possible, have a process to generate/re-generate them; make sure this
     process is used/tested many times before final netlist; process must be
     repeatable/deterministic given some set of inputs
 3.  If architecturally feasible, install a device-specific secret to override
     the global secret once boot-strapped (and disable the global secret)

 ### **Recommendation 14**: Sensors

 1.  Sensors need to be adjusted/tweaked so that they actually fire. It is
     challenging to set the sensors at levels that detect "interesting"
     glitches/environmental effects, but don't fire constantly or cause yield
     issues. Security team should work with the silicon supplier to determine the
     best course of action here.
 2.  Sensor configuration / calibration data should be integrity-protected.

 ## References and further reading

 [<span id="ref-1">1</span>]: Overview of checksums and hashes -
 https://cybergibbons.com/reverse-engineering-2/checksums-hashes-and-security/

 [<span id="ref-2">2</span>]: A Survey of hardware-based Control Flow Integrity -
 https://arxiv.org/pdf/1706.07257.pdf

 [<span id="ref-3">3</span>]: Cache-timing attacks on AES -
 https://cr.yp.to/antiforgery/cachetiming-20050414.pdf

 [<span id="ref-4">4</span>]: Meltdown: Reading Kernel Memory from User Space -
 https://meltdownattack.com/meltdown.pdf

 [<span id="ref-5">5</span>]: Spectre Attacks: Exploiting Speculative Execution -
 https://spectreattack.com/spectre.pdf

 [<span id="ref-6">6</span>]: Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other
 Systems - https://www.rambus.com/wp-content/uploads/2015/08/TimingAttacks.pdf

 [<span id="ref-7">7</span>]: Differential Power Analysis -
 https://paulkocher.com/doc/DifferentialPowerAnalysis.pdf

 [<span id="ref-8">8</span>]: SoC it to EM: electromagnetic side-channel attacks on a complex
 system-on-chip - https://www.iacr.org/archive/ches2015/92930599/92930599.pdf

 [<span id="ref-9">9</span>]: Introduction To differential power analysis -
 https://link.springer.com/content/pdf/10.1007/s13389-011-0006-y.pdf

 [<span id="ref-10">10</span>]: Principles of Secure Processor Architecture Design -
 https://caslab.csl.yale.edu/tutorials/hpca2019/ and
 https://caslab.csl.yale.edu/tutorials/hpca2019/tutorial_principles_sec_arch_20190217.pdf

 [<span id="ref-11">11</span>]: Time Protection - https://ts.data61.csiro.au/projects/TS/timeprotection/

 [<span id="ref-12">12</span>]: Fault Attacks on Secure Embedded Software: Threats, Design and Evaluation -
 https://arxiv.org/pdf/2003.10513.pdf

 [<span id="ref-13">13</span>]: The Sorcerer's Apprentice Guide to Fault Attacks -
 https://eprint.iacr.org/2004/100.pdf

 [<span id="ref-14">14</span>]: Fault Mitigation Patterns -
 https://www.riscure.com/uploads/2020/05/Riscure_Whitepaper_Fault_Mitigation_Patterns_final.pdf

 [<span id="ref-15">15</span>]: SIFA: Exploiting Ineffective Fault Inductions on Symmetric Cryptography -
 https://eprint.iacr.org/2018/071.pdf

 <!-- Footnotes themselves at the bottom. -->

 ## Notes

 [^1]: In other OpenTitan documents, the combination of technical and
     administrative defense are often referred to as "logical security"