|  | # Secure Hardware Design Guidelines | 
|  |  | 
|  | ## Overview | 
|  |  | 
|  | Silicon designs for security devices require special guidelines to protect the | 
|  | designs against myriad attacks. For OpenTitan, the universe of potential attacks | 
|  | is described in our threat model. In order to have the most robust defensive | 
|  | posture, a general approach to secure hardware design should rely on the | 
|  | concepts of (1) defense in depth, (2) consideration of recovery methods | 
|  | post-breach, and (3) thinking with an attacker mindset. | 
|  |  | 
|  | In all cases, as designers, we need to think of equalizing the difficulty of any | 
|  | particular attack for the adversary. If a design has a distribution of attack | 
|  | vectors (sometimes called the "attack surface" or "attack surface area"), it is | 
|  | not the strength of the strongest defenses that is particularly of interest but | 
|  | rather the weakest, since these will be the most likely to be exploited by the | 
|  | adversary. For example, it's unlikely that an attacker will try to brute-force a | 
|  | system based on AES-128 encryption, as the difficulty level of such an attack is | 
|  | high, and our confidence in the estimate of the difficulty is also high. But, if | 
|  | the security of the AES-128 depends on a global secret, more mundane attacks | 
|  | like theft or bribery become more likely avenues for the adversary to exploit. | 
|  |  | 
|  | Defense in depth means having multiple layers of defenses/controls acting | 
|  | independently. Classically in information security, these are grouped into three | 
|  | main categories: physical, technical and administrative[^1]. We map these into | 
|  | slightly different elements when considering secure hardware design: | 
|  |  | 
|  | *   Physical security typically maps to sensors and shields, but also separation | 
|  | of critical information into different locations on the die. | 
|  | *   Technical security includes techniques like encrypting data-at-rest, | 
|  | scrambling buses for data-in-motion, and integrity checking for all kinds of | 
|  | data. | 
|  | *   Administrative security encompasses architectural elements like permissions, | 
|  | lifecycle states, and key splits (potentially also linked to physical | 
|  | security). | 
|  |  | 
|  | Consideration of recovery methods means assuming that some or all of the | 
|  | defenses will fail, with an eye to limiting the extent of the resulting system | 
|  | failure/compromise. If an adversary gains control over a sub-block, but cannot | 
|  | use this to escalate to full-chip control, we have succeeded. If control over a | 
|  | sub-block is detected, but an alert is generated that ultimately causes a device | 
|  | reset or other de-escalation sequence, we have created a recovery strategy. If | 
|  | the software is compromised but access to keys/secrets is prevented by hardware | 
|  | controls, we have succeeded. If compromise of secrets from a single device | 
|  | cannot be leveraged into attacks on other (or all) devices, again we have | 
|  | succeeded. If compromised devices can be identified and quarantined when | 
|  | enrolled into a larger system, then we have a successful recovery strategy. | 
|  |  | 
|  | Thinking with an attacker mindset means "breaking the rules" or violating | 
|  | assumptions: what if two linked state machines no longer are "in sync" - how | 
|  | will they operate, and how can they recover? What happens if the adversary | 
|  | manipulates an internal value (fault injection)? What happens if the adversary | 
|  | can learn some or all of a secret value (side channel leakage)? This document | 
|  | will primarily try to give generic guidance for defense against the latter two | 
|  | attacks (fault injection, and side channel information leakage). It also | 
|  | discusses ways to either prevent attacks, mitigate them, or alert of their | 
|  | existence. Other attack vectors (especially software compromises or operational | 
|  | security failures) are not in the scope of this document, or will be addressed | 
|  | at a later stage. | 
|  |  | 
|  | In general, when thinking of protecting against fault injection attacks, the | 
|  | designer should consider the consequences of any particular net/node being | 
|  | inverted or forced by an adversary. State of the art fault attacks can stimulate | 
|  | two nodes in close succession; robustness to this type of attack depends on the | 
|  | declared threat model. Designers need to be well aware of the power of an attack | 
|  | like SIFA [[15](#ref-15)], which can bypass "conventional" fault countermeasures (e.g. | 
|  | redundancy/detectors) and requires only modest numbers of traces. | 
|  |  | 
|  | For increased resistance against side channel leakage (typically: power, | 
|  | electromagnetic radiation, or timing), designs in general should ensure that the | 
|  | creation or transmission of *secret material* is handled in such a way as to not | 
|  | work with "small" subsets of bits of sensitive information. Attacks like DPA are | 
|  | very powerful because they are able to "divide and conquer" an AES operation | 
|  | (regardless of key size) into its 8-bit S-Boxes and enumerate all 256 possible | 
|  | values to evaluate hypotheses. Evaluating/processing information in 32-bit | 
|  | quanta (or larger) will make these kinds of enumerations much more difficult; | 
|  | operating on a single bit at a time makes them almost trivial. | 
|  |  | 
|  | Below we will go deeper into these recommendations for general design practices. | 
|  | Individual module guidance for particular IP (processor, AES, SHA, etc) will be | 
|  | handled in addenda to this document. | 
|  |  | 
|  | ## General Module Level Design Guidance | 
|  |  | 
|  | These guidelines are for sub-block / module level design. System architecture, | 
|  | identity management, and protocol design are outside the scope of this document, | 
|  | but may create some dependencies here. For general reading, the slides of [[10](#ref-10)] | 
|  | are considered a useful companion to these guidelines. | 
|  |  | 
|  | ### **Recommendation 1**: Identify sensitive/privileged operations | 
|  |  | 
|  | Identify any sensitive/privileged operations performed by the module | 
|  | (non-exhaustive list of examples: working with secret keys, writing to OTP, | 
|  | potentially writing to flash, enabling debug functionality, lifting/releasing | 
|  | access restrictions, changing lifecycle state) | 
|  |  | 
|  | 1.  Having these operations documented helps to analyze the potential issues of | 
|  | any attack discussed below. | 
|  | 2.  Subsequent design/verification reviews can use these sensitive operations as | 
|  | focus areas or even coverage points. | 
|  |  | 
|  | ### **Recommendation 2**: Side-channel leakage considerations | 
|  |  | 
|  | Consider side-channel leakage of any secret information (side channels include | 
|  | timing, power, EM radiation, caches, and micro-architectural state, among | 
|  | others) | 
|  |  | 
|  | 1.  Process secret information in at least a 32-bit wide datapath | 
|  | 2.  Use fixed/constant time operations when handling secrets (see [[6](#ref-6)] and [[11](#ref-11)]) | 
|  | 3.  Don't branch/perform conditional operations based on secret values | 
|  | 4.  Incorporate temporal randomness (example: add delay cycles based on LFSR | 
|  | around critical operations, see [[9](#ref-9)]) | 
|  | 5.  Cryptographic operations should incorporate entropy (via masking/blinding, | 
|  | see [[9](#ref-9)]), especially if the key is long-lived, or a global/class-wide value. | 
|  | Short-lived keys may not require this, but careful study of the information | 
|  | leakage rate is necessary | 
|  | 6.  Noise generation - run other "chaff" switching actions in parallel with | 
|  | sensitive calculations, if power budget permits (see [[9](#ref-9)]) | 
|  | 7.  Secrets should not be stored in a processor cache (see [[3](#ref-3)]) | 
|  | 8.  Speculative execution in a processor can lead to leakage of secrets via | 
|  | micro-architectural state (see [[4](#ref-4)]/[[5](#ref-5)]) | 
|  | 9.  When clearing secrets, use an LFSR to wipe values to prevent a Hamming | 
|  | weight leakage that would occur if clearing to zero. For secrets stored in | 
|  | multiple shares, use different permutations (or separate LFSRs) to perform | 
|  | the clearing of the shares. | 
|  |  | 
|  | ### **Recommendation 3**: Fault injection countermeasures | 
|  |  | 
|  | Consider defenses to fault injection / glitching attacks (survey and overview of | 
|  | attacks, see [[12](#ref-12)] and [[13](#ref-13)]) | 
|  |  | 
|  | 1.  Initially assume that the adversary can glitch any node arbitrarily, and | 
|  | determine the resulting worst-case scenario. This is a very conservative | 
|  | approach and might lead to over-pessimism, but serves to highlight potential | 
|  | issues. Then, to ease implementation burden, assume the adversary can glitch | 
|  | to all-1's or all-0's (since these are considered "easier" to reach), and | 
|  | that reset can be asserted semi-arbitrarily. | 
|  | 2.  Use parity/ECC on memories and data paths (note here that ECC is not true | 
|  | integrity, have to use hash to prevent forgery, see [[1](#ref-1)]). For memories, ECC | 
|  | is helpful to protect instruction streams or values that can cause | 
|  | "branching control flows" that redirect execution flow. Parity is | 
|  | potentially helpful if detection of corruption is adequate (though | 
|  | double-glitch fault injection can fool parity, so Hsiao or other | 
|  | detect-2-error codes can be used, even without correction circuitry | 
|  | implemented). When committing to an irrevocable action (e.g. burning into | 
|  | OTP, unlocking part of the device/increasing permissions), ECC is probably | 
|  | more appropriate. | 
|  | 1.  When selecting a specific ECC implementation, the error detection | 
|  | properties are likely more important than error correction (assuming | 
|  | memory lifetime retention/wear are not considered). For a good example | 
|  | of how to consider the effectiveness of error correction, see | 
|  | [this PR comment](https://github.com/lowRISC/opentitan/pull/3899#issuecomment-716799810). | 
|  | 3.  State machines: | 
|  | 1.  Have a minimum Hamming distance for state machine transitions, to make | 
|  | single bit faults non-effective | 
|  | 2.  Use a | 
|  | [sparsely populated state encoding](https://github.com/lowRISC/opentitan/blob/master/util/design/sparse-fsm-encode.py), | 
|  | with all others marked invalid - see 11.1 about optimization concerns | 
|  | when doing this though | 
|  | 3.  All states could have the same Hamming weight, then can constantly check | 
|  | for this property (or use ECC-type coding on state variable and check | 
|  | this) | 
|  | 4.  If waiting for a counter to expire to transition to the next state, | 
|  | better if the terminal count that causes the transition is not | 
|  | all-0/all-1. One could use an LFSR instead of a binary counter, but | 
|  | debugging this can be a bit painful then | 
|  | 4.  Maintain value-and-its-complement throughout datapath (sometimes called | 
|  | "dual rail" logic), especially if unlocking/enabling something sensitive, | 
|  | and continually check for validity/consistency of representation | 
|  | 5.  Incorporate temporal randomness where possible (example: add delay cycles | 
|  | based on LFSR around sensitive operations) | 
|  | 6.  Run-it-twice and compare results for sensitive calculations | 
|  | 7.  Redundancy - keep/store multiple copies of sensitive checks/data | 
|  | 8.  For maximum sensitivity, compare combinational and sequential paths with | 
|  | hair-trigger/one-shot latch of miscompare | 
|  | 9.  Empty detection for OTP/flash (if needed, but especially for lifecycle | 
|  | determination) | 
|  | 10. Avoid local resets / prefer larger reset domains, since a glitch on this | 
|  | larger reset keeps more of the design "in sync." But, consider the | 
|  | implications of any block with more than one reset domain (see also 9.1). | 
|  | 11. Similar to the "mix in" idea of 4.4, in any case where multiple contributing | 
|  | "votes" are going to an enable/unlock decision, consider mixing them into | 
|  | some cryptographic structure over time that will be diverted from its path | 
|  | by attempts to glitch each vote. (Note: if the final outcome of this is | 
|  | simply a wide-compare that produces a single-bit final unlock/enable vote | 
|  | then this is only marginally helpful - since that vote is now the glitch | 
|  | target. Finding a way to bind the final cryptographic result to the vote is | 
|  | preferred, but potentially very difficult / impossible, depending on the | 
|  | situation.) | 
|  | 12. When checking/creating a signal to | 
|  | <span style="text-decoration:underline;">permit</span> some sensitive | 
|  | operation, prefer that the checking logic is maximally volatile (e.g. | 
|  | performs a lot of the calculation in a single cycle after a register), such | 
|  | that a glitch prevents the operation. Whereas, when checking to | 
|  | <span style="text-decoration:underline;">deny</span> a sensitive operation, | 
|  | prefer that the checking logic is minimally volatile (is directly following | 
|  | a register with minimal combinational logic), such that a glitch will be | 
|  | recovered on the next clock and the denial will be continued/preserved. | 
|  | 13. CFI (control flow integrity) hardware can help protect a processor / | 
|  | programmable peripheral from some types of glitch attacks. This topic is | 
|  | very involved and beyond the scope of these guidelines, consult [[2](#ref-2)] | 
|  | for an introduction to previous techniques. | 
|  | 14. Analog sensors (under/over-voltage, laser light, mesh breach, among others) | 
|  | can be used to generate SoC-level alerts and/or inhibit sensitive | 
|  | operations. Many of these sensors require calibration/trimming, or require | 
|  | hysteresis circuits to prevent false-positives, so they may not be usable in | 
|  | fast-reacting situations. | 
|  | 15. Running an operation (e.g. AES or KMAC) to completion, even with a detected | 
|  | fault, is sometimes useful since it suppresses information for the adversary | 
|  | about the success/failure of the attempted fault, and minimizes any timing | 
|  | side channel. However, for some operations (e.g. ECDSA sign), operations on | 
|  | faulty inputs can have catastrophic consequences. These guidelines cannot | 
|  | recommend a default-safe posture, but each decision about handling detected | 
|  | faults should be carefully considered. | 
|  | 16. For request-acknowledge interfaces, monitor the acknowledge line for | 
|  | spurious pulses at all times (not only when pending request) and use this | 
|  | as a glitch/fault detector to escalate locally and/or generate alerts. | 
|  | 17. When arbitrating between two or more transaction sources with different | 
|  | privilege/access levels, consider how to protect a request from one source | 
|  | being glitched/forged to masquerade as being sourced from another | 
|  | higher-privilege source (for example, to return side-loaded | 
|  | hardware-visible-only data via a software read path). At a minimum, | 
|  | redundant arbitration and multiple-bit encoding of the arbitration "winner" | 
|  | can help to mitigate this type of attack. | 
|  |  | 
|  | ### **Recommendation 4**: Handling of secrets | 
|  |  | 
|  | 1.  Diversify types/sources of secrets (e.g. use combination of RTL constants + | 
|  | OTP + flash) to prevent a single compromise from being effective | 
|  | 2.  Rather than "check an unlock value directly" - use a hash function with a | 
|  | user-supplied input, and check the output of the hash matches. This way the | 
|  | unlock value is not contained in the netlist. | 
|  | 3.  Qualify operations with allowed lifecycle state (even if redundant with | 
|  | other checks) | 
|  | 4.  Where possible, mix in operating modes to calculation of derived secrets to | 
|  | create parallel/non-substitutable operating/keying domains. (i.e. mixing in | 
|  | devmode, lifecycle state) | 
|  | 1.  If defenses can be bypassed for debugging/recovery, considering mixing | 
|  | in activation vector/bypass bits of defenses as well, consider this like | 
|  | small-scale attestation of device state | 
|  | 5.  Encrypt (or at least scramble) any secrets stored at-rest in flash/OTP, to | 
|  | reduce risks of static/offline inspection. | 
|  |  | 
|  | ### **Recommendation 5**: Alerts | 
|  |  | 
|  | 1.  Generate alerts on any detected anomaly (need to define what | 
|  | priority/severity should be assigned) | 
|  | 2.  Where possible, prefer to take a local action (clearing/randomizing state, | 
|  | cease processing) in addition to generating the alert | 
|  |  | 
|  | ### **Recommendation 6**: Safe default values | 
|  |  | 
|  | 1.  All case statements, if statements, and ternaries should consider what the | 
|  | safest default value is. Having an "invalid" state/value is nice to have for | 
|  | this purpose, but isn't always possible. | 
|  | 2.  Operate in a general policy/philosophy of starting with lowest allowed | 
|  | privilege and augmenting by approvals/unlocks. | 
|  | 3.  Implement enforcement of inputs on CSRs - qualify/force data attempted to be | 
|  | written based on lifecycle state, peripheral state, or other values. The | 
|  | designer must determine the safest remapping, e.g. write --> read, read --> | 
|  | nop, write --> nop and so forth. Blanket implementation of input enforcement | 
|  | complicates verification, so this style of design should be chosen only | 
|  | where the inputs are particularly sensitive (requests to unlock, privilege | 
|  | increase requests, debug mode enables, etc). | 
|  |  | 
|  | ### **Recommendation 7**: DFT issues | 
|  |  | 
|  | 1.  Entry and exit from scan mode should cause a reset to prevent insertion or | 
|  | exfiltration of sensitive values | 
|  | 2.  Ensure that when in production (e.g. not in lab debug) environments, scan | 
|  | chains are disabled | 
|  | 3.  Processor debug paths (via JTAG) may need to be disabled in production modes | 
|  | 4.  Beware of self-repair or redundant-row/columns schemes for memories (SRAM | 
|  | and OTP), as they can be exploited to misdirect reads to | 
|  | adversary-controlled locations | 
|  |  | 
|  | ### **Recommendation 8**: Power management issues | 
|  |  | 
|  | 1.  If module is not in an always-on power domain, consider that a sleep/wake | 
|  | sequence can be used to force a re-derivation of secrets needed in the | 
|  | module, as many times as desired by the adversary | 
|  | 2.  Fine-grained clock gating should never be used for any module that processes | 
|  | secret data, only coarse-grained (module-level) gating is acceptable. (Fine | 
|  | grained gates essentially compute `clock_gate = D ^ Q` which often acts as | 
|  | an SCA "amplifier"). | 
|  |  | 
|  | ### **Recommendation 9**: "Synchronization" (consistency) issues | 
|  |  | 
|  | 1.  If a module interacts with other modules in a stateful way (think of two | 
|  | data-transfer counters moving in ~lockstep, but the counts are not sent back | 
|  | and forth for performance optimization), what happens if: | 
|  | 1.  One side is reset and the other is not | 
|  | 2.  One side is clock-gated and the other is not | 
|  | 3.  One side is power-gated and the other is not | 
|  | 4.  The counter on one side is glitched | 
|  | 2.  Generally these kind of blind lockstep situations should be avoided where | 
|  | possible, and current module/interface status should exchanged in both | 
|  | directions and constantly checked for validity/consistency | 
|  |  | 
|  | ### **Recommendation 10**: Recovery mechanism considerations | 
|  |  | 
|  | 1.  What happens if a security mechanism fails? (Classic problem of this variety | 
|  | is on-die sensors being too sensitive and resetting the chip) Traditionally, | 
|  | fuses can disable some mechanisms if they are faulty. | 
|  | 2.  Could an adversary exploit a recovery mechanism? (If a sensor can be | 
|  | fuse-disabled, wouldn't the adversary just do that? See 4.4 above.) | 
|  |  | 
|  | ### **Recommendation 11**: Optimization concerns | 
|  |  | 
|  | 1.  Sometimes synthesis will optimize away redundant (but necessary for | 
|  | security) logic - `dont_touch` or `size_only` attributes may sometimes be | 
|  | needed, or even more aggressive preservation strategies. Example: when using | 
|  | the sparse FSM encoding, use the `prim_flop` component for the state vector | 
|  | register. | 
|  | 2.  Value-and-complement strategies can also be optimized away, or partially | 
|  | disconnected such that only half of the datapath is contributing to the | 
|  | logic, or a single register with both Q & Qbar outputs becomes the source of | 
|  | both values to save area. | 
|  | 3.  Retiming around pipeline registers can create DPA issues, due to inadvertent | 
|  | combination of shares, or intra-cycle glitchy evaluation. For DPA-resistant | 
|  | logic, explicitly declare functions and registers using `prim_*` components, | 
|  | and make sure that pipeline retiming is not enabled in synthesis. | 
|  |  | 
|  | ### **Recommendation 12**: Entropy concerns | 
|  |  | 
|  | 1.  Verify that all nonces are truly only used once | 
|  | 2.  If entropy is broadcast, verify list of consumers and arbitration scheme to | 
|  | prevent reuse / duplicate use of entropy in sensitive calculations | 
|  | 3.  Seeds for local LFSRs need to be unique/diversified | 
|  |  | 
|  | ### **Recommendation 13**: Global secrets | 
|  |  | 
|  | 1.  Avoid if at all possible | 
|  | 2.  If not possible, have a process to generate/re-generate them; make sure this | 
|  | process is used/tested many times before final netlist; process must be | 
|  | repeatable/deterministic given some set of inputs | 
|  | 3.  If architecturally feasible, install a device-specific secret to override | 
|  | the global secret once boot-strapped (and disable the global secret) | 
|  |  | 
|  | ### **Recommendation 14**: Sensors | 
|  |  | 
|  | 1.  Sensors need to be adjusted/tweaked so that they actually fire. It is | 
|  | challenging to set the sensors at levels that detect "interesting" | 
|  | glitches/environmental effects, but don't fire constantly or cause yield | 
|  | issues. Security team should work with the silicon supplier to determine the | 
|  | best course of action here. | 
|  | 2.  Sensor configuration / calibration data should be integrity-protected. | 
|  |  | 
|  | ## References and further reading | 
|  |  | 
|  | [<span id="ref-1">1</span>]: Overview of checksums and hashes - | 
|  | https://cybergibbons.com/reverse-engineering-2/checksums-hashes-and-security/ | 
|  |  | 
|  | [<span id="ref-2">2</span>]: A Survey of hardware-based Control Flow Integrity - | 
|  | https://arxiv.org/pdf/1706.07257.pdf | 
|  |  | 
|  | [<span id="ref-3">3</span>]: Cache-timing attacks on AES - | 
|  | https://cr.yp.to/antiforgery/cachetiming-20050414.pdf | 
|  |  | 
|  | [<span id="ref-4">4</span>]: Meltdown: Reading Kernel Memory from User Space - | 
|  | https://meltdownattack.com/meltdown.pdf | 
|  |  | 
|  | [<span id="ref-5">5</span>]: Spectre Attacks: Exploiting Speculative Execution - | 
|  | https://spectreattack.com/spectre.pdf | 
|  |  | 
|  | [<span id="ref-6">6</span>]: Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other | 
|  | Systems - https://www.rambus.com/wp-content/uploads/2015/08/TimingAttacks.pdf | 
|  |  | 
|  | [<span id="ref-7">7</span>]: Differential Power Analysis - | 
|  | https://paulkocher.com/doc/DifferentialPowerAnalysis.pdf | 
|  |  | 
|  | [<span id="ref-8">8</span>]: SoC it to EM: electromagnetic side-channel attacks on a complex | 
|  | system-on-chip - https://www.iacr.org/archive/ches2015/92930599/92930599.pdf | 
|  |  | 
|  | [<span id="ref-9">9</span>]: Introduction To differential power analysis - | 
|  | https://link.springer.com/content/pdf/10.1007/s13389-011-0006-y.pdf | 
|  |  | 
|  | [<span id="ref-10">10</span>]: Principles of Secure Processor Architecture Design - | 
|  | https://caslab.csl.yale.edu/tutorials/hpca2019/ and | 
|  | https://caslab.csl.yale.edu/tutorials/hpca2019/tutorial_principles_sec_arch_20190217.pdf | 
|  |  | 
|  | [<span id="ref-11">11</span>]: Time Protection - https://ts.data61.csiro.au/projects/TS/timeprotection/ | 
|  |  | 
|  | [<span id="ref-12">12</span>]: Fault Attacks on Secure Embedded Software: Threats, Design and Evaluation - | 
|  | https://arxiv.org/pdf/2003.10513.pdf | 
|  |  | 
|  | [<span id="ref-13">13</span>]: The Sorcerer's Apprentice Guide to Fault Attacks - | 
|  | https://eprint.iacr.org/2004/100.pdf | 
|  |  | 
|  | [<span id="ref-14">14</span>]: Fault Mitigation Patterns - | 
|  | https://www.riscure.com/uploads/2020/05/Riscure_Whitepaper_Fault_Mitigation_Patterns_final.pdf | 
|  |  | 
|  | [<span id="ref-15">15</span>]: SIFA: Exploiting Ineffective Fault Inductions on Symmetric Cryptography - | 
|  | https://eprint.iacr.org/2018/071.pdf | 
|  |  | 
|  | <!-- Footnotes themselves at the bottom. --> | 
|  |  | 
|  | ## Notes | 
|  |  | 
|  | [^1]: In other OpenTitan documents, the combination of technical and | 
|  | administrative defense are often referred to as "logical security" |