blob: 96a49b7bb0b5ecd43b9e0881347ff70efd1ab599 [file] [log] [blame] [view]
# Secure Hardware Design Guidelines
## Overview
Silicon designs for security devices require special guidelines to protect the
designs against myriad attacks. For OpenTitan, the universe of potential attacks
is described in our threat model. In order to have the most robust defensive
posture, a general approach to secure hardware design should rely on the
concepts of (1) defense in depth, (2) consideration of recovery methods
post-breach, and (3) thinking with an attacker mindset.
In all cases, as designers, we need to think of equalizing the difficulty of any
particular attack for the adversary. If a design has a distribution of attack
vectors (sometimes called the "attack surface" or "attack surface area"), it is
not the strength of the strongest defenses that is particularly of interest but
rather the weakest, since these will be the most likely to be exploited by the
adversary. For example, it's unlikely that an attacker will try to brute-force a
system based on AES-128 encryption, as the difficulty level of such an attack is
high, and our confidence in the estimate of the difficulty is also high. But, if
the security of the AES-128 depends on a global secret, more mundane attacks
like theft or bribery become more likely avenues for the adversary to exploit.
Defense in depth means having multiple layers of defenses/controls acting
independently. Classically in information security, these are grouped into three
main categories: physical, technical and administrative[^1]. We map these into
slightly different elements when considering secure hardware design:
* Physical security typically maps to sensors and shields, but also separation
of critical information into different locations on the die.
* Technical security includes techniques like encrypting data-at-rest,
scrambling buses for data-in-motion, and integrity checking for all kinds of
data.
* Administrative security encompasses architectural elements like permissions,
lifecycle states, and key splits (potentially also linked to physical
security).
Consideration of recovery methods means assuming that some or all of the
defenses will fail, with an eye to limiting the extent of the resulting system
failure/compromise. If an adversary gains control over a sub-block, but cannot
use this to escalate to full-chip control, we have succeeded. If control over a
sub-block is detected, but an alert is generated that ultimately causes a device
reset or other de-escalation sequence, we have created a recovery strategy. If
the software is compromised but access to keys/secrets is prevented by hardware
controls, we have succeeded. If compromise of secrets from a single device
cannot be leveraged into attacks on other (or all) devices, again we have
succeeded. If compromised devices can be identified and quarantined when
enrolled into a larger system, then we have a successful recovery strategy.
Thinking with an attacker mindset means "breaking the rules" or violating
assumptions: what if two linked state machines no longer are "in sync" - how
will they operate, and how can they recover? What happens if the adversary
manipulates an internal value (fault injection)? What happens if the adversary
can learn some or all of a secret value (side channel leakage)? This document
will primarily try to give generic guidance for defense against the latter two
attacks (fault injection, and side channel information leakage). It also
discusses ways to either prevent attacks, mitigate them, or alert of their
existence. Other attack vectors (especially software compromises or operational
security failures) are not in the scope of this document, or will be addressed
at a later stage.
In general, when thinking of protecting against fault injection attacks, the
designer should consider the consequences of any particular net/node being
inverted or forced by an adversary. State of the art fault attacks can stimulate
two nodes in close succession; robustness to this type of attack depends on the
declared threat model. Designers need to be well aware of the power of an attack
like SIFA [[15](#ref-15)], which can bypass "conventional" fault countermeasures (e.g.
redundancy/detectors) and requires only modest numbers of traces.
For increased resistance against side channel leakage (typically: power,
electromagnetic radiation, or timing), designs in general should ensure that the
creation or transmission of *secret material* is handled in such a way as to not
work with "small" subsets of bits of sensitive information. Attacks like DPA are
very powerful because they are able to "divide and conquer" an AES operation
(regardless of key size) into its 8-bit S-Boxes and enumerate all 256 possible
values to evaluate hypotheses. Evaluating/processing information in 32-bit
quanta (or larger) will make these kinds of enumerations much more difficult;
operating on a single bit at a time makes them almost trivial.
Below we will go deeper into these recommendations for general design practices.
Individual module guidance for particular IP (processor, AES, SHA, etc) will be
handled in addenda to this document.
## General Module Level Design Guidance
These guidelines are for sub-block / module level design. System architecture,
identity management, and protocol design are outside the scope of this document,
but may create some dependencies here. For general reading, the slides of [[10](#ref-10)]
are considered a useful companion to these guidelines.
### **Recommendation 1**: Identify sensitive/privileged operations
Identify any sensitive/privileged operations performed by the module
(non-exhaustive list of examples: working with secret keys, writing to OTP,
potentially writing to flash, enabling debug functionality, lifting/releasing
access restrictions, changing lifecycle state)
1. Having these operations documented helps to analyze the potential issues of
any attack discussed below.
2. Subsequent design/verification reviews can use these sensitive operations as
focus areas or even coverage points.
### **Recommendation 2**: Side-channel leakage considerations
Consider side-channel leakage of any secret information (side channels include
timing, power, EM radiation, caches, and micro-architectural state, among
others)
1. Process secret information in at least a 32-bit wide datapath
2. Use fixed/constant time operations when handling secrets (see [[6](#ref-6)] and [[11](#ref-11)])
3. Don't branch/perform conditional operations based on secret values
4. Incorporate temporal randomness (example: add delay cycles based on LFSR
around critical operations, see [[9](#ref-9)])
5. Cryptographic operations should incorporate entropy (via masking/blinding,
see [[9](#ref-9)]), especially if the key is long-lived, or a global/class-wide value.
Short-lived keys may not require this, but careful study of the information
leakage rate is necessary
6. Noise generation - run other "chaff" switching actions in parallel with
sensitive calculations, if power budget permits (see [[9](#ref-9)])
7. Secrets should not be stored in a processor cache (see [[3](#ref-3)])
8. Speculative execution in a processor can lead to leakage of secrets via
micro-architectural state (see [[4](#ref-4)]/[[5](#ref-5)])
9. When clearing secrets, use an LFSR to wipe values to prevent a Hamming
weight leakage that would occur if clearing to zero. For secrets stored in
multiple shares, use different permutations (or separate LFSRs) to perform
the clearing of the shares.
### **Recommendation 3**: Fault injection countermeasures
Consider defenses to fault injection / glitching attacks (survey and overview of
attacks, see [[12](#ref-12)] and [[13](#ref-13)])
1. Initially assume that the adversary can glitch any node arbitrarily, and
determine the resulting worst-case scenario. This is a very conservative
approach and might lead to over-pessimism, but serves to highlight potential
issues. Then, to ease implementation burden, assume the adversary can glitch
to all-1's or all-0's (since these are considered "easier" to reach), and
that reset can be asserted semi-arbitrarily.
2. Use parity/ECC on memories and data paths (note here that ECC is not true
integrity, have to use hash to prevent forgery, see [[1](#ref-1)]). For memories, ECC
is helpful to protect instruction streams or values that can cause
"branching control flows" that redirect execution flow. Parity is
potentially helpful if detection of corruption is adequate (though
double-glitch fault injection can fool parity, so Hsiao or other
detect-2-error codes can be used, even without correction circuitry
implemented). When committing to an irrevocable action (e.g. burning into
OTP, unlocking part of the device/increasing permissions), ECC is probably
more appropriate.
1. When selecting a specific ECC implementation, the error detection
properties are likely more important than error correction (assuming
memory lifetime retention/wear are not considered). For a good example
of how to consider the effectiveness of error correction, see
[this PR comment](https://github.com/lowRISC/opentitan/pull/3899#issuecomment-716799810).
3. State machines:
1. Have a minimum Hamming distance for state machine transitions, to make
single bit faults non-effective
2. Use a
[sparsely populated state encoding](https://github.com/lowRISC/opentitan/blob/master/util/design/sparse-fsm-encode.py),
with all others marked invalid - see 11.1 about optimization concerns
when doing this though
3. All states could have the same Hamming weight, then can constantly check
for this property (or use ECC-type coding on state variable and check
this)
4. If waiting for a counter to expire to transition to the next state,
better if the terminal count that causes the transition is not
all-0/all-1. One could use an LFSR instead of a binary counter, but
debugging this can be a bit painful then
4. Maintain value-and-its-complement throughout datapath (sometimes called
"dual rail" logic), especially if unlocking/enabling something sensitive,
and continually check for validity/consistency of representation
5. Incorporate temporal randomness where possible (example: add delay cycles
based on LFSR around sensitive operations)
6. Run-it-twice and compare results for sensitive calculations
7. Redundancy - keep/store multiple copies of sensitive checks/data
8. For maximum sensitivity, compare combinational and sequential paths with
hair-trigger/one-shot latch of miscompare
9. Empty detection for OTP/flash (if needed, but especially for lifecycle
determination)
10. Avoid local resets / prefer larger reset domains, since a glitch on this
larger reset keeps more of the design "in sync." But, consider the
implications of any block with more than one reset domain (see also 9.1).
11. Similar to the "mix in" idea of 4.4, in any case where multiple contributing
"votes" are going to an enable/unlock decision, consider mixing them into
some cryptographic structure over time that will be diverted from its path
by attempts to glitch each vote. (Note: if the final outcome of this is
simply a wide-compare that produces a single-bit final unlock/enable vote
then this is only marginally helpful - since that vote is now the glitch
target. Finding a way to bind the final cryptographic result to the vote is
preferred, but potentially very difficult / impossible, depending on the
situation.)
12. When checking/creating a signal to
<span style="text-decoration:underline;">permit</span> some sensitive
operation, prefer that the checking logic is maximally volatile (e.g.
performs a lot of the calculation in a single cycle after a register), such
that a glitch prevents the operation. Whereas, when checking to
<span style="text-decoration:underline;">deny</span> a sensitive operation,
prefer that the checking logic is minimally volatile (is directly following
a register with minimal combinational logic), such that a glitch will be
recovered on the next clock and the denial will be continued/preserved.
13. CFI (control flow integrity) hardware can help protect a processor /
programmable peripheral from some types of glitch attacks. This topic is
very involved and beyond the scope of these guidelines, consult [[2](#ref-2)]
for an introduction to previous techniques.
14. Analog sensors (under/over-voltage, laser light, mesh breach, among others)
can be used to generate SoC-level alerts and/or inhibit sensitive
operations. Many of these sensors require calibration/trimming, or require
hysteresis circuits to prevent false-positives, so they may not be usable in
fast-reacting situations.
15. Running an operation (e.g. AES or KMAC) to completion, even with a detected
fault, is sometimes useful since it suppresses information for the adversary
about the success/failure of the attempted fault, and minimizes any timing
side channel. However, for some operations (e.g. ECDSA sign), operations on
faulty inputs can have catastrophic consequences. These guidelines cannot
recommend a default-safe posture, but each decision about handling detected
faults should be carefully considered.
16. For request-acknowledge interfaces, monitor the acknowledge line for
spurious pulses at all times (not only when pending request) and use this
as a glitch/fault detector to escalate locally and/or generate alerts.
17. When arbitrating between two or more transaction sources with different
privilege/access levels, consider how to protect a request from one source
being glitched/forged to masquerade as being sourced from another
higher-privilege source (for example, to return side-loaded
hardware-visible-only data via a software read path). At a minimum,
redundant arbitration and multiple-bit encoding of the arbitration "winner"
can help to mitigate this type of attack.
### **Recommendation 4**: Handling of secrets
1. Diversify types/sources of secrets (e.g. use combination of RTL constants +
OTP + flash) to prevent a single compromise from being effective
2. Rather than "check an unlock value directly" - use a hash function with a
user-supplied input, and check the output of the hash matches. This way the
unlock value is not contained in the netlist.
3. Qualify operations with allowed lifecycle state (even if redundant with
other checks)
4. Where possible, mix in operating modes to calculation of derived secrets to
create parallel/non-substitutable operating/keying domains. (i.e. mixing in
devmode, lifecycle state)
1. If defenses can be bypassed for debugging/recovery, considering mixing
in activation vector/bypass bits of defenses as well, consider this like
small-scale attestation of device state
5. Encrypt (or at least scramble) any secrets stored at-rest in flash/OTP, to
reduce risks of static/offline inspection.
### **Recommendation 5**: Alerts
1. Generate alerts on any detected anomaly (need to define what
priority/severity should be assigned)
2. Where possible, prefer to take a local action (clearing/randomizing state,
cease processing) in addition to generating the alert
### **Recommendation 6**: Safe default values
1. All case statements, if statements, and ternaries should consider what the
safest default value is. Having an "invalid" state/value is nice to have for
this purpose, but isn't always possible.
2. Operate in a general policy/philosophy of starting with lowest allowed
privilege and augmenting by approvals/unlocks.
3. Implement enforcement of inputs on CSRs - qualify/force data attempted to be
written based on lifecycle state, peripheral state, or other values. The
designer must determine the safest remapping, e.g. write --> read, read -->
nop, write --> nop and so forth. Blanket implementation of input enforcement
complicates verification, so this style of design should be chosen only
where the inputs are particularly sensitive (requests to unlock, privilege
increase requests, debug mode enables, etc).
### **Recommendation 7**: DFT issues
1. Entry and exit from scan mode should cause a reset to prevent insertion or
exfiltration of sensitive values
2. Ensure that when in production (e.g. not in lab debug) environments, scan
chains are disabled
3. Processor debug paths (via JTAG) may need to be disabled in production modes
4. Beware of self-repair or redundant-row/columns schemes for memories (SRAM
and OTP), as they can be exploited to misdirect reads to
adversary-controlled locations
### **Recommendation 8**: Power management issues
1. If module is not in an always-on power domain, consider that a sleep/wake
sequence can be used to force a re-derivation of secrets needed in the
module, as many times as desired by the adversary
2. Fine-grained clock gating should never be used for any module that processes
secret data, only coarse-grained (module-level) gating is acceptable. (Fine
grained gates essentially compute `clock_gate = D ^ Q` which often acts as
an SCA "amplifier").
### **Recommendation 9**: "Synchronization" (consistency) issues
1. If a module interacts with other modules in a stateful way (think of two
data-transfer counters moving in ~lockstep, but the counts are not sent back
and forth for performance optimization), what happens if:
1. One side is reset and the other is not
2. One side is clock-gated and the other is not
3. One side is power-gated and the other is not
4. The counter on one side is glitched
2. Generally these kind of blind lockstep situations should be avoided where
possible, and current module/interface status should exchanged in both
directions and constantly checked for validity/consistency
### **Recommendation 10**: Recovery mechanism considerations
1. What happens if a security mechanism fails? (Classic problem of this variety
is on-die sensors being too sensitive and resetting the chip) Traditionally,
fuses can disable some mechanisms if they are faulty.
2. Could an adversary exploit a recovery mechanism? (If a sensor can be
fuse-disabled, wouldn't the adversary just do that? See 4.4 above.)
### **Recommendation 11**: Optimization concerns
1. Sometimes synthesis will optimize away redundant (but necessary for
security) logic - `dont_touch` or `size_only` attributes may sometimes be
needed, or even more aggressive preservation strategies. Example: when using
the sparse FSM encoding, use the `prim_flop` component for the state vector
register.
2. Value-and-complement strategies can also be optimized away, or partially
disconnected such that only half of the datapath is contributing to the
logic, or a single register with both Q & Qbar outputs becomes the source of
both values to save area.
3. Retiming around pipeline registers can create DPA issues, due to inadvertent
combination of shares, or intra-cycle glitchy evaluation. For DPA-resistant
logic, explicitly declare functions and registers using `prim_*` components,
and make sure that pipeline retiming is not enabled in synthesis.
### **Recommendation 12**: Entropy concerns
1. Verify that all nonces are truly only used once
2. If entropy is broadcast, verify list of consumers and arbitration scheme to
prevent reuse / duplicate use of entropy in sensitive calculations
3. Seeds for local LFSRs need to be unique/diversified
### **Recommendation 13**: Global secrets
1. Avoid if at all possible
2. If not possible, have a process to generate/re-generate them; make sure this
process is used/tested many times before final netlist; process must be
repeatable/deterministic given some set of inputs
3. If architecturally feasible, install a device-specific secret to override
the global secret once boot-strapped (and disable the global secret)
### **Recommendation 14**: Sensors
1. Sensors need to be adjusted/tweaked so that they actually fire. It is
challenging to set the sensors at levels that detect "interesting"
glitches/environmental effects, but don't fire constantly or cause yield
issues. Security team should work with the silicon supplier to determine the
best course of action here.
2. Sensor configuration / calibration data should be integrity-protected.
## References and further reading
[<span id="ref-1">1</span>]: Overview of checksums and hashes -
https://cybergibbons.com/reverse-engineering-2/checksums-hashes-and-security/
[<span id="ref-2">2</span>]: A Survey of hardware-based Control Flow Integrity -
https://arxiv.org/pdf/1706.07257.pdf
[<span id="ref-3">3</span>]: Cache-timing attacks on AES -
https://cr.yp.to/antiforgery/cachetiming-20050414.pdf
[<span id="ref-4">4</span>]: Meltdown: Reading Kernel Memory from User Space -
https://meltdownattack.com/meltdown.pdf
[<span id="ref-5">5</span>]: Spectre Attacks: Exploiting Speculative Execution -
https://spectreattack.com/spectre.pdf
[<span id="ref-6">6</span>]: Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other
Systems - https://www.rambus.com/wp-content/uploads/2015/08/TimingAttacks.pdf
[<span id="ref-7">7</span>]: Differential Power Analysis -
https://paulkocher.com/doc/DifferentialPowerAnalysis.pdf
[<span id="ref-8">8</span>]: SoC it to EM: electromagnetic side-channel attacks on a complex
system-on-chip - https://www.iacr.org/archive/ches2015/92930599/92930599.pdf
[<span id="ref-9">9</span>]: Introduction To differential power analysis -
https://link.springer.com/content/pdf/10.1007/s13389-011-0006-y.pdf
[<span id="ref-10">10</span>]: Principles of Secure Processor Architecture Design -
https://caslab.csl.yale.edu/tutorials/hpca2019/ and
https://caslab.csl.yale.edu/tutorials/hpca2019/tutorial_principles_sec_arch_20190217.pdf
[<span id="ref-11">11</span>]: Time Protection - https://ts.data61.csiro.au/projects/TS/timeprotection/
[<span id="ref-12">12</span>]: Fault Attacks on Secure Embedded Software: Threats, Design and Evaluation -
https://arxiv.org/pdf/2003.10513.pdf
[<span id="ref-13">13</span>]: The Sorcerer's Apprentice Guide to Fault Attacks -
https://eprint.iacr.org/2004/100.pdf
[<span id="ref-14">14</span>]: Fault Mitigation Patterns -
https://www.riscure.com/uploads/2020/05/Riscure_Whitepaper_Fault_Mitigation_Patterns_final.pdf
[<span id="ref-15">15</span>]: SIFA: Exploiting Ineffective Fault Inductions on Symmetric Cryptography -
https://eprint.iacr.org/2018/071.pdf
<!-- Footnotes themselves at the bottom. -->
## Notes
[^1]: In other OpenTitan documents, the combination of technical and
administrative defense are often referred to as "logical security"