blob: 3e4fff367e16aea0d593c7c38dd911eb935d58bd [file] [log] [blame] [view]
Hugo McNallyf6298b32023-02-12 14:47:22 +00001# AES HWIP Technical Specification
Pirmin Vogelce136622019-10-09 15:51:06 +01002
3
Garret Kelly9eebde02019-10-22 15:36:49 -04004# Overview
Pirmin Vogelce136622019-10-09 15:51:06 +01005
6This document specifies the AES hardware IP functionality.
7[Advanced Encryption Standard (AES)](https://www.nist.gov/publications/advanced-encryption-standard-aes) is the primary symmetric encryption and decryption mechanism used in OpenTitan protocols.
Greg Chadwick202f1f22019-10-28 15:56:11 +00008The AES unit is a cryptographic accelerator that accepts requests from the processor to encrypt or decrypt 16 byte blocks of data.
Hugo McNallyaef0a662023-02-11 19:44:55 +00009It is attached to the chip interconnect bus as a peripheral module and conforms to the [Comportable guideline for peripheral functionality.](../../../doc/contributing/hw/comportability/README.md)
Pirmin Vogelce136622019-10-09 15:51:06 +010010
11
Garret Kelly9eebde02019-10-22 15:36:49 -040012## Features
Pirmin Vogelce136622019-10-09 15:51:06 +010013
14The AES unit supports the following features:
15
Pirmin Vogel94671dd2020-02-17 17:01:06 +010016- Encryption/Decryption using AES-128/192/256 in the following cipher block modes:
17 - Electronic Codebook (ECB) mode,
Pirmin Vogeld8f8e3e2020-07-28 17:14:43 +020018 - Cipher Block Chaining (CBC) mode,
19 - Cipher Feedback (CFB) mode (fixed data segment size of 128 bits, i.e., CFB-128),
20 - Output Feedback (OFB) mode, and
Pirmin Vogel94671dd2020-02-17 17:01:06 +010021 - Counter (CTR) mode.
Pirmin Vogele825e6c2019-11-12 11:22:19 +010022- Support for AES-192 can be removed to save area, and is enabled/disabled using a compile-time Verilog parameter
Hugo McNally544e7a62023-02-12 01:12:36 +000023- First-order masking of the cipher core using domain-oriented masking (DOM) to aggravate side-channel analysis (SCA), can optionally be disabled using compile-time Verilog parameters (for more details see [Security Hardening below](#side-channel-analysis))
Pirmin Vogel2e6a6c62021-03-23 15:43:08 +010024- Latency per 16 byte data block of 12/14/16 clock cycles (unmasked implementation) and 56/66/72 clock cycles (DOM) in AES-128/192/256 mode
Pirmin Vogel72f98d62022-02-04 11:59:22 +010025- Automatic as well as software-initiated reseeding of internal pseudo-random number generators (PRNGs) with configurable reseeding rate resulting in max entropy consumption rates ranging from 286 Mbit/s to 0.035 Mbit/s (at 100 MHz).
Hugo McNally544e7a62023-02-12 01:12:36 +000026- Countermeasures for aggravating fault injection (FI) on the control path (for more details see [Security Hardening below](#fault-injection))
Pirmin Vogelce136622019-10-09 15:51:06 +010027- Register-based data and control interface
Pirmin Vogel992f9332021-09-08 09:02:18 +020028- System key-manager interface for optional key sideload to not expose key material to the processor and other hosts attached to the system bus interconnect.
Hugo McNally544e7a62023-02-12 01:12:36 +000029- On-the-fly round-key generation in parallel to the actual encryption/decryption from a single initial 128/192/256-bit key provided through the register interface (for more details see [Theory of Operations below](#theory-of-operations))
Pirmin Vogelce136622019-10-09 15:51:06 +010030
Pirmin Vogel2e6a6c62021-03-23 15:43:08 +010031This AES unit targets medium performance (16 parallel S-Boxes, \~1 cycle per round for the unmasked implementation, \~5 cycles per round for the DOM implementation).
Philipp Wagnere92416d2020-01-17 12:47:21 +000032High-speed, single-cycle operation for high-bandwidth data streaming is not required.
Pirmin Vogelce136622019-10-09 15:51:06 +010033
Pirmin Vogeld8f8e3e2020-07-28 17:14:43 +020034Cipher modes other than ECB, CBC, CFB, OFB and CTR are beyond this version of the AES unit but might be supported in future versions.
Pirmin Vogelce136622019-10-09 15:51:06 +010035
36
Garret Kelly9eebde02019-10-22 15:36:49 -040037## Description
Pirmin Vogelce136622019-10-09 15:51:06 +010038
39The AES unit is a cryptographic accelerator that accepts requests from the processor to encrypt or decrypt 16B blocks of data.
Pirmin Vogeld8f8e3e2020-07-28 17:14:43 +020040It supports AES-128/192/256 in Electronic Codebook (ECB) mode, Cipher Block Chaining (CBC) mode, Cipher Feedback (CFB) mode (fixed data segment size of 128 bits, i.e., CFB-128), Output Feedback (OFB) mode and Counter (CTR) mode.
Pirmin Vogel94671dd2020-02-17 17:01:06 +010041For more information on these cipher modes, refer to [Recommendation for Block Cipher Modes of Operation](https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf).
Pirmin Vogelce136622019-10-09 15:51:06 +010042Other cipher modes might be added in future versions.
43
Pirmin Vogele825e6c2019-11-12 11:22:19 +010044The AES unit is attached to the chip interconnect bus as a peripheral module.
Pirmin Vogelce136622019-10-09 15:51:06 +010045Communication with the processor happens through a set of control and status registers (CSRs).
46This includes input/output data and key, as well as status and control information.
Pirmin Vogele825e6c2019-11-12 11:22:19 +010047Future versions of the AES unit might include a separate interface through which a possible system key manager can provide the key without exposing it to the processor or other hosts attached to the system bus interconnect.
Pirmin Vogelce136622019-10-09 15:51:06 +010048
49
Garret Kelly9eebde02019-10-22 15:36:49 -040050# Theory of Operations
Pirmin Vogelce136622019-10-09 15:51:06 +010051
Pirmin Vogeld8f8e3e2020-07-28 17:14:43 +020052The AES unit supports both encryption and decryption for AES-128/192/256 in ECB, CBC, CFB, OFB and CTR modes using a single, shared data path.
Pirmin Vogelce136622019-10-09 15:51:06 +010053That is, it can either do encryption or decryption but not both at the same time.
54
55The AES unit features a key expanding mechanism to generate the required round keys on-the-fly from a single initial key provided through the register interface.
56This means the processor needs to provide just the initial encryption key to the AES unit via register interface.
57The AES unit then uses this key to generate all round keys as they are needed in parallel to the actual encryption/decryption.
58The benefits of this design compared to passing all round keys via register interface include:
59
60- Reduced storage requirements and smaller circuit area: Instead of storing 15 128-bit round keys, only 3 256-bit key registers are required for AES-256:
Timothy Chend0250202019-10-31 21:32:18 -070061 - one set of registers to which the processor writes the initial key, i.e., the start key for encryption,
62 - one set of registers to hold the current full key, and
63 - one set of registers to hold the full key of the last encryption round, i.e., the start key for decryption.
Pirmin Vogelce136622019-10-09 15:51:06 +010064- Faster re-configuration and key switching: The core just needs to perform 8 write operations instead of 60 write operations for AES-256.
65
Pirmin Vogel94671dd2020-02-17 17:01:06 +010066On-the-fly round-key generation comes however at the price of an initial delay whenever the key is changed by the processor before the AES unit can perform ECB/CBC **decryption** using this new key.
Pirmin Vogelce136622019-10-09 15:51:06 +010067During this phase, the key expanding mechanism iteratively computes the start key for the decryption.
68The duration of this delay phase corresponds to the latency required for encrypting one 16B block (i.e., 12/14/16 cycles for AES-128/192/256).
69Once the start key for decryption has been computed, it is stored in a dedicated internal register for later use.
70The AES unit can then switch between decryption and encryption without additional overhead.
71
Pirmin Vogeld8f8e3e2020-07-28 17:14:43 +020072For encryption or if the mode is set to CFB, OFB or CTR, there is no such initial delay upon changing the key.
Pirmin Vogel94671dd2020-02-17 17:01:06 +010073If the next operation after a key switch is ECB or CBC **decryption**, the AES unit automatically initiates a key expansion using the key schedule first (to generate the start key for decryption, the actual data path remains idle during that phase).
Pirmin Vogelce136622019-10-09 15:51:06 +010074
75The AES unit uses a status register to indicate to the processor when ready to receive the next input data block via the register interface.
76While the AES unit is performing encryption/decryption of a data block, it is safe for the processor to provide the next input data block.
Pirmin Vogel0d991562020-01-22 16:43:21 +010077The AES unit automatically starts the encryption/decryption of the next data block once the previous encryption/decryption is finished and new input data is available.
78The order in which the input registers are written does not matter.
Pirmin Vogel94671dd2020-02-17 17:01:06 +010079Every input register must be written at least once for the AES unit to automatically start encryption/decryption.
Pirmin Vogel0d991562020-01-22 16:43:21 +010080This is the default behavior.
Hugo McNally6321c5e2023-02-16 21:39:55 +000081It can be disabled by setting the MANUAL_OPERATION bit in [`CTRL_SHADOWED`](data/aes.hjson#ctrl_shadowed) to `1`.
82In this case, the AES unit only starts the encryption/decryption once the START bit in [`TRIGGER`](data/aes.hjson#trigger) is set to `1` (automatically cleared to `0` once the next encryption/decryption is started).
Pirmin Vogel0d991562020-01-22 16:43:21 +010083
Pirmin Vogelce136622019-10-09 15:51:06 +010084Similarly, the AES unit indicates via a status register when having new output data available to be read by the processor.
85Also, there is a back-pressure mechanism for the output data.
86If the AES unit wants to finish the encryption/decryption of a data block but the previous output data has not yet been read by the processor, the AES unit is stalled.
Pirmin Vogele825e6c2019-11-12 11:22:19 +010087It hangs and does not drop data.
Pirmin Vogelce136622019-10-09 15:51:06 +010088It only continues once the previous output data has been read and the corresponding registers can be safely overwritten.
Pirmin Vogel0d991562020-01-22 16:43:21 +010089The order in which the output registers are read does not matter.
Pirmin Vogel94671dd2020-02-17 17:01:06 +010090Every output register must be read at least once for the AES unit to continue.
Pirmin Vogele825e6c2019-11-12 11:22:19 +010091This is the default behavior.
Hugo McNally6321c5e2023-02-16 21:39:55 +000092It can be disabled by setting the MANUAL_OPERATION bit in [`CTRL_SHADOWED`](data/aes.hjson#ctrl_shadowed) to `1`.
Pirmin Vogele825e6c2019-11-12 11:22:19 +010093In this case, the AES unit never stalls and just overwrites previous output data, independent of whether it has been read or not.
Pirmin Vogelce136622019-10-09 15:51:06 +010094
95
Garret Kelly9eebde02019-10-22 15:36:49 -040096## Block Diagram
Pirmin Vogelce136622019-10-09 15:51:06 +010097
Pirmin Vogel2e6a6c62021-03-23 15:43:08 +010098This AES unit targets medium performance (\~1 cycle per round for the unmasked implementation).
Pirmin Vogelce136622019-10-09 15:51:06 +010099High-speed, single-cycle operation for high-bandwidth data streaming is not required.
100
Pirmin Vogel94671dd2020-02-17 17:01:06 +0100101Therefore, the AES unit uses an iterative cipher core architecture with a 128-bit wide data path as shown in the figure below.
Pirmin Vogel2e6a6c62021-03-23 15:43:08 +0100102Note that for the sake of simplicity, the figure shows the unmasked implementation.
Hugo McNally544e7a62023-02-12 01:12:36 +0000103For details on the masked implementation of the cipher core refer to [Security Hardening below](#security-hardening)).
Pirmin Vogelce136622019-10-09 15:51:06 +0100104Using an iterative architecture allows for a smaller circuit area at the cost of throughput.
Pirmin Vogel2e6a6c62021-03-23 15:43:08 +0100105Employing a 128-bit wide data path allows to achieve the latency requirements of 12/14/16 clock cycles per 16B data block in AES-128/192/256 mode in the unmasked implementation, respectively.
Pirmin Vogelce136622019-10-09 15:51:06 +0100106
Hugo McNallyaef0a662023-02-11 19:44:55 +0000107![AES unit block diagram (unmasked implementation) with shared data paths for encryption and decryption (using the Equivalent Inverse Cipher).](./doc/aes_block_diagram.svg)
Pirmin Vogelce136622019-10-09 15:51:06 +0100108
Pirmin Vogel94671dd2020-02-17 17:01:06 +0100109Inside the cipher core, both the data paths for the actual cipher (left) and the round key generation (right) are shared between encryption and decryption.
Pirmin Vogelce136622019-10-09 15:51:06 +0100110Consequently, the blocks shown in the diagram always implement the forward and backward (inverse) version of the corresponding operation.
111For example, SubBytes implements both SubBytes and InvSubBytes.
112
Pirmin Vogel94671dd2020-02-17 17:01:06 +0100113Besides the actual AES cipher core, the AES unit features a set of control and status registers (CSRs) accessible by the processor via TL-UL bus interface, and a counter module (used in CTR mode only).
Pirmin Vogel22cb74d2020-06-15 11:52:50 +0200114This counter module implements the Standard Incrementing Function according to [Recommendation for Block Cipher Modes of Operation (Appendix B.1)](https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf) with a fixed parameter m = 128.
115Note that for AES, parameter b = 128 and the counter increment is big-endian.
Pirmin Vogeld8f8e3e2020-07-28 17:14:43 +0200116CFB mode is supported with a fixed parameter s = 128 (CFB-128).
117Support for data segment sizes other than 128 bits would require a substantial amount of additional muxing resources and is thus not provided.
118The initialization vector (IV) register and the register to hold the previous input data are used in CBC, CFB, OFB and CTR modes only.
Pirmin Vogel94671dd2020-02-17 17:01:06 +0100119
Pirmin Vogele825e6c2019-11-12 11:22:19 +0100120
Timothy Chend0250202019-10-31 21:32:18 -0700121## Hardware Interfaces
122
Hugo McNallyba16bae2023-02-12 21:08:04 +0000123* [Interface Tables](data/aes.hjson#interfaces)
Timothy Chend0250202019-10-31 21:32:18 -0700124
Pirmin Vogel317ddc92021-02-26 17:07:49 +0100125The table below lists other signals of the AES unit.
126
127Signal | Direction | Type | Description
128-------------------|------------------|------------------------|---------------
129`idle_o` | `output` | `logic` | Idle indication signal for clock manager.
Hugo McNallyaef0a662023-02-11 19:44:55 +0000130`lc_escalate_en_i` | `input` | `lc_ctrl_pkg::lc_tx_t` | Life cycle escalation enable coming from [life cycle controller](../lc_ctrl/README.md). This signal moves the main controller FSM within the AES unit into the terminal error state. The AES unit needs to be reset.
131`edn_o` | `output` | `edn_pkg::edn_req_t` | Entropy request to [entropy distribution network (EDN)](../edn/README.md) for reseeding internal pseudo-random number generators (PRNGs) used for register clearing and masking.
132`edn_i` | `input` | `edn_pkg::edn_rsp_t` | [EDN](../edn/README.md) acknowledgment and entropy input for reseeding internal PRNGs.
133`keymgr_key_i` | `input` | `keymgr_pgk::hw_key_req_t` | Key sideload request coming from [key manager](../keymgr/README.md).
Pirmin Vogel317ddc92021-02-26 17:07:49 +0100134
Hugo McNallyaef0a662023-02-11 19:44:55 +0000135Note that the `edn_o` and `edn_i` signals used to interface [EDN](../edn/README.md) follow a REQ/ACK protocol.
136The entropy distributed by EDN is obtained from the [cryptographically secure random number generator (CSRNG)](../csrng/README.md).
Timothy Chend0250202019-10-31 21:32:18 -0700137
138## Design Details
139
140This section discusses different design details of the AES module.
141
142
143### Datapath Architecture and Operation
144
Pirmin Vogelce136622019-10-09 15:51:06 +0100145The AES unit implements the Equivalent Inverse Cipher described in the [AES specification](https://csrc.nist.gov/csrc/media/publications/fips/197/final/documents/fips-197.pdf).
Pirmin Vogel6f105622020-01-27 18:49:24 +0100146This allows for more efficient cipher data path sharing between encryption/decryption as the operations are applied in the same order (less muxes, simpler control), but requires the round key during decryption to be transformed using an inverse MixColumns in all rounds except for the first and the last one.
Pirmin Vogelce136622019-10-09 15:51:06 +0100147
148This architectural choice targets at efficient cipher data path sharing and low area footprint.
149Depending on the application scenario, other architectures might offer a more suitable area/performance tradeoff.
Pirmin Vogeld8f8e3e2020-07-28 17:14:43 +0200150For example if only CFB, OFB or CTR modes are ever used, the inverse cipher is not used at all.
Pirmin Vogelce136622019-10-09 15:51:06 +0100151Moreover, if the key is changed extremely rarely (as for example in the case of bulk decryption), it may pay off to store all round keys instead of generating them on the fly.
Pirmin Vogel94671dd2020-02-17 17:01:06 +0100152Future versions of the AES unit might offer compile-time parameters to selectively instantiate the forward/inverse cipher part only to allow for dedicated encryption/decryption-only units.
Pirmin Vogelce136622019-10-09 15:51:06 +0100153
154All submodules in the data path are purely combinational.
155The only sequential logic in the cipher and round key generation are the State, Full Key and Decryption Key registers.
156
Pirmin Vogelf41f5e32020-06-16 16:09:36 +0200157The following description explains how the AES unit operates, i.e., how the operation of the AES cipher is mapped to the datapath architecture of the AES unit.
158Phrases in italics apply to peculiarities of different block cipher modes.
159For a general introduction into these cipher modes, refer to [Recommendation for Block Cipher Modes of Operation](https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf).
Pirmin Vogelce136622019-10-09 15:51:06 +0100160
Pirmin Vogel42610c02020-07-30 15:36:55 +02001611. The configuration and initial key is provided to the AES unit via a set of control and status registers (CSRs) accessible by the processor via TL-UL bus interface.
Hugo McNally6321c5e2023-02-16 21:39:55 +0000162 The processor must first provide the configuration to the [`CTRL_SHADOWED`](data/aes.hjson#ctrl_shadowed) register.
Pirmin Vogel42610c02020-07-30 15:36:55 +0200163 Then follows the initial key.
Pirmin Vogel94671dd2020-02-17 17:01:06 +0100164 Each key register must be written at least once.
Pirmin Vogel0d991562020-01-22 16:43:21 +0100165 The order in which the registers are written does not matter.
Pirmin Vogeld8f8e3e2020-07-28 17:14:43 +02001661. _The processor provides the initialization vector (IV) or initial counter value to the four IV registers via TL-UL bus interface in CBC, CFB and OFB modes, or CTR mode, respectively.
Pirmin Vogel94671dd2020-02-17 17:01:06 +0100167 Each IV register must be written at least once.
168 The order in which the registers are written does not matter.
169 Note that while operating, the AES unit automatically updates the IV registers after having consumed the current IV value.
Pirmin Vogel22cb74d2020-06-15 11:52:50 +0200170 Whenever a new message is started, the processor must provide the corresponding IV value via TL-UL bus interface.
Pirmin Vogel94671dd2020-02-17 17:01:06 +0100171 In ECB mode, no IV needs to be provided.
172 The content of the IV registers is ignored in ECB mode._
Pirmin Vogel0d991562020-01-22 16:43:21 +01001731. The input data is provided to the AES unit via four CSRs.
Pirmin Vogel94671dd2020-02-17 17:01:06 +0100174 Each input register must be written at least once.
Pirmin Vogel0d991562020-01-22 16:43:21 +0100175 The order in which the registers are written does not matter.
Pirmin Vogel94671dd2020-02-17 17:01:06 +01001761. If new input data is available, the AES unit automatically starts encryption/decryption by performing the following actions.
177 1. The AES unit loads initial state into the State register inside the cipher core.
178
179 _Depending on the cipher mode, the initial state is a combination of input data as well as IV._
Pirmin Vogeld8f8e3e2020-07-28 17:14:43 +0200180 _Note, if CBC decryption is performed, or if running in CFB, OFB or CTR mode, the input data is also registered (Data In Prev in the block diagram)._
Pirmin Vogel94671dd2020-02-17 17:01:06 +0100181 2. The initial key is loaded into the Full Key register inside the cipher core.
182
183 _Note, if the ECB/CBC decryption is performed, the Full Key register is loaded with the value stored in the Decryption Key register._
184
Pirmin Vogel040316e2020-08-04 21:17:05 +0200185 _Note, for the AES unit to automatically start in CBC, CFB, OFB or CTR mode, also the IV must be ready.
Pirmin Vogel94671dd2020-02-17 17:01:06 +0100186 The IV is ready if -- since the last IV update (either done by the processor or the AES unit itself) -- all IV registers have been written at least once or none of them.
187 The AES unit will not automatically start the next encryption/decryption with a partially updated IV._
188
Hugo McNally6321c5e2023-02-16 21:39:55 +0000189 By setting the MANUAL_OPERATION bit in [`CTRL_SHADOWED`](data/aes.hjson#ctrl_shadowed) to `1`, the AES unit can be operated in manual mode.
190 In manual mode, the AES unit starts encryption/decryption whenever the START bit in [`TRIGGER`](data/aes.hjson#trigger) is set to `1`, irrespective of the status of the IV and input data registers.
Pirmin Vogel94671dd2020-02-17 17:01:06 +0100191
1921. Once the State and Full Key registers have been loaded, the AES cipher core starts the encryption/decryption by adding the first round key to the initial state (all blocks in both data paths are bypassed).
Pirmin Vogelce136622019-10-09 15:51:06 +0100193 The result is stored back in the State register.
Pirmin Vogel94671dd2020-02-17 17:01:06 +01001941. Then, the AES cipher core performs 9/11/13 rounds of encryption/decryption when using a 128/192/256-bit key, respectively.
Pirmin Vogelce136622019-10-09 15:51:06 +0100195 In every round, the cipher data path performs the four following transformations.
196 For more details, refer to the [AES specification](https://csrc.nist.gov/csrc/media/publications/fips/197/final/documents/fips-197.pdf).
197 1. SubBytes Transformation: A non-linear byte substitution that operates independently on each byte of the state using a substitution table (S-Box).
198 2. ShiftRows Transformation: The bytes of the last three rows of the state are cyclically shifted over different offsets.
Pirmin Vogel15b07492019-12-23 18:03:19 +0100199 3. MixColumns Transformation: Each of the four columns of the state are considered as polynomials over GF(2^8) and individually multiplied with another fixed polynomial.
Pirmin Vogelce136622019-10-09 15:51:06 +0100200 4. AddRoundKey Transformation: The round key is XORed with the output of the MixColumns operation and stored back into the State register.
201 The 128-bit round key itself is extracted from the current value in the Full Key register.
Pirmin Vogel94671dd2020-02-17 17:01:06 +0100202
203 In parallel, the full key used for the next round is computed on the fly using the key expand module.
204
205 _If running in CTR mode, the counter module iteratively updates the IV in parallel to the cipher core performing encryption/decryption.
206 Internally, the counter module uses one 16-bit counter, meaning it requires 8 clock cycles to increment the 128-bit counter value stored in the IV register.
207 Since the counter value is used in the first round only, and since the encryption/decryption of a single block takes 12/14/16 cycles, the iterative counter implementation does not affect the throughput of the AES unit._
2081. Finally, the AES cipher core performs the final encryption/decryption round in which the MixColumns operation is skipped.
209 The output is forwarded to the output register in the CSRs but not stored back into the State register.
Pirmin Vogel96386a12020-03-30 17:56:12 +0200210 The internal State register is cleared with pseudo-random data.
Pirmin Vogel94671dd2020-02-17 17:01:06 +0100211
Pirmin Vogeld8f8e3e2020-07-28 17:14:43 +0200212 _Depending on the cipher mode, the output of the final round is potentially XORed with either the value in the IV registers (CBC decryption) or the value stored in the previous input data register (CFB, OFB, CTR modes), before being forwarded to the output register in the CSRs.
213 If running in CBC mode, the IV registers are updated with the output data (encryption) or the value stored in the previous input data register (decryption).
214 If running in CFB or OFB mode, the IV registers are updated with the output data or the output of the final cipher round (before XORing with the previous input data), respectively._
Pirmin Vogelce136622019-10-09 15:51:06 +0100215
216Having separate registers for input, output and internal state prevents the extraction of intermediate state via TL-UL bus interface and allows to overlap reconfiguration with operation.
217While the AES unit is performing encryption/decryption, the processor can safely write the next input data block into the CSRs or read the previous output data block from the CSRs.
218The State register is internal to the AES unit and not exposed via the TL-UL bus interface.
219If the AES unit wants to finish the encryption/decryption of an output data block but the previous one has not yet been read by the processor, the AES unit is stalled.
Pirmin Vogele825e6c2019-11-12 11:22:19 +0100220It hangs and does not drop data.
Pirmin Vogelce136622019-10-09 15:51:06 +0100221It only continues once the previous output data has been read and the corresponding registers can be safely overwritten.
Pirmin Vogel0d991562020-01-22 16:43:21 +0100222The order in which the output registers are read does not matter.
Pirmin Vogel94671dd2020-02-17 17:01:06 +0100223Every output register must be read at least once for the AES unit to continue.
224In contrast, the initial key, and control register can only be updated if the AES unit is idle, which eases design verification (DV).
225Similarly, the initialization vector (IV) register can only be updated by the processor if the AES unit is idle.
226If the AES unit is busy and running in CBC or CTR mode, the AES unit itself updates the IV register.
Pirmin Vogelce136622019-10-09 15:51:06 +0100227
Pirmin Vogel94671dd2020-02-17 17:01:06 +0100228The cipher core architecture of the AES unit is derived from the architecture proposed by Satoh et al.: ["A compact Rijndael Hardware Architecture with S-Box Optimization"](https://link.springer.com/chapter/10.1007%2F3-540-45682-1_15).
Pirmin Vogel2e6a6c62021-03-23 15:43:08 +0100229The expected circuit area in a 110nm CMOS technology is in the order of 12 - 22 kGE (unmasked implementation, AES-128 only).
Pirmin Vogel72f98d62022-02-04 11:59:22 +0100230The expected circuit area of the entire AES unit with masking enabled is around 110 kGE.
Pirmin Vogelce136622019-10-09 15:51:06 +0100231
232For a description of the various sub modules, see the following sections.
233
234
235### SubBytes / S-Box
236
237The SubBytes operation is a non-linear byte substitution that operates independently on each byte of the state using a substitution table (S-Box).
Pirmin Vogelce136622019-10-09 15:51:06 +0100238It is both used for the cipher data path and the key expand data path.
Pirmin Vogel22cb74d2020-06-15 11:52:50 +0200239In total, the AES unit instantiates 20 S-Boxes in parallel (16 for SubBytes, 4 for KeyExpand), each having 8-bit input and output.
240In combination with the 128-bit wide data path, this allows to perform one AES round per iteration.
Pirmin Vogelce136622019-10-09 15:51:06 +0100241
Pirmin Vogelb7c706c2019-12-20 17:58:21 +0100242The design of this S-Box and its inverse can have a big impact on circuit area, timing critical path, robustness and power leakage, and is itself its own research topic.
243
Pirmin Vogel2e6a6c62021-03-23 15:43:08 +0100244The S-Boxes are decoupled from the rest of the AES unit with a handshake protocol, allowing them to be easily replaced by different implementations if required.
245The AES unit comes with the following S-Box implementations that can be selected by a compile-time Verilog parameter:
246- Domain-oriented masking (DOM) S-Box: default, see [Gross et al.: "Domain-Oriented Masking: Compact Masked Hardware Implementations with Arbitrary Protection Order"](https://eprint.iacr.org/2016/486.pdf)
247- Masked Canright S-Box: provided for reference, usage discouraged, a version w/ and w/o mask re-use is provided, see [Canright and Batina: "A very compact "perfectly masked" S-Box for AES (corrected)"](https://eprint.iacr.org/2009/011.pdf)
248- Canright S-Box: only use when disabling masking, recommended when targeting ASIC implementation, see [Canright: "A very compact Rijndael S-Box"](https://hdl.handle.net/10945/25608)
249- LUT-based S-Box: only use when disabling masking, recommended when targeting FPGA implementation
Pirmin Vogelb7c706c2019-12-20 17:58:21 +0100250
Pirmin Vogel2e6a6c62021-03-23 15:43:08 +0100251The DOM S-Box has a latency of 5 clock cycles.
252All other implementations are fully combinational (one S-Box evaluation every clock cycle).
Hugo McNally544e7a62023-02-12 01:12:36 +0000253See also [Security Hardening below.](#1st-order-masking-of-the-cipher-core)
Pirmin Vogelce136622019-10-09 15:51:06 +0100254
255### ShiftRows
256
257The ShiftRows operation simply performs a cyclic shift of Rows 1, 2 and 3 of the state matrix.
258Consequently, it can be implemented using 3\*4 32-bit 2-input muxes (encryption/decryption).
259
260
261### MixColumns
262
263Each of the four columns of the state are considered as polynomials over GF(2^8) and individually multiplied with another fixed polynomial.
264The whole operation can be implemented using 36 2-input XORs and 16 4-input XORs (all 8-bit), 8 2-input muxes (8-bit), as well as 78 2-input and 24 3-input XOR gates.
265
266
267### KeyExpand
268
269The key expand module (KEM) integrated in the AES unit is responsible for generating the various round keys from the initial key for both encryption and decryption.
270The KEM generates the next 128/192/256-bit full key in parallel to the actual encryption/decryption based on the current full key or the initial key (for the first encryption round).
271The actual 128-bit round key is then extracted from this full key.
272
Pirmin Vogel94671dd2020-02-17 17:01:06 +0100273Generating the keys on-the-fly allows for lower storage requirements and smaller circuit area but comes at the price of an initial delay before doing ECB/CBC **decryption** whenever the key is changed.
Pirmin Vogelce136622019-10-09 15:51:06 +0100274During this phase, the KEM cycles through all full keys to obtain the start key for decryption (equals the key for final round of encryption).
275The duration of this delay phase corresponds to the latency required for encrypting one 16B block.
276During this initial phase, the cipher data path is kept idle.
277
278The timing diagram below visualizes this process.
279
Hugo McNallyc641c582023-02-12 22:49:10 +0000280```wavejson
Pirmin Vogelce136622019-10-09 15:51:06 +0100281{
282 signal: [
Pirmin Vogel6f105622020-01-27 18:49:24 +0100283 { name: 'clk', wave: 'p........|.......'},
Pirmin Vogelce136622019-10-09 15:51:06 +0100284 ['TL-UL IF',
Pirmin Vogel6f105622020-01-27 18:49:24 +0100285 { name: 'write', wave: '01...0...|.......'},
286 { name: 'addr', wave: 'x2345xxxx|xxxxxxx', data: 'K0 K1 K2 K3'},
287 { name: 'wdata', wave: 'x2345xxxx|xxxxxxx', data: 'K0 K1 K2 K3'},
Pirmin Vogelce136622019-10-09 15:51:06 +0100288 ],
289 {},
290 ['AES Unit',
Pirmin Vogel6f105622020-01-27 18:49:24 +0100291 { name: 'Config op', wave: 'x4...............', data: 'DECRYPT'},
292 { name: 'AES op', wave: '2........|.4.....', data: 'IDLE DECRYPT'},
293 { name: 'KEM op', wave: '2....3...|.4.....', data: 'IDLE ENCRYPT DECRYPT'},
294 { name: 'round', wave: 'xxxxx2.22|22.2222', data: '0 1 2 9 0 1 2 3 4'},
295 { name: 'key_init', wave: 'xxxx5....|.......', data: 'K0-3'},
296 { name: 'key_full', wave: 'xxxxx5222|4.22222', data: 'K0-3 f(K) f(K) f(K) K0-3\' f(K) f(K) f(K) f(K) f(K)'},
297 { name: 'key_dec', wave: 'xxxxxxxxx|4......', data: 'K0-3\''},
Pirmin Vogelce136622019-10-09 15:51:06 +0100298 ]
299 ]
300}
Hugo McNallyc641c582023-02-12 22:49:10 +0000301```
Pirmin Vogelce136622019-10-09 15:51:06 +0100302
Pirmin Vogel6f105622020-01-27 18:49:24 +0100303The AES unit is configured to do decryption (`Config op` = DECRYPT).
304Once the new key has been provided via the control and status registers (top), this new key is loaded into the Full Key register (`key_full` = K0-3) and the KEM starts performing encryption (`KEM op`=ENCRYPT).
305The cipher data path remains idle (`AES op`=IDLE).
Pirmin Vogelce136622019-10-09 15:51:06 +0100306In every round, the value in `key_full` is updated.
307After 10 encryption rounds, the value in `key_full` equals the start key for decryption.
Scott Johnson4e353842020-01-17 17:14:35 -0800308This value is stored into the Decryption Key register (`key_dec` = K0-3' at the very bottom).
Pirmin Vogelce136622019-10-09 15:51:06 +0100309Now the AES unit can switch between encryption/decryption without overhead as both the start key for encryption (`key_init`) and decryption (`key_dec`) can be loaded into `full_key`.
310
311For details on the KeyExpand operation refer to the [AES specification, Section 5.2](https://csrc.nist.gov/csrc/media/publications/fips/197/final/documents/fips-197.pdf).
312
313Key expanding is the only operation in the AES unit for which the functionality depends on the selected key length.
314Having a KEM that supports 128-bit key expansion, support for the 256-bit mode can be added at low overhead.
315In contrast, the 192-bit mode requires much larger muxes.
316Support for this mode is thus optional and can be enabled/disabled via a design-time parameter.
317
318Once we have cost estimates in terms of gate count increase for 192-bit mode, we can decide on whether or not to use it in OpenTitan.
319Typically, systems requiring security above AES-128 go directly for AES-256.
320
Pirmin Vogelce136622019-10-09 15:51:06 +0100321### System Key-Manager Interface
322
Pirmin Vogel992f9332021-09-08 09:02:18 +0200323By default, the AES unit is controlled entirely by the processor.
Pirmin Vogelce136622019-10-09 15:51:06 +0100324The processor writes both input data as well as the initial key to dedicated registers via the system bus interconnect.
325
Hugo McNallyaef0a662023-02-11 19:44:55 +0000326Alternatively, the processor can configure the AES unit to use an initial key provided by the [key manager](../keymgr/README.md) via key sideload interface without exposing the key to the processor or other hosts attached to the system bus interconnect.
Hugo McNally6321c5e2023-02-16 21:39:55 +0000327To this end, the processor has to set the SIDELOAD bit in [`CTRL_SHADOWED`](data/aes.hjson#ctrl_shadowed) to `1`.
328Any write operations of the processor to the Initial Key registers [`KEY_SHARE0_0`](data/aes.hjson#key_share0_0) - [`KEY_SHARE1_7`](data/aes.hjson#key_share1_7) are then ignored.
Pirmin Vogel992f9332021-09-08 09:02:18 +0200329In normal/automatic mode, the AES unit only starts encryption/decryption if the sideload key is marked as valid.
Hugo McNally6321c5e2023-02-16 21:39:55 +0000330To update the sideload key, the processor has to 1) wait for the AES unit to become idle, 2) wait for the key manager to update the sideload key and assert the valid signal, and 3) write to the [`CTRL_SHADOWED`](data/aes.hjson#ctrl_shadowed) register to start a new message.
Hugo McNally544e7a62023-02-12 01:12:36 +0000331After using a sideload key, the processor has to trigger the clearing of all key registers inside the AES unit (see [De-Initialization](#de-initialization) below).
Pirmin Vogelce136622019-10-09 15:51:06 +0100332
333
Pirmin Vogel2e6a6c62021-03-23 15:43:08 +0100334# Security Hardening
Pirmin Vogelce136622019-10-09 15:51:06 +0100335
Pirmin Vogel2e6a6c62021-03-23 15:43:08 +0100336The AES unit employs different means at architectural, micro-architectural and physical levels for security hardening against side-channel analysis and fault injection.
Pirmin Vogelce136622019-10-09 15:51:06 +0100337
Pirmin Vogel2e6a6c62021-03-23 15:43:08 +0100338## Side-Channel Analysis
339
340To aggravate side-channel analysis (SCA), the AES unit implements the following countermeasures.
341
342### 1st-order Masking of the Cipher Core
343
344The AES unit employs 1st-order masking of the AES cipher core.
345More precisely, both the cipher and the key expand data path use two shares.
346As shown in the block diagram below, the width of all registers and data paths basically doubles.
347
Hugo McNallyaef0a662023-02-11 19:44:55 +0000348![Block diagram of the masked AES cipher core.](./doc/aes_block_diagram_cipher_core_masked.svg)
Pirmin Vogel2e6a6c62021-03-23 15:43:08 +0100349
350The initial key is provided in two shares via the register interface.
351The input data is provided in unmasked form and masked outside of the cipher core to obtain the two shares of the initial state.
352The pseudo-random data (PRD) required for masking the input data is provided by the pseudo-random number generator (PRNG) of the cipher core.
353Similarly, the two shares of the output state are combined outside the cipher core to obtain the output data.
354
355The same PRNG also generates the fresh randomness required by the masked SubBytes (16 masked S-Boxes) and the masked KeyExpand (4 masked S-Boxes).
356The masking scheme selected for the S-Box can have a high impact on SCA resistance, circuit area, number of PRD bits consumed per cycle and per S-Box evaluation, and throughput.
357The selection of the masked S-Box implementation can be controlled via compile-time Verilog parameter.
358By default, the AES unit uses domain-oriented masking (DOM) for the S-Boxes as proposed by [Gross et al.: "Domain-Oriented Masking: Compact Masked Hardware Implementations with Arbitrary Protection Order".](https://eprint.iacr.org/2016/486.pdf)
359The provided implementation has a latency of 5 clock cycles per S-Box evaluation.
360As a result, the overall latency for processing a 16-byte data block increases from 12/14/16 to 56/66/72 clock cycles in AES-128/192/256 mode, respectively.
Pirmin Vogel72f98d62022-02-04 11:59:22 +0100361The provided implementation further forwards partial, intermediate results among DOM S-Box instances for remasking purposes.
362This allows to reduce circuit area related to generating, buffering and applying PRD without impacting SCA resistance.
Pirmin Vogel2e6a6c62021-03-23 15:43:08 +0100363Alternatively, the two original versions of the masked Canright S-Box can be chosen as proposed by [Canright and Batina: "A very compact "perfectly masked" S-Box for AES (corrected)".](https://eprint.iacr.org/2009/011.pdf)
364These are fully combinational (one S-Box evaluation every cycle) and have lower area footprint, but they are significantly less resistant to SCA.
365They are mainly included for reference but their usage is discouraged due to potential vulnerabilities to the correlation-enhanced collision attack as described by [Moradi et al.: "Correlation-Enhanced Power Analysis Collision Attack".](https://eprint.iacr.org/2010/297.pdf)
366
Hugo McNally6321c5e2023-02-16 21:39:55 +0000367The masking PRNG is reseeded with fresh entropy via [EDN](../edn/README.md) automatically 1) whenever a new key is provided (see [`CTRL_AUX_SHADOWED.KEY_TOUCH_FORCES_RESEED`](data/aes.hjson#ctrl_aux_shadowed)) and 2) based on a block counter.
368The rate at which this block counter initiates automatic reseed operations can be configured via [`CTRL_SHADOWED.PRNG_RESEED_RATE`](data/aes.hjson#ctrl_shadowed).
369In addition software can manually initiate a reseed operation via [`TRIGGER.PRNG_RESEED`](data/aes.hjson#trigger).
Pirmin Vogel72f98d62022-02-04 11:59:22 +0100370
Pirmin Vogel2e6a6c62021-03-23 15:43:08 +0100371Note that the masking can be enabled/disabled via compile-time Verilog parameter.
Hugo McNallyaef0a662023-02-11 19:44:55 +0000372It may be acceptable to disable the masking when using the AES cipher core for random number generation e.g. inside [CSRNG.](../csrng/README.md)
Pirmin Vogel2e6a6c62021-03-23 15:43:08 +0100373When disabling the masking, also an unmasked S-Box implementation needs to be selected using the corresponding compile-time Verilog parameter.
374When disabling masking, it is recommended to use the unmasked Canright or LUT S-Box implementation for ASIC or FPGA targets, respectively.
375Both are fully combinational and allow for one S-Box evaluation every clock cycle.
376
377It's worth noting that since input/output data are provided/retrieved via register interface in unmasked form, the AES unit should not be used to form an identity ladder where the output of one AES operation is used to form the key for the next AES operation in the ladder.
Hugo McNallyaef0a662023-02-11 19:44:55 +0000378In OpenTitan, the [Keccak Message Authentication Code (KMAC) unit](../kmac/README.md) is used for that purpose.
Pirmin Vogel2e6a6c62021-03-23 15:43:08 +0100379
380### Fully-Parallel Data Path
381
382Any 1st-order masking scheme primarily protects against 1st-order SCA.
383Vulnerabilities against higher-order SCA might still be present.
384A common technique to aggravate higher-order attacks is to increase the noise in the system e.g. by leveraging parallel architectures.
385To this end, the AES cipher core uses a 128-bit parallel data path with a total of up to 20 S-Boxes (16 inside SubBytes, 4 inside KeyExpand) that are evaluated in parallel.
386
387Besides more noise for increased resistance against higher-order SCA, the fully-parallel architecture also enables for higher performance and flexibility.
388It allows users to seamlessly switch out the S-Box implementation in order to experiment with different masking schemes.
389To interface the data paths with the S-Boxes, a handshake protocol is used.
390
391### Note on Reset vs. Non-Reset Flip-Flops
392
393The choice of flip-flop type for registering sensitive assets such as keys can have implications on the vulnerability against e.g. combined reset glitch attacks and SCA.
394Following the [OpenTitan non-reset vs. reset flops rationale](https://github.com/lowRISC/opentitan/issues/2603), the following observations can be made:
395- If masking is enabled, key and state values are stored in two shares inside the AES unit.
396 Neither the Hamming weights of the individual shares nor the summed Hamming weight are proportional to the Hamming weight of the secret asset.
397- Input/output data and IV values are (currently) not stored in multiple shares but these are less critical as they are used only once.
398 Further, they are stored in banks of 32 bits leaving a larger hypothesis space compared to when glitching e.g. an 8-bit register into reset.
399 In addition, they could potentially also be extracted when being transferred over the TL-UL bus interface.
Pirmin Vogelffbfb492021-03-26 14:59:47 +0100400
Pirmin Vogel2e6a6c62021-03-23 15:43:08 +0100401For this reason, the AES unit uses reset flops only.
402However, all major key and data registers are cleared with pseudo-random data upon reset.
403
404### Clearing Registers with Pseudo-Random Data
405
406Upon reset or if initiated by software, all major key and data registers inside the AES module are cleared with pseudo-random data (PRD).
407This helps to reduce SCA leakage when both writing these registers for reconfiguration and when clearing the registers after use.
408
409In addition, the state registers inside the cipher core are cleared with PRD during the last round of every encryption/decryption.
410This prevents Hamming distance leakage between the states of the last two rounds as well as between output and input data.
411
412## Fault Injection
413
414Fault injection (FI) attacks can be distinguished based on the FI target.
415
416### Control Path
417
418In cryptographic devices, fault attacks on the control path usually aim to disturb the control flow in a way to facilitate SCA or other attacks.
419Example targets for AES include: switch to less secure mode of operation (ECB), keep processing the same input data, reduce the number of rounds/early termination, skip particular rounds, skip individual operations in a round.
420
421To protect against FI attacks on the control path, the AES unit implements the following countermeasures.
422
423- Shadowed Control Register:
424 The main control register is implemented as a shadow register.
425 This means software has to perform two subsequent write operations to perform an update.
426 Internally, a shadow copy is used that is constantly compared with the actual register.
Hugo McNallyaef0a662023-02-11 19:44:55 +0000427 For further details, refer to the [Register Tool documentation.](../../../util/reggen/README.md#shadow-registers)
Pirmin Vogel2e6a6c62021-03-23 15:43:08 +0100428
Pirmin Vogel2e6a6c62021-03-23 15:43:08 +0100429- Sparse encodings of FSM states:
430 All FSMs inside the AES unit use sparse state encodings.
431
432- Sparse encodings for mux selector signals:
433 All main muxes use sparsely encoded selector signals.
434
435- Sparse encodings for handshake and other important control signals.
436
Pirmin Vogel72f98d62022-02-04 11:59:22 +0100437- Multi-rail control logic:
438 All FSMs inside the AES unit are implemented using multiple independent and redundant logic rails.
439 Every rail evaluates and drives exactly one bit of sparsely encoded handshake or other important control signals.
440 The outputs of the different rails are constantly compared to detect potential faults.
441 The number of logic rails can be scaled up by means of relatively easy RTL modifications.
442 By default, three independent logic rails are used.
443
444- Hardened round counter:
445 Similar to the cipher core FSM, the internal round counter is protected against FI through a multi-rail implementation.
446 The outputs of the different rails are constantly compared to detect potential faults in the round counter.
447
Pirmin Vogel2e6a6c62021-03-23 15:43:08 +0100448If any of these countermeasures detects a fault, a fatal alert is triggered, the internal FSMs go into a terminal error state, the AES unit does not release further data and locks up until reset.
Pirmin Vogel72f98d62022-02-04 11:59:22 +0100449Since the AES unit has no ability to reset itself, a system-supplied reset is required before the AES unit can become operational again.
Hugo McNally6321c5e2023-02-16 21:39:55 +0000450Such a condition is reported in [`STATUS.ALERT_FATAL_FAULT`](data/aes.hjson#status).
Pirmin Vogel2e6a6c62021-03-23 15:43:08 +0100451Details on where the fault has been detected are not provided.
452
453### Data Path
454
455The aim of fault attacks on the data path is typically to extract information on the key by means of statistical analysis.
456The current version of the AES unit does not employ countermeasures against such attacks, but future versions most likely will.
Pirmin Vogelce136622019-10-09 15:51:06 +0100457
458
Garret Kelly9eebde02019-10-22 15:36:49 -0400459# Programmers Guide
Pirmin Vogelce136622019-10-09 15:51:06 +0100460
461This section discusses how software can interface with the AES unit.
462
463
Pirmin Vogel46339bb2021-03-17 09:36:02 +0100464## Clear upon Reset
465
Pirmin Vogel60df4fb2021-12-14 23:52:12 +0100466Upon reset, the AES unit will first reseed the internal PRNGs for register clearing and masking via EDN, and then clear all key, IV and data registers with pseudo-random data.
Hugo McNally6321c5e2023-02-16 21:39:55 +0000467Only after this sequence has finished, the unit becomes idle (indicated in [`STATUS.IDLE`](data/aes.hjson#status)).
Pirmin Vogel46339bb2021-03-17 09:36:02 +0100468The AES unit is then ready for software initialization.
469Note that at this point, the key, IV and data registers' values can no longer be expected to match the reset values.
470
471
Garret Kelly9eebde02019-10-22 15:36:49 -0400472## Initialization
Pirmin Vogelce136622019-10-09 15:51:06 +0100473
Hugo McNally6321c5e2023-02-16 21:39:55 +0000474Before initialization, software must ensure that the AES unit is idle by checking [`STATUS.IDLE`](data/aes.hjson#status).
475If the AES unit is not idle, write operations to [`CTRL_SHADOWED`](data/aes.hjson#ctrl_shadowed), the Initial Key registers [`KEY_SHARE0_0`](data/aes.hjson#key_share0_0) - [`KEY_SHARE1_7`](data/aes.hjson#key_share1_7) and initialization vector (IV) registers [`IV_0`](data/aes.hjson#iv_0) - [`IV_3`](data/aes.hjson#iv_3) are ignored.
Pirmin Vogeld7a3bbe2020-06-26 15:41:59 +0200476
Hugo McNally6321c5e2023-02-16 21:39:55 +0000477To initialize the AES unit, software must first provide the configuration to the [`CTRL_SHADOWED`](data/aes.hjson#ctrl_shadowed) register.
Pirmin Vogel72f98d62022-02-04 11:59:22 +0100478Since writing this register may initiate the reseeding of the internal PRNGs, software must check that the AES unit is idle before providing the initial key.
Hugo McNally6321c5e2023-02-16 21:39:55 +0000479Then software must write the initial key to the Initial Key registers [`KEY_SHARE0_0`](data/aes.hjson#key_share0_0) - [`KEY_SHARE1_7`](data/aes.hjson#key_share1_7).
Pirmin Vogelded9da42020-08-10 18:13:18 +0200480The key is provided in two shares:
Hugo McNally6321c5e2023-02-16 21:39:55 +0000481The first share is written to [`KEY_SHARE0_0`](data/aes.hjson#key_share0_0) - [`KEY_SHARE0_7`](data/aes.hjson#key_share0_7) and the second share is written to [`KEY_SHARE1_0`](data/aes.hjson#key_share1_0) - [`KEY_SHARE1_7`](data/aes.hjson#key_share1_7).
482The actual initial key used for encryption corresponds to the value obtained by XORing [`KEY_SHARE0_0`](data/aes.hjson#key_share0_0) - [`KEY_SHARE0_7`](data/aes.hjson#key_share0_7) with [`KEY_SHARE1_0`](data/aes.hjson#key_share1_0) - [`KEY_SHARE1_7`](data/aes.hjson#key_share1_7).
Pirmin Vogele825e6c2019-11-12 11:22:19 +0100483Note that all registers are little-endian.
Hugo McNally6321c5e2023-02-16 21:39:55 +0000484The key length is configured using the KEY_LEN field of [`CTRL_SHADOWED`](data/aes.hjson#ctrl_shadowed).
Pirmin Vogelded9da42020-08-10 18:13:18 +0200485Independent of the selected key length, software must always write all 8 32-bit registers of both shares.
Pirmin Vogel94671dd2020-02-17 17:01:06 +0100486Each register must be written at least once.
Pirmin Vogelbd103852020-08-04 13:21:52 +0200487The order in which the key registers are written does not matter.
Pirmin Vogelded9da42020-08-10 18:13:18 +0200488Anything can be written to the unused key registers of both shares, however, random data is preferred.
Hugo McNally6321c5e2023-02-16 21:39:55 +0000489For AES-128 ,the actual initial key used for encryption is formed by XORing [`KEY_SHARE0_0`](data/aes.hjson#key_share0_0) - [`KEY_SHARE0_3`](data/aes.hjson#key_share0_3) with [`KEY_SHARE1_0`](data/aes.hjson#key_share1_0) - [`KEY_SHARE1_3`](data/aes.hjson#key_share1_3).
490For AES-192, the actual initial key used for encryption is formed by XORing [`KEY_SHARE0_0`](data/aes.hjson#key_share0_0) - [`KEY_SHARE0_5`](data/aes.hjson#key_share0_5) with [`KEY_SHARE1_0`](data/aes.hjson#key_share1_0) - [`KEY_SHARE1_5`](data/aes.hjson#key_share1_5).
Pirmin Vogelce136622019-10-09 15:51:06 +0100491
Hugo McNally6321c5e2023-02-16 21:39:55 +0000492If running in CBC, CFB, OFB or CTR mode, software must also write the IV registers [`IV_0`](data/aes.hjson#iv_0) - [`IV_3`](data/aes.hjson#iv_3).
Pirmin Vogel72f98d62022-02-04 11:59:22 +0100493Since providing the initial key initiate the reseeding of the internal PRNGs, software must check that the AES unit is idle before writing the IV registers.
Pirmin Vogel94671dd2020-02-17 17:01:06 +0100494These registers are little-endian, but the increment of the IV in CTR mode is big-endian (see [Recommendation for Block Cipher Modes of Operation](https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf)).
495Each IV register must be written at least once.
496The order in which these registers are written does not matter.
Pirmin Vogeld8f8e3e2020-07-28 17:14:43 +0200497Note that the AES unit automatically updates the IV registers when running in CBC, CFB, OFB or CTR mode (after having consumed the current IV value).
Pirmin Vogel94671dd2020-02-17 17:01:06 +0100498To start the encryption/decryption of a new message, software must wait for the AES unit to become idle and then provide new values to the IV registers.
Pirmin Vogelce136622019-10-09 15:51:06 +0100499
Garret Kelly9eebde02019-10-22 15:36:49 -0400500## Block Operation
Pirmin Vogelce136622019-10-09 15:51:06 +0100501
Pirmin Vogelbd103852020-08-04 13:21:52 +0200502For block operation, software must initialize the AES unit as described in the previous section.
503In particular, the AES unit must be configured to run in normal/automatic mode.
Hugo McNally6321c5e2023-02-16 21:39:55 +0000504This is indicated by the MANUAL_OPERATION bit in [`CTRL_SHADOWED`](data/aes.hjson#ctrl_shadowed) reading as `0`.
Pirmin Vogelbd103852020-08-04 13:21:52 +0200505It ensures that the AES unit:
5061. Automatically starts encryption/decryption when new input data is available.
5071. Does not overwrite previous output data that has not yet been read by the processor.
Garret Kelly9eebde02019-10-22 15:36:49 -0400508
Pirmin Vogelbd103852020-08-04 13:21:52 +0200509Then, software must:
Hugo McNally6321c5e2023-02-16 21:39:55 +00005101. Ensure that the INPUT_READY bit in [`STATUS`](data/aes.hjson#status) is `1`.
5111. Write Input Data Block `0` to the Input Data registers [`DATA_IN_0`](data/aes.hjson#data_in_0) - [`DATA_IN_3`](data/aes.hjson#data_in_3).
Pirmin Vogel94671dd2020-02-17 17:01:06 +0100512 Each register must be written at least once.
Pirmin Vogel15b07492019-12-23 18:03:19 +0100513 The order in which these registers are written does not matter.
Hugo McNally6321c5e2023-02-16 21:39:55 +00005141. Wait for the INPUT_READY bit in [`STATUS`](data/aes.hjson#status) to become `1`, i.e. wait for the AES unit to load Input Data Block `0` into the internal state register and start operation.
Pirmin Vogelbd103852020-08-04 13:21:52 +02005151. Write Input Data Block `1` to the Input Data registers.
Pirmin Vogelce136622019-10-09 15:51:06 +0100516
Pirmin Vogele825e6c2019-11-12 11:22:19 +0100517Then for every Data Block `I=0,..,N-3`, software must:
Hugo McNally6321c5e2023-02-16 21:39:55 +00005181. Wait for the OUTPUT_VALID bit in [`STATUS`](data/aes.hjson#status) to become `1`, i.e., wait for the AES unit to finish encryption/decryption of Block `I`.
Greg Chadwick202f1f22019-10-28 15:56:11 +0000519 The AES unit then directly starts processing the previously input block `I+1`
Hugo McNally6321c5e2023-02-16 21:39:55 +00005202. Read Output Data Block `I` from the Output Data registers [`DATA_OUT_0`](data/aes.hjson#data_out_0) - [`DATA_OUT_3`](data/aes.hjson#data_out_3).
Pirmin Vogel94671dd2020-02-17 17:01:06 +0100521 Each register must be read at least once.
Pirmin Vogel15b07492019-12-23 18:03:19 +0100522 The order in which these registers are read does not matter.
Greg Chadwick202f1f22019-10-28 15:56:11 +00005233. Write Input Data Block `I+2` into the Input Data register.
Pirmin Vogel22cb74d2020-06-15 11:52:50 +0200524 There is no need to explicitly check INPUT_READY as in the same cycle OUTPUT_VALID becomes `1`, the current input is loaded in (meaning INPUT_READY becomes `1` one cycle later).
Garret Kelly9eebde02019-10-22 15:36:49 -0400525
Pirmin Vogelbd103852020-08-04 13:21:52 +0200526Once all blocks have been input, the final data blocks `I=N-2,N-1` must be read out:
Hugo McNally6321c5e2023-02-16 21:39:55 +00005271. Wait for the OUTPUT_VALID bit in [`STATUS`](data/aes.hjson#status) to become `1`, i.e., wait for the AES unit to finish encryption/decryption of Block `I`.
Greg Chadwick202f1f22019-10-28 15:56:11 +00005282. Read Output Data Block `I` from the Output Data register.
Pirmin Vogelce136622019-10-09 15:51:06 +0100529
Greg Chadwick202f1f22019-10-28 15:56:11 +0000530Note that interrupts are not provided, the latency of the AES unit is such that they are of little utility.
Pirmin Vogelce136622019-10-09 15:51:06 +0100531
Pirmin Vogele825e6c2019-11-12 11:22:19 +0100532The code snippet below shows how to perform block operation.
533
534```c
Pirmin Vogelbe4bcb72020-04-17 14:43:45 +0200535 // Enable autostart, disable overwriting of previous output data. Note the control register is
536 // shadowed and thus needs to be written twice.
537 uint32_t aes_ctrl_val =
Pirmin Vogel72f98d62022-02-04 11:59:22 +0100538 (op & AES_CTRL_SHADOWED_OPERATION_MASK) << AES_CTRL_SHADOWED_OPERATION_OFFSET |
539 (mode & AES_CTRL_SHADOWED_MODE_MASK) << AES_CTRL_SHADOWED_MODE_OFFSET |
Pirmin Vogelbe4bcb72020-04-17 14:43:45 +0200540 (key_len & AES_CTRL_SHADOWED_KEY_LEN_MASK) << AES_CTRL_SHADOWED_KEY_LEN_OFFSET |
Pirmin Vogel72f98d62022-02-04 11:59:22 +0100541 0x0 << AES_CTRL_SHADOWED_MANUAL_OPERATION_OFFSET;
Pirmin Vogelbe4bcb72020-04-17 14:43:45 +0200542 REG32(AES_CTRL_SHADOWED(0)) = aes_ctrl_val;
543 REG32(AES_CTRL_SHADOWED(0)) = aes_ctrl_val;
Pirmin Vogele825e6c2019-11-12 11:22:19 +0100544
Pirmin Vogelbd103852020-08-04 13:21:52 +0200545 // Write key - Note: All registers are little-endian.
546 for (int j = 0; j < 8; j++) {
Pirmin Vogelded9da42020-08-10 18:13:18 +0200547 REG32(AES_KEY_SHARE0_0(0) + j * 4) = key_share0[j];
548 REG32(AES_KEY_SHARE1_0(0) + j * 4) = key_share1[j];
Pirmin Vogelbd103852020-08-04 13:21:52 +0200549 }
550
551 // Write IV.
552 for (int j = 0; j < 4; j++) {
Pirmin Vogel4ec676e2020-08-05 08:33:06 +0200553 REG32(AES_IV_0(0) + j * 4) = iv[j];
Pirmin Vogelbd103852020-08-04 13:21:52 +0200554 }
555
556 // Write Input Data Block 0.
Pirmin Vogele825e6c2019-11-12 11:22:19 +0100557 for (int j = 0; j < 4; j++) {
Pirmin Vogel4ec676e2020-08-05 08:33:06 +0200558 REG32(AES_DATA_IN_0(0) + j * 4) = input_data[j];
Pirmin Vogele825e6c2019-11-12 11:22:19 +0100559 }
560
561 // Wait for INPUT_READY bit
562 while (!((REG32(AES_STATUS(0)) >> AES_STATUS_INPUT_READY) & 0x1)) {
563 }
564
565 // Write Input Data Block 1
566 for (int j = 0; j < 4; j++) {
Pirmin Vogel4ec676e2020-08-05 08:33:06 +0200567 REG32(AES_DATA_IN_0(0) + j * 4) = input_data[j + 4];
Pirmin Vogele825e6c2019-11-12 11:22:19 +0100568 }
569
570 // For Data Block I=0,...,N-1
571 for (int i = 0; i < N; i++) {
572
573 // Wait for OUTPUT_VALID bit
574 while (!((REG32(AES_STATUS(0)) >> AES_STATUS_OUTPUT_VALID) & 0x1)) {
575 }
576
577 // Read Output Data Block I
578 for (int j = 0; j < 4; j++) {
Pirmin Vogel4ec676e2020-08-05 08:33:06 +0200579 output_data[j + i * 4] = REG32(AES_DATA_OUT_0(0) + j * 4);
Pirmin Vogele825e6c2019-11-12 11:22:19 +0100580 }
581
582 // Write Input Data Block I+2 - For I=0,...,N-3 only.
583 if (i < N - 2) {
584 for (int j = 0; j < 4; j++) {
Pirmin Vogel4ec676e2020-08-05 08:33:06 +0200585 REG32(AES_DATA_IN_0(0) + j * 4) = input_data[j + 4 * (i + 2)];
Pirmin Vogele825e6c2019-11-12 11:22:19 +0100586 }
587 }
588 }
589
590```
591
592
Pirmin Vogel94671dd2020-02-17 17:01:06 +0100593## Padding
594
Hugo McNally6321c5e2023-02-16 21:39:55 +0000595For the AES unit to automatically start encryption/decryption of the next data block, software is required to always update all four Input Data registers [`DATA_IN_0`](data/aes.hjson#data_in_0) - [`DATA_IN_3`](data/aes.hjson#data_in_3) and read all four Output Data registers [`DATA_OUT_0`](data/aes.hjson#data_out_0) - [`DATA_OUT_3`](data/aes.hjson#data_out_3).
Pirmin Vogeld8f8e3e2020-07-28 17:14:43 +0200596This is also true if the AES unit is operated in OFB or CTR mode, i.e., if the plaintext/ciphertext not necessarily needs to be a multiple of the block size (for more details refer to Appendix A of [Recommendation for Block Cipher Modes of Operation](https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf)).
Pirmin Vogel94671dd2020-02-17 17:01:06 +0100597
Pirmin Vogeld8f8e3e2020-07-28 17:14:43 +0200598In the case that the plaintext/ciphertext is not a multiple of the block size and the AES unit is operated in OFB or CTR mode, software can employ any form of padding for the input data of the last message block as the padding bits do not have an effect on the actual message bits.
Pirmin Vogel94671dd2020-02-17 17:01:06 +0100599It is recommended that software discards the padding bits after reading the output data.
600
601
Garret Kelly9eebde02019-10-22 15:36:49 -0400602## De-Initialization
Pirmin Vogelce136622019-10-09 15:51:06 +0100603
604After finishing operation, software must:
Hugo McNally6321c5e2023-02-16 21:39:55 +00006051. Disable the AES unit to no longer automatically start encryption/decryption by setting the MANUAL_OPERATION bit in [`CTRL_SHADOWED`](data/aes.hjson#ctrl_shadowed) to `1`.
6061. Clear all key registers, IV registers as well as the Input Data and the Output Data registers with pseudo-random data by setting the KEY_IV_DATA_IN_CLEAR and DATA_OUT_CLEAR bits in [`TRIGGER`](data/aes.hjson#trigger) to `1`.
Pirmin Vogelce136622019-10-09 15:51:06 +0100607
Pirmin Vogele825e6c2019-11-12 11:22:19 +0100608The code snippet below shows how to perform this task.
609
610```c
Pirmin Vogelbe4bcb72020-04-17 14:43:45 +0200611 // Disable autostart. Note the control register is shadowed and thus needs to be written twice.
612 uint32_t aes_ctrl_val = 0x1 << AES_CTRL_SHADOWED_MANUAL_OPERATION;
613 REG32(AES_CTRL_SHADOWED(0)) = aes_ctrl_val;
614 REG32(AES_CTRL_SHADOWED(0)) = aes_ctrl_val;
Pirmin Vogele825e6c2019-11-12 11:22:19 +0100615
Pirmin Vogelbc40da42021-01-14 15:42:07 +0100616 // Clear all key, IV, Input Data and Output Data registers.
Pirmin Vogele825e6c2019-11-12 11:22:19 +0100617 REG32(AES_TRIGGER(0)) =
Pirmin Vogelbc40da42021-01-14 15:42:07 +0100618 (0x1 << AES_TRIGGER_KEY_IV_DATA_IN_CLEAR) |
Pirmin Vogel9107f842019-11-21 15:49:31 +0100619 (0x1 << AES_TRIGGER_DATA_OUT_CLEAR);
Pirmin Vogele825e6c2019-11-12 11:22:19 +0100620```
621
Timothy Trippelf82e75a2022-07-27 14:42:22 -0700622## Device Interface Functions (DIFs)
623
Hugo McNallyac9f9b52023-02-14 12:15:34 +0000624* [DIF Listings](../../../sw/device/lib/dif/dif_aes.h)
Pirmin Vogele825e6c2019-11-12 11:22:19 +0100625
Garret Kelly9eebde02019-10-22 15:36:49 -0400626## Register Table
Pirmin Vogelce136622019-10-09 15:51:06 +0100627
Pirmin Vogel94671dd2020-02-17 17:01:06 +0100628The AES unit uses 8 and 2x4 separate write-only registers for the initial key, initialization vector, and input data, as well as 4 separate read-only registers for the output data.
Pirmin Vogele825e6c2019-11-12 11:22:19 +0100629All registers are little-endian.
Pirmin Vogelce136622019-10-09 15:51:06 +0100630Compared to first-in, first-out (FIFO) interfaces, having separate registers has a couple of advantages:
Garret Kelly9eebde02019-10-22 15:36:49 -0400631
Pirmin Vogelce136622019-10-09 15:51:06 +0100632- Supported out-of-the-box by the register tool (the FIFO would have to be implemented separately).
633- Usability: critical corner cases where software updates input data or the key partially only are easier to avoid using separate registers and the `hwqe`-signals provided by the Register Tool.
634- Easier interaction with DMA engines
Pirmin Vogelffbfb492021-03-26 14:59:47 +0100635
Pirmin Vogelce136622019-10-09 15:51:06 +0100636Also, using a FIFO interface for something that is not actually FIFO (internally, 16B of input/output data are consumed/produced at once) is less natural.
637
Hugo McNallyaef0a662023-02-11 19:44:55 +0000638For a detailed overview of the register tool, please refer to the [Register Tool documentation.](../../../util/reggen/README.md)
Pirmin Vogelce136622019-10-09 15:51:06 +0100639
Hugo McNallyba16bae2023-02-12 21:08:04 +0000640* [Register Table](data/aes.hjson#registers)