| --- |
| title: "Primitive Component: SRAM Scrambler" |
| --- |
| |
| # Overview |
| |
| The scrambling primitive `prim_ram_1p_scr` employs a reduced-round (5 instead of 11) PRINCE block cipher in CTR mode to scramble the data. |
| The PRINCE lightweight block cipher has been selected due to its low latency and low area characteristics, see also [prim_prince]({{< relref "hw/ip/prim/doc/prim_prince" >}}) for more information on PRINCE. |
| The number of rounds is reduced to 5 in order to ease timing pressure and ensure single cycle operation (the number of rounds can always be increased if it turns out that there is enough timing slack). |
| |
| In [CTR mode](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Counter_(CTR)), the block cipher is used to encrypt a 64bit IV with the scrambling key in order to create a 64bit keystream block that is bitwise XOR'ed with the data in order to transform plaintext into ciphertext and vice versa. |
| The IV is assembled by concatenating a nonce with the word address. |
| |
| If the word width of the scrambled memory is smaller than 64bit, the keystream block is truncated to fit the data width. |
| If the word width is wider than 64bit, the scrambling primitive by default instantiates multiple PRINCE primitives in order to create a unique keystream for the full datawidth. |
| For area constrained settings, the parameter `ReplicateKeyStream` in `prim_ram_1p_scr` can be set to 1 in order to replicate the keystream block generated by one single primitive instead of using multiple parallel PRINCE instances (but it should be understood that this lowers the level of security). |
| |
| Since plain CTR mode does not diffuse the data bits due to the bitwise XOR, the scheme is augmented by passing each individual word through a two-layer substitution-permutation (S&P) network implemented with the `prim_subst_perm` primitive (the diffusion chunk width can be parameterized via the `DiffWidth` parameter). |
| The S&P network employed is similar to the one employed in PRESENT and will be explained in more detail [further below]({{< relref "#custom-substitution-permutation-network" >}}). |
| Note that if individual bytes need to be writable without having to perform a read-modify-write operation, the diffusion chunk width should be set to 8. |
| |
| Another CTR mode augmentation that is aimed at breaking the linear address space is SRAM address scrambling. |
| The same two-layer S&P network that is used for byte diffusion is leveraged to non-linearly remap the SRAM address as shown in the block diagram above. |
| As opposed to the byte diffusion S&P networks, this particular address scrambling network additionally XOR's in a nonce that has the same width as the address. |
| |
| ## Parameters |
| |
| The following table lists the instantiation parameters of the `prim_ram_1p_scr` primitive. |
| These are not exposed in the `sram_ctrl` IP, but have to be set directly when instantiating `prim_ram_1p_scr` in the top. |
| |
| Parameter | Default (Max) | Top Earlgrey | Description |
| ----------------------------|-----------------------|--------------|--------------- |
| `Depth` | 512 | multiple | SRAM depth, needs to be a power of 2 if `NumAddrScrRounds` > 0. |
| `Width` | 32 | 32 | Effective SRAM width without redundancy. |
| `DataBitsPerMask` | 8 | 8 | Number of data bits per write mask. |
| `EnableParity` | 1 | 1 | This parameter enables byte parity. |
| `CfgWidth` | 8 | 8 | Width of SRAM attributes field. |
| `NumPrinceRoundsHalf` | 2 (5) | 2 | Number of PRINCE half-rounds. |
| `NumDiffRounds` | 2 | 2 | Number of additional diffusion rounds, set to 0 to disable. |
| `DiffWidth` | 8 | 8 | Width of additional diffusion rounds, set to 8 for intra-byte diffusion. |
| `NumAddrScrRounds` | 2 | 2 | Number of address scrambling rounds, set to 0 to disable. |
| `ReplicateKeyStream` | 0 (1) | 0 | If set to 1, the same 64bit key stream is replicated if the data port is wider than 64bit. Otherwise, multiple PRINCE primitives are employed to generate a unique keystream for the full data width. |
| |
| ## Signal Interfaces |
| |
| Signal | Direction | Type | Description |
| ---------------------------|------------------|------------------------------------|--------------- |
| `key_valid_i` | `input` | `logic` | Indicates whether the key and nonce are considered valid. New memory requests are blocked if this is set to 0. |
| `key_i` | `input` | `logic [127:0]` | Scrambling key. |
| `nonce_i` | `input` | `logic [NonceWidth-1:0]` | Scrambling nonce. |
| `req_i` | `input` | `logic` | Memory request indication signal (from TL-UL SRAM adapter). |
| `gnt_o` | `output` | `logic` | Grant signal for memory request (to TL-UL SRAM adapter) |
| `write_i` | `input` | `logic` | Indicates that this is a write operation (from TL-UL SRAM adapter). |
| `addr_i` | `input` | `logic [AddrWidth-1:0]` | Address for memory op (from TL-UL SRAM adapter). |
| `wdata_i` | `input` | `logic [Width-1:0]` | Write data (from TL-UL SRAM adapter). |
| `wmask_i` | `input` | `logic [Width-1:0]` | Write mask (from TL-UL SRAM adapter). |
| `intg_error_i` | `input` | `logic` | Indicates whether the incoming transaction has an integrity error |
| `rdata_o` | `output` | `logic [Width-1:0]` | Read data output (to TL-UL SRAM adapter). |
| `rvalid_o` | `output` | `logic` | Read data valid indication (to TL-UL SRAM adapter). |
| `rerror_o` | `output` | `logic [1:0]` | Error indication (to TL-UL SRAM adapter). Bit 0 indicates a correctable and bit 1 an uncorrectable error. Note that at this time, only uncorrectable errors are reported, since the scrambling device only supports byte parity. |
| `raddr_o` | `output` | `logic [31:0]` | Address of the faulty read operation. |
| `cfg_i` | `input` | `logic [CfgWidth-1:0]` | Attributes for physical memory macro. |
| |
| ## Custom Substitution Permutation Network |
| |
| In addition to the PRINCE primitive, `prim_ram_1p_scr` employs a custom S&P network for byte diffusion and address scrambling. |
| The structure of that S&P network is similar to the one used in PRESENT, but it uses a modified permutation function that makes it possible to parameterize the network to arbitrary data widths as shown in the pseudo code below. |
| |
| ```c++ |
| |
| NUM_ROUNDS = 2; |
| DATA_WIDTH = 8; // bitwidth of the data |
| |
| // Apply PRESENT Sbox4 on all nibbles, leave uppermost bits unchanged |
| // if the width is not divisible by 4. |
| state_t sbox4_layer(state) { |
| for (int i = 0; i < DATA_WIDTH/4; i ++) { |
| nibble_t nibble = get_nibble(state, i); |
| nibble = present_sbox4(nibble) |
| set_nibble(state, i, nibble); |
| } |
| return state; |
| } |
| |
| // Reverses the bit order. |
| state_t flip_vector(state) { |
| state_t state_flipped; |
| for (int i = 0; i < DATA_WIDTH; i ++) { |
| state_flipped[i] = state[width-1-i]; |
| } |
| return state_flipped; |
| } |
| |
| // Gather all even bits and put them into the lower half. |
| // Gather all odd bits and put them into the upper half. |
| state_t perm_layer(state) { |
| // Initialize with input state. |
| // If the number of bits is odd, the uppermost bit |
| // will stay in position, as intended. |
| state_t state_perm = state; |
| for (int i = 0; i < DATA_WIDTH/2; i++) { |
| state_perm[i] = state[i * 2]; |
| state_perm[i + DATA_WIDTH/2] = state[i * 2 + 1]; |
| } |
| return state_perm; |
| } |
| |
| state_t prim_subst_perm(data_i, key_i) { |
| |
| state_t state = data_i; |
| for (int i = 0; i < NUM_ROUNDS; i++) { |
| state ^= key_i; |
| state = sbox4_layer(state); |
| // The vector flip and permutation operations have the |
| // combined effect that all bits will be passed through an |
| // Sbox4 eventually, even if the number of bits in data_i |
| // is not aligned with 4. |
| state = flip_vector(state); |
| state = perm_layer(state); |
| } |
| |
| return state ^ key_i; |
| } |
| |
| ``` |