| --- |
| title: OpenTitan Big Number Accelerator (OTBN) Technical Specification |
| --- |
| |
| # Overview |
| |
| This document specifies functionality of the OpenTitan Big Number Accelerator, or OTBN. |
| OTBN is a coprocessor for asymmetric cryptographic operations like RSA or Elliptic Curve Cryptography (ECC). |
| |
| This module conforms to the [Comportable guideline for peripheral functionality]({{< relref "doc/rm/comportability_specification" >}}). |
| See that document for integration overview within the broader top level system. |
| |
| ## Features |
| |
| * Processor optimized for wide integer arithmetic |
| * 32b wide control path with 32 32b wide registers |
| * 256b wide data path with 32 256b wide registers |
| * Full control-flow support with conditional branch and unconditional jump instructions, hardware loops, and hardware-managed call/return stacks. |
| * Reduced, security-focused instruction set architecture for easier verification and the prevention of data leaks. |
| * Built-in access to random numbers. |
| |
| ## Description |
| |
| OTBN is a processor, specialized for the execution of security-sensitive asymmetric (public-key) cryptography code, such as RSA or ECC. |
| Such algorithms are dominated by wide integer arithmetic, which are supported by OTBN's 256b wide data path, registers, and instructions which operate these wide data words. |
| On the other hand, the control flow is clearly separated from the data, and reduced to a minimum to avoid data leakage. |
| |
| The data OTBN processes is security-sensitive, and the processor design centers around that. |
| The design is kept as simple as possible to reduce the attack surface and aid verification and testing. |
| For example, no interrupts or exceptions are included in the design, and all instructions are designed to be executable within a single cycle. |
| |
| OTBN is designed as a self-contained co-processor with its own instruction and data memory, which is accessible as a bus device. |
| |
| ## Compatibility |
| |
| OTBN is not designed to be compatible with other cryptographic accelerators. |
| It received some inspiration from assembly code available from the [Chromium EC project](https://chromium.googlesource.com/chromiumos/platform/ec/), |
| which has been formally verified within the [Fiat Crypto project](http://adam.chlipala.net/papers/FiatCryptoSP19/FiatCryptoSP19.pdf). |
| |
| # Instruction Set |
| |
| OTBN is a processor with a custom instruction set. |
| The full ISA description can be found in our [ISA manual]({{< relref "isa" >}}). |
| The instruction set is split into two groups: |
| |
| * The **base instruction subset** operates on the 32b General Purpose Registers (GPRs). |
| Its instructions are used for the control flow of a OTBN application. |
| The base instructions are inspired by RISC-V's RV32I instruction set, but not compatible with it. |
| * The **big number instruction subset** operates on 256b Wide Data Registers (WDRs). |
| Its instructions are used for data processing. |
| |
| ## Processor State |
| |
| ### General Purpose Registers (GPRs) |
| |
| OTBN has 32 General Purpose Registers (GPRs), each of which is 32b wide. |
| The GPRs are defined in line with RV32I and are mainly used for control flow. |
| They are accessed through the base instruction subset. |
| GPRs aren't used by the main data path; this operates on the [Wide Data Registers](#wide-data-registers-wdrs), a separate register file, controlled by the big number instructions. |
| |
| <table> |
| <tr> |
| <td><code>x0</code></td> |
| <td>Zero register. Reads as 0; writes are ignored.</td> |
| </tr> |
| <tr> |
| <td><code>x1</code></td> |
| <td> |
| |
| Access to the [call stack](#call-stack) |
| |
| </td> |
| </tr> |
| <tr> |
| <td><code>x2</code> ... <code>x31</code></td> |
| <td>General purpose registers</td> |
| </tr> |
| </table> |
| |
| Note: Currently, OTBN has no "standard calling convention," and GPRs other than `x0` and `x1` can be used for any purpose. |
| If a calling convention is needed at some point, it is expected to be aligned with the RISC-V standard calling conventions, and the roles assigned to registers in that convention. |
| Even without a agreed-on calling convention, software authors are encouraged to follow the RISC-V calling convention where it makes sense. |
| For example, good choices for temporary registers are `x6`, `x7`, `x28`, `x29`, `x30`, and `x31`. |
| |
| ### Call Stack |
| |
| OTBN has an in-built call stack which is accessed through the `x1` GPR. |
| This is intended to be used as a return address stack, containing return addresses for the current stack of function calls. |
| See the documentation for {{< otbnInsnRef "JAL" >}} and {{< otbnInsnRef "JALR" >}} for a description of how to use it for this purpose. |
| |
| The call stack has a maximum depth of 8 elements. |
| Each instruction that reads from `x1` pops a single element from the stack. |
| Each instruction that writes to `x1` pushes a single element onto the stack. |
| An instruction that reads from an empty stack or writes to a full stack causes OTBN to stop, raising an alert and setting the `ErrBitCallStack` bit in the {{< regref "ERR_BITS" >}} register. |
| |
| A single instruction can both read and write to the stack. |
| In this case, the read is ordered before the write. |
| Providing the stack has at least one element, this is allowed, even if the stack is full. |
| |
| ### Control and Status Registers (CSRs) {#csrs} |
| |
| Control and Status Registers (CSRs) are 32b wide registers used for "special" purposes, as detailed in their description; |
| they are not related to the GPRs. |
| CSRs can be accessed through dedicated instructions, {{< otbnInsnRef "CSRRS" >}} and {{< otbnInsnRef "CSRRW" >}}. |
| Writes to read-only (RO) registers are ignored; they do not signal an error. |
| All read-write (RW) CSRs are set to 0 when OTBN starts an operation (when 1 is written to {{< regref "CMD.start" >}}). |
| |
| <!-- This list of CSRs is replicated in otbn_env_cov.sv, wsr.py, the |
| RTL and in rig/model.py. If editing one, edit the other four as well. --> |
| <table> |
| <thead> |
| <tr> |
| <th>Number</th> |
| <th>Access</th> |
| <th>Name</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td>0x7C0</td> |
| <td>RW</td> |
| <td>FG0</td> |
| <td> |
| Wide arithmetic flag group 0. |
| This CSR provides access to flag group 0 used by wide integer arithmetic. |
| <strong>FLAGS</strong>, <strong>FG0</strong> and <strong>FG1</strong> provide different views on the same underlying bits. |
| <table> |
| <thead> |
| <tr><th>Bit</th><th>Description</th></tr> |
| </thead> |
| <tbody> |
| <tr><td>0</td><td>Carry of Flag Group 0</td></tr> |
| <tr><td>1</td><td>MSb of Flag Group 0</td></tr> |
| <tr><td>2</td><td>LSb of Flag Group 0</td></tr> |
| <tr><td>3</td><td>Zero of Flag Group 0</td></tr> |
| </tbody> |
| </table> |
| </td> |
| </tr> |
| <tr> |
| <td>0x7C1</td> |
| <td>RW</td> |
| <td>FG1</td> |
| <td> |
| Wide arithmetic flag group 1. |
| This CSR provides access to flag group 1 used by wide integer arithmetic. |
| <strong>FLAGS</strong>, <strong>FG0</strong> and <strong>FG1</strong> provide different views on the same underlying bits. |
| <table> |
| <thead> |
| <tr><th>Bit</th><th>Description</th></tr> |
| </thead> |
| <tbody> |
| <tr><td>0</td><td>Carry of Flag Group 1</td></tr> |
| <tr><td>1</td><td>MSb of Flag Group 1</td></tr> |
| <tr><td>2</td><td>LSb of Flag Group 1</td></tr> |
| <tr><td>3</td><td>Zero of Flag Group 1</td></tr> |
| </tbody> |
| </table> |
| </td> |
| </tr> |
| <tr> |
| <td>0x7C8</td> |
| <td>RW</td> |
| <td>FLAGS</td> |
| <td> |
| Wide arithmetic flag groups. |
| This CSR provides access to both flags groups used by wide integer arithmetic. |
| <strong>FLAGS</strong>, <strong>FG0</strong> and <strong>FG1</strong> provide different views on the same underlying bits. |
| <table> |
| <thead> |
| <tr><th>Bit</th><th>Description</th></tr> |
| </thead> |
| <tbody> |
| <tr><td>0</td><td>Carry of Flag Group 0</td></tr> |
| <tr><td>1</td><td>MSb of Flag Group 0</td></tr> |
| <tr><td>2</td><td>LSb of Flag Group 0</td></tr> |
| <tr><td>3</td><td>Zero of Flag Group 0</td></tr> |
| <tr><td>4</td><td>Carry of Flag Group 1</td></tr> |
| <tr><td>5</td><td>MSb of Flag Group 1</td></tr> |
| <tr><td>6</td><td>LSb of Flag Group 1</td></tr> |
| <tr><td>7</td><td>Zero of Flag Group 1</td></tr> |
| </tbody> |
| </table> |
| </td> |
| </tr> |
| <tr> |
| <td>0x7D0</td> |
| <td>RW</td> |
| <td>MOD0</td> |
| <td> |
| Bits [31:0] of the modulus operand, used in the {{< otbnInsnRef "BN.ADDM" >}}/{{< otbnInsnRef "BN.SUBM" >}} instructions. |
| This CSR is mapped to the MOD WSR. |
| </td> |
| </tr> |
| <tr> |
| <td>0x7D1</td> |
| <td>RW</td> |
| <td>MOD1</td> |
| <td> |
| Bits [63:32] of the modulus operand, used in the {{< otbnInsnRef "BN.ADDM" >}}/{{< otbnInsnRef "BN.SUBM" >}} instructions. |
| This CSR is mapped to the MOD WSR. |
| </td> |
| </tr> |
| <tr> |
| <td>0x7D2</td> |
| <td>RW</td> |
| <td>MOD2</td> |
| <td> |
| Bits [95:64] of the modulus operand, used in the {{< otbnInsnRef "BN.ADDM" >}}/{{< otbnInsnRef "BN.SUBM" >}} instructions. |
| This CSR is mapped to the MOD WSR. |
| </td> |
| </tr> |
| <tr> |
| <td>0x7D3</td> |
| <td>RW</td> |
| <td>MOD3</td> |
| <td> |
| Bits [127:96] of the modulus operand, used in the {{< otbnInsnRef "BN.ADDM" >}}/{{< otbnInsnRef "BN.SUBM" >}} instructions. |
| This CSR is mapped to the MOD WSR. |
| </td> |
| </tr> |
| <tr> |
| <td>0x7D4</td> |
| <td>RW</td> |
| <td>MOD4</td> |
| <td> |
| Bits [159:128] of the modulus operand, used in the {{< otbnInsnRef "BN.ADDM" >}}/{{< otbnInsnRef "BN.SUBM" >}} instructions. |
| This CSR is mapped to the MOD WSR. |
| </td> |
| </tr> |
| <tr> |
| <td>0x7D5</td> |
| <td>RW</td> |
| <td>MOD5</td> |
| <td> |
| Bits [191:160] of the modulus operand, used in the {{< otbnInsnRef "BN.ADDM" >}}/{{< otbnInsnRef "BN.SUBM" >}} instructions. |
| This CSR is mapped to the MOD WSR. |
| </td> |
| </tr> |
| <tr> |
| <td>0x7D6</td> |
| <td>RW</td> |
| <td>MOD6</td> |
| <td> |
| Bits [223:192] of the modulus operand, used in the {{< otbnInsnRef "BN.ADDM" >}}/{{< otbnInsnRef "BN.SUBM" >}} instructions. |
| This CSR is mapped to the MOD WSR. |
| </td> |
| </tr> |
| <tr> |
| <td>0x7D7</td> |
| <td>RW</td> |
| <td>MOD7</td> |
| <td> |
| Bits [255:224] of the modulus operand, used in the {{< otbnInsnRef "BN.ADDM" >}}/{{< otbnInsnRef "BN.SUBM" >}} instructions. |
| This CSR is mapped to the MOD WSR. |
| </td> |
| </tr> |
| <tr> |
| <td>0x7D8</td> |
| <td>RW</td> |
| <td>RND_PREFETCH</td> |
| <td> |
| Write to this CSR to begin a request to fill the RND cache. |
| Always reads as 0. |
| </td> |
| </tr> |
| <tr> |
| <td>0xFC0</td> |
| <td>RO</td> |
| <td>RND</td> |
| <td> |
| An AIS31-compliant class PTG.3 random number with guaranteed entropy and forward and backward secrecy. |
| Primarily intended to be used for key generation. |
| |
| The number is sourced from the EDN via a single-entry cache. |
| Reads when the cache is empty will cause OTBN to be stalled until a new random number is fetched from the EDN. |
| </td> |
| </tr> |
| <tr> |
| <td>0xFC1</td> |
| <td>RO</td> |
| <td>URND</td> |
| <td> |
| A random number without guaranteed secrecy properties or specific statistical properties. |
| Intended for use in masking and blinding schemes. |
| Use RND for high-quality randomness. |
| |
| The number is sourced from an LFSR. |
| Reads never stall. |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| |
| ### Wide Data Registers (WDRs) |
| |
| In addition to the 32b wide GPRs, OTBN has a second "wide" register file, which is used by the big number instruction subset. |
| This register file consists of NWDR = 32 Wide Data Registers (WDRs). |
| Each WDR is WLEN = 256b wide. |
| |
| Wide Data Registers (WDRs) and the 32b General Purpose Registers (GPRs) are separate register files. |
| They are only accessible through their respective instruction subset: |
| GPRs are accessible from the base instruction subset, and WDRs are accessible from the big number instruction subset (`BN` instructions). |
| |
| | Register | |
| |----------| |
| | w0 | |
| | w1 | |
| | ... | |
| | w31 | |
| |
| ### Wide Special Purpose Registers (WSRs) {#wsrs} |
| |
| OTBN has 256b Wide Special purpose Registers (WSRs). |
| These are analogous to the 32b CSRs, but are used by big number instructions. |
| They can be accessed with the {{< otbnInsnRef "BN.WSRR" >}} and {{< otbnInsnRef "BN.WSRW" >}} instructions. |
| Writes to read-only (RO) registers are ignored; they do not signal an error. |
| All read-write (RW) WSRs are set to 0 when OTBN starts an operation (when 1 is written to {{< regref "CMD.start" >}}). |
| |
| <!-- This list of WSRs is replicated in otbn_env_cov.sv, wsr.py, the |
| RTL and in rig/model.py. If editing one, edit the other four as well. --> |
| <table> |
| <thead> |
| <tr> |
| <th>Number</th> |
| <th>Access</th> |
| <th>Name</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td>0x0</td> |
| <td>RW</td> |
| <td>MOD</td> |
| <td> |
| |
| The modulus used by the {{< otbnInsnRef "BN.ADDM" >}} and {{< otbnInsnRef "BN.SUBM" >}} instructions. |
| This WSR is also visible as CSRs `MOD0` through to `MOD7`. |
| |
| </td> |
| </tr> |
| <tr> |
| <td>0x1</td> |
| <td>RO</td> |
| <td>RND</td> |
| <td> |
| An AIS31-compliant class PTG.3 random number with guaranteed entropy and forward and backward secrecy. |
| Primarily intended to be used for key generation. |
| |
| The number is sourced from the EDN via a single-entry cache. |
| Reads when the cache is empty will cause OTBN to be stalled until a new random number is fetched from the EDN. |
| </td> |
| </tr> |
| <tr> |
| <td>0x2</td> |
| <td>RO</td> |
| <td>URND</td> |
| <td> |
| A random number without guaranteed secrecy properties or specific statistical properties. |
| Intended for use in masking and blinding schemes. |
| Use RND for high-quality randomness. |
| |
| The number is sourced from an LFSR. |
| Reads never stall. |
| </td> |
| </tr> |
| <tr> |
| <td>0x3</td> |
| <td>RW</td> |
| <td>ACC</td> |
| <td> |
| The accumulator register used by the {{< otbnInsnRef "BN.MULQACC" >}} instruction. |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| |
| ### Flags |
| |
| In addition to the wide register file, OTBN maintains global state in two groups of flags for the use by wide integer operations. |
| Flag groups are named Flag Group 0 (`FG0`), and Flag Group 1 (`FG1`). |
| Each group consists of four flags. |
| Each flag is a single bit. |
| |
| - `C` (Carry flag). |
| Set to 1 an overflow occurred in the last arithmetic instruction. |
| |
| - `M` (MSb flag) |
| The most significant bit of the result of the last arithmetic or shift instruction. |
| |
| - `L` (LSb flag). |
| The least significant bit of the result of the last arithmetic or shift instruction. |
| |
| - `Z` (Zero Flag) |
| Set to 1 if the result of the last operation was zero; otherwise 0. |
| |
| The `M`, `L`, and `Z` flags are determined based on the result of the operation as it is written back into the result register, without considering the overflow bit. |
| |
| ### Loop Stack |
| |
| OTBN has two instructions for hardware-assisted loops: {{< otbnInsnRef "LOOP" >}} and {{< otbnInsnRef "LOOPI" >}}. |
| Both use the same state for tracking control flow. |
| This is a stack of tuples containing a loop count, start address and end address. |
| The stack has a maximum depth of eight and the top of the stack is the current loop. |
| |
| # Security Features |
| |
| <div class="bd-callout bd-callout-warning"> |
| <h5>Work in progress</h5> |
| |
| Work on OTBN is ongoing, including work on the specification and implementation of its security features. |
| Do not treat the following description (or anything in this documentation) as final, fully implemented, or verified. |
| </div> |
| |
| OTBN is a security co-processor. |
| It contains various security features and is hardened against side-channel analysis and fault injection attacks. |
| The following sections describe the high-level security features of OTBN. |
| Refer to the [Design Details]({{< relref "#design-details" >}}) section for a more in-depth description. |
| |
| ## Data Integrity Protection |
| |
| OTBN's data integrity protection is designed to protect the data stored and processed within OTBN from modifications through physical attacks. |
| |
| Data in OTBN travels along a data path which includes the data memory (DMEM), the load-store-unit (LSU), the register files (GPR and WDR), and the execution units. |
| Whenever possible, data transmitted or stored within OTBN is protected with an integrity protection code which guarantees the detection of at least three modified bits per 32 bit word. |
| Additionally, instructions and data stored in the instruction and data memory, respectively, are scrambled with a lightweight, non-cryptographically-secure cipher. |
| |
| Refer to the [Data Integrity Protection]({{<relref "#design-details-data-integrity-protection">}}) section for details of how the data integrity protections are implemented. |
| |
| ## Secure Wipe |
| |
| OTBN provides a mechanism to securely wipe all state it stores, including the instruction memory. |
| |
| The full secure wipe mechanism is split into three parts: |
| - [Data memory secure wipe]({{<relref "#design-details-secure-wipe-dmem">}}) |
| - [Instruction memory secure wipe]({{<relref "#design-details-secure-wipe-imem">}}) |
| - [Internal state secure wipe]({{<relref "#design-details-secure-wipe-internal">}}) |
| |
| A secure wipe is performed automatically in certain situations, or can be requested manually by the host software. |
| The full secure wipe is automatically initiated as a local reaction to a fatal error. |
| A secure wipe of only the internal state is performed whenever an OTBN operation is complete and after a recoverable error. |
| Finally, host software can manually trigger the data memory and instruction memory secure wipe operations by issuing an appropriate [command](#design-details-commands). |
| |
| Refer to the [Secure Wipe]({{<relref "#design-details-secure-wipe">}}) section for implementation details. |
| |
| ## Instruction Counter |
| |
| In order to detect and mitigate fault injection attacks on the OTBN, the host CPU can read the number of executed instructions from {{< regref "INSN_CNT">}} and verify whether it matches the expectation. |
| |
| # Theory of Operations |
| |
| ## Block Diagram |
| |
|  |
| |
| ## Hardware Interfaces |
| |
| {{< incGenFromIpDesc "../data/otbn.hjson" "hwcfg" >}} |
| |
| ### Hardware Interface Requirements |
| |
| OTBN connects to other components in an OpenTitan system. |
| This section lists requirements on those interfaces that go beyond the physical connectivity. |
| |
| #### Entropy Distribution Network (EDN) |
| |
| OTBN has two EDN connections: `edn_urnd` and `edn_rnd`. |
| What kind of randomness is provided on the EDN connections is configurable at runtime, but unknown to OTBN. |
| To maintain its security properties, OTBN requires the following configuration for the two EDN connections: |
| |
| * OTBN has no specific requirements on the randomness drawn from `edn_urnd`. |
| For performance reasons, requests on this EDN connection should be answered quickly. |
| * `edn_rnd` must provide AIS31-compliant class PTG.3 random numbers. |
| The randomness from this interface is made available through the `RND` WSR and intended to be used for key generation. |
| |
| ## Design Details {#design-details} |
| |
| ### Memories |
| |
| The OTBN processor core has access to two dedicated memories: an instruction memory (IMEM), and a data memory (DMEM). |
| Each memory is 4 kiB in size. |
| |
| The memory layout follows the Harvard architecture. |
| Both memories are byte-addressed, with addresses starting at 0. |
| |
| The instruction memory (IMEM) is 32b wide and provides the instruction stream to the OTBN processor. |
| It cannot be read from or written to by user code through load or store instructions. |
| |
| The data memory (DMEM) is 256b wide and read-write accessible from the base and big number instruction subsets of the OTBN processor core. |
| There are four instructions that can access data memory. |
| In the base instruction subset, there are {{< otbnInsnRef "LW" >}} (load word) and {{< otbnInsnRef "SW" >}} (store word). |
| These access 32b-aligned 32b words. |
| In the big number instruction subset, there are {{< otbnInsnRef "BN.LID" >}} (load indirect) and {{< otbnInsnRef "BN.SID" >}} (store indirect). |
| These access 256b-aligned 256b words. |
| |
| Both memories can be accessed through OTBN's register interface ({{< regref "DMEM" >}} and {{< regref "IMEM" >}}). |
| These accesses are ignored if OTBN is busy. |
| A host processor can check whether OTBN is busy by reading the {{< regref "STATUS">}} register. |
| All memory accesses through the register interface must be word-aligned 32b word accesses. |
| |
| ### Random Numbers |
| |
| OTBN is connected to the [Entropy Distribution Network (EDN)]({{< relref "hw/ip/edn/doc" >}}) which can provide random numbers via the `RND` and `URND` CSRs and WSRs. |
| |
| `RND` provides bits taken directly from the EDN connected via `edn_rnd`. |
| The EDN interface provides 32b of entropy per transaction and comes from a different clock domain to the OTBN core. |
| A FIFO is used to synchronize the incoming package to the OTBN clock domain. |
| Synchronized packages are then stacked up to a single `WLEN` value of 256b. |
| In order to service a single EDN request, a total of 8 transactions are required from EDN interface. |
| |
| As an EDN request can take time, `RND` is backed by a single-entry cache containing the result of the most recent EDN request in OTBN core level. |
| A read from `RND` empties this cache. |
| A prefetch into the cache, which can be used to hide the EDN latency, is triggered on any write to the `RND_PREFETCH` CSR. |
| Writes to `RND_PREFETCH` will be ignored whilst a prefetch is in progress or when the cache is already full. |
| OTBN will stall until the request provides bits. |
| Both the `RND` CSR and WSR take their bits from the same cache. |
| `RND` CSR reads simply discard the other 192 bits on a read. |
| When stalling on an `RND` read, OTBN will unstall on the cycle after it receives WLEN RND data from the EDN. |
| |
| `URND` provides bits from an LFSR within OTBN; reads from it never stall. |
| The `URND` LFSR is seeded once from the EDN connected via `edn_urnd` when OTBN starts execution. |
| Each new execution of OTBN will reseed the `URND` LFSR. |
| The LFSR state is advanced every cycle when OTBN is running. |
| |
| ### Operational States {#design-details-operational-states} |
| |
| <!-- |
| Source: https://docs.google.com/drawings/d/1C0D4UriRk5pKGFoFtAXYLcJ1oBG1BCDd2omCLPYHtr0/edit |
| |
| Download the SVG from Google Draw, open it in Inkscape once and save it without changes to add width/height information to the image. |
| --> |
|  |
| |
| OTBN can be in different operational states. |
| OTBN is *busy* for as long it is performing an operation. |
| OTBN is *locked* if a fatal error was observed. |
| Otherwise OTBN is *idle*. |
| |
| The current operational state is reflected in the {{< regref "STATUS" >}} register. |
| - If OTBN is idle, the {{< regref "STATUS" >}} register is set to `IDLE`. |
| - If OTBN is busy, the {{< regref "STATUS" >}} register is set to one of the values starting with `BUSY_`. |
| - If OTBN is locked, the {{< regref "STATUS" >}} register is set to `LOCKED`. |
| |
| OTBN transitions into the busy state as result of host software [issuing a command](#design-details-commands); OTBN is then said to perform an operation. |
| OTBN transitions out of the busy state whenever the operation has completed. |
| In the {{< regref "STATUS" >}} register the different `BUSY_*` values represent the operation that is currently being performed. |
| |
| A transition out of the busy state is signaled by the `done` interrupt ({{< regref "INTR_STATE.done" >}}). |
| |
| The locked state is a terminal state; transitioning out of it requires an OTBN reset. |
| |
| ### Operations and Commands {#design-details-commands} |
| |
| OTBN understands a set of commands to perform certain operations. |
| Commands are issued by writing to the {{< regref "CMD" >}} register. |
| |
| The `EXECUTE` command starts the [execution of the application](#design-details-software-execution) contained in OTBN's instruction memory. |
| |
| The `SEC_WIPE_DMEM` command [securely wipes the data memory](#design-details-secure-wipe). |
| |
| The `SEC_WIPE_IMEM` command [securely wipes the instruction memory](#design-details-secure-wipe). |
| |
| ### Software Execution {#design-details-software-execution} |
| |
| Software execution on OTBN is triggered by host software by [issuing the `EXECUTE` command](#design-details-commands). |
| The software then runs to completion, without the ability for host software to interrupt or inspect the execution. |
| |
| - OTBN transitions into the busy state, and reflects this by setting {{< regref "STATUS">}} to `BUSY_EXECUTE`. |
| - The internal randomness source, which provides random numbers to the `URND` CSR and WSR, is re-seeded from the EDN. |
| - The instruction at {{< regref "START_ADDR" >}} is fetched and executed. |
| - From this point on, all subsequent instructions are executed according to their semantics until either an {{< otbnInsnRef "ECALL" >}} instruction is executed, or an error is detected. |
| - A [secure wipe of internal state](#design-details-secure-wipe-internal) is performed. |
| - The {{< regref "ERR_BITS" >}} register is set to indicate either a successful execution (value `0`), or to indicate the error that was observed (a non-zero value). |
| - OTBN transitions into the [idle state](#design-details-operational-states) (in case of a successful execution, or a recoverable error) or the locked state (in case of a fatal error). |
| This transition is signaled by raising the `done` interrupt ({{< regref "INTR_STATE.done" >}}), and reflected in the {{< regref "STATUS" >}} register. |
| |
| ### Errors {#design-details-errors} |
| |
| OTBN is able to detect a range of errors, which are classified as *software errors* or *fatal errors*. |
| A software error is an error in the code that OTBN executes. |
| In the absence of an attacker, these errors are due to a programmer's mistake. |
| A fatal error is typically the violation of a security property. |
| All errors and their classification are listed in the [List of Errors](#design-details-list-of-errors). |
| |
| Whenever an error is detected, OTBN reacts locally, and informs the OpenTitan system about it by raising an alert. |
| OTBN generally does not try to recover from errors itself, and provides no error handling support to code that runs on it. |
| |
| OTBN gives host software the option to recover from some errors by restarting the operation. |
| All software errors are treated as recoverable and are handled as described in the section [Reaction to Recoverable Errors](#design-details-recoverable-errors). |
| |
| Fatal errors are treated as described in the section [Reaction to Fatal Errors](#design-details-fatal-errors). |
| |
| ### Reaction to Recoverable Errors {#design-details-recoverable-errors} |
| |
| Recoverable errors can be the result of a programming error in OTBN software. |
| Recoverable errors can only occur during the execution of software on OTBN, and not in other situations in which OTBN might be busy. |
| |
| The following actions are taken when OTBN detects a recoverable error: |
| |
| 1. The currently running operation is terminated, similar to the way an {{< otbnInsnRef "ECALL" >}} instruction [is executed](#writing-otbn-applications-ecall): |
| - No more instructions are fetched or executed. |
| - A [secure wipe of internal state](#design-details-secure-wipe-internal) is performed. |
| - The {{< regref "ERR_BITS" >}} register is set to a non-zero value that describes the error. |
| - The current operation is marked as complete by setting {{< regref "INTR_STATE.done" >}}. |
| - The {{< regref "STATUS" >}} register is set to `IDLE`. |
| 2. A [recoverable alert]({{< relref "#alerts" >}}) is raised. |
| |
| The host software can start another operation on OTBN after a recoverable error was detected. |
| |
| ### Reaction to Fatal Errors {#design-details-fatal-errors} |
| |
| Fatal errors are generally seen as a sign of an intrusion, resulting in more drastic measures to protect the secrets stored within OTBN. |
| Fatal errors can occur at any time, even when an OTBN operation isn't in progress. |
| |
| The following actions are taken when OTBN detects a fatal error: |
| |
| 1. A [secure wipe of the data memory](#design-details-secure-wipe-dmem) and a [secure wipe of the instruction memory](#design-details-secure-wipe-imem) is initiated. |
| 2. If OTBN [is not idle](#design-details-operational-states), then the currently running operation is terminated, similarly to how an operation ends after an {{< otbnInsnRef "ECALL" >}} instruction [is executed](#writing-otbn-applications-ecall): |
| - No more instructions are fetched or executed. |
| - A [secure wipe of internal state](#design-details-secure-wipe-internal) is performed. |
| - The {{< regref "ERR_BITS" >}} register is set to a non-zero value that describes the error. |
| - The current operation is marked as complete by setting {{< regref "INTR_STATE.done" >}}. |
| 3. The {{< regref "STATUS" >}} register is set to `LOCKED`. |
| 4. A [fatal alert]({{< relref "#alerts" >}}) is raised. |
| |
| Note that OTBN can detect some errors even when it isn't running. |
| One example of this is an error caused by an integrity error when reading or writing OTBN's memories over the bus. |
| In this case, the {{< regref "ERR_BITS" >}} register will not change. |
| This avoids race conditions with the host processor's error handling software. |
| However, every error that OTBN detects when it isn't running is fatal. |
| This means that the cause will be reflected in {{< regref "FATAL_ALERT_CAUSE" >}}, as described below in [Alerts]({{< relref "#alerts" >}}). |
| This way, no alert is generated without setting an error code somewhere. |
| |
| ### List of Errors {#design-details-list-of-errors} |
| |
| <table> |
| <thead> |
| <tr> |
| <th>Name</th> |
| <th>Class</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code>BAD_DATA_ADDR<code></td> |
| <td>software</td> |
| <td>A data memory access occurred with an out of bounds or unaligned access.</td> |
| </tr> |
| <tr> |
| <td><code>BAD_INSN_ADDR<code></td> |
| <td>software</td> |
| <td>An instruction memory access occurred with an out of bounds or unaligned access.</td> |
| </tr> |
| <tr> |
| <td><code>CALL_STACK<code></td> |
| <td>software</td> |
| <td>An instruction tried to pop from an empty call stack or push to a full call stack.</td> |
| </tr> |
| <tr> |
| <td><code>ILLEGAL_INSN<code></td> |
| <td>software</td> |
| <td> |
| An illegal instruction was about to be executed. |
| </td> |
| <tr> |
| <td><code>LOOP<code></td> |
| <td>software</td> |
| <td> |
| A loop stack-related error was detected. |
| </td> |
| </tr> |
| <tr> |
| <td><code>IMEM_INTG_VIOLATION<code></td> |
| <td>fatal</td> |
| <td>Data read from the instruction memory failed the integrity checks.</td> |
| </tr> |
| <tr> |
| <td><code>DMEM_INTG_VIOLATION<code></td> |
| <td>fatal</td> |
| <td>Data read from the data memory failed the integrity checks.</td> |
| </tr> |
| <tr> |
| <td><code>REG_INTG_VIOLATION<code></td> |
| <td>fatal</td> |
| <td>Data read from a GPR or WDR failed the integrity checks.</td> |
| </tr> |
| <tr> |
| <td><code>BUS_INTG_VIOLATION<code></td> |
| <td>fatal</td> |
| <td>An incoming bus transaction failed the integrity checks.</td> |
| </tr> |
| <tr> |
| <td><code>ILLEGAL_BUS_ACCESS<code></td> |
| <td>fatal</td> |
| <td>A bus-accessible register or memory was accessed when not allowed.</td> |
| </tr> |
| <tr> |
| <td><code>LIFECYCLE_ESCALATION<code></td> |
| <td>fatal</td> |
| <td>A life cycle escalation request was received.</td> |
| </tr> |
| </tbody> |
| </table> |
| |
| ### Alerts |
| |
| An alert is a reaction to an error that OTBN detected. |
| OTBN has two alerts, one recoverable and one fatal. |
| |
| A **recoverable alert** is a one-time triggered alert caused by [recoverable errors](#design-details-recoverable-errors). |
| The error that caused the alert can be determined by reading the {{< regref "ERR_BITS" >}} register. |
| |
| A **fatal alert** is a continuously triggered alert caused by [fatal errors](#design-details-fatal-errors). |
| The error that caused the alert can be determined by reading the {{< regref "FATAL_ALERT_CAUSE" >}} register. |
| If OTBN was running, this value will also be reflected in the {{< regref "ERR_BITS" >}} register. |
| A fatal alert can only be cleared by resetting OTBN through the `rst_ni` line. |
| |
| ### Reaction to Life Cycle Escalation Requests {#design-details-lifecycle-escalation} |
| |
| OTBN receives and reacts to escalation signals from the [life cycle controller]({{< relref "/hw/ip/lc_ctrl/doc#security-escalation" >}}). |
| An incoming life cycle escalation is a fatal error of type `lifecycle_escalation` and treated as described in the section [Fatal Errors](#design-details-fatal-errors). |
| |
| ### Idle |
| |
| OTBN exposes a single-bit `idle_o` signal, intended to be used by the clock manager to clock-gate the block when it is not in use. |
| This signal is in the same clock domain as `clk_i`. |
| The `idle_o` signal is high when OTBN [is idle](#design-details-operational-states), and low otherwise. |
| |
| OTBN also exposes another version of the idle signal as `idle_otp_o`. |
| This works analogously, but is in the same clock domain as `clk_otp_i`. |
| |
| TODO: Specify interactions between `idle_o`, `idle_otp_o` and the clock manager fully. |
| |
| ### Data Integrity Protection {#design-details-data-integrity-protection} |
| |
| OTBN stores and operates on data (state) in its dedicated memories, register files, and internal registers. |
| OTBN's data integrity protection is designed to protect all data stored and transmitted within OTBN from modifications through physical attacks. |
| |
| During transmission, the integrity of data is protected with an integrity protection code. |
| Data at rest in the instruction and data memories is additionally scrambled. |
| |
| In the following, the Integrity Protection Code and the scrambling algorithm are discussed, followed by their application to individual storage elements. |
| |
| #### Integrity Protection Code {#design-details-integrity-protection-code} |
| |
| OTBN uses the same integrity protection code everywhere to provide overarching data protection without regular re-encoding. |
| The code is applied to 32b data words, and produces 39b of encoded data. |
| |
| The code used is an (39,32) Hsiao "single error correction, double error detection" (SECDED) error correction code (ECC) [[CHEN08]({{< relref "#ref-chen08">}})]. |
| It has a minimum Hamming distance of four, resulting in the ability to detect at least three errors in a 32 bit word. |
| The code is used for error detection only; no error correction is performed. |
| |
| #### Memory Scrambling {#design-details-memory-scrambling} |
| |
| Contents of OTBN's instruction and data memories are scrambled while at rest. |
| The data is bound to the address and scrambled before being stored in memory. |
| The addresses are randomly remapped. |
| |
| Note that data stored in other temporary memories within OTBN, including the register files, is not scrambled. |
| |
| Scrambling is used to obfuscate the memory contents and to diffuse the data. |
| Obfuscation makes passive probing more difficult, while diffusion makes active fault injection attacks more difficult. |
| |
| The scrambling mechanism is described in detail in the [section "Scrambling Primitive" of the SRAM Controller Technical Specification](/hw/ip/sram_ctrl/doc/#scrambling-primitive). |
| |
| The scrambling keys are rotated regularly, refer to the sections below for more details. |
| |
| #### Actions on Integrity Errors |
| |
| A fatal error is raised whenever a data integrity violation is detected, which results in an immediate stop of all processing and the issuing of a fatal alert. |
| The section [Error Handling and Reporting]({{< relref "#design-details-error-handling-and-reporting" >}}) describes the error handling in more detail. |
| |
| #### Register File Integrity Protection |
| |
| OTBN contains two register files: the 32b GPRs and the 256b WDRs. |
| The data stored in both register files is protected with the [Integrity Protection Code]({{< relref "#design-details-integrity-protection-code">}}). |
| Neither the register file contents nor register addresses are scrambled. |
| |
| The GPRs `x2` to `x31` store a 32b data word together with the Integrity Protection Code, resulting in 39b of stored data. |
| (`x0`, the zero register, and `x1`, the call stack, require special treatment.) |
| |
| Each 256b Wide Data Register (WDR) stores a 256b data word together with the Integrity Protection Code, resulting in 312b of stored data. |
| The integrity protection is done separately for each of the eight 32b sub-words within a 256b word. |
| |
| The register files can consume data protected with the Integrity Protection Code, or add it on demand. |
| Whenever possible the Integrity Protection Code is preserved from its source and written directly to the register files without recalculation, in particular in the following cases: |
| |
| * Data coming from the data memory (DMEM) through the load-store unit to a GPR or WDR. |
| * Data copied between WDRs using the {{< otbnInsnRef "BN.MOV" >}} or {{< otbnInsnRef "BN.MOVR" >}} instructions. |
| * Data conditionally copied between WDRs using the {{< otbnInsnRef "BN.SEL" >}} instruction. |
| * Data copied between the `ACC` and `MOD` WSRs and a WDR. |
| (TODO: Not yet implemented.) |
| * Data copied between any of the `MOD0` to `MOD7` CSRs and a GPR. |
| (TODO: Not yet implemented.) |
| |
| In all other cases the register files add the Integrity Protection Code to the incoming data before storing the data word. |
| |
| The integrity protection bits are checked on every read from the register files, even if the integrity protection is not removed from the data. |
| |
| Detected integrity violations in a register file raise a fatal `reg_error`. |
| |
| #### Data Memory (DMEM) Integrity Protection |
| |
| OTBN's data memory is 256b wide, but allows for 32b word accesses. |
| To facilitate such accesses, all integrity protection in the data memory is done on a 32b word granularity. |
| |
| All data entering or leaving the data memory block is protected with the [Integrity Protection Code]({{< relref "#design-details-integrity-protection-code">}}); |
| this code is not re-computed within the memory block. |
| |
| Before being stored in SRAM, the data word with the attached Integrity Protection Code, as well as the address are scrambled according to the [memory scrambling algorithm]({{< relref "#design-details-memory-scrambling">}}). |
| The scrambling is reversed on a read. |
| |
| The ephemeral memory scrambling key and the nonce are provided by the [OTP block]({{<relref "/hw/ip/otp_ctrl/doc" >}}). |
| They are set once when OTBN block is reset, and changed whenever a [secure wipe]({{<relref "#design-details-secure-wipe-dmem">}}) of the data memory is performed. |
| |
| |
| The Integrity Protection Code is checked on every memory read, even though the code remains attached to the data. |
| A further check must be performed when the data is consumed. |
| Detected integrity violations in the data memory raise a fatal `dmem_error`. |
| |
| #### Instruction Memory (IMEM) Integrity Protection |
| |
| All data entering or leaving the instruction memory block is protected with the [Integrity Protection Code]({{< relref "#design-details-integrity-protection-code">}}); |
| this code is not re-computed within the memory block. |
| |
| Before being stored in SRAM, the instruction word with the attached Integrity Protection Code, as well as the address are scrambled according to the [memory scrambling algorithm]({{< relref "#design-details-memory-scrambling">}}). |
| The scrambling is reversed on a read. |
| |
| The ephemeral memory scrambling key and the nonce are provided by the [OTP block]({{<relref "/hw/ip/otp_ctrl/doc" >}}). |
| They are set once when OTBN block is reset, and changed whenever a [secure wipe]({{<relref "#design-details-secure-wipe-imem">}}) of the instruction memory is performed. |
| |
| The Integrity Protection Code is checked on every memory read, even though the code remains attached to the data. |
| A further check must be performed when the data is consumed. |
| Detected integrity violations in the data memory raise a fatal `imem_error`. |
| |
| ### Secure Wipe {#design-details-secure-wipe} |
| |
| Applications running on OTBN may store sensitive data in the internal registers or the memory. |
| In order to prevent an untrusted application from reading any leftover data, OTBN provides the secure wipe operation. |
| This operation can be applied to: |
| - [Data memory]({{<relref "#design-details-secure-wipe-dmem">}}) |
| - [Instruction memory]({{<relref "#design-details-secure-wipe-imem">}}) |
| - [Internal state]({{<relref "#design-details-secure-wipe-internal">}}) |
| |
| The three forms of secure wipe can be triggered in different ways. |
| |
| A secure wipe of either the instruction or the data memory can be triggered from from host software by issuing a `SEC_WIPE_DMEM` or `SEC_WIPE_IMEM` [command](#design-details-command). |
| |
| A secure wipe of instruction memory, data memory, and all internal state is performed automatically when handling a [fatal error](#design-details-fatal-errors). |
| |
| A secure wipe of the internal state only is triggered automatically when OTBN [ends the software execution](#design-details-software-execution), either successfully, or unsuccessfully due to a [recoverable error](#design-details-recoverable-errors). |
| |
| #### Data Memory (DMEM) Secure Wipe {#design-details-secure-wipe-dmem} |
| |
| The wiping is performed by securely replacing the memory scrambling key, making all data stored in the memory unusable. |
| The key replacement is a two-step process: |
| |
| * Overwrite the 128b key of the memory scrambling primitive with randomness from URND. |
| This action takes a single cycle. |
| * Request new scrambling parameters from OTP. |
| The request takes multiple cycles to complete. |
| |
| Host software can initiate a data memory secure wipe by [issuing the `SEC_WIPE_DMEM` command](#design-details-commands). |
| |
| #### Instruction Memory (IMEM) Secure Wipe {#design-details-secure-wipe-imem} |
| |
| The wiping is performed by securely replacing the memory scrambling key, making all instructions stored in the memory unusable. |
| The key replacement is a two-step process: |
| |
| * Overwrite the 128b key of the memory scrambling primitive with randomness from URND. |
| This action takes a single cycle. |
| * Request new scrambling parameters from OTP. |
| The request takes multiple cycles to complete. |
| |
| Host software can initiate a data memory secure wipe by [issuing the `SEC_WIPE_IMEM` command](#design-details-commands). |
| |
| #### Internal State Secure Wipe {#design-details-secure-wipe-internal} |
| |
| OTBN provides a mechanism to securely wipe all internal state, excluding the instruction and data memories. |
| |
| The following state is wiped: |
| * Register files: GPRs and WDRs |
| * The accumulator register (also accessible through the ACC WSR) |
| * Flags (accessible through the FG0, FG1, and FLAGS CSRs) |
| * The modulus (accessible through the MOD0 to MOD7 CSRs and the MOD WSR) |
| |
| The wiping procedure is a two-step process: |
| * Overwrite the state with randomness from URND. |
| * Overwrite the state with zeros. |
| |
| Loop and call stack pointers are reset. |
| |
| Host software cannot explicitly trigger an internal secure wipe; it is performed automatically at the end of an `EXECUTE` operation. |
| |
| # Running applications on OTBN |
| |
| OTBN is a specialized coprocessor which is used from the host CPU. |
| This section describes how to interact with OTBN from the host CPU to execute an existing OTBN application. |
| The section [Writing OTBN applications]({{< ref "#writing-otbn-applications" >}}) describes how to write such applications. |
| |
| ## High-level operation sequence |
| |
| The high-level sequence by which the host processor should use OTBN is as follows. |
| |
| 1. Write the OTBN application binary to {{< regref "IMEM" >}}, starting at address 0. |
| 2. Optional: Write constants and input arguments, as mandated by the calling convention of the loaded application, to {{< regref "DMEM" >}}. |
| 3. Start the operation on OTBN by [issuing the `EXECUTE` command](#design-details-commands). |
| Now neither data nor instruction memory may be accessed from the host CPU. |
| After it has been started the OTBN application runs to completion without further interaction with the host. |
| 4. Wait for the operation to complete (see below). |
| As soon as the OTBN operation has completed the data and instruction memories can be accessed again from the host CPU. |
| 5. Check if the operation was successful by reading the {{< regref "ERR_BITS" >}} register. |
| 6. Optional: Retrieve results by reading {{< regref "DMEM" >}}, as mandated by the calling convention of the loaded application. |
| |
| OTBN applications are run to completion. |
| The host CPU can determine if an application has completed by either polling {{< regref "STATUS">}} or listening for an interrupt. |
| |
| * To poll for a completed operation, software should repeatedly read the {{< regref "STATUS" >}} register. |
| While the operation is in progress, {{< regref "STATUS" >}} is non-zero. |
| The operation is complete if {{< regref "STATUS" >}} is `IDLE`. |
| * Alternatively, software can listen for the `done` interrupt to determine if the operation has completed. |
| The standard sequence of working with interrupts has to be followed, i.e. the interrupt has to be enabled, an interrupt service routine has to be registered, etc. |
| The [DIF]({{<relref "#dif" >}}) contains helpers to do so conveniently. |
| |
| Note: This operation sequence only covers functional aspects. |
| Depending on the application additional steps might be necessary, such as deleting secrets from the memories. |
| |
| ## Device Interface Functions (DIFs) {#dif} |
| |
| {{< dif_listing "sw/device/lib/dif/dif_otbn.h" >}} |
| |
| ## Driver {#driver} |
| |
| A higher-level driver for the OTBN block is available at `sw/device/lib/runtime/otbn.h` ([API documentation](/sw/apis/lib_2runtime_2otbn_8h.html)). |
| |
| Another driver for OTBN is part of the silicon creator code at `sw/device/silicon_creator/lib/drivers/otbn.h`. |
| |
| ## Register Table |
| |
| {{< incGenFromIpDesc "../data/otbn.hjson" "registers" >}} |
| |
| # Writing OTBN applications {#writing-otbn-applications} |
| |
| OTBN applications are (small) pieces of software written in OTBN assembly. |
| The full instruction set is described in the [ISA manual]({{< relref "isa" >}}), and example software is available in the `sw/otbn` directory of the OpenTitan source tree. |
| |
| A hands-on user guide to develop OTBN software can be found in the section [Writing and building software for OTBN]({{<relref "doc/ug/otbn_sw.md" >}}). |
| |
| ## Toolchain support |
| |
| OTBN comes with a toolchain consisting of an assembler, a linker, and helper tools such as objdump. |
| The toolchain wraps a RV32 GCC toolchain and supports many of its features. |
| |
| The following tools are available: |
| * `otbn-as`: The OTBN assembler. |
| * `otbn-ld`: The OTBN linker. |
| * `otbn-objdump`: objdump for OTBN. |
| |
| Other tools from the RV32 toolchain can be used directly, such as objcopy. |
| |
| ## Passing of data between the host CPU and OTBN {#writing-otbn-applications-datapassing} |
| |
| Passing data between the host CPU and OTBN is done through the data memory (DMEM). |
| No standard or required calling convention exists, every application is free to pass data in and out of OTBN in whatever format it finds convenient. |
| All data passing must be done when OTBN [is idle](#design-details-operational-states); otherwise both the instruction and the data memory are inaccessible from the host CPU. |
| |
| ## Returning from an application {#writing-otbn-applications-ecall} |
| |
| The software running on OTBN signals completion by executing the {{< otbnInsnRef "ECALL" >}} instruction. |
| |
| Once OTBN has executed the {{< otbnInsnRef "ECALL" >}} instruction, the following things happen: |
| |
| - No more instructions are fetched or executed. |
| - A [secure wipe of internal state](#design-details-secure-wipe-internal) is performed. |
| - The {{< regref "ERR_BITS" >}} register is set to 0, indicating a successful operation. |
| - The current operation is marked as complete by setting {{< regref "INTR_STATE.done" >}} and clearing {{< regref "STATUS" >}}. |
| |
| The DMEM can be used to pass data back to the host processor, e.g. a "return value" or an "exit code". |
| Refer to the section [Passing of data between the host CPU and OTBN]({{<relref "#writing-otbn-applications-datapassing" >}}) for more information. |
| |
| ## Using hardware loops |
| |
| OTBN provides two hardware loop instructions: {{< otbnInsnRef "LOOP" >}} and {{< otbnInsnRef "LOOPI" >}}. |
| |
| ### Loop nesting |
| |
| OTBN permits loop nesting and branches and jumps inside loops. |
| However, it doesn't have support for early termination of loops: there's no way to pop an entry from the loop stack without executing the last instruction of the loop the correct number of times. |
| It can also only pop one level of the loop stack per instruction. |
| |
| To avoid polluting the loop stack and avoid surprising behaviour, the programmer must ensure that: |
| * Even if there are branches and jumps within a loop body, the final instruction of the loop body gets executed exactly once per iteration. |
| * Nested loops have distinct end addresses. |
| * The end instruction of an outer loop is not executed before an inner loop finishes. |
| |
| OTBN does not detect these conditions being violated, so no error will be signaled should they occur. |
| |
| (Note indentation in the code examples is for clarity and has no functional impact.) |
| |
| The following loops are *well nested*: |
| |
| ``` |
| LOOP x2, 3 |
| LOOP x3, 1 |
| ADDI x4, x4, 1 |
| # The NOP ensures that the outer and inner loops end on different instructions |
| NOP |
| |
| # Both inner and outer loops call some_fn, which returns to |
| # the body of the loop |
| LOOP x2, 5 |
| JAL x1, some_fn |
| LOOP x3, 2 |
| JAL x1, some_fn |
| ADDI x4, x4, 1 |
| NOP |
| |
| # Control flow leaves the immediate body of the outer loop but eventually |
| # returns to it |
| LOOP x2, 4 |
| BEQ x4, x5, some_label |
| branch_back: |
| LOOP x3, 1 |
| ADDI x6, x6, 1 |
| NOP |
| |
| some_label: |
| ... |
| JAL x0, branch_back |
| ``` |
| |
| The following loops are not well nested: |
| |
| ``` |
| # Both loops end on the same instruction |
| LOOP x2, 2 |
| LOOP x3, 1 |
| ADDI x4, x4, 1 |
| |
| # Inner loop jumps into outer loop body (executing the outer loop end |
| # instruction before the inner loop has finished) |
| LOOP x2, 5 |
| LOOP x3, 3 |
| ADDI x4, x4 ,1 |
| BEQ x4, x5, outer_body |
| ADD x6, x7, x8 |
| outer_body: |
| SUBI x9, x9, 1 |
| ``` |
| |
| ## Algorithic Examples: Multiplication with BN.MULQACC |
| |
| The big number instruction subset of OTBN generally operates on WLEN bit numbers. |
| {{< otbnInsnRef "BN.MULQACC" >}} operates with WLEN/4 bit operands (with a full WLEN accumulator). |
| This section outlines two techniques to perform larger multiplies by composing multiple {{< otbnInsnRef "BN.MULQACC" >}} instructions. |
| |
| ### Multiplying two WLEN/2 numbers with BN.MULQACC |
| |
| This instruction sequence multiplies the lower half of `w0` by the upper half of |
| `w0` placing the result in `w1`. |
| |
| ``` |
| BN.MULQACC.Z w0.0, w0.2, 0 |
| BN.MULQACC w0.0, w0.3, 64 |
| BN.MULQACC w0.1, w0.2, 64 |
| BN.MULQACC.WO w1, w0.1, w0.3, 128 |
| ``` |
| |
| ### Multiplying two WLEN numbers with BN.MULQACC |
| |
| The shift out functionality can be used to perform larger multiplications without extra adds. |
| The table below shows how two registers `w0` and `w1` can be multiplied together to give a result in `w2` and `w3`. |
| The cells on the right show how the result is built up `a0:a3 = w0.0:w0.3` and `b0:b3 = w1.0:w1.3`. |
| The sum of a column represents WLEN/4 bits of a destination register, where `c0:c3 = w2.0:w2.3` and `d0:d3 = w3.0:w3.3`. |
| Each cell with a multiply in takes up two WLEN/4-bit columns to represent the WLEN/2-bit multiply result. |
| The current accumulator in each instruction is represented by highlighted cells where the accumulator value will be the sum of the highlighted cell and all cells above it. |
| |
| The outlined technique can be extended to arbitrary bit widths but requires unrolled code with all operands in registers. |
| |
| <table> |
| <thead> |
| <tr> |
| <th></th> |
| <th>d3</th> |
| <th>d2</th> |
| <th>d1</th> |
| <th>d0</th> |
| <th>c3</th> |
| <th>c2</th> |
| <th>c1</th> |
| <th>c0</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code>BN.MULQACC.Z w0.0, w1.0, 0</code></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| <td style="background-color: orange"></td> |
| <td style="background-color: orange"></td> |
| <td style="background-color: orange" colspan="2" rowspan="1"><code>a0 * b0</code></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC w0.1, w1.0, 64</code></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| <td style="background-color: orange"></td> |
| <td style="background-color: orange" colspan="2" rowspan="1"><code>a1 * b0</code></td> |
| <td style="background-color: orange"></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC.SO w2.l, w0.0, w1.1, 64</code></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| <td style="background-color: orange"></td> |
| <td style="background-color: orange" colspan="2" rowspan="1"><code>a0 * b1</code></td> |
| <td style="background-color: orange"></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC w0.2, w1.0, 0</code></td> |
| <td></td> |
| <td></td> |
| <td style="background-color: yellow"></td> |
| <td style="background-color: yellow"></td> |
| <td style="background-color: yellow" colspan="2" rowspan="1"><code>a2 * b0</code></td> |
| <td></td> |
| <td></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC w0.1, w1.1, 0</code></td> |
| <td></td> |
| <td></td> |
| <td style="background-color: yellow"></td> |
| <td style="background-color: yellow"></td> |
| <td style="background-color: yellow" colspan="2" rowspan="1"><code>a1 * b1</code></td> |
| <td></td> |
| <td></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC w0.0, w1.2, 0</code></td> |
| <td></td> |
| <td></td> |
| <td style="background-color: yellow"></td> |
| <td style="background-color: yellow"></td> |
| <td style="background-color: yellow" colspan="2" rowspan="1"><code>a0 * b2</code></td> |
| <td></td> |
| <td></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC w0.3, w1.0, 64</code></td> |
| <td></td> |
| <td></td> |
| <td style="background-color: yellow"></td> |
| <td style="background-color: yellow" colspan="2" rowspan="1"><code>a3 * b0</code></td> |
| <td style="background-color: yellow"></td> |
| <td></td> |
| <td></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC w0.2, w1.1, 64</code></td> |
| <td></td> |
| <td></td> |
| <td style="background-color: yellow"></td> |
| <td style="background-color: yellow" colspan="2" rowspan="1"><code>a2 * b1</code></td> |
| <td style="background-color: yellow"></td> |
| <td></td> |
| <td></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC w0.1, w1.2, 64</code></td> |
| <td></td> |
| <td></td> |
| <td style="background-color: yellow"></td> |
| <td style="background-color: yellow" colspan="2" rowspan="1"><code>a1 * b2</code></td> |
| <td style="background-color: yellow"></td> |
| <td></td> |
| <td></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC.SO w2.u, w0.0, w1.3, 64</code></td> |
| <td></td> |
| <td></td> |
| <td style="background-color: yellow"></td> |
| <td style="background-color: yellow" colspan="2" rowspan="1"><code>a0 * b3</code></td> |
| <td style="background-color: yellow"></td> |
| <td></td> |
| <td></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC w0.3, w1.1, 0</code></td> |
| <td style="background-color: olive"></td> |
| <td style="background-color: olive"></td> |
| <td style="background-color: olive" colspan="2" rowspan="1"><code>a3 * b1</code></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC w0.2, w1.2, 0</code></td> |
| <td style="background-color: olive"></td> |
| <td style="background-color: olive"></td> |
| <td style="background-color: olive" colspan="2" rowspan="1"><code>a2 * b2</code></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC w0.1, w1.3, 0</code></td> |
| <td style="background-color: olive"></td> |
| <td style="background-color: olive"></td> |
| <td style="background-color: olive" colspan="2" rowspan="1"><code>a1 * b3</code></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC w0.3, w1.2, 64</code></td> |
| <td style="background-color: olive"></td> |
| <td style="background-color: olive" colspan="2" rowspan="1"><code>a3 * b2</code></td> |
| <td style="background-color: olive"></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC.SO w3.l, w0.2, w1.3, 64</code></td> |
| <td style="background-color: olive"></td> |
| <td style="background-color: olive" colspan="2" rowspan="1"><code>a2 * b3</code></td> |
| <td style="background-color: olive"></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC.SO w3.u, w0.3, w1.3, 0</code></td> |
| <td style="background-color: lightblue" colspan="2" rowspan="1"><code>a3 * b3</code></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| </tr> |
| </tbody> |
| </table> |
| |
| Code snippets giving examples of 256x256 and 384x384 multiplies can be found in `sw/otbn/code-snippets/mul256.s` and `sw/otbn/code-snippets/mul384.s`. |
| |
| # References |
| |
| <a name="ref-chen08">[CHEN08]</a> L. Chen, "Hsiao-Code Check Matrices and Recursively Balanced Matrices," arXiv:0803.1217 [cs], Mar. 2008 [Online]. Available: http://arxiv.org/abs/0803.1217 |