| --- |
| title: OpenTitan Big Number Accelerator (OTBN) Technical Specification |
| --- |
| |
| <div class="bd-callout bd-callout-warning"> |
| <h5>Note on the status of this document</h5> |
| |
| **This specification is work in progress and will see significant changes before it can be considered final.** |
| We invite input of all kind through the standard means of the OpenTitan project; a good starting point is filing an issue in our [GitHub issue tracker](https://github.com/lowRISC/opentitan/issues). |
| </div> |
| |
| # Overview |
| |
| This document specifies functionality of the OpenTitan Big Number Accelerator, or OTBN. |
| OTBN is a coprocessor for asymmetric cryptographic operations like RSA or Elliptic Curve Cryptography (ECC). |
| |
| This module conforms to the [Comportable guideline for peripheral functionality]({{< relref "doc/rm/comportability_specification" >}}). |
| See that document for integration overview within the broader top level system. |
| |
| ## Features |
| |
| * Processor optimized for wide integer arithmetic |
| * 32b wide control path with 32 32b wide registers |
| * 256b wide data path with 32 256b wide registers |
| * Full control-flow support with conditional branch and unconditional jump instructions, hardware loops, and hardware-managed call/return stacks. |
| * Reduced, security-focused instruction set architecture for easier verification and the prevention of data leaks. |
| * Built-in access to random numbers. |
| Note: The (quality) properties of the provided random numbers are not currently specified; this gap in the specification will be addressed in a future revision. |
| |
| ## Description |
| |
| OTBN is a processor, specialized for the execution of security-sensitive asymmetric (public-key) cryptography code, such as RSA or ECC. |
| Such algorithms are dominated by wide integer arithmetic, which are supported by OTBN's 256b wide data path, registers, and instructions which operate these wide data words. |
| On the other hand, the control flow is clearly separated from the data, and reduced to a minimum to avoid data leakage. |
| |
| The data OTBN processes is security-sensitive, and the processor design centers around that. |
| The design is kept as simple as possible to reduce the attack surface and aid verification and testing. |
| For example, no interrupts or exceptions are included in the design, and all instructions are designed to be executable within a single cycle. |
| |
| OTBN is designed as a self-contained co-processor with its own instruction and data memory, which is accessible as a bus device. |
| |
| ## Compatibility |
| |
| OTBN is not designed to be compatible with other cryptographic accelerators. |
| It received some inspiration from assembly code available from the [Chromium EC project](https://chromium.googlesource.com/chromiumos/platform/ec/), |
| which has been formally verified within the [Fiat Crypto project](http://adam.chlipala.net/papers/FiatCryptoSP19/FiatCryptoSP19.pdf). |
| |
| # Instruction Set |
| |
| OTBN is a processor with a custom instruction set. |
| The full ISA description can be found in our [ISA manual]({{< relref "hw/ip/otbn/doc/isa" >}}). |
| The instruction set is split into two groups: |
| |
| * The **base instruction subset** operates on the 32b General Purpose Registers (GPRs). |
| Its instructions are used for the control flow of a OTBN application. |
| The base instructions are inspired by RISC-V’s RV32I instruction set, but not compatible with it. |
| * The **big number instruction subset** operates on 256b Wide Data Registers (WDRs). |
| Its instructions are used for data processing. |
| |
| ## Processor State |
| |
| ### General Purpose Registers (GPRs) |
| |
| OTBN has 32 General Purpose Registers (GPRs). |
| Each GPR is 32b wide. |
| General Purpose Registers in OTBN are mainly used for control flow. |
| The GPRs are defined in line with RV32I. |
| |
| Note: GPRs and Wide Data Registers (WDRs) are separate register files. |
| They are only accessible through their respective instruction subset: |
| GPRs are accessible from the base instruction subset, and WDRs are accessible from the big number instruction subset (`BN` instructions). |
| |
| <table> |
| <tr> |
| <td><code>x0</code></td> |
| <td><strong>Zero</strong>. Always reads 0. Writes are ignored.</td> |
| </tr> |
| <tr> |
| <td><code>x1</code></td> |
| <td> |
| <strong>Return address</strong>. |
| Access to the call stack. |
| Reading <code>x1</code> pops an address from the call stack. |
| Writing <code>x1</code> pushes a return address to the call stack. |
| Reading from an empty call stack results in an alert. |
| </td> |
| </tr> |
| <tr> |
| <td><code>x2</code></td> |
| <td><strong>General Purpose Register 2</strong>.</td> |
| </tr> |
| <tr> |
| <td>...</td> |
| <td></td> |
| </tr> |
| <tr> |
| <td><code>x31</code></td> |
| <td><strong>General Purpose Register 31</strong>.</td> |
| </tr> |
| </table> |
| |
| Note: Currently, OTBN has no "standard calling convention," and GPRs except for `x0` and `x1` can be used for any purpose. |
| If, at one point, a calling convention is needed, it is expected to be aligned with the RISC-V standard calling conventions, and the roles assigned to registers in that convention. |
| Even without a agreed-on calling convention, software authors are encouraged to follow the RISC-V calling convention where it makes sense. |
| For example, good choices for temporary registers are `x6`, `x7`, `x28`, `x29`, `x30`, and `x31`. |
| |
| ### Control and Status Registers (CSRs) |
| |
| Control and Status Registers (CSRs) are 32b wide registers used for "special" purposes, as detailed in their description; |
| they are not related to the GPRs. |
| CSRs can be accessed through dedicated instructions, `CSRRS` and `CSRRW`. |
| |
| <table> |
| <thead> |
| <tr> |
| <th>Number</th> |
| <th>Privilege</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td>0x7C0</td> |
| <td>RW</td> |
| <td> |
| <strong>FLAGS</strong>. |
| Wide arithmetic flags. |
| This CSR provides access to the flags used in wide integer arithmetic. |
| <table> |
| <thead> |
| <tr><th>Bit</th><th>Description</th></tr> |
| </thead> |
| <tbody> |
| <tr><td>0</td><td>Carry of Flag Group 0</td></tr> |
| <tr><td>1</td><td>MSb of Flag Group 0</td></tr> |
| <tr><td>2</td><td>LSb of Flag Group 0</td></tr> |
| <tr><td>3</td><td>Zero of Flag Group 0</td></tr> |
| <tr><td>4</td><td>Carry of Flag Group 1</td></tr> |
| <tr><td>5</td><td>MSb of Flag Group 1</td></tr> |
| <tr><td>6</td><td>LSb of Flag Group 1</td></tr> |
| <tr><td>7</td><td>Zero of Flag Group 1</td></tr> |
| </tbody> |
| </table> |
| </td> |
| </tr> |
| <tr> |
| <td>0x7D0</td> |
| <td>RW</td> |
| <td> |
| <strong>MOD0</strong>. |
| Bits [31:0] of the modulus operand, used in the BN.ADDM/BN.SUBM instructions. |
| This CSR is mapped to the MOD WSR. |
| </td> |
| </tr> |
| <tr> |
| <td>0x7D1</td> |
| <td>RW</td> |
| <td> |
| <strong>MOD1</strong>. |
| Bits [63:32] of the modulus operand, used in the BN.ADDM/BN.SUBM instructions. |
| This CSR is mapped to the MOD WSR. |
| </td> |
| </tr> |
| <tr> |
| <td>0x7D2</td> |
| <td>RW</td> |
| <td> |
| <strong>MOD2</strong>. |
| Bits [95:64] of the modulus operand, used in the BN.ADDM/BN.SUBM instructions. |
| This CSR is mapped to the MOD WSR. |
| </td> |
| </tr> |
| <tr> |
| <td>0x7D3</td> |
| <td>RW</td> |
| <td> |
| <strong>MOD3</strong>. |
| Bits [127:96] of the modulus operand, used in the BN.ADDM/BN.SUBM instructions. |
| This CSR is mapped to the MOD WSR. |
| </td> |
| </tr> |
| <tr> |
| <td>0x7D4</td> |
| <td>RW</td> |
| <td> |
| <strong>MOD4</strong>. |
| Bits [159:128] of the modulus operand, used in the BN.ADDM/BN.SUBM instructions. |
| This CSR is mapped to the MOD WSR. |
| </td> |
| </tr> |
| <tr> |
| <td>0x7D5</td> |
| <td>RW</td> |
| <td> |
| <strong>MOD5</strong>. |
| Bits [191:160] of the modulus operand, used in the BN.ADDM/BN.SUBM instructions. |
| This CSR is mapped to the MOD WSR. |
| </td> |
| </tr> |
| <tr> |
| <td>0x7D6</td> |
| <td>RW</td> |
| <td> |
| <strong>MOD6</strong>. |
| Bits [223:192] of the modulus operand, used in the BN.ADDM/BN.SUBM instructions. |
| This CSR is mapped to the MOD WSR. |
| </td> |
| </tr> |
| <tr> |
| <td>0x7D7</td> |
| <td>RW</td> |
| <td> |
| <strong>MOD7</strong>. |
| Bits [255:224] of the modulus operand, used in the BN.ADDM/BN.SUBM instructions. |
| This CSR is mapped to the MOD WSR. |
| </td> |
| </tr> |
| <tr> |
| <td>0xFC0</td> |
| <td>R</td> |
| <td> |
| <strong>RND</strong>. |
| A random number. |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| |
| ### Wide Data Registers (WDRs) |
| |
| In addition to the 32b wide GPRs, OTBN has a second "wide" register file, which is used by the big number instruction subset. |
| This register file consists of NWDR = 32 Wide Data Registers (WDRs). |
| Each WDR is WLEN = 256b wide. |
| |
| Wide Data Registers (WDRs) and the 32b General Purpose Registers (GPRs) are separate register files. |
| They are only accessible through their respective instruction subset: |
| GPRs are accessible from the base instruction subset, and WDRs are accessible from the big number instruction subset (`BN` instructions). |
| |
| | Register | |
| |----------| |
| | w0 | |
| | w1 | |
| | ... | |
| | w31 | |
| |
| ### Wide Special Purpose Registers (WSRs) |
| |
| In addition to the Wide Data Registers, BN instructions can also access WLEN-sized special purpose registers, short WSRs. |
| |
| <table> |
| <thead> |
| <tr> |
| <th>Number</th> |
| <th>Privilege</th> |
| <th>Description</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td>0x0</td> |
| <td>RW</td> |
| <td> |
| <strong>MOD</strong> |
| Modulus. |
| To be used in the BN.ADDM and BN.SUBM instructions. |
| This WSR is mapped to the MOD0 to MOD7 CSRs. |
| </td> |
| </tr> |
| <tr> |
| <td>0x1</td> |
| <td>R</td> |
| <td> |
| <strong>RND</strong> |
| A random number. |
| </td> |
| </tr> |
| <tr> |
| <td>0x2</td> |
| <td>RW</td> |
| <td> |
| <strong>ACC</strong> |
| MAC Accumulator. |
| This gives direct access to the accumulator register used by the BN.MULQACC instruction. |
| </td> |
| </tr> |
| </tbody> |
| </table> |
| |
| ### Flags |
| |
| In addition to the wide register file, OTBN maintains global state in two groups of flags for the use by wide integer operations. |
| Flag groups are named Flag Group 0 (`FG0`), and Flag Group 1 (`FG1`). |
| Each group consists of four flags. |
| Each flag is a single bit. |
| |
| - `C` (Carry flag). |
| Set to 1 an overflow occurred in the last arithmetic instruction. |
| |
| - `L` (LSb flag). |
| The least significant bit of the result of the last arithmetic or shift instruction. |
| |
| - `M` (MSb flag) |
| The most significant bit of the result of the last arithmetic or shift instruction. |
| |
| - `Z` (Zero Flag) |
| Set to 1 if the result of the last operation was zero; otherwise 0. |
| |
| The `L`, `M`, and `Z` flags are determined based on the result of the operation as it is written back into the result register, without considering the overflow bit. |
| |
| ### Loop Stack |
| |
| The LOOP instruction allows for nested loops; the active loops are stored on the loop stack. |
| Each loop stack entry is a tuple of loop count, start address, and end address. |
| The number of entries in the loop stack is implementation-dependent. |
| |
| ### Call Stack |
| |
| A stack (LIFO) of function call return addresses (also known as "return address stack"). |
| The number of entries in this stack is implementation-dependent. |
| |
| The call stack is accessed through the `x1` GPR (return address). |
| Writing to `x1` pushes to the call stack, reading from it pops an item. |
| |
| ### Accumulator |
| |
| A WLEN bit wide accumulator used by the BN.MULQACC instruction. |
| |
| # Theory of Operations |
| |
| ## Block Diagram |
| |
|  |
| |
| ## Hardware Interfaces |
| |
| {{< hwcfg "hw/ip/otbn/data/otbn.hjson" >}} |
| |
| ## Design Details |
| |
| <div class="bd-callout bd-callout-warning"> |
| <h5>Note</h5> |
| |
| To be filled in as we create the implementation. |
| </div> |
| |
| By design, OTBN is a simple processor and has essentially no error handling support. |
| When anything goes wrong (an out-of-bounds memory operation, an invalid instruction encoding, etc.), OTBN will stop fetching instructions, and set the `ERR_CODE` register and the `err` bit of the `INTR_STATE` register. |
| |
| # Programmers Guide |
| |
| <div class="bd-callout bd-callout-warning"> |
| <h5>Note</h5> |
| |
| This section will be written as we move on in the design and implementation process. |
| </div> |
| |
| ## Memories |
| |
| The OTBN processor core has access to two dedicated memories: |
| an instruction memory (IMEM), and a data memory (DMEM). |
| Each memory is 4 kiB in size. |
| |
| The memory layout follows the Harvard architecture. |
| Both memories are byte-addressed, with addresses starting at 0. |
| |
| The instruction memory (IMEM) is 32b wide and provides the instruction stream to the OTBN processor; |
| it cannot be read or written from user code through load or store instructions. |
| |
| The data memory (DMEM) is 256b wide and read-write accessible from the base and big number instruction subsets of the OTBN processor core. |
| When accessed from the base instruction subset through the `LW` or `SW` instructions, accesses must read or write 32b-aligned 32b words. |
| When accessed from the big number instruction subset through the `BN.LID` or `BN.SID` instructions, accesses must read or write 256b-aligned 256b words. |
| |
| Both memories can be accessed through OTBN's register interface ({{< regref "DMEM" >}} and {{< regref "IMEM" >}}) only when OTBN is idle, as indicated by the {{< regref "STATUS.busy">}} flag. |
| All memory accesses through the register interface must be word-aligned 32b word accesses. |
| |
| ## Operation |
| |
| <div class="bd-callout bd-callout-warning"> |
| <h5>Note</h5> |
| |
| The exact sequence of operations is not yet finalized. |
| </div> |
| |
| Rough expected process: |
| |
| * Write {{< regref "IMEM" >}} |
| * Write {{< regref "DMEM" >}} |
| * Write `1` to {{< regref "CMD.start" >}} |
| * Wait for `done` interrupt |
| * Retrieve results by reading {{< regref "DMEM" >}} |
| |
| ## Error conditions |
| |
| <div class="bd-callout bd-callout-warning"> |
| <h5>Note</h5> |
| |
| To be filled in as we create the implementation. |
| </div> |
| |
| ## Register Table |
| |
| {{< registers "hw/ip/otbn/data/otbn.hjson" >}} |
| |
| ## Algorithic Example: Replacing BN.MULH with BN.MULQACC |
| |
| This specification gives the implementers the option to provide either a quarter-word multiply-accumulate instruction, `BN.MULQADD`, or a half-word multiply instruction, `BN.MULH`. |
| Four `BN.MULQACC` can be used to replace one `BN.MULH` instruction, which is able to operate on twice the data size. |
| |
| `BN.MULH w1, w0.l, w0.u` becomes |
| |
| ``` |
| BN.MULQACC.Z w0.0, w0.2, 0 |
| BN.MULQACC w0.0, w0.3, 64 |
| BN.MULQACC w0.1, w0.2, 64 |
| BN.MULQACC.WO r1, w0.1, w0.3, 128 |
| ``` |
| |
| ## Algorithmic Example: Multiplying two WLEN numbers with BN.MULQACC |
| |
| The big number instruction subset of OTBN generally operates on WLEN bit numbers. |
| However, the multiplication instructions only operate on half or quarter-words of WLEN bit. |
| This section outlines a technique to multiply two WLEN-bit numbers with the use of the quarter-word multiply-accumulate instruction `BN.MULQACC`. |
| |
| The shift out functionality can be used to perform larger multiplications without extra adds. |
| The table below shows how two registers `w0` and `w1` can be multiplied together to give a result in `w2` and `w3`. |
| The cells on the right show how the result is built up `a0:a3 = w0.0:w0.3` and `b0:b3 = w1.0:w1.3`. |
| The sum of a column represents WLEN/4 bits of a destination register, where `c0:c3 = w2.0:w2.3` and `d0:d3 = w3.0:w3.3`. |
| Each cell with a multiply in takes up two WLEN/4-bit columns to represent the WLEN/2-bit multiply result. |
| The current accumulator in each instruction is represented by highlighted cells where the accumulator value will be the sum of the highlighted cell and all cells above it. |
| |
| The outlined technique can be extended to arbitrary bit widths but requires unrolled code with all operands in registers. |
| |
| <table> |
| <thead> |
| <tr> |
| <th></th> |
| <th>d3</th> |
| <th>d2</th> |
| <th>d1</th> |
| <th>d0</th> |
| <th>c3</th> |
| <th>c2</th> |
| <th>c1</th> |
| <th>c0</th> |
| </tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><code>BN.MULQACC.Z w0.0, w1.0, 0</code></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| <td style="background-color: orange"></td> |
| <td style="background-color: orange"></td> |
| <td style="background-color: orange" colspan="2" rowspan="1"><code>a0 * b0</code></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC w0.1, w1.0, 64</code></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| <td style="background-color: orange"></td> |
| <td style="background-color: orange" colspan="2" rowspan="1"><code>a1 * b0</code></td> |
| <td style="background-color: orange"></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC.SO w2.l, w0.0, w1.1, 64</code></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| <td style="background-color: orange"></td> |
| <td style="background-color: orange" colspan="2" rowspan="1"><code>a0 * b1</code></td> |
| <td style="background-color: orange"></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC w0.2, w1.0, 0</code></td> |
| <td></td> |
| <td></td> |
| <td style="background-color: yellow"></td> |
| <td style="background-color: yellow"></td> |
| <td style="background-color: yellow" colspan="2" rowspan="1"><code>a2 * b0</code></td> |
| <td></td> |
| <td></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC w0.1, w1.1, 0</code></td> |
| <td></td> |
| <td></td> |
| <td style="background-color: yellow"></td> |
| <td style="background-color: yellow"></td> |
| <td style="background-color: yellow" colspan="2" rowspan="1"><code>a1 * b1</code></td> |
| <td></td> |
| <td></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC w0.0, w1.2, 0</code></td> |
| <td></td> |
| <td></td> |
| <td style="background-color: yellow"></td> |
| <td style="background-color: yellow"></td> |
| <td style="background-color: yellow" colspan="2" rowspan="1"><code>a0 * b2</code></td> |
| <td></td> |
| <td></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC w0.3, w1.0, 64</code></td> |
| <td></td> |
| <td></td> |
| <td style="background-color: yellow"></td> |
| <td style="background-color: yellow" colspan="2" rowspan="1"><code>a3 * b0</code></td> |
| <td style="background-color: yellow"></td> |
| <td></td> |
| <td></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC w0.2, w1.1, 64</code></td> |
| <td></td> |
| <td></td> |
| <td style="background-color: yellow"></td> |
| <td style="background-color: yellow" colspan="2" rowspan="1"><code>a2 * b1</code></td> |
| <td style="background-color: yellow"></td> |
| <td></td> |
| <td></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC w0.1, w1.2, 64</code></td> |
| <td></td> |
| <td></td> |
| <td style="background-color: yellow"></td> |
| <td style="background-color: yellow" colspan="2" rowspan="1"><code>a1 * b2</code></td> |
| <td style="background-color: yellow"></td> |
| <td></td> |
| <td></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC.SO w2.u, w0.0, w1.3, 64</code></td> |
| <td></td> |
| <td></td> |
| <td style="background-color: yellow"></td> |
| <td style="background-color: yellow" colspan="2" rowspan="1"><code>a0 * b3</code></td> |
| <td style="background-color: yellow"></td> |
| <td></td> |
| <td></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC w0.3, w1.1, 0</code></td> |
| <td style="background-color: olive"></td> |
| <td style="background-color: olive"></td> |
| <td style="background-color: olive" colspan="2" rowspan="1"><code>a3 * b1</code></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC w0.2, w1.2, 0</code></td> |
| <td style="background-color: olive"></td> |
| <td style="background-color: olive"></td> |
| <td style="background-color: olive" colspan="2" rowspan="1"><code>a2 * b2</code></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC w0.1, w1.3, 0</code></td> |
| <td style="background-color: olive"></td> |
| <td style="background-color: olive"></td> |
| <td style="background-color: olive" colspan="2" rowspan="1"><code>a1 * b3</code></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC w0.3, w1.2, 64</code></td> |
| <td style="background-color: olive"></td> |
| <td style="background-color: olive" colspan="2" rowspan="1"><code>a3 * b2</code></td> |
| <td style="background-color: olive"></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC.SO w3.l, w0.2, w1.3, 64</code></td> |
| <td style="background-color: olive"></td> |
| <td style="background-color: olive" colspan="2" rowspan="1"><code>a2 * b3</code></td> |
| <td style="background-color: olive"></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| </tr> |
| <tr> |
| <td><code>BN.MULQACC.SO w3.u, w0.3, w1.3, 0</code></td> |
| <td style="background-color: lightblue" colspan="2" rowspan="1"><code>a3 * b3</code></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| <td></td> |
| </tr> |
| </tbody> |
| </table> |
| |
| Code snippets giving examples of 256x256 and 384x384 multiplies can be found in `sw/otbn/code-snippets/mul256.s` and `sw/otbn/code-snippets/mul384.s`. |