Where possible, OTBN assembly should follow the same principles as the [RISC-V assembly style guide]({{< relref “asm_coding_style” >}}). This guide describes additional OTBN-specific guidelines, and places where OTBN assembly can or should diverge from the RISC-V style guidelines.
There's no ABI, so OTBN registers cannot be referred to by ABI names.
Lower case for mnemonics and registers.
Example:
/* Correct */ bn.addi w3, w3, 2 /* Wrong */ BN.ADDI W3, W3, 2
Upper case for flags, flag groups, and half word identifiers in multiplication instructions. Example:
/* Correct */ bn.mulqacc.so w27.L, w30.0, w25.1, 64 bn.sel w3, w4, w2, FG1.C /* Wrong */ bn.mulqacc.so w27.l, w30.0, w25.1, 64 bn.sel w3, w4, w2, fg1.c
As in RISC-V, prefer pseudoinstructions (e.g. ret
, li
) in OTBN code where they exist. However, loop
and loopi
instructions require counting the number of instructions in the loop body, which can be tricky for pseudoinstructions that can expand to multiple instructions. (For example, li
can expand to either 1 or 2 instructions depending on the immediate value.) Therefore, it is permitted to avoid pseudoinstructions that can expand to multiple instructions within loop bodies.
Generally, OTBN routines should be written so that they finish in ret
. Use of ecall
should be restricted to thin wrapper files which serve as an interface for the more substantial assembly files, usually just reading one or two arguments and then calling subroutines from the other files.
The //
syntax is not currently available for OTBN because it is not supported by riscv32-unknown-as
. Use the /* */
syntax instead.
It is not required to use an L_
prefix for internal labels. Labels should not be indented.
Operands should be aligned within blocks of OTBN code.
The exact spacing betweeen mnemonic and operand is not important, as long as it is consistent within the block.
Example:
/* Correct */ bn.add w4, w7, w10 bn.addc w5, w8, w11 bn.sel w2, w2, w3, C /* Correct */ bn.add w4, w7, w10 bn.addc w5, w8, w11 bn.sel w2, w2, w3, C /* Wrong */ bn.add w4, w7, w10 bn.addc w5, w8, w11 bn.sel w2, w2, w3, C
Always document which registers and flag groups are clobbered by an OTBN “function”, and which flags have meaning.
OTBN subroutines should always document the registers and flag groups whose values they overwrite, including those used for output. In addition to Doxygen-style @param[in]
and @param[out]
notations, OTBN subroutines should also state whether the flags have meaning at the end of the subroutine. If a subroutine jumps to another subroutine that clobbers additional registers or flag groups, these additional names should be added to the caller's list.
Example:
/** * Compute the sum of two 2048-bit numbers. * * Returns C = A + B. * * Flags: When leaving this subroutine, the C flag of FG0 depends on the * addition. * * @param[in] [w10:w3]: A, first 2048-bit operand * @param[in] [w18:w11]: B, second 2048-bit operand * @param[in] w31: all-zero * @param[out] [w26:w19]: C, result * * clobbered registers: x9 to x13, w19 to w28 * clobbered flag groups: FG0 */ wide_sum: /* Prepare temporary registers. */ li x9, 3 li x10, 11 li x11, 19 li x12, 27 li x13, 28 /* Clear flags. */ bn.add w31, w31, 0 /* Addition loop. */ loopi 8, 4 /* w27 <= A[i] */ bn.movr x12, x9++ /* w28 <= B[i] */ bn.movr x13, x10++ /* w27 <= w27 + w28 */ bn.addc w27, w27, w28 /* C[i] <= w27 */ bn.movr x11++, x12 ret
Big-number load/store instructions (bn.lid
, bn.sid
) require DMEM addresses to be 256-bit aligned; use .balign 32
for any big-number inline binary data to ensure this. For non-big-number data, .balign 4
(32-bit alignment) is recommended.
Prefer .word
over alternatives such as .quad
.
The following guidelines address cryptography-specific concerns for OTBN assembly.
Prefer bn.mov <wrd>, <wrs>
to bn.addi <wrd>, <wrs>, 0
when copying data between registers. Because bn.addi
passes data through the ALU, it will strip off integrity protection, while bn.mov
will copy the integrity protection bits.
In cryptographic code, subroutines should always document whether they run in constant or variable time.
In situations where constant- and variable- time code is mixed, it is recommended to name variable-time subroutines something that makes it clear they are variable-time, such as a name that ends in _var
. If a piece of code is constant-time with respect to some inputs but not others (e.g. with respect to the MOD
register but not to the operands), this should be documented.
Example:
/** * Determine if two 2048-bit operands are equal. * * Returns 1 if A = B, otherwise 0. * * This routine runs in constant time. * * Flags: Flags have no meaning beyond the scope of this subroutine. * * @param[in] [w10:w3]: A, first 2048-bit operand * @param[in] [w18:w11]: B, second 2048-bit operand * @param[in] w31: all-zero * @param[out] x3: result, 1 or 0 * * clobbered registers: x2, x3, x6, x9 to x12 * clobbered flag groups: FG0 */ eq_2048: /* Prepare temporary registers. */ li x9, 3 li x10, 11 li x11, 27 li x12, 28 /* Check if all limbs are equal. */ li x3, 1 loopi 8, 4 /* w27 <= A[i] */ bn.movr x11, x9++ /* w28 <= B[i] */ bn.movr x12, x10++ /* x2 <= (w27 == w28) */ jal x1, eq_256 /* x3 <= x3 & x2 */ and x3, x2, x2 ret /** * Determine if two 2048-bit operands are equal. * * Returns 1 if A = B, otherwise 0. * * This routine runs in variable time. * * Flags: Flags have no meaning beyond the scope of this subroutine. * * @param[in] [w10:w3]: A, first 2048-bit operand * @param[in] [w18:w11]: B, second 2048-bit operand * @param[in] w31: all-zero * @param[out] x2: result, 1 or 0 * * clobbered registers: x2, x6, x9 to x12 * clobbered flag groups: FG0 */ eq_2048_var: /* Prepare temporary registers. */ li x9, 3 li x10, 11 li x11, 27 li x12, 28 /* Check if all limbs are equal. */ li x2, 1 loopi 8, 5 /* If x2 is 0, skip to the end of the loop. (It's still necessary to complete the same number of loop iterations to avoid polluting the loop stack, but we can skip all instructions except the last.) */ beq x2, x0, loop_end /* w27 <= A[i] */ bn.movr x11, x9++ /* w28 <= B[i] */ bn.movr x12, x10++ /* x2 <= (w27 == w28) */ jal x1, eq_256 loop_end: nop ret