blob: 8d1918cbf257a9e79086542c00521f5d3dfdf5de [file] [log] [blame] [view]
Miguel Young de la Sota6d335d72019-11-07 13:55:02 -06001---
2title: "RISC-V Assembly Style Guide"
3---
4
5## Basics
6
7### Summary
8
9OpenTitan needs to implement substantial functionality directly in RISC-V assembly.
10This document describes best practices for both assembly `.S` files and inline assembly statements in C and C++.
11It also codifies otherwise unwritten style guidelines in one central location.
12
13This document is not an introduction to RISC-V assembly; for that purpose, see the [RISC-V Assembly Programmer's Manual](https://github.com/riscv/riscv-asm-manual/blob/master/riscv-asm.md).
14
15Assembly is typically very specialized; the following rules do not presume to describe every use-case, so use your best judgement.
16
17This style guide is specialized for R32IMC, the ISA implemented by Ibex.
18As such, no advice is provided for other RISC-V extensions, though this style guide is written such that advice for other extensions could be added without conflicts.
19
20## General Advice
21
22### Register Names
23
24***When referring to a RISC-V register, they must be referred to by their ABI names.***
25
26See the [psABI Reference](https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md#register-convention) for a reference to these names.
27
28Example:
29```S
30 // Correct:
31 li a0, 42
32 // Wrong:
33 li x10, 42
34```
35
36This rule can be ignored when the ABI meaning of a register is unimportant, e.g., such as when clobbering all 31 general-purpose registers.
37
38### Pseudoinstrucions
39
40***When performing an operation for which a pseudoinstruction exists, that pseudoinstruction must be used.***
41
42Pseudoinstructions make RISC-V's otherwise verbose RISC style more readable; for consistency, these must be used where possible.
43
44Example:
45```S
46 // Correct:
47 sw t0, _my_global, t1
48 // Wrong:
49 la t1, _my_global
50 sw t0, 0(t1)
51
52 // Correct:
53 ret
54 // Wrong:
55 jr ra
56```
57
58### Operation-with-immediate mnemonics
59
60***Do not use aliases for opertaion-with-immediate instructions, like `add rd, rs, imm`.***
61
62Assemblers usually recognize instructions like `add t0, t1, 5` as an alias for `addi`. These should be avoided, since they are confusing and a potential source of errors.
63
64Example:
65```S
66 // Correct:
67 addi t0, t1, 0xf
68 ori a0, a0, 0x4
69 // Wrong:
70 add t0, t1, 0xf
71 or a0, a0, 0x4
72```
73
74### Loading Addresses
75
76***Always use `la` to load the address of a symbol; always use `li` to load an address stored in a `#define`.***
77
78Some assemblers allow `la` with an immediate expression instead of a symbol, allowing a form of symbol+offset.
79However, support for this behavior is patchy, and the semantics of PIC `la` with immediate are unclear (in PIC mode, `la` should perform a GOT lookup, not a `pc`-relative load).
80
81### Jumping into C
82
83***Jumping into a C function must be done either with a `call` instruction, or, if that function is marked `noreturn`, a `tail` instruction.***
84
85The RISC-V jump instructions take a "link register", which holds the return address (this should always be `zero` or `ra`), and a small `pc`-relative immediate.
86For jumping to a symbol, there are two user-controlled settings: "near" or "far", and "returnable" (i.e., a link register of `zero` or `ra`).
87The mnemonics for these are:
88- `j sym`, for a near non-returnable jump.
89- `jal sym`, for a near returnable jump.
90- `tail sym`, for a far non-returnable jump (i.e., a non-unwinding tail-call).
91- `call sym`, for a far returnable jump (i.e., function calls).
92
93Far jumps are implemented in the assembler by emiting `auipc` instructions as necessary (since the jump-and-link instruction takes only a small immediate).
94Jumps into C should always be treated as far jumps, and as such use the `call` instruction, unless the C function is marked `noreturn`, in which case `tail` can be used.
95
96Example:
97```S
98 call _syscall_start
99
100 tail _crt0
101```
102
103### Control and Status Register (CSR) Names
104
105***CSRs defined in the RISC-V spec must be refered to by those names (like `mstatus`), while custom non-standard ones must be encapsulated in a `#define`.***
106
107Naturally, if a pseudoinstruction exists to read that CSR (like `rdtime`) that one should be used, instead.
108
109`#define`s for CSRs should be prefixed with `CSR_<design>_`, where `<design>` is the name of the design the CSR corresponds to.
110
111Recognized CSR prefixes:
112- `CSR_IBEX_` - A CSR specific to the Ibex core.
113- `CSR_OT_` - A CSR specific to the OpenTitan chip, beyond the Ibex core.
114
115Example:
116```S
117 csrr t0, mstatus
118
119 #define CSR_OT_HMAC_ENABLED ...
120 csrw CSR_OT_HMAC_ENABLED, 0x1
121```
122
123### Load and Store From Pointer in Register
124
125***When loading and storing from a pointer in a register, prefer to use `n(reg)` shorthand.***
126
127In the case that a pointer is being read without an offset, prefer `0(reg)` over `(reg)`.
128
129```S
130 // Correct:
131 lw t3, 8(sp)
132 sb t3, 0(a0)
133 // Wrong:
134 lw t3, sp, 8
135 sb t3, a0
136```
137
138### Compressed Instruction Mnemonics
139
140***Do not use compressed instruction mnemonics.***
141
142While Ibex implements the RISC-V C extension, it is expected that the toolchain will automatically compress instructions where possible.
143
144Of course, this advice should be ignored when it is necessary to prove that a certain block of instructions does not exceed a particular width.
145
146### "Current Point" Label
147
148***Do not use the current point (`.`) label.***
149
150The current point label does not look like a label, and can be easilly missed during review.
151
152## `.S` Files
153
154This advice applies specifically to `.S` files, as well as globally-scoped assembly in `.c` and `.cc` files.
155
156While this is is already implicit, we only use the `.S` extension for assembly files; not `.s` or `.asm`.
157
158### Indentation
159
160Assembly files must be formatted with all directives indented two spaces, except for labels.
161Comments should be indented as usual.
162
163There is no mandated requirement on aligning instruction operands.
164
165Example:
166```S
167_trap_start:
168 .globl _trap_start
169 csrr a0, mcause
170 sw x1, 0(sp)
171 sw x2, 4(sp)
172 // ...
173```
174
175### Comments
176
177***Comments must use either the `//` or `/* */` syntaxes.***
178
179Every function-like label which is meant to be called like a function (*especially* `.globl`s) should be given a Doxygen-style comment.
180While Doxygen is not suited for assembly, that style should be used for consistency.
181See the [C/C++ style guide]({{< relref "c_cpp_coding_style" >}}) for more information.
182
183All other advice for writing comments, as in the C/C++ style guide, also applies.
184
185### Register useage
186
187***Register usage in a "function" that diverges from the RISC-V function call ABI must be documented.***
188
189This includes non-standard calling conventions, non-standard clobbers, and other behavior not expected of a well-behaved RISC-V function.
190Non-standard input and output registers should use Doxygemn's `param[in] reg` and
191`param[out] reg` annotations, respectively.
192
193Within a function, whether or not it conforms to RISC-V's calling convention, comments should be present to describe the asassignment of logical values to registers.
194
195Example:
196```S
197/**
198 * Compute some stuff, outputing a 96-bit integer.
199 *
200 * @param[out] a0 bits [31:0] of the result.
201 * @param[out] a1 bits [63:32] of the result.
202 * @param[out] a2 bits [95:64] of the result.
203 */
204compute_stuff:
205 .globl compute_stuff
206 // a0 is to be used as an accumulator, which will be returned as-is.
207 li a0, 0xdeadbeef
208 // t0 is a loop variable.
209 li t0, 0x0
2101:
211 // ...
212 bnez t0, 1b
213
214 li a1, 0xbeefcafe
215 li a2, 0xcafedead
216 ret
217```
218
219### Ending an Instruction Sequence
220
221***Every code path within an assembly file must end in a non-linking jump.***
222
223Assembly should be written such that the program counter can't wander off past the written instructions.
224As such, all assembly should be ended with `ret` (or any of the protection ring returns like `mret`), an infinite `wfi` loop, or an instruction that is guaranteed to trap and not return, like an `exit`-like syscall or `unimp`.
225
226Example:
227```S
228loop_forever:
229 wfi
230 j loop_forever
231```
232
233### Alignment Directives
234
235***Do not use `.align`; use `.p2align` and `.balign` as the situation requires.***
236
237The exact meaning of `.align` depends on architecture; rather than asking readers to second-guess themselves, use alignment directives with strongly-typed arguments.
238
239Example:
240```S
241 // Correct:
242 .balign 8 // 8-byte aligned.
243 tail _magic_symbol
244
245 // Wrong:
246 .align 8 // Is this 8-byte aligned, or 256-byte aligned?
247 tail _magic_symbol
248```
249
250### Inline Binary Directives
251
252***Always use `.byte`/`.2byte`/`.4byte`/`.8byte` for inline binary data.***
253
254`.word`, `.long`, and friends are confusing, for the same reason `.align` is.
255
256## Inline Assembly
257
258This advice applies to function-scope inline assembly in `.c` and `.cc` files.
259For an introduction on this syntax, check out [GCC's documentation](https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html).
260
261### When to Use
262
263***Avoid inline assembly as much as possible, as long as correctness and readability are not impacted.***
264
265Inline assembly is best reserved for when a high-level language cannot express what we need to do, such as expressing complex control flow or talking to the hardware.
266If a compiler intrinsic can achieve the same effect, such as `__builtin_clz()`, then that should be used instead.
267
268> The compiler is *always* smarter than you; only in the rare case where it is not, assembly should be used instead.
269
270### Formatting
271
272Inline assembly statements must conform to the following formatting requirements, which are chosen to closely resemble how Google's clang-format rules format function calls.
273- Neither the `asm` or `__asm__` keyword is specified in C; the former must be used, and should be `#define`d into existence if not supported by the compiler.
274 C++ specifies `asm` to be part of the grammar, and should be used exclusively.
275- There should not be a space after the `asm` qualfiers and the opening parentheses:
276 ```c
277 asm(...);
278 asm volatile(...);
279 ```
280- Single-instruction `asm` statements should be written on one line, if possible:
281 ```c
282 asm volatile("wfi");
283 ```
284- Multiple-instruction `asm` statements should be written with one instruction per line, formatted as follows:
285 ```c
286 asm volatile(
287 "my_label:"
288 " la sp, _stack_start;"
289 " tail _crt0;"
290 ::: "memory");
291 ```
292- The colons separating register constraints should be surrounded with spaces, unless there are no constraints between them, in which case they should be adjacent.
293 ```c
294 asm("..." : "=a0"(foo) :: "memory");
295 ```
296
297### Non-returning `asm`
298
299***Functions with non-returning `asm` must be marked as `noreturn`.***
300
301C and C++ compilers are, in general, not supposed to introspect `asm` blocks, and as such cannot determine that they never return.
302Functions marked as never returning should end in `__builtin_unreachable()`, which the compiler will usually turn into an `unimp`.
303