[otbn] Move loop nesting into programmers guide

The "Loop nesting" section in the OTBN documentation is targeting
programmers. Move it from the "Processor state" section to the
"Programmers guide" section.

Signed-off-by: Philipp Wagner <phw@lowrisc.org>
diff --git a/hw/ip/otbn/doc/_index.md b/hw/ip/otbn/doc/_index.md
index f345c07..1604e6c 100644
--- a/hw/ip/otbn/doc/_index.md
+++ b/hw/ip/otbn/doc/_index.md
@@ -361,72 +361,6 @@
 This is a stack of tuples containing a loop count, start address and end address.
 The stack has a maximum depth of eight and the top of the stack is the current loop.
 
-### Loop nesting
-
-OTBN permits loop nesting and branches and jumps inside loops.
-However, it doesn't have support for early termination of loops: there's no way to pop an entry from the loop stack without executing the last instruction of the loop the correct number of times.
-It can also only pop one level of the loop stack per instruction.
-
-To avoid polluting the loop stack or avoid surprising behaviour, the programmer must ensure that:
-* Even if there are branches and jumps within a loop body, the final instruction of the loop body gets executed exactly once per iteration.
-* Nested loops have distinct end addresses.
-* The end instruction of an outer loop is not executed before an inner loop finishes.
-
-OTBN does not detect these conditions being violated, so no error will be signalled should they occur.
-
-(Note indentation in the code examples is for clarity and has no functional impact).
-
-The following loops are *well nested*:
-
-```
-LOOP x2, 3
-  LOOP x3, 1
-    ADDI x4, x4, 1
-  # The NOP ensures that the outer and inner loops end on different instructions
-  NOP
-
-# Both inner and outer loops call some_fn, which returns to
-# the body of the loop
-LOOP x2, 5
-  JAL x1, some_fn
-  LOOP x3, 2
-    JAL x1, some_fn
-    ADDI x4, x4, 1
-  NOP
-
-# Control flow leaves the immediate body of the outer loop but eventually
-# returns to it
-LOOP x2, 4
-  BEQ x4, x5, some_label
-branch_back:
-  LOOP x3, 1
-    ADDI x6, x6, 1
-  NOP
-
-some_label:
-  ...
-  JAL x0, branch_back
-```
-
-The following loops are not well nested:
-
-```
-# Both loops end on the same instruction
-LOOP x2, 2
-  LOOP x3, 1
-    ADDI x4, x4, 1
-
-# Inner loop jumps into outer loop body (executing the outer loop end
-# instruction before the inner loop has finished)
-LOOP x2, 5
-  LOOP x3, 3
-    ADDI x4, x4 ,1
-    BEQ  x4, x5, outer_body
-    ADD  x6, x7, x8
-outer_body:
-  SUBI  x9, x9, 1
-```
-
 # Theory of Operations
 
 ## Block Diagram
@@ -596,6 +530,76 @@
 The DMEM can be used to pass data back to the host processor, e.g. a "return value" or an "exit code".
 Refer to the section [Passing of data between the host CPU and OTBN]({{<relref "#writing-otbn-applications-datapassing" >}}) for more information.
 
+## Using hardware loops
+
+OTBN provides two hardware loop instructions: `LOOP` and `LOOPI`.
+
+### Loop nesting
+
+OTBN permits loop nesting and branches and jumps inside loops.
+However, it doesn't have support for early termination of loops: there's no way to pop an entry from the loop stack without executing the last instruction of the loop the correct number of times.
+It can also only pop one level of the loop stack per instruction.
+
+To avoid polluting the loop stack and avoid surprising behaviour, the programmer must ensure that:
+* Even if there are branches and jumps within a loop body, the final instruction of the loop body gets executed exactly once per iteration.
+* Nested loops have distinct end addresses.
+* The end instruction of an outer loop is not executed before an inner loop finishes.
+
+OTBN does not detect these conditions being violated, so no error will be signalled should they occur.
+
+(Note indentation in the code examples is for clarity and has no functional impact.)
+
+The following loops are *well nested*:
+
+```
+LOOP x2, 3
+  LOOP x3, 1
+    ADDI x4, x4, 1
+  # The NOP ensures that the outer and inner loops end on different instructions
+  NOP
+
+# Both inner and outer loops call some_fn, which returns to
+# the body of the loop
+LOOP x2, 5
+  JAL x1, some_fn
+  LOOP x3, 2
+    JAL x1, some_fn
+    ADDI x4, x4, 1
+  NOP
+
+# Control flow leaves the immediate body of the outer loop but eventually
+# returns to it
+LOOP x2, 4
+  BEQ x4, x5, some_label
+branch_back:
+  LOOP x3, 1
+    ADDI x6, x6, 1
+  NOP
+
+some_label:
+  ...
+  JAL x0, branch_back
+```
+
+The following loops are not well nested:
+
+```
+# Both loops end on the same instruction
+LOOP x2, 2
+  LOOP x3, 1
+    ADDI x4, x4, 1
+
+# Inner loop jumps into outer loop body (executing the outer loop end
+# instruction before the inner loop has finished)
+LOOP x2, 5
+  LOOP x3, 3
+    ADDI x4, x4 ,1
+    BEQ  x4, x5, outer_body
+    ADD  x6, x7, x8
+outer_body:
+  SUBI  x9, x9, 1
+```
+
 ## Algorithic Examples: Multiplication with BN.MULQACC
 
 The big number instruction subset of OTBN generally operates on WLEN bit numbers.