Improve kelvin isa doc
- Improve aconv description
- Note the effect of v48 on acc getting/setting instructions
- Fix a typo in VSLIDEP
Change-Id: Ic4e6934aef3d6f1d43fd475640013dde43ffb40c
diff --git a/docs/kelvin_isa.md b/docs/kelvin_isa.md
index 4b9b06c..5aaac0c 100644
--- a/docs/kelvin_isa.md
+++ b/docs/kelvin_isa.md
@@ -85,6 +85,10 @@
where a stripmine register must use a mod4 base aligned register (eg. v0, v4,
v8, ...). Normal instruction and stripmine variants may be mixed together.
+Currently, neither the assembler nor kelvin_sim checks for invalid stripmine
+registers. Code using invalid registers (like v1) will compile and sim, but
+will cause FPGA to hang.
+
When stripmining is used in conjunction with instructions which use a register
index as a base to several registers, the offset of +4 (instead of +1) shall be
used. e.g., {vm0,vm1} becomes {{v0,v1,v2,v3},{v4,v5,v6,v7}}.
@@ -753,7 +757,7 @@
### ACONV
-Convolution ALU operation.
+Performs matmul vs1*vs3, accumulating into the accumulator.
**Encodings**
@@ -787,29 +791,32 @@
(signed(SData2,Data2{7:0}) + signed(Bias2{8:0})){9:0}){18:0}
```
-Length (stop - start + 1) is in 32bit accumulator lane count, as all inputs will
-horizontally reduce to this size.
+vs1 goes to the *narrow* port of the matmul. 8 vectors are always used.
-The Start and Stop definition allows for a partial window of input values to be
-transpose broadcast into the convolution unit.
+vs3 goes to the *wide* port of the matmul, up to 8 vectors are used.
+
+vx2 specifies control params used in the operation and has the following
+format:
Mode | Mode | Usage
:----: | :--: | :-----------------------------------------------:
Common | | Mode[1:0] Start[6:2] Stop[11:7]
s8 | 0 | SData2[31] SBias2[30:22] SData1[21] SBias1[20:12]
-```
-# SIMD256
-acc.out = {v48..55}
-narrow0 = {v0..7}
-narrow1 = {v16..23}
-narrow2 = {v32..39}
-narrow3 = {v48..55}
-wide0 = {v8..15}
-wide1 = {v24..31}
-wide2 = {v40..47}
-wide3 = {v56..63}
-```
+Start and Stop controls the window of input values to participate in the
+matmul:
+- On vs1 this is in 4-byte words on all 8 vectors at the same time.
+- On vs3 this is the register number to use (vs3+0 to vs3+7).
+- The operation takes (stop - start + 1) ticks to complete.
+
+When using SIMD256, the folling operands are valid:
+- vd: v48
+- vs1: v0, v16, v32, v48
+- vs3: v8, v24, v40, v56
+
+Notes:
+- v48 is used as vd but never written to.
+- v48-v55 will always be overwritten upon VCGET.
### VCGET
@@ -830,6 +837,8 @@
```
+v48 is the only valid vd in this instruction.
+
### ACSET
Copy general registers into convolution accumulators.
@@ -847,6 +856,8 @@
Accum{Y} = vd{Y}
```
+Note that v48 is used as vd but never written to.
+
--------------------------------------------------------------------------------
### ACTR
@@ -860,13 +871,15 @@
**Operation**
```
-assert(vd in {v48})
+assert(vd == 48)
assert(vs1 in {v0, v16, v32, v48}
for I in i32.typelen
for J in i32.typelen
ACCUM[J][I] = vs1[I][J]
```
+Note that v48 is used as vd but never written to.
+
--------------------------------------------------------------------------------
### VCLB
@@ -1813,7 +1826,7 @@
vslidep.[b,h,w].[1,2,3,4].vv vd, vs1, vs2 \
vslidevp.[b,h,w].[1,2,3,4].vv.m vd, vs1, vs2 \
-vslidevp.[b,h,w].[1,2,3,4].vv.m vd, vs1, xs2
+vslidevp.[b,h,w].[1,2,3,4].vx.m vd, vs1, xs2
**Operation**