blob: 898daf5c38a9456c5023de49e8a1f6ae615d39f1 [file] [log] [blame] [view]
{{% lowrisc-doc-hdr SPI Device HWIP Technical Specification }}
{{% regfile ../data/spi_device.hjson }}
{{% section1 Overview }}
{{% toc 3 }}
{{% section2 Features }}
- Single-bit wide SPI device interface implementing a raw data transfer protocol
termed "firmware mode"
- No address bits, data is sent and received from peripheral pins to/from an
internal buffer
- Intended to be used to bulk-load data into and out of the chip
- Not intended to support EEPROM or other addressing modes (functionality to
come in later versions)
- Supports clock polarity and reverse bit order configurations
- Flexible RX/TX Buffer size within an SRAM range
- Interrupts for RX/TX SRAM FIFO conditions (empty, full, designated level for
RX, TX)
{{% section2 Description }}
The SPI device module is a serial-to-parallel receive (RX) and
parallel-to-serial transmit (TX) full duplex design (single line mode) to communicate
with an outside host. This first version of the module supports operations
controlled by firmware to dump incoming single-line RX data (MOSI) to an
internal RX buffer, and send data from a transmit buffer to single-line TX
output (MISO). The clock for the peripheral data transfer uses the SPI
peripheral pin SCK. In this design the SCK is directly used to drive the
interface logic as its primary clock, which has performance benefits, but incurs
design complications described later.
{{% section2 Compatibility }}
The SPI device doesn't support emulating an EEPROM as of this initial version.
This version is mostly compatible with the Haven SPI Slave Generic Mode design.
{{% section1 Theory of Operations }}
{{% section2 General Data Transfer on Pins }}
Data transfers with the SPI device module involve four peripheral SPI pins: SCK,
CSB, MOSI, MISO. SCK is the SPI clock driven by an external SPI host. CSB (chip
select bar) is an active low enable signal that frames a transfer, driven by the
external host. Transfers with active SCK edges but inactive (high) CSB are
ignored. Data is driven into the SPI device on the MOSI pin ("Master Out Slave
In", though we're otherwise using host/device terminology) and driven out on
MISO. Any transfer length is legal, though higher level protocols typically
assume word width boundaries. See details on protocols and transfers that
follow. The diagram below shows a typical transfer, here for 8 bytes (64 cycles,
showing the beginning and end of the transfer). Configurability for active
edges, polarities, and bit orders are described later.
```wavejson
{ signal: [
{ name: 'CSB', wave: '10.........|....1.'},
{ name: 'SCK', wave: '0.p........|....l.'},
{ name: 'MOSI', wave: 'z.=..=.=.=.=.=.=.=.=.=|=.=.=.=.z....',
data:['R07','R06','R05','R04','R03','R02','R01','R00','R17',
'','R73','R72','R71','R70'], period:0.5, },
{ name: 'MISO', wave: 'z.=..=.=.=.=.=.=.=.=.=|=.=.=.=.z....',
data:['T07','T06','T05','T04','T03','T02','T01','T00','T17',
'','T73','T72','T71','T70'], period:0.5}],
head:{
text: 'Data Transfer',
tick: ['-2 -1 0 1 2 3 4 5 6 7 8 9 60 61 62 63 ']
}
}
```
{{% section2 Defining "Generic Mode" }}
Firmware mode, as implemented by this SPI device, is used to bulk copy data in
and out of the chip using the pins as shown above. In general, it is used to
load firmware into the chip, but can be used for any data transfer into or out
of the chip. The transfers are "generic" in the sense that there is no
addressing involved. Data transferred into the chip goes into a SPI Device
circular buffer implemented in an SRAM, and firmware decides what to do with the
data. Data transferred out of the chip comes out of a circular buffer in an
SRAM. Software can build any number of higher level protocols on top of this
basic mechanism. All transfers are by definition full duplex: whenever an active
SCK edge is received, a bit of RX data is latched into the peripheral, and a bit
of TX data is sent out of the peripheral. If transfers only require
unidirectional movement of data, the other direction can be ignored but will
still be active. For instance, if only receive data is needed in the transfer,
the device will still be transmitting data out on the TX ("MISO") pin.
### SPI Generic Mode
The primary protocol considered is one used by an external SPI host to send
chunks of firmware data into the device in the receive direction, confirming the
contents with an echo back of a hash of the received data in the transmit
direction. This is generally termed 'SPI Generic' mode, since SPI is used to
send firmware into the device flash, brokered by software confirming integrity
of the received firmware data. This special case will be described first, and
then a generic understanding of how firmware mode operates will follow.
The following diagram shows the expected data transfer in SPI Generic mode.
![data transfer in SPI Device](data_transfer.svg)
In this diagram, bursts of data transfer are shown as "pages" of firmware
content being driven into the device. The size of the page is not relevant,
though it must be less than the size of the internal SPI Device SRAM. (Typically
the SRAM is divided in half for RX and TX buffers, but the boundary is
configurable. The total size of RX and TX buffer must fit in the SPI device
SRAM.) Since the external SPI Host is in charge of the clock (SCK), it controls
all aspects of the transfer, including the size of the page. But it is done in
coordination with software running on the device that manages the higher level
protocol.
The protocol assumes that for each page written into the device, a response will
be prepared for the next page. But since the SPI Device is always transmitting
during every received page, the first transmitted page can be ignored. After the
first page is received, software will get alerted as to its completion (via an
RX interrupt), and will execute whatever integrity check is required on that
data. It can then prepare its response to page zero by writing into the SPI
Device TX buffer. What it writes into the TX buffer is of the concern of the
higher level protocol. It could be a "good" indication, a full echo of the RX
data, or a hash of the received contents. The decision is not in scope for this
specification.
Clearly there is a potential race condition here as a new page could begin to be
received before software has prepared the transmit response to page zero
(including the time to read data out of the SRAM), but that is a condition that
the higher level protocol must prepare for. That protocol is not in scope for
this document, but some hints to its implementation are given in the
programmer's guide that follows.
The transfer continues until all received data is taken in, and responded back.
In this protocol the last "received" page of data is a "don't care" as long
as the response is transmitted successfully.
### Generic Firmware Mode
Taking this example as a guide, we can see the generic method of the SPI
firmware mode. On every active SCK clock edge, data is received from the MOSI
pin into the SPI device, and data is transmitted on the MISO pin.. Received data
is gathered into bytes and written into the RX circular buffer in the SPI Device
SRAM as it is accumulated. Whatever data exists in the TX circular buffer is
serialized and transmitted. Transfers are framed using the active low chip
select pin SCB. What happens when receive data arrives and the RX circular
buffer is full, or when transmits encounter an empty TX circular buffer are
error conditions discussed in the Design Details section that follows.
{{% section2 Block Diagram }}
![Block Diagram](block_diagram.svg)
The block diagram above shows how the SPI Device IP converts incoming
bit-serialized MOSI data into a valid byte, where the data bit is valid when the
chip select signal (CSB) is 0 (active low) and SCK is at positive or negative
edge (configurable, henceforth called the "active edge"). The bit order within
the byte is determined by !!CFG.rx_order configuration register field. After a
byte is gathered, the interface module writes the byte data into a small FIFO
("RXFIFO") using SCK. It is read out of the FIFO and written into to the
buffer SRAM ("DP_SRAM") using the system bus clock. If RXFIFO is full, this is
an error condition and the interface module discards the byte.
The interface module also serializes data from the small transmit FIFO
("TXFIFO") and shifts it out on the MISO pin when CSB is 0 and SCK is at the
active edge. The bit order within the byte can be configured with configuration
register field !!CFG.tx_order. It is expected that software has prepared TX data
as per the SPI Flash or general Firmware Mode described in the "Defining
Firmware Mode" section above. But because SCK is not under the control of
software or the device (it is driven by the external SPI host), it is possible
that there is no data ready in the TXFIFO when chip select becomes active and
the interface needs to send data on the MISO pin. Either software has not
prepared TX data or software does not care about the contents of the TX data -
then the hardware will send whatever lingering data is in the empty TXFIFO. If
this is a security risk, then software should at least soft-reset the contents
of the TXFIFO using the !!CONTROL.rst_txfifo register. The soft-reset signal
doesn't have synchronizer to SCK clock, so the software shall control the reset
signal when SPI interface is in idle state.
### RXFIFO, TXFIFO, and DP_SRAM
The relationship between the Dual Port SRAM (DP_SRAM) and the RX and TXFIFOs
should be explained. The SRAM is divided into a section for the transmit
direction, named TXF, and a section for the receive direction, named RXF. Each
section has its own read and write pointer. The SRAM may be read and written by
software at any time, but for correct normal operation it will only write the
empty area of the TXF (between the write pointer and read pointer) and only read
the full area of the RXF (between the read pointer and write pointer) with the
other areas used by the hardware It is first worth noting that the hardware
implications of the asynchronous nature of SCK and the fact it may not be free
running, complicate some of the logic. The full feature set of that interface
logic (clocked by SCK) includes the serial to parallel converter for RX data,
the parallel-to-serial converter for TX data, and the interfaces to RXFIFO and
TXFIFO. Before the first bit transfer and after the last SCK is stopped, so
there is no clock for any of this logic. So for instance there is no guarantee
of the two-clock-edges normally required for asynchronous handshaking protocols.
The RXFIFO and TXFIFO exist to facilitate this situation.
In the receive direction, data gathered from the MOSI pin is written into the
RXFIFO (see details below) at appropriate size boundaries. This data is
handshake-received on the core clock side, gathered into byte or word quantity,
and written into the RX circular buffer of the dual-port SRAM. On each write,
the RXF write pointer(!!RXF_PTR.wptr) is incremented by hardware, wrapping at
the size of the circular buffer. Software can watch (via polling or interrupts)
the incrementing of this write pointer to determine how much valid data has been
received, and determine when and what data to act upon. Once it has acted upon
data, the software should update the RXF read pointer to indicate the space in
the SRAM is available for future writes by the hardware. If incrementing the
write pointer would result in it becoming equal to the read pointer then the RXF
is full and any subsequently received data will be discarded. Thus in normal
operation the RXF write pointer is updated automatically by hardware and the RXF
read pointer is managed by software. As an optimization the hardware will
normally only write to the 32-bit wide SRAM when an entire word can be written.
Since the end of the received data may not be aligned, there is a timer that
forces sub-word writes if data has been staged for too long. The timer value
(!!CFG.timer_v) represents the number of core clock cycles. For instance, if
timer value is configured in 0xFF, the RXF control logic will write gathered
sub-word data in 255 cycles if no further bit stream from SPI is received.
In the transmit direction, things are a little more tricky. Since the pin
interface logic begins transmitting data on its very first SCK edge, there are
no previous clock edges in the interface side of the fifo to allow an empty flag
to be updated. So the interface must *blindly* take whatever data is at the
read pointer of the TXFIFO. (In a typical asynchronous FIFO with free-running
clocks the pointers can always be sent across the asynchronous boundary to
determine if the FIFO is truly empty or not). Hence the need to potentially send
out garbage data if software has not prepared the TXFIFO in time.
The software writes data that it wants to transmit into the TXF circular buffer
of the DP_SRAM buffer. It then passes the data to the hardware by moving the TXF
write pointer to point to the next location after the data (this is the location
it will use to start the data for the next transmission). Hardware that manages
the TXFIFO detects the change in TXF write pointer and begins reading from the
SRAM and prefilling the TXFIFO until it is full or until all valid TXF data has
been read. This prepares the TXFIFO with the desired data for when the next SCK
data arrives. As the SCK domain logic pulls data out of the TXFIFO to transmit
on the MISO pin, that TXFIFO read is detected (after synchronization to the core
clock domain) and potentially another word of data is read from the SRAM and
written into the TXFIFO. Each time the SRAM is read the hardware increments the
TXF read pointer making the space available to software. Like above, though
conversely, in normal operation the TXF write pointer is managed completely by
software and the TXF read pointer is incremented by hardware.
All reads and writes to/from the SRAM for RXF and TXF activity are managed by
direct reads and writes through the TLUL bus interface, managed by the
auto-generated register file control logic.
{{% section2 Hardware Interfaces }}
{{% hwcfg spi_device }}
{{% section1 Design Details }}
{{% section2 Clock and Phase }}
The SPI device module has two programmable register bits to control the SPI
clock, !!CFG.CPOL and !!CFG.CPHA. CPOL controls clock polarity and CPHA controls the clock
phase. Please take a look at the link below from Wikipedia.
[File:SPI_timing_diagram2.svg](https://en.wikipedia.org/wiki/Serial_Peripheral_Interface#/media/File:SPI_timing_diagram2.svg)
{{% section2 SPI Device Generic Mode }}
As described in the Theory of Operations above, in generic mode, the SPI device
writes incoming data directly into the SRAM (through RXFIFO) and updates the SPI
device SRAM write pointer (!!RXF_PTR.wptr). It does not parse a command byte nor
address bytes, analyzing incoming data relies on firmware implementation of a
higher level protocol. Data is sent from the TXF SRAM contents via TXFIFO.
It is important that the data path inside the block should meet the timing that
is a half cycle of SCK. As SCK clock is shut off right after the last bit of the
last byte is received, the hardware module cannot latch MOSI signal. The signal
is directly connected to RXFIFO and other bits [7:1] are shifted out values of
MOSI. This is detailed in the waveform below.
```wavejson
{ signal: [
{ name: 'CSB', wave: '10.||...|..1'},
{ name: 'SCK', wave: '0.p||...|..l', node:'......b' },
{ name: 'MOSI', wave: '0.=..=|=|=.=.=.=|=.=.z..', data:['7','6','5','1','0','7','6','1','0'], period:0.5, },
{ name: 'BitCount', wave: '=...=.=|=|=.=.=.=|=.=...', data:['7','6','5','1','0','7','6','1','0','7'], period:0.5},
{ name: 'RX_WEN', wave: '0....|....1.0.|...1.0...' , period:0.5},
{ name: 'RXFIFO_D', wave:'x.=.=================.x.', node: '...........a',period:0.5},
],
head:{
text: 'Read Data to FIFO',
tick: ['-2 -1 0 1 . 30 31 32 33 n-1 n n+1 n+2 '],
},
}
```
As shown above, the RXFIFO write request signal (`RX_WEN`) is asserted when
BitCount reaches 0h. Bitcount is reset by CSB asynchronously, returning to 7h
for the next round. RXFIFO input data changes on the half clock cycle. RXFIFO
latches WEN at the positive edge of SCK. So Bit 0 (or Bit 7 in reverse order
case, see !!CFG.rx_order configuration field) cannot be latched. When BitCount
is 0h, bit 0 of FIFO data shows bit1 value for the first half clock cycle then
shows correct value after MOSI value is changed.
TXFIFO is similar. TX_REN is asserted when Tx BitCount reaches 1, and the
current entry of TXFIFO is popped at the negative edge of SCK. It results in a
change of MISO value at the negative edge of SCK. MISO_OE is bounded by CSB
signal. If CSB goes to high, MISO is returned to High-Z state.
```wavejson
{ signal: [
{ name: 'CSB', wave:'10.||...|..1'},
{ name: 'SCK', wave:'0...p.|.|...|l' , node:'.............a', period:0.5},
{ name: 'MISO', wave:'x.=..=|=|=.=.=.=|=.=.x..', data:['7','6','5','1','0','7','6','1','0'], period:0.5, },
{ name: 'MISO_OE', wave:'0.1...................0.', period:0.5},
{ name: 'BitCount', wave:'=....=.=|=|=.=.=.=|=.=..', data:['7','6','5','1','0','7','6','1','0','7'], period:0.5},
{ name: 'TX_REN', wave:'0.....|..1.0...|.1.0....' , node:'..........c',period:0.5},
{ name: 'TX_DATA_i',wave:'=.....|....=.......=....',data:['D0','Dn','Dn+1'], node:'...........b', period:0.5},
],
edge: ['a~b', 'c~b t1'],
head:{
text: 'Write Data from FIFO',
tick: ['-2 -1 0 1 . 30 31 32 33 n-1 n n+1 n+2 '],
},
}
```
Please keep in mind that in the mode 3 configuration, the logic isn't able to
pop the entry from the TX async FIFO at the end of the bit at the last byte in a
transaction. In the mode 3, no further SCK edge is given after sending the last
bit before the CSB de-assertion. The design is chosen to pop the entry at the
7th bit position. This introduces unavoidable behavior of dropping the last
byte if CSB is de-asserted before a byte transfer is completed. If CSB is
de-asserted in bit 1 to 6 position, the FIFO entry isn't popped. TX logic will
re-send the byte in next transaction. If CSB is de-asserted in 7th or 8th bit
position, the data is dropped and begins with next byte in the next transaction.
### RXFIFO control
![RXF CTRL State Machine](rxf_ctrl_fsm.svg)
The RXFIFO Control module controls data flow from RXFIFO to SRAM. It connects
two FIFOs having different data widths. RXFIFO is a byte width. SRAM storing
incoming data to serve FW has TL-UL interface width.
To reduce traffic to SRAM, the control logic gathers FIFO entries up to full
SRAM data width, then does a full-word SRAM write. A programmable timer exists
in the case when partial bytes are received at the end of a transfer. If the
timer expires while bytes are still in the RXFIFO, the logic writes partial
words to SRAM. Following received data triggers read-modify-write operation to
update partially.
![State Machine](rxf_ctrl_fsm_table.png)
### TXFIFO control
The TXFIFO control module reads data from SRAM then pushes to TXFIFO whenever
there is space in TXFIFO and when the TXF wptr and rptr indicate there is data
to transmit. Data is written into the TXF SRAM by software which also controls
the TXF write pointer.
![TXF CTRL Data Path](txf_ctrl_dp.svg)
The TXFIFO control module latches the write pointer then use it internally. This
scheme prevents HW from using incorrect data from sram_rdata if write pointer
and read pointer are pointing at the same location in the SRAM. It is
recommended for the software to update the write pointer at the SRAM data width
granularity if it has more than 1 DWord data to send out. If software updates
write pointer every byte, HW tries to fetch data from SRAM every time it hits
the write pointer leading to inefficiency of SRAM access.
If TXFIFO is empty, HW module repeatedly sends current entry of TXFIFO output as
explained in "Theory of Operations" section. It cannot use an empty signal from
TXFIFO due to asynchronous timing constraints.
So, if software wants to send specific dummy data, it should prepare the amount
of data with that value. As shown in the Theory Of Operations figure, for
example, internal software could prepare FFh values for first page.
![State Machine](txf_ctrl_fsm_table.png)
{{% section2 Data Storage Sizes }}
SPI Device IP uses 2kB internal Dual-Port SRAM. Firmware can resize RX/ TX
circular buffer within the SRAM size. For example, the firmware is able to set
RX circular buffer to be 1.5kB and 512B for TX circular buffer.
If it is need to increase SRAM size, `SramAw` local parameter in `spi_device.sv`
should be changed. It cannot exceed 13 (32kB) due to the read and write
pointers' widths.
{{% section1 Programmers Guide }}
{{% section2 Initialization }}
By default, RX SRAM FIFO base and limit address (via !!RXF_ADDR register) are
set to 0x0 and 0x1FC, 512 bytes. And TX SRAM FIFO base and limit addresses (in
the !!TXF_ADDR register) are 0x200 and 0x3FC. If FW wants bigger spaces, it can
change the values of the above registers !!RXF_ADDR and !!TXF_ADDR.
Software can configure the timer value !!CFG.timer_v to change the delay between
DATA received from SPI interface to be written into the SRAM. Value of the field
is the number of the core clock cycles that the logic waits for. Note that if
receiver gathers full SRAM words, 4 bytes, it writes to RX SRAM FIFO regardless
of the timer.
{{% section2 Pointers }}
RX/ TX SRAM FIFO has read and write pointers, !!RXF_PTR and !!TXF_PTR . Those
pointers are used to manage circular FIFOs inside the SRAM. The pointer width in
the register description is 16 bit but the number of valid bits in the pointers
depends on the size of the SRAM.
The current SRAM size is 2kB and the pointer width is 12 bits, 11bits
representing a byte offset and 1 most-significant bit for indicating phase of
the FIFO. Since they represent bytes, the low 2 bits indicate the offset within
the 32-bit wide SRAM word. The pointers indicate the offset into the area
described by the base and limit values, so the lower bits (11 bits in this case)
of a pointer shall not exceed the size in bytes (4 * (limit address - base
address)) reserved for the region (RXF or TXF) that the pointer is in. For
instance, if FW sets RXFIFO depth to 128 (default value), it shall not update
the read pointer outside the range 0x000 - 0x1FF (128*4 = 512Bytes) ignoring
the phase bit, bit 11.
{{% section2 Register Table }}
{{% registers x }}