hw/ip/usbdev/doc/usbdev.md

{{% lowrisc-doc-hdr Simple USB Full Speed Device IP Technical Specification }} {{% regfile ../data/usbdev.hjson }}

USB Full-Speed (12Mbps) Device interface
2kbyte Interface buffer
Up to 12 Endpoints (including required Endpoint 0)
Support for USB packet size up to 64 bytes
Support SETUP, IN and OUT transactions
Support for Bulk, Control and Interrupt endpoints and transactions
Initial version does not support Isochronous transfers larger than 64 bytes (later extension)
Streaming possible through software
Interrupts for packet reception and transmission

The USB device module is a simple software-driven gneric USB device interface for Full-Speed USB operation. The IP includes the physical layer interface (capable of using 3.3V I/O pads, but could be extended to a more formal USB PHY), the low level USB protocol and a packet buffer interface to the software.

The USB device programming interface is not based on any existing interface.

A useful quick reference for USB Full-Speed is http://www.usbmadesimple.co.uk/ums_3.htm

The block diagram shows a high level view of the simple USB Device including the main register access paths.

Block Diagram

The USB Full-Speed interface runs at a 12MHz datarate. The interface runs at four times this and must be clocked from an accurate 48MHz clock source. The USB specification for a Full-Speed device requires the average bit-rate is 12Mbps +/- 0.25%, so the clock needs to support maximum error of 2,500ppm.

Control transfers pass through asynchronous FIFOs or have a ready bit synchronized across the clock domain boundary. A dual-port asynchronous buffer SRAM is used for data transfers between the bus clock and USB clock domains.

The interface pin table summarizes the external pins.

The USB 2.0 Full Speed specification uses a bidirectional serial interface that can be implemented with pseudo-differential 3.3V GPIO pins and an oversampling receiver for recovery of the bitstream and clock alignment. (Use of a differential receiver would improve signal integrity and may be required for full spec compliance.) The IP interface for the D+ and D- pins is presented using a pair of transmit signals, a pair of receive signals and a transmit enable. External to the IP these should be combined to drive the pins when transmit is enabled and receive when transmit is not enabled. Note that to avoid confusing the clock recovery, the receive signals should be set to receive p = 1 and receive n = 0 when transmission is enabled (this is done outside the IP block to allow interfacing of alternative PHYs). Using standard 3.3V I/O pads allows use on most FPGAs although the drive strength and series termination resistors may need to be adjusted to meet the USB signal eye. On a Xilinx Artix-7 (and less well tested Spartan-7) part, setting the driver to the 8mA, FAST setting seems to work well with a 22R series termination (and with a 0R series termination).

A Full-Speed device identifies itself by providing a 1.5k pullup resistor (to 3.3V) on the D+ line. The IP block produces a signal usb_pullup that is asserted when this resistor should be presented. This signal will be asserted whenever the interface is enabled. In an FPGA implementation this signal can drive a 3.3V output pin that is driven high when the signal is asserted and set high impedance when the signal is deasserted, and the output pin used to drive a 1.5k resistor connected on the board to the D+ line. Alternatively, it can be used to enable an internal 1.5k pullup on the D+ pin.

The USB Host will identify itself to the device by enabling the 5V VBUS power. It may do a hard reset of a port by removing and reasserting VBUS (the Linux driver will do this when it finds a port in an inconsistent state or a port that generates errors during enumeration). The IP detects VBUS through the usb_sense pin. This pin is always an input and should be externally connected to detect the state of the VBUS. Note that this may require a resistor divider or (for USB-C where VBUS can be up to 20V) active level translation to an acceptable voltage for the input pin.

The USB link has a number of states. These are detected and reported in !!usbstat.link_state and state changes are reported using interrupts.

State	Description
Disconnected	The link is disconnected. This is signalled when the VBUS is not driven by the host, which results in the sense input pin being low. An interrupt is raised on entering this state.
Reset	The link is reset whenever the D+ and D- are both low (an SE0 condition) for an extended period. The host will assert reset for a minumum of 10ms, but the USB specification allows the device to detect and respond to a reset after 2.5us. The implementation here will report the reset state and raise an interrupt when the link is in SE0 for 3us.
Suspened	The link is suspended when at idle (a J condition) for more than 3ms. An interrupt is generated when the suspend is detected and a resume interrupt is generated when the link exits the suspend state.
Active	The link is active when it is running normally
Host Lost	The host lost interrupt will be signalled if the link is active but a start of frame has not been received from the host in 4.096ms. The host is required to send a SOF every 1ms. This is not an expected condition.

The USB 2.0 Full Speed Protocol Engine is provided by the common USB interface code and is not part of this module.

At the lowest level of the USB stack the transmit bitstream is serialized, converted to NRZI encoding with bit-stuffing and sent to the transmitter. The received bitstream is recovered, clock aligned and decoded and has bit-stuffing removed. The recovered clock alignment is used for transmission

The higher level protocol engine forms the bitstream into packets, performs CRC checking and recognizes IN, OUT and SETUP transactions. These are presented without buffering to this module which must accept or provide data when requested. The protocol engine may cancel a transaction because of a bad CRC or request a retry if an ACK was not received.

A 2k Byte SRAM is used to hold data between the system and the USB interface. This is divided up into 32 buffers each containing 64 bytes. This is an asynchronous dual-port SRAM with the software accessing from the bus clock domain and the USB interface accessing from the USB 48MHz clock domain.

The 64 byte size for the buffers satisfies the maximum USB packet size for a Full Speed interface for Control transfers (max may be 8, 16, 32 or 64 bytes), Bulk Transfers (max is 64 bytes) and Interrupt transfers (max is 64 bytes). It is small for Isochronous transfers (which have a max of 1023 bytes) and the interface will need extending for high rate isochronous use (would probably allow up to 8 or 16 buffers to be aggregated as the isoc buffer).

The software provides buffers for packet reception through a 4-entry available-buffer FIFO. (More study needed but 4 seems about right: one just returned to software, one being filled, one ready to be filled and one for luck.) The !!RxEnable register is used to indicate which endpoints will accept data from the host using SETUP or OUT transactions. When a packet is transferred from the host to the device (using an OUT or SETUP transaction) and reception of that type of transaction is enabled for the requested endpoint, the next receive buffer is pulled from the FIFO and will be filled with the data from the packet. If the packet is correctly received then an ACK is returned to the host and the buffer number, the packet size, an out/setup flag and the endpoint number are passed back to software using the receive FIFO and a pkt_received interrupt raised. The driver should immediately provide a free buffer for future reception by writing its buffer number to the available-buffer FIFO, then can process the packet and eventually return the received buffer to the free pool. This allows streaming on a single endpoint or across a number of endpoints. If the packets cannot be consumed at the rate they are received then software can implement selective flow-control by disabling OUT or SETUP transactions for a particular endpoint, which will result in a request to that endpoint being NACKed. In the unfortunate event that the available-buffer FIFO is empty or receive FIFO is full then all OUT/SETUP transactions are NACKed. (Do we need to be able to STALL? Currently in INCfg but it may need to be its own register)

To send data to the host in response to an IN transaction the software writes the data into a free buffer and writes the buffer number, data length and a ready flag to the !!configin register for the endpoint that the data is from. When the host next does an IN transaction to that endpoint the data will be sent from the buffer. On receipt of the ACK from the host the ready flag in the !!configin register will be cleared and the bit corresponding to the endpoint number will be set in the !!in_sent register which will cause a pkt_sent interrupt. Software can return the buffer to the free pool and write a 1 to clear the bit in the !!in_sent register. Note that streaming can be achieved if the next buffer has been prepared and is written to the !!configin register when the interrupt is received.

A Control transfer may need an IN data transfer. Therefore when a SETUP transaction is received for an endpoint the ready bit is cleared in !!configin to cancel any buffer that was pending being sent to the host from the Endpoint. When this is done the pending bit will be set. The transfer must be queued again after the Control transfer is completed.

A link level reset will also cancel pending IN transactions by clearing the ready bit and setting the pending bit.

In general the 32 buffers will be allocated with one being processed following reception, 4 in the available-buffer FIFO and 12 (worst case) waiting transmission in the !!configin registers. This leaves 15 for preparation of next transmission (which would need 12 in the worst case of one per endpoint) and the free pool.

The basic hardware initialization is to (in any order) fill the Available buffer FIFO, enable reception of SETUP and OUT packets on Endpoint 0 (this is the control endpoint that the host will use to configure the interface), enable reception of SETUP and OUT packets on any endpoints that accept them and enable any required interrupts. Finally the interface is enabled by setting the !!usbctrl.enable bit. Setting this bit will cause the USB device to assert the Full-Speed pullup on the D+ line, which is used by the host to detect the device. In most cases there is no need to configure the device ID (!!usbctrl.device_address) at this point -- the line will be in reset and the hardware will have forced the device ID to zero.

The second stage of initialization is done under control of the host, which will use control transfers (always beginning with SETUP transactions) to Endpoint 0. Initially these will be sent to device ID 0. When a Set Address request is received the device ID received must be stored in the !!usbctrl.device_address register. Note that device 0 is used for the entire control transaction setting the new device ID, so writing the new ID to the register should not be done until the ACK for the Status stage (see USB specification) has been received.

The driver needs to manage the buffers in the interface SRAM. Each buffer can hold the maximum length packet (64 bytes). Other than for data movement this is most likely done based on their buffer identifier which is a small integer between zero and (SRAM size-in-bytes)/(MaxPacket-in-bytes).

In order to avoid unintentionally stalling the interface there must be buffers available when the host sends data to the device (an OUT or STATUS transaction). The driver needs to ensure (1) there are always buffer identifiers in the Available buffer FIFO (2) the receive FIFO is not full. If the AV FIFO is empty or the RX FIFO is full when data is received a NAK will be returned to the host, requesting the packet be retried later. Generating NAK with this mechanism is generally to be avoided (for example the host expects a device will always accept STATUS packets to endpoint 0).

Keeping the AV FIFO full can be done with a simple loop, adding buffers from the software managed free-pool until the FIFO is full. A simpler policy of just adding a buffer to the AV FIFO everytime one is removed from the RX FIFO should work on average, but will be slightly worse when a burst of packets are received.

Flow control (using NAKs) may be done per-endpoint using the !!rxenable register. If this does not indicate OUT packet reception is enabled then any packet will receive a NAK to request a retry later. This should only be done for short durations or the host may timeout the transaction.

The host will send OUT or SETUP transactions when it wants to transfer data to the device. The data packets are directed to a particular endpoint, and the maximum packet size is set per-endpoint in its Endpoint Descriptor (this must be the same or smaller than the maximum packet size supported by the device). A received interrupt is raised whenever there is one or more packets in the RX FIFO. The driver should pop the information from the FIFO by reading the !!rxfifo register, which gives (a) the buffer ID that the data was received in (b) the data length, in bytes, received (c) the endpoint to which the packet was sent (d) an indication if the packet was sent with an OUT or STATUS transaction. Note that the data length could be between 0 and the maximum packet size -- in some situations a zero length packet is used as an acknowledgement or end of transmission.

The data length does not include the packet CRC. (The CRC bytes are written to the buffer if they fit within the maximum buffer size.) Packets with a bad CRC will not be transferred to the RX FIFO, the hardware will request a retry.

Data is transferred to the host based on the host requesting a transfer with an IN transaction. The host will only generate IN requests if the endpoint is declared as an IN endpoint in its Endpoint Descriptor (note that two descriptors are needed if the same endpoint is used for both IN and OUT transfers). The Endpoint Descriptor also includes a description of the frequency the endpoint should be polled.

Data is queued for transmission by writing the corresponding !!configin register with the buffer ID containing the data, the length in bytes of data (0 to maximum packet length) and setting the Rdy bit. This data (with the packet CRC) will be sent as a response to the next IN transaction on the endpoint. When the host ACKs the data the Rdy bit is cleared, the bit corresponding to the endpoint is set in the !!in_sent register and a transmit interrupt is raised. If the host does not ACK the data then the packet will be retried. When the packet transmission has been noted by the driver the endpoint bit should be cleared by writing a 1 to it in the !!in_sent register.

Note that the !!configin for an endpoint is a single register, so no new data packet should be queued until the previous packet has been acknowledged. This causes a problem if a Control Transaction is received on an endpoint with a transmission pending because the Control Transaction may require an IN packet. Therefore the hardware will clear the rdy bit if an enabled SETUP transaction is received on any endpoint and set the pending bit if there was data pending. The driver must remember the pending transfer and, after the Control transaction is complete, write it back to the !!configin register with the rdy bit set.

The !!stall register is used to Stall an endpoint. This is used if it is shutdown for some reason, or to signal certain error conditions. Unused endpoints can have their !!stall register bit left clear, so in many cases there is no need to use the !!stall register. If the stall bit is set for an endpoint then the STALL response will be provided to any (IN, OUT or SETUP) request on that endpoint.