| {{% lowrisc-doc-hdr Simple USB Full Speed Device IP Technical Specification }} |
| {{% regfile ../data/usbdev.hjson }} |
| |
| {{% section1 Overview }} |
| |
| {{% toc 3 }} |
| |
| |
| {{% section2 Features }} |
| |
| - USB Full-Speed (12Mbps) Device interface |
| |
| - 2kbyte Interface buffer |
| |
| - Up to 12 Endpoints (including required Endpoint 0) |
| |
| - Support for USB packet size up to 64 bytes |
| |
| - Support SETUP, IN and OUT transactions |
| |
| - Support for Bulk, Control and Interrupt endpoints and transactions |
| |
| - Initial version does not support Isochronous transfers larger than |
| 64 bytes (later extension) |
| |
| - Streaming possible through software |
| |
| - Interrupts for packet reception and transmission |
| |
| |
| {{% section2 Description }} |
| |
| The USB device module is a simple software-driven gneric USB device |
| interface for Full-Speed USB operation. The IP includes the physical |
| layer interface (capable of using 3.3V I/O pads, but could be extended |
| to a more formal USB PHY), the low level USB protocol and a packet |
| buffer interface to the software. |
| |
| |
| {{% section2 Compatibility }} |
| |
| The USB device programming interface is not based on any existing |
| interface. |
| |
| |
| {{% section1 Theory of Operations }} |
| |
| A useful quick reference for USB Full-Speed is |
| [http://www.usbmadesimple.co.uk/ums_3.htm](http://www.usbmadesimple.co.uk/ums_3.htm) |
| |
| The block diagram shows a high level view of the simple USB Device |
| including the main register access paths. |
| |
|  |
| |
| |
| {{% section2 Clocking }} |
| |
| The USB Full-Speed interface runs at a 12MHz datarate. The interface |
| runs at four times this and must be clocked from an accurate 48MHz |
| clock source. The USB specification for a Full-Speed device requires |
| the average bit-rate is 12Mbps +/- 0.25%, so the clock needs to |
| support maximum error of 2,500ppm. |
| |
| Control transfers pass through asynchronous FIFOs or have a ready bit |
| synchronized across the clock domain boundary. A dual-port |
| asynchronous buffer SRAM is used for data transfers between the bus |
| clock and USB clock domains. |
| |
| {{% section2 USB Interface Pins }} |
| |
| The interface pin table summarizes the external pins. |
| |
| |
| <table> |
| <tr> |
| <td>External Pin |
| </td> |
| <td>Internal signal |
| </td> |
| <td>Notes |
| </td> |
| </tr> |
| <tr> |
| <td>USB D+ |
| <p> |
| USB D- |
| </td> |
| <td>usb_dp_i, usb_dn_i |
| <p> |
| usb_dp_o, usb_dn_o |
| </td> |
| <td>External pins are bidirectional. When the pins are driven the input should be stable at dp=1, dn=0. |
| </td> |
| </tr> |
| <tr> |
| <td> |
| </td> |
| <td>usb_dp_en_o |
| <p> |
| usb_dn_en_o |
| </td> |
| <td>Enable driving the external pins. |
| </td> |
| </tr> |
| <tr> |
| <td>[usb_pullup] |
| </td> |
| <td>usb_pullup_o |
| </td> |
| <td>When the usb_pullup_o asserts a 1.5k pullup resistor should be connected to D+. This can be done inside the chip or with an external pin. |
| </td> |
| </tr> |
| <tr> |
| <td>usb_sense |
| </td> |
| <td>usb_sense_i |
| </td> |
| <td>The sense pin indicates the presence of VBUS from the host. |
| </td> |
| </tr> |
| </table> |
| |
| |
| The USB 2.0 Full Speed specification uses a bidirectional serial |
| interface that can be implemented with pseudo-differential 3.3V GPIO |
| pins and an oversampling receiver for recovery of the bitstream and |
| clock alignment. (Use of a differential receiver would improve signal |
| integrity and may be required for full spec compliance.) The IP |
| interface for the D+ and D- pins is presented using a pair of transmit |
| signals, a pair of receive signals and a transmit enable. External to |
| the IP these should be combined to drive the pins when transmit is |
| enabled and receive when transmit is not enabled. Note that to avoid |
| confusing the clock recovery, the receive signals should be set to |
| receive p = 1 and receive n = 0 when transmission is enabled (this is |
| done outside the IP block to allow interfacing of alternative |
| PHYs). Using standard 3.3V I/O pads allows use on most FPGAs although |
| the drive strength and series termination resistors may need to be |
| adjusted to meet the USB signal eye. On a Xilinx Artix-7 (and less |
| well tested Spartan-7) part, setting the driver to the 8mA, FAST |
| setting seems to work well with a 22R series termination (and with a |
| 0R series termination). |
| |
| A Full-Speed device identifies itself by providing a 1.5k pullup |
| resistor (to 3.3V) on the D+ line. The IP block produces a signal |
| usb_pullup that is asserted when this resistor should be |
| presented. This signal will be asserted whenever the interface is |
| enabled. In an FPGA implementation this signal can drive a 3.3V output |
| pin that is driven high when the signal is asserted and set high |
| impedance when the signal is deasserted, and the output pin used to |
| drive a 1.5k resistor connected on the board to the D+ |
| line. Alternatively, it can be used to enable an internal 1.5k pullup |
| on the D+ pin. |
| |
| The USB Host will identify itself to the device by enabling the 5V |
| VBUS power. It may do a hard reset of a port by removing and |
| reasserting VBUS (the Linux driver will do this when it finds a port |
| in an inconsistent state or a port that generates errors during |
| enumeration). The IP detects VBUS through the usb_sense pin. This pin |
| is always an input and should be externally connected to detect the |
| state of the VBUS. Note that this may require a resistor divider or |
| (for USB-C where VBUS can be up to 20V) active level translation to an |
| acceptable voltage for the input pin. |
| |
| {{% hwcfg usb_dev }} |
| |
| {{% section2 USB Link State }} |
| |
| The USB link has a number of states. These are detected and reported |
| in !!usbstat.link_state and state changes are reported using |
| interrupts. |
| |
| |State| Description | |
| |-----|-------------| |
| |Disconnected | The link is disconnected. This is signalled when the VBUS is not driven by the host, which results in the sense input pin being low. An interrupt is raised on entering this state.| |
| |Reset | The link is reset whenever the D+ and D- are both low (an SE0 condition) for an extended period. The host will assert reset for a minumum of 10ms, but the USB specification allows the device to detect and respond to a reset after 2.5us. The implementation here will report the reset state and raise an interrupt when the link is in SE0 for 3us.| |
| |Suspened| The link is suspended when at idle (a J condition) for more than 3ms. An interrupt is generated when the suspend is detected and a resume interrupt is generated when the link exits the suspend state.| |
| |Active| The link is active when it is running normally| |
| |Host Lost| The host lost interrupt will be signalled if the link is active but a start of frame has not been received from the host in 4.096ms. The host is required to send a SOF every 1ms. This is not an expected condition.| |
| |
| |
| {{% section2 USB Protocol Engine }} |
| |
| The USB 2.0 Full Speed Protocol Engine is provided by the common USB |
| interface code and is not part of this module. |
| |
| At the lowest level of the USB stack the transmit bitstream is |
| serialized, converted to NRZI encoding with bit-stuffing and sent to |
| the transmitter. The received bitstream is recovered, clock aligned |
| and decoded and has bit-stuffing removed. The recovered clock |
| alignment is used for transmission |
| |
| The higher level protocol engine forms the bitstream into packets, |
| performs CRC checking and recognizes IN, OUT and SETUP |
| transactions. These are presented without buffering to this module |
| which must accept or provide data when requested. The protocol engine |
| may cancel a transaction because of a bad CRC or request a retry if an |
| ACK was not received. |
| |
| {{% section2 Buffer Interface }} |
| |
| A 2k Byte SRAM is used to hold data between the system and the USB |
| interface. This is divided up into 32 buffers each containing 64 |
| bytes. This is an asynchronous dual-port SRAM with the software |
| accessing from the bus clock domain and the USB interface accessing |
| from the USB 48MHz clock domain. |
| |
| The 64 byte size for the buffers satisfies the maximum USB packet size |
| for a Full Speed interface for Control transfers (max may be 8, 16, 32 |
| or 64 bytes), Bulk Transfers (max is 64 bytes) and Interrupt transfers |
| (max is 64 bytes). It is small for Isochronous transfers (which have a |
| max of 1023 bytes) and the interface will need extending for high rate |
| isochronous use (would probably allow up to 8 or 16 buffers to be |
| aggregated as the isoc buffer). |
| |
| The software provides buffers for packet reception through a 4-entry |
| available-buffer FIFO. (More study needed but 4 seems about right: one |
| just returned to software, one being filled, one ready to be filled |
| and one for luck.) The !!RxEnable register is used to indicate which |
| endpoints will accept data from the host using SETUP or OUT |
| transactions. When a packet is transferred from the host to the device |
| (using an OUT or SETUP transaction) and reception of that type of |
| transaction is enabled for the requested endpoint, the next receive |
| buffer is pulled from the FIFO and will be filled with the data from |
| the packet. If the packet is correctly received then an ACK is |
| returned to the host and the buffer number, the packet size, an |
| out/setup flag and the endpoint number are passed back to software |
| using the receive FIFO and a pkt_received interrupt raised. The driver |
| should immediately provide a free buffer for future reception by |
| writing its buffer number to the available-buffer FIFO, then can |
| process the packet and eventually return the received buffer to the |
| free pool. This allows streaming on a single endpoint or across a |
| number of endpoints. If the packets cannot be consumed at the rate |
| they are received then software can implement selective flow-control |
| by disabling OUT or SETUP transactions for a particular endpoint, |
| which will result in a request to that endpoint being NACKed. In the |
| unfortunate event that the available-buffer FIFO is empty or receive |
| FIFO is full then all OUT/SETUP transactions are NACKed. (Do we need |
| to be able to STALL? Currently in INCfg but it may need to be its own |
| register) |
| |
| To send data to the host in response to an IN transaction the software |
| writes the data into a free buffer and writes the buffer number, data |
| length and a ready flag to the !!configin register for the endpoint |
| that the data is from. When the host next does an IN transaction to |
| that endpoint the data will be sent from the buffer. On receipt of the |
| ACK from the host the ready flag in the !!configin register will be |
| cleared and the bit corresponding to the endpoint number will be set |
| in the !!in_sent register which will cause a pkt_sent |
| interrupt. Software can return the buffer to the free pool and write a |
| 1 to clear the bit in the !!in_sent register. Note that streaming can be |
| achieved if the next buffer has been prepared and is written to the |
| !!configin register when the interrupt is received. |
| |
| A Control transfer may need an IN data transfer. Therefore when a |
| SETUP transaction is received for an endpoint the ready bit is cleared |
| in !!configin to cancel any buffer that was pending being sent to the |
| host from the Endpoint. When this is done the pending bit will be |
| set. The transfer must be queued again after the Control transfer is |
| completed. |
| |
| A link level reset will also cancel pending IN transactions by |
| clearing the ready bit and setting the pending bit. |
| |
| In general the 32 buffers will be allocated with one being processed |
| following reception, 4 in the available-buffer FIFO and 12 (worst |
| case) waiting transmission in the !!configin registers. This leaves 15 |
| for preparation of next transmission (which would need 12 in the worst |
| case of one per endpoint) and the free pool. |
| |
| |
| {{% section1 Design Details }} |
| |
| |
| {{% section1 Programmers Guide }} |
| |
| |
| {{% section2 Initialization }} |
| |
| The basic hardware initialization is to (in any order) fill the |
| Available buffer FIFO, enable reception of SETUP and OUT packets on |
| Endpoint 0 (this is the control endpoint that the host will use to |
| configure the interface), enable reception of SETUP and OUT packets on |
| any endpoints that accept them and enable any required |
| interrupts. Finally the interface is enabled by setting the |
| !!usbctrl.enable bit. Setting this bit will cause the USB device to |
| assert the Full-Speed pullup on the D+ line, which is used by the host |
| to detect the device. In most cases there is no need to configure the |
| device ID (!!usbctrl.device_address) at this point -- the line will be |
| in reset and the hardware will have forced the device ID to zero. |
| |
| The second stage of initialization is done under control of the host, |
| which will use control transfers (always beginning with SETUP |
| transactions) to Endpoint 0. Initially these will be sent to device |
| ID 0. When a Set Address request is received the device ID received |
| must be stored in the !!usbctrl.device_address register. Note that |
| device 0 is used for the entire control transaction setting the new |
| device ID, so writing the new ID to the register should not be done |
| until the ACK for the Status stage (see USB specification) has been |
| received. |
| |
| |
| {{% section2 Buffers }} |
| |
| The driver needs to manage the buffers in the interface SRAM. Each |
| buffer can hold the maximum length packet (64 bytes). Other than for |
| data movement this is most likely done based on their buffer |
| identifier which is a small integer between zero and (SRAM |
| size-in-bytes)/(MaxPacket-in-bytes). |
| |
| In order to avoid unintentionally stalling the interface there must be |
| buffers available when the host sends data to the device (an OUT or |
| STATUS transaction). The driver needs to ensure (1) there are always |
| buffer identifiers in the Available buffer FIFO (2) the receive FIFO |
| is not full. If the AV FIFO is empty or the RX FIFO is full when data |
| is received a NAK will be returned to the host, requesting the packet |
| be retried later. Generating NAK with this mechanism is generally to |
| be avoided (for example the host expects a device will always accept |
| STATUS packets to endpoint 0). |
| |
| Keeping the AV FIFO full can be done with a simple loop, adding |
| buffers from the software managed free-pool until the FIFO is full. A |
| simpler policy of just adding a buffer to the AV FIFO everytime one is |
| removed from the RX FIFO should work on average, but will be slightly |
| worse when a burst of packets are received. |
| |
| Flow control (using NAKs) may be done per-endpoint using the |
| !!rxenable register. If this does not indicate OUT packet reception |
| is enabled then any packet will receive a NAK to request a retry |
| later. This should only be done for short durations or the host may |
| timeout the transaction. |
| |
| |
| {{% section2 Reception }} |
| |
| The host will send OUT or SETUP transactions when it wants to transfer |
| data to the device. The data packets are directed to a particular |
| endpoint, and the maximum packet size is set per-endpoint in its |
| Endpoint Descriptor (this must be the same or smaller than the maximum |
| packet size supported by the device). A received interrupt is raised |
| whenever there is one or more packets in the RX FIFO. The driver |
| should pop the information from the FIFO by reading the !!rxfifo |
| register, which gives (a) the buffer ID that the data was received in |
| (b) the data length, in bytes, received (c) the endpoint to which the |
| packet was sent (d) an indication if the packet was sent with an OUT |
| or STATUS transaction. Note that the data length could be between 0 |
| and the maximum packet size -- in some situations a zero length packet |
| is used as an acknowledgement or end of transmission. |
| |
| The data length does not include the packet CRC. (The CRC bytes are |
| written to the buffer if they fit within the maximum buffer size.) |
| Packets with a bad CRC will **not** be transferred to the RX FIFO, the |
| hardware will request a retry. |
| |
| |
| {{% section2 Transmission }} |
| |
| Data is transferred to the host based on the host requesting a |
| transfer with an IN transaction. The host will only generate IN |
| requests if the endpoint is declared as an IN endpoint in its Endpoint |
| Descriptor (note that two descriptors are needed if the same endpoint |
| is used for both IN and OUT transfers). The Endpoint Descriptor also |
| includes a description of the frequency the endpoint should be polled. |
| |
| Data is queued for transmission by writing the corresponding |
| !!configin register with the buffer ID containing the data, the |
| length in bytes of data (0 to maximum packet length) and setting the |
| Rdy bit. This data (with the packet CRC) will be sent as a response to |
| the next IN transaction on the endpoint. When the host ACKs the data |
| the Rdy bit is cleared, the bit corresponding to the endpoint is set |
| in the !!in_sent register and a transmit interrupt is raised. If the |
| host does not ACK the data then the packet will be retried. When the |
| packet transmission has been noted by the driver the endpoint bit |
| should be cleared by writing a 1 to it in the !!in_sent register. |
| |
| Note that the !!configin for an endpoint is a single register, so no |
| new data packet should be queued until the previous packet has been |
| acknowledged. This causes a problem if a Control Transaction is |
| received on an endpoint with a transmission pending because the |
| Control Transaction may require an IN packet. Therefore the hardware |
| will **clear the rdy bit** if an enabled SETUP transaction is received |
| on any endpoint and **set the pending bit** if there was data |
| pending. The driver must remember the pending transfer and, after the |
| Control transaction is complete, write it back to the !!configin |
| register with the rdy bit set. |
| |
| |
| {{% section2 Stall }} |
| |
| The !!stall register is used to Stall an endpoint. This is used if it |
| is shutdown for some reason, or to signal certain error |
| conditions. Unused endpoints can have their !!stall register bit left |
| clear, so in many cases there is no need to use the !!stall |
| register. If the stall bit is set for an endpoint then the STALL |
| response will be provided to any (IN, OUT or SETUP) request on that |
| endpoint. |
| |
| |
| {{% section2 Register Table }} |
| |
| {{% registers x }} |