 Welcome to Analyze USB Traffic with Bioshock Session. My name is Tomas Mojn and I work as senior firmware engineer at Nordic Semiconductor. During office hours I mostly work on USB support in Zephyr. You can reach out to me in USB channel on Zephyr Discord server. The presentation starts with introduction covering basic terminology, speeds, connectors, and why USB2 is still relevant. I will mention the USB transfer types and device classes. Then I will show USB traffic capture options and outline the main difference between software and hardware snippers. I picked USB mass storage as an example because USB memory stick is really well known device. The mass storage USB protocol layer is simple, and various mass storage deceptor in Wireshark. At the end there will be short summary. Please write down only a question so I can answer them later. Let's start with the terminology. Both USB and networking use the same words, but with a slightly different meaning. The analogies I present might not be quite exact, but I consider them good enough to get the big picture. USB host can be seen as a requester because it initiates all the communication and also as the ACP server because it assigns addresses to devices. Example host is a PC or laptop. USB device is a responder because it responds to host requests. Even when the device is sending data to host, it is essentially responding to host requests. Example device is a mouse. Port in USB means the physical port connector. A host can have multiple ports. Every device requires one port. If the host does not have enough ports, when we can use hub. It acts similar to switch or hub known from networking work. Thanks to hub we can connect more devices to a single host's port like keyboard or USB memory. In order for communication to work, every device needs an address. The address is similar to local IP address, except the range is much smaller. After reset, the device defaults to address 0 and then host sets device address to value from 1 to 127. Endpoint is essentially a buffer. From address in point of view, endpoint number can be seen as analogies to TCP or UDP port. Each endpoint operates using one of the four available transfer types. USB class pretty much defines the communication protocol. The descriptor is like a data sheet that holds reads to know what type of device it is talking to. Vendor ID is 16-bit vernal code assigned by USB implementers forum and product ID is 16-bit product code. However, unlike MAC addresses, vendor ID and product ID pair only identifies the device model not a particular unit. When a USB was introduced back in 1996, the ANB connectors were used. A is at the host side, B is at the device. Initially, the connector had only four pins. 5-volt P-bass, Grand, T-plus and D-minus. The D-plus and D-minus signals form a differential pair. Because there is just a single differential pair in USB 2.0, only half to Plex communication is possible. Media access in USB is simple. If the host doesn't ask the device for data, the device cannot send anything. When the host asks for data, the device has to pretty much respond instantly because the timeouts are pretty short. Full and high-speed timeouts are in hundreds and other seconds range. At high speed, in the worst-case scenario, when there are five halves in between the host and device, the timeout occurs if host does not start receiving response within 1.7 microseconds. Hopefully, the USB peripherals can load the timeouts in hardware. For example, when host reads data, the peripherals will knock the transaction unless the firmware has already armed the endpoint with data. Similarly, when the host writes data, the device will knock if the endpoint buffer is in empty. USB 3 adds two separate super-speed differential pairs, SSTX and SSRX. When the device operates in backward compatibility mode, it uses dedicated USB 2 differential pair. The super-speed traffic happens solely at SSTX and SSRX. USB 3 is dual-simplex, and it has one differential pair in each direction. USB 2, published in April 2000, featured three transmission speeds, low at 1.5 megabits, full speed at 12 megabits, and high speed at 480 megabits. USB 3 is not so simple. USB 3.0, published in November 2008, featured 8 gigabits transmission speed, the 3.1, published in July 2013, doubled the speed to 10 gigabits using the same connectors. The 3.2, published in September 2017, introduced two-lane transmission requiring new USB Type-C connector. The 10 gigabits can be achieved either on a single lane by doubling the frequency or by using two 5 gigabits lanes simultaneously. 20 gigabits connection uses two lanes, each operating at 10 gigabits. Super-speed devices operate on a completely separate bandwidth than USB 2.0 devices. All heavy bandwidth users, like networkers and storage devices, generally use super-speed nowadays. USB 4, published in August 2019, added 40 gigabits and USB 4 version 2, published in October 2022, added 80 gigabits symmetric and 120 gigabits asymmetric. Asymmetric makes sense if there is more traffic in one direction. For example, when via multiple monitors connected, USB 4 requires Type-C connector and is essentially tunneling protocol. USB Type-C connector is reversible, so we don't have to worry, we slide it up, as it works either way. In the middle, there are D-plus and D-minus signals used for USB 2.0. These are used by Zephyr USB device stack. The CC1 and CC2 form configuration channel and are used for USB power delivery implemented by Zephyr USB-C stack. USB power delivery can be used to change the voltage on the V-bus pins. The voltage can go up to 20V in standard power range and up to 48V in extended power range. The maximum current is 5A. To deliver standard power range, maximum 100W power but is 20V at 5A or to use extended power range. The electronically marked cable assembly is needed. The electronically marked cable includes eMarket chip in the plug but responds to USB PD discover identity command. USB power delivery is also used to enter USB 4 or configure alternate mode like Thunderbolt 3 or DisplayPort. USB 3 device supporting alternate modes can for example agree with the cost to use TX1 and RX1 for DisplayPort and TX2 and RX2 for USB communication. The SBU1 and SBU2 are side bind used for alternate mode. DisplayPort alternate mode uses SBU pins as auxiliary channels. USB4 uses SBU pins to negotiate USB 4 link parameters and to manage line equalization. USB4 devices are already available at the market and you might be wondering if it's worldwide to get familiar with USB 2.0. The truth is that USB 2.0 is not going anywhere. The backwards compatibility is achieved by dual bus and the upper layers are pretty much the same. Every USB 3 hub contains both USB 2 and USB 3 hub inside. USB hub is the only device that can operate at USB 2 and 3 speeds simultaneously. The new connectors including the USB Type-C contain dedicated USB 2 D plus D minus signals. All USB 2 rules apply on D plus D minus signals. There's a lot of devices but are fine with USB 2.0 speeds, keyboard, mouse or controllers. For example, Nintendo Switch Pro controller comes with USB Type-C connector but it is in fact full speed device. USB stands for universal zero. To be universal it might be able to support many devices. Many devices mean different needs. All possible transfer types are generalized into four types. USB supports plug and play and is able to detect what type of device is connected. Plug and play is possible thanks to control transfers. Every USB device knows how to respond to get the script or command. The script contains basic information about device but hosts can use to know how to talk to it. Control transfer can be also used for vendor commands, for example for volume adjustment. Control transfer is the only mandatory transfer type. Interrupt has nothing to do with interrupt in the classical sense. Interrupt transfer is intended to handle things that used to be handled via interrupts in the past. The hosts will periodically poll the device for interrupt data. The polls will happen often enough to meet the latency requirement. Failed polls will be retried. Example use cases for interrupt transfer are human-interferred devices like keyboard or mouse. Isochronous transfer are good for streaming audio or video. Isochronous transfer is periodic with guaranteed bandwidth but where is it and you retry or warranty of delivery. To transfer large data, the bulk transfer should be used. The data can be transferred the fastest using the bulk transfer. The catch is there's no guarantee about latency or bandwidth but for plenty application it doesn't matter. Example use case for bulk transfer is mass storage or network adapter. USB class defines the language host dogs with the device. There are some USB specific classes but are not similar to other protocols like hub or human interface device. Hit class is actually pretty complex but in my opinion it has successfully solved the configuration issued for basic input peripherals. We can pretty much be sure that the basic functionality of USB mouse will simply work after connecting it to the computer. Some classes are just a simple protocol. For example mass storage usually wraps classic communication device class wraps 80 commands, internet range or just plain serial data. Printed class wraps IEEE 1284 and there are also vendor specific classes for example FTDI USB to serial converters use vendor specific protocol. Moving on to traffic capture. The traffic can be captured in software on Linux with USB module, on Windows using USB pickup and on Mac using XAC interface. There are open source software open source hardware USB 2 sniffers available. Openbejla which is not only open source but also open hardware project and there's lambda concept USB 2 sniffers for which only the software is open source. If you have logic analyzer you can decode low and full speed USB signaling with zero. To my best knowledge there are no open source USB 3 hardware sniffers. If you are working on one please let me know. After loading USB mode module if we have permission to access USB mode the USB mode interfaces appear in Wireshark interfaces list. USB 0 interface is special interface but groups all root hubs. The other USB mode instances correspond to host controller interfaces. When we release devices using QLS USB we can see multiple Linux foundation root hubs. These are funny devices inside computer there is at least one host controller interface chip. All its hub functionality in Linux is modeled with funny root hub device. The extensible host controller is modeled as two root hubs one for USB 2 and one for USB 3. When we want to capture some specific device for example Zephyr MSC sample we can find it in LS USB output. Here it is device 17 on bus 3. Therefore mass storage traffic can be captured on USB mode 3 interface. LS USB shows 2FE3 as Nordic semiconductor because Linux USB USB IDS has incorrect entry. 2FE3 is in fact Zephyr project vendor ID and Nordic semiconductor vendor ID is 19105. I have reported this to Linux USB but it will take time before it is fixed. Note that the device number displayed for devices connected to XHCI host controller is not necessarily matching the address assigned to the device. This is because the address is assigned by hardware. Why the kernel controls that device numbers? USB mode captures contains the device numbers so don't be surprised when debugging firmware but address the firmware reports doesn't match the values shown in LS USB or in Wireshark capture. When Thunderbolt 3 docking station is connected chances are the Thunderbolt is used to tunnel PCI express data to extensible host controller embedded in the docking station. In such case there will be two additional USB root hubs visible in LS USB output. One for USB 2 device is connected to the docking station and one for USB 3 device. When Thunderbolt 4 docking station is connected to a Thunderbolt 4 public host then it is unlikely for the new root hub to appear. Devices connected to Thunderbolt 4 docks are visible in the system of the USB 2 and USB 3 host controller belonging to USB 4 controller. Capture engines can be integrated into Wireshark using the XCAP interface. I have ULB pickup CMD and OB XCAP copied into Wireshark extract directory. This makes it possible to see ULB pickup and open visual interfaces in Wireshark interfaces this. This screenshot is made on the same computer the one on the previous slide. You can see that on Windows there are only two USB pickup instances via Linux wherever five USB Mone instances. This is because USB Pico does not have the equivalent to the special USB Mone 0 interface and Windows does not logically split extensible host controller into two separate root hubs. Just like on Linux connecting Thunderbolt 3 docks results in new USB pickup interface while devices connected to USB 4 dock will end up on the existing USB pickup interfaces. When we click on the juice icon next to the USB pickup interface we can configure capture options. The options include snapshot lamp which is how many bytes of single records to save. Unless we know but application or device driver summaries large requests we should probably keep the default snapshot lamp value. We can also configure the capture buffer lamp. We can increase it in case we notice missing data but keep in mind that this buffer is allocated in kernel space within the non-page pool memory. By allocating capture buffer in non-page pool it is guaranteed that the buffer will always reside in physical RAM. Non-page pool is never swapped to disk so you should keep the capture buffer size reasonable. Next we might want to select what we want to capture. I usually don't capture from all devices but instead I capture from newly connected devices. I start the capture and only then plug in the device. By the way the capture contains all the necessary descriptors but these sectors can use. The option to inject already connected devices descriptors into capture data is useful when we are capturing data from device but is embedded into a system and have no easy way to disconnect it. USB pickup will then fake the device and configuration descriptors request into the pickup data. If we don't capture from all devices we can select individual device we want to capture from. The number in square brackets is the device address. Capturing USB in software on Mac is somewhat complicated. The capture is performed on fake network adapters but are visible in the system when booted with disabled system integrity protection. On Intel MacBook Pro from 2019 there are four such interfaces VAC128 which correspond to Apple T2 Pass but connects the touchpad, internal keyboard, trackpad, headset, ambient license, FaceTime Camera and T2 Contour. XAC0 which captures super speed traffic for devices connected to the USB C ports on the MacBook Pro left side. XAC1 which captures super speed traffic for devices connected to the USB C ports on MacBook Pro right side. XAC20 which captures low, full and high speed for devices connected to any USB C port on the inverse side. To capture the interface has to be brought up using if config. Just like on Linux, LSUSB can be used to determine to which pass the device is connected. The ever mass storage sample is on bus 20 because it is full speed device. My OS interface list is quite long and there are also four Thunderbolt interfaces visible. The Thunderbolt interfaces are visible regardless of system integrity protection state. However, the Thunderbolt interface can only be used to capture ethernet traffic on local network interface but is brought up when two Thunderbolt 3 or Thunderbolt 4 capable devices are connected. Thunderbolt 4 interfaces won't show any actual Thunderbolt Liar packets. There is just the ethernet traffic and no USB PCI Express or DisplayPort data. On the screenshot traffic is observed on Thunderbolt 4 interface because the rear USB C port on the right side was connected to USB C port on Thunderbolt 4 compatible laptop running Linux. The other option is to capture in hardware. Capturing in hardware is useful when you are debugging host controller driver issues or if both host and device are microcontroller based and we simply cannot capture in software. OVXCAP presents separate interface for each USB 2 speed. Perfor very separate interface for low full and high speed capture. Packet filtering can be enabled in OVXCAP option. It is quite useful to filter NAT transaction unless you are debugging some weird bug in the driver or the device firmware. Filtering NAT transaction will significantly reduce the number of capture packets making the capture file much smaller. When capturing full or high speed traffic we can also filter startup front packets. Every second there are 1000 startup front packets on full speed link and 8000 startup front packets on high speed. So how does OpenVisible work? On the left side there is USB type B connector that connects to capture host. The monitor device is connected to USB type A connector on the right. The USB type B connector on the right connects to target host. The link between the target host and monitor device is decoded by USB transceiver operating in passive mode. USB transceiver translates the differential signaling to ULPI. FPGI receives the ULPI data and extracts packets from there. The data is buffered in SDRA. FTDI USB 2 serial converter connects capture host with the FPGA. FTDI FT2232H has two channels. On OpenVisible one channel is used to load the B-stream into the FPGA and the other channel is used to transfer the capture data. The only non-volatile memory on OpenVisible is the E-prompt chip but stores FTDI configuration. The FPGA B-stream is always loaded from the capture host. Wireshark shows what the capture engine provided. For example the pick up USB mode provides USB packets with Linux header and padding. USB pick up provides USB packets with USB pick up header. OpenVisible provides the actual USB packets. The USB packets are described in USB 2 specification chapter 8. Software snifers can capture URPs captured to host controllers submitted to host controller. URP stands for USB request block. The Linux header and USB pick up header contain OS specific URB information. If you develop software sniffer for another system and want to use Wireshark for this section, specify the OS-dependent pseudo header and request link layer header type for it on TCP thumb mailing list. So what software snifers really show? Device driver summons URB, ACD handles URB and reports back to device driver. All software sniffer packets contain OS specific metadata URB ID and what? And overs. Software snifers spy on their interactions between device drivers and ACD. Both when sending and receiving data, USB mode, USB pick up, and mac xac interfaces capture two packets. This one is from host to device and contains information that the device driver submitted. The second is after host controller driver has finished processing the data. For control transfers, the first packet always contains the setup data. If the data travels from host to device, the packet will contain both the setup data and pile. If the data transfer from device to host, when the second packet will contain the data red. For interrupt, bulk and isochronous transfer, the payload is only in the one of the packets or rides. The payload is in the first packet while for reads it is in the second packet. If the read phase or is cancelled, then the second packet will contain only OS specific metadata. For reads, the first packet indicates that the device drivers requested host controller driver to start read attempts. This is useful when debugging host software because if there isn't any data coming from the device, it might be that the host is not asking for it. If the host doesn't ask for data, the device cannot send anything. Now let's finally start Wireshark. I will capture the every USB master example simultaneously with USB pickup and open the visual. Let me just open all the Wireshark instances and let's go. So on the left, I will be capturing with USB pickup. We can open the options. I have the capture from newly connected devices selected and on this generic USB hub, I have USB master device that for flash is connected. However, I will be capturing only from newly connected devices. So now I will disconnect the device. So I disconnected it and in no longer appears in the generic USB hub. So let's start USB pickup capture. And here on the right side, let's start open visual full speed capture with filtering that transaction and startup frame packets. However, I will be also dumping all the unfiltered data so we can compare the unfiltered and filtered. Let's start. Okay, now we can connect the device. Okay, I can disconnect the device now because I have a lot of traffic to analyze and I can I can stop it now. So let's start the analysis with comparing the filtered with unfiltered data. So when we scroll up to this filter data on the right unfiltered on the left, we can see that there are a lot of startup frame packets. There are so many of them. But yeah, if we keep scrolling, we can see startup frame every now and then. For example, here. And here, there are a lot of them. A lot, really a lot. So in Wireshark, we can filter them out with apply as filter not selected. I picked wrong. We want the PID apply as filter not selected. Okay. And now we can also change filter to be more human readable because if it's A5 is maybe not the clearest. So we can just type SOF. So we can filter out all the SOFs. And this removes over 2000 of packets. However, the difference between the filtered and unfiltered is still well over 6000 packets. So where this, yeah, why there are so many NACs? Basically, the way it works, the host controller, the SACI host controller is so fast, but it keeps asking device for data as fast as it can. So the device obviously cannot cope, especially this Zephyr device on Nordic, low power NRF 52 device. So the peripheral does NAC the packets multiple times before the software finally arms the endpoint and then responds with the actual data. Similarly, the over direction when the host is sending data to device, the device can NAC the data. And then the host simply retries. And finally, when device is ready, it acknowledges the data. So this is the big difference between the unfiltered and filtered USB link liar captures. So, yeah, we can see that this NAC is always, always is like the most common business perfectly fine on USB, but there's a lot of NACs. So let's compare now the USB pickup to filter data. So first, USB pickup starts asking the device driver to address 12, but link liar capture contains the get descriptor request from device zero. This is because USB pickup does not capture this data at all. But the host is using this to know what is connected to USB device. And to know the device max packet size zero. This is needed to correctly reassemble the link liar data packets into transfers. Because if this max packets, the max packet size for full speed device can be also value eight, 16, 32, or 64. If it was eight, then this data packet will be only the first eight bytes from here. And the following data will be in the data zero packets, but the host will have to read afterwards. Without knowing the max packet size zero, it's really impossible for a host to tell if the device intends to stop the transfer or if there's one more data packets coming. So then the host knows what the device is connected. It issues the set address and it sets address 12. So we can see, but further down the line, the device accepted the address and then we see transfers to request the get device descriptor. Get device descriptor length 18. So this is most likely the same request. And then when we see the response, the response is the same. Only this metadata here is nowhere to be seen in link liar capture. And this PID and CRC is nowhere to be seen in USB pickup. This lower layer is basically what's different, but this higher layer, this actual payload is essentially the same between the two. Here we see that there are a lot of, in this open visual capture, there are a lot of intermediate packets between the data that clearly corresponds to this USB pickup. We can simply filter it with USB filter. So now it looks more or less the same. Let's now move to the configuration request. We asked for nine bytes of configuration descriptor, but here the host was asking 255. So this is some different request. It asks for string descriptor and for device valid part. So this is a separate part of the USB host stack, but it's not captured with USB pickup. Only here, when it asks for the configuration descriptor, this is exactly the same request that we captured with USB pickup. And we get the same response. We get only nine bytes of the configuration descriptor because we only asked for nine bytes. The total length is 32. While USB allows us to basically request any length that can fit on 16 bits here, many devices fail when the request is too large. Therefore, the host asks for nine bytes and only then asks for the remaining. Here we can see, let's look at the configuration description because we have all the interfaces. This example is just the mass storage. So we can see that it's the mass storage with SCSI transparent command set. So we will be seeing SCSI commands and it's using the bulk only transport. The bulk only transport uses two endpoints. One is in to get the data from device to host and also the status and the other one is out to get the data from host to device and commands from host to device. Both are number one, but we can see that one is in hexadecima 81, the other is 01. These are really separate endpoints. One will be always access with the in token. The other one will be always access with the out token on the actual USB pass. Then we can see the string descriptor request and we are asking for two bytes and both show really this malware packet, but this is really a problem in Wireshark because this string descriptor is four bytes, but we only asked for two, so we got two, but Wireshark tries to dissect it further. But only when we ask for the four bytes we get the full response and Wireshark is happily showing, but this device supports the English United States language for all string descriptors. Then we ask for the string descriptor index three and in the configuration descriptor now in device descriptor we can see that a serial number is under the index three. So we are asking for the serial number and the serial number is here, so we can see that we get the serial numbers in both captures. Then the mass storage class driver is asking for maximum loans on the device. So we get just value zero, this means that we have only one loan, which is the default case for the Zephyr mass storage sample. And then we proceed to SCSI inquiry. We can see that this is from HOS to the endpoint one, the inquiry, it's SCSI inquiry, the mass storage part is just a simple wrapper and the actual SCSI is here. Here we can see that there is this URB pulled out, but it's not anywhere in Linglaya capture. This is because Linglaya has this old hack in a packet, but in URBs we are only capturing the instance, but the request has finished successfully. Here it means we get this status success only after the device acknowledges the outer data. If the device knocked the data, then the host will keep retrying until either timeout occurs, in which case we would see a timeout here, or until the device finally acknowledges in which case we see the success. So let's move to some USB with some more data, like this read, so some bigger read with 16 bytes. We can easily find it in the other window by looking for the actual comma by the transfer length that's applied as filter selected, and let's copy this here. So we can find it's this, we have the second request, these are the same. So let's see what is the difference between the two. Here we have read, pull out, pull in, but the host will start sending the int tokens, reading the data, and then we have this 8 kilobytes of payload, and here we have 64 bytes of payload, and we keep getting 64 bytes of payload until it is finally reassembled, and it shows where it reassembles, and we can see that this data here, this data here, is exactly the same as the data here, because it's the data that was read from the disk. And then we have the mass storage status. Here it's bundled together with this reassembled URP, but on Windows it is in the separate URP request. This is because the host really went submitting this in request, it knew what the total transfer length would be, but here Wireshark doesn't really pass this information to the dissector, so the dissectors keep reassembling the data until versus short packet or zero length packet. In this case, it is a short packet, but it's here, and this is the response status from this mass storage. Okay, so I pretty much covered what I wanted, so let's move back to the presentation, so we can get some summary. Okay, to the right. To sum it up, USB 2.0 is still relevant today, and most likely will be very forever, not only because there are multiple applications where USB 2.0 speeds are sufficient, but also because USB backwards compatibility with USB 2.0 is achieved by dual pass. Host initiates all communication in and out is always from host's perspective. Device cannot send data unless hosts ask for it, that is when driver submits in URP. Software snippers capture URPs. Every URP is captured as two URP packets. Driver to ACI includes data payload from host to device, if any, and ACI to driver includes data payload from device to host, if any. URP level capture is sufficient for general use. However, understanding USB packet level helps make sense out of the URP packets, so that's all for now. I'm waiting for your questions, so I will answer them. You will receive the presentation slides and also the captures what I took during this presentation, so you can check it out in-depth later on. That's all for now. Take care. Bye.