 All right. Thank you, everybody, for stopping by to my talk. This is more of a higher level talk compared to the previous talk that Johan gave. This is basically an overview of an experience that I had with a client implementing a custom USB device controller driver using Zephyr RTOS. So a little bit about myself. First, I'm an embedded software consultant based out of New York City in the USA. The design work that I've been doing, I've been working professionally for about 15 years as an embedded software engineer, started my consulting company in 2019. So we're looking at about four and a half years now where I've been doing consulting work. Most of the design work that I've been involved in have run a number of different verticals, including medical devices, scientific instruments, which have mainly been in the agricultural space, of course, automotive, defense, and because of where I am geographically, a lot of consumer electronics for a lot of startups. And of course, my expertise and experience more recently has been with RTOS based systems, of course, including Zephyr, and for most of the time that I've been working professionally, a lot of it has been in the metalinics, mainly using Yachter project based BSPs to deliver software solutions for a lot of my clients. And on the application side, I've been working mainly with QT based applications. So essentially just running the entire gamut of the embedded software stack, right? From RTOS based to Linux, kernel space based applications, and also now a lot of user-based applications as well. You can check out my social media, LinkedIn and Twitter. I hope to get connection requests and follows from all of you. It'd be great to just connect with everybody and just have a conversation online. And I've also been recently getting involved in trainings and workshops. You can sign up for my newsletter, where in addition to getting a schedule of when these workshops and training courses occur, you can also just get insight into some of the things that I learned throughout the course of my project work. And of course, you can reach out to me via email and check out my website. So for today, this is a simple agenda about what we're going to talk about. First, we're gonna talk about the project, just look at the system architecture of what we're trying to do or what we were trying to do, the problem statement. And then we're just gonna go over a really brief USB primer. It's just meant to kind of level set of the conversation and understand what we're aiming to achieve. Then we're gonna talk about the initial approach that we went through in implementing this custom driver. Some of the challenges that we faced, how we debug those issues initially, what steps we took, what tools we used, what the final resolution was, which in my opinion, wasn't the greatest resolution, but we had to do what we had to do to get a solution out to do it to a customer. And then next step, right? If I were to redo this project, which I'm actually working on right now, again, like on my own time, what sorts of steps that I could have taken to improve the final solution and what are actually the next steps that I'm working on? And then we're going to conclude with questions. So just quickly going over the project and what it entailed. So the client came to me and said, oh, we have an encluster based, it's a Xilinx based board, encluster makes basically production ready, so they're not development boards, they're actually boards that you can place into final solutions that are ready to be delivered and they're production ready. They had initially UART console, and so essentially it's a zinc based platform that has a hard ARM core and they just have a UART console to it and ethernet essentially. So those were the main initial peripherals that they had on their system. And this is a diagram of the system architecture. So we have on the bottom right an ARM hard block. And so this originally had the implementation that they wanted in terms of all the peripherals on their MCU based solution on the FPGA. And their goal, their target really, they created a custom, well not a custom, but they used an off the shelf risk five core and they wanted to create an application around that for their own kind of project. And they used Xilinx IP to serve as the USB controller for device site. And again, this is a hard block and they had a UART controller as well. Again, Xilinx IP, another hard block that both of those essentially tied to the risk five core and the ARM core was not only just used for debugging, but also did the risk five core implementation by just accessing the risk five memory but it also had access to the memory registers and the registers of the other hard blocks. So we could poke and peek at the different registers on the USB controller and also on the UART controller just to see, hey, what was the state as we started to implement and test out the implementation? What were the actual registers that the hardware reported for debugging purposes? So like I said, the existing system, it was based on an Enclostra based Xilinx FPGA board. It had a hard ARM core, it had UART console, Ethernet and then the big component for this project was the USB CDC. And so their existing system from a software perspective used free ARTOS. So they had two versions of the implementation. One is the final, which again was a simple implementation where they just had a few peripherals that they had on their system. They had a Blinky basically just an LED that just demonstrated that the system was up and running. The USB CDC, which was the focus of this project, a network interface and then for the debugging implementation, again like I said, they not only had access to the risk five memory but also registers access to the other hard blocks as well as USB, UART, et cetera. And of course they had a network interface to basically, so essentially the way that you loaded going back to here, the way you loaded, let's say like in this case we were targeting the risk five core for the Zephyr implementation is the host side would actually communicate over Ethernet to the ARM core and then you would basically transfer the Zephyr binary to the ARM core. You would issue a command and then the Zephyr binary would essentially write the Zephyr application into the risk five core. And so that's why we needed an Ethernet or a network interface on the free ARTOS implementation on the ARM side. So what, and if you looked at, if you sat into Anna's talk, this keynote this morning, kind of the problem with a lot of ARTOS, especially free ARTOS now is that it's not cohesive unlike Zephyr, which you have a USB stack available for you. So they needed to actually incorporate a separate stack. They used tiny USB for this. It has USB host and device capability and it's targeted for embedded systems. So it's memory safe. There is no dynamic allocations. It's thread safe so that all the interrupts are deferred so essentially you would have to implement some sort of task to go through and address the interrupts that have occurred. And it's cross platform and open source. So this is kind of similar. So a lot of the USB stacks, this is similar to how Zephyr, like we saw in Johann's talk, how it's implemented in Zephyr. At the top you have the application. So this is a custom implementation that we have. We have the underlying stack itself, which is essentially the middle region. And then on the bottom we have the actual MCU port. So essentially in their implementation and their free ARCTOS implementation on the ARM itself, they just implemented the two ends. They had a task at the top and then essentially they just implemented the interface to the Xilinx hard block at the bottom. And that's it. That's all they really needed to do. So just really quickly just looking at how tiny USB implements a lot of the abstractions for not only OS but MCUs. Basically you just have to enable, and this is important because we'll see later on like what we ended up doing to get this work with Zephyr, which is a poor decision on my part, but we'll get to that later. So essentially all you need to do is just set a macro to enable certain the OS abstraction that you want. So if you want to use free ARTOS, you just specify the appropriate macro and then you have all of the essential free ARTOS primitives that you want to implement in tiny USB. Similarly, so here we can see basically this OSL MS2 tick and if we enable free ARTOS, that's essentially what we have. We just have a really thin layer to the underlying free ARTOS implementation of like counting a tick essentially. And then they also have support for different MCUs. So again, leveraging a lot of the underlying microcontroller based on implementations of like leveraging some of the synchronization primitives, you can enable using hardware based primitives by specifying the appropriate MCU type that you want to use, right? And so for Xilinx, what they did is they just added another option for Xilinx in tiny USB and then all they did was just okay, set that configuration option. They had a thin layer to implement some of the Xilinx primitives and that was that. So basically in the tiny USB implementation, just added another directory for Xilinx and that's all that was really needed. And then from the other side at the application layer, just implementing the appropriate tasking to address the device interrupts as well as some of the mechanisms for the CDC implementation. So right, so not only did we have to enable it for Xilinx, but also just set it for FreeRTOS to set the config option for FreeRTOS, enable the CDC class and then implement the relevant application code to manage the CDC interface as well as the device interface. So like I said, just to summarize the application, what it ended up being is just like an echo application. Basically you just have a task that's receiving data over the USB CDC and then just echoing it back, transmitting back over USB. And then for the MCU port on the bottom side, on the low level, you have the interrupt handlers and then for initialization purposes, writing the appropriate registers in the controller itself in response to tiny USB function calls. So the problem that they came to me was, hey, we wanna take this, we have a RISC-5 core, we wanna actually use the Zephyr stack now to work, to basically do, to interface with the Xilinx USB device controller and we wanna do the same thing that we have where we have a CDC controller and we just wanna be able to echo data and then ultimately not only just echo, but we wanted to take measurements about in terms of performance. So later on there was like an IPERF application that was built to start exercising because we have to do, for higher speed, for bigger packets, we need to do DMA instead of like asynchronous function calls. So just use the DMA controller also in Xilinx and they wanted to be able to validate well, how well is the DMA controller working? Is that all configured properly or are there some bottlenecks that we're not aware of? But this is going to focus more on the asynchronous, just sending a single character, really small USB packet, making sure that that's functional. So again, that was the goal of the project, it says, okay, we wanna implement this now using Zephyr. So just a real quick, before we dive into what we did, just a quick overview about how USB works. So it all starts with when you plug in your device into, let's say your PC, your computer, right? The host side determines that, oh, like something is connected, right? There's some device out there, so it starts interrogating the device. It identifies essentially the type of device that it is and the services that the device provides. And then once it identifies, okay, what sort of device it is, from a hardware perspective, what sort of services, quote, unquote, from a USB perspective that the device offers, it loads the appropriate host driver on its end, right? So there are a couple of USB transfer modes. There's a control and we'll get into some of these later. There's the interrupt, there's a bulk, and then there's the isochronous. So how does that integration process occur? So it's essentially the basis of a USB transaction. There's a token packet, so there are a number of different packets that make up a USB transaction. There's the token packet, which essentially just has a header and it tells the device, okay, there's more stuff that's about to be, that coming your way, so get ready to start processing it. There's the data packet, which can be optional, but it does contain the actual payload itself. And then finally, there's the status packet, either that's reported by the device or the host to tell it that, oh, that what you asked me to do was successful or sometimes I'm not ready yet, can you come back to me later, which is essentially a knack, and then let me continue to process that data and then come back to me later. And then there's also mechanisms in terms of a status packet for error correction. So when I first got started with USB, this was a mindset that I needed to have, because everything starts with the host. It's a bit weird because when you're coming from other protocols, it's something to get used to and you have to basically form your mindset when you're looking at a lot of the USB traffic, like understanding what's going on, you have to realize that the host controls everything, so there's nothing that the device can do unless the host actually requests it. And so this is also reflected from the language as well. Some of the language that we see in USB, it's derived from this mentality that, oh, everything is from the host perspective, so just keep that in mind. So just looking quickly at a USB packet, so there's some standard fields that exist in the USB packet, there's the sync, which is again in almost all communication schemes, there's something that exists, whether it's wired or wireless, there's something that exists to tell the other party that, hey, this is a specific region in the packet, use this for synchronization, or I'm telling you, I wanna start communicating now. There's a packet ID, which instructs the side about what type of packet it is. There's the address, again, because USB is a bus protocol, you need to tell the device that, hey, the host needs to tell the device that this is meant for you, and then there's the endpoint, which we'll get into later. And then the token packet, again, understanding the language right in means that the host wishes to read data from the device, so it's from the device to the host, and then the out, you know, that terminology and out token packet is essentially the host wants to write data to the device, or send it, or transmit data. Right, it's all transmission, so transmit and receive is hard to kind of conceptualize, but you know, I think understanding these specific terminologies is helpful. And then finally, setup basically just is an instruction from the host that advice telling it that it wants to start an actual transaction. And like I said, there's the final portion of any USB transaction is the handshake, so an act is an indication that, oh, that the packet has been received, I'm working on it, right, and you know, everything is good to go. An act is usually like, hey, you asked me to do something earlier, I'm still working on it, like you asked me for a response, right? The host is, when it asks the device to do something, it wants to know, is it done, or when it wants to read information from the device, it says, oh, okay, you know, where's that information that I asked you, asked from you, and the device can sometimes say, you know, I'm still processing, you know, it took me some time, but you know, just come back to me in a little bit. And then a stall is usually when something horribly has gone wrong, hey, host, you need to step in and like, you know, do something to basically correct what I'm doing. So kind of a quick graphic about what the general gist of a USB transaction looks like. So we have the token packet that comes in, the actual data packet, which can be optional or have a payload in it. And then finally, we have the handshake packet that contains the result of the actual transaction. So going into briefly about endpoints. So again, if you look at the arrow, it's essentially, you know, everything is initiated by the host. So the host is always asking for something or telling the device something. So let's say, you know, initially, the device gets an interrupt that says, from the host, it says, oh, I have data on endpoint one out. So the device recognizes that interrupt, right? It goes to the appropriate buffer that corresponds to that endpoint and for not only that endpoint, but also that direction for that endpoint. And then once it's done processing, it just dumps data into, let's say the endpoint one in buffer, right? So now that data is sitting there. There's no mechanism for the device to let the host know that, oh, you can read now. It's whenever the host decides to read later, oh, you know, I wanna see if data is now in endpoint one in, it's right there, right? For the host to retrieve. There's no mechanism for the device to let the host know that, yes, I'm done. Here you go. Right, another important thing to keep in mind is that optional payload in the data packet is a corresponds to what's a zero length packet, right? So let's say the host sends out an out token to send data to the EP1 out buffer, as we saw in the previous slide. Now the next time, so it wrote, the host wants the device to do something. The host to see if the device is ready or has parsed the instruction and has a result ready, the host sends an in token to check whether, like, okay, did the device receive it okay, right? And it uses the zero length data packet because there is no information, right, that the host wants to sell the device. It says, hey, is everything okay? Did you get it? And then, so it uses that zero length packet. And then so if the device has successfully received the packet and is on its way to processing that information, it just responds with an act. And then so, like, and we continue the device, process the data in the endpoint one out. So if, let's say there was an error, the device didn't expect the data in the packet, it just responds with an act. Otherwise, if everything looked okay in terms of processing that data, the device just responds with an act. Similarly, let's say on the other side, right, the host wants to know, okay, enough time has passed by, do you have data for me to actually read? So it sends an in token to read data from the EP1 in buffer. And so afterwards it sends the out token to see if the device is ready for the next transaction. And again, it uses the zero length data packet to see if the device is actually done and ready for the next transaction. So just the zero length data packets that I learned during the course of this project was, these are pretty important. There's some mechanism, there's some way to, by the host side, to enforce flow control in USB, essentially. And so just some more kind of information about USBs that all devices must support endpoint zero. And so it receives, a device receives all control and status requests during enumeration using endpoint zero. And it also, this endpoint allows the host to identify the functionality that's provided by the device and to determine which other endpoints are available as well. So the enumeration process from my experience with this project has been that the device descriptor request is one of the first things that happens. And so you have the setup token, the data in that setup token is the device descriptor request. And then you have the next process, which is the host actually requesting, determining which are the descriptors that the device actually returns back with. And then the out token is really just, okay, are you ready? I have all the descriptors of the endpoints and services that you support. Are you ready for the next transaction? So the initial approach, leveraging, again, what Johann was talked about was we implemented the device callbacks, again, from the lower side and also using the low level API and the high level API. So just going about, I just started implementing some of these callbacks. And from the host side, we used Windows for just the CDC host driver, plugged in the device, right? And then we checked her term to see if the character is being echoed. And before that, we essentially just use the Windows device manager to say, okay, does it show up as a compor, right? And we use print case essential functionality like just to determine whether all the callbacks and functions are being exercised, whether they're being called. But we're not done, right? So of course, when we try this, as we all know, it fails the first time, right? We plug it in, right? And Windows just fails on the initial enumeration process. It doesn't know what, which during the device descriptor request, it just fails immediately. So what do we do, right? There's so much further that we can get with troubleshooting using print case. So we needed a way to independently track the USB transactions. So we used this tool. It's a pretty powerful tool. It can basically just plugs in line with your USB device and then you can essentially just store all the USB traffic that's going from the host to the device and see, okay, where are things going wrong? And so this is really just a capture of the traffic that's going back and forth with our device and with the Windows host. And there's a lot of information here initially. And so you tend to see a pattern. So what we ended up seeing was that, and do I get here? Yeah, and then the other, I'll talk into what we actually saw with those captors. And the other thing we actually ended up using was Xilinx's internal logic analyzer. And this was, this is like a hardcore like FPGA diagnostic tool because the print case aren't helpful. The tool isn't helpful because you don't know exactly what's going on, like why the actual FPGA is responding the way it is. And so what this tool allows you to do is with really fine time precision, see the actual signals in the FPGA to help you determine, okay, from the controller side, like what's going on? And the other thing that we use is also the ARM core, right? Like I said, the ARM core is actually hooked up to the USB controller. So what we would do is we would actually, you know, this is really naive and barbaric way of doing debugging, but we would like have like infinite while loops at different stages of the USB initialization stack in Zephyr and then just start poking like, okay, from the ARM side, okay, we have access to the USB controller register space. Let's just see what it thinks should be happening. And so that was another mechanism that we used to troubleshoot like what's going on. And what we learned was through these kind of troubleshooting steps was we didn't actually set up the Xilinx controller properly. It was always sending out Nax. And so what would happen is the host would try, try, try, try to do the initial device descriptor request. And because we didn't configure the controller property, we would always be sending Nax out. And so eventually, right, the host has to give up eventually, can't keep requesting forever. And so we realized like, okay, we actually didn't configure the registers correctly. Like I said, we also used the ARM core to basically peak and poke the controller registers to actually determine what the actual registers were. So what did we do, right? Fortunately, we were running out of time, right? This was like, you know, we tried, couldn't figure out what was going on, like how to appropriately implement our driver. So we did, we, you know, we took a really hard pill, it was really hard pill to swallow when we just decided to port tiny USB to Zephyr, right? We completely sidestepped to Zephyr Sack. It's a pretty, I'm ashamed of this implementation. Like I really, I literally had a bag over my head for a couple of days. My daughter was asking me, you know, why am I so sad? And so we created, and it wasn't too bad, right? We created a new OS port for Zephyr, right? Again, we used essentially, that was why I kind of pointed out all these different layers for the different RTOSes. So we created an OSAL for Zephyr. We just implemented a lot of the primitives that tiny USB uses for, you know, tasking, for delaying, for getting ticks, just using the Zephyr primitives. And we, at the application level, we just created two Zephyr threads. One is to manage the actual device transaction, right? The controller from the FPGA side to set everything up, okay, when interrupts are fired to defer those interrupts and then, okay, come back around and start acting on those interrupts. And the other thread that we created was just a simple application thread to just echo the CDC, right? So we have CDC data coming in and then transmit that back over to, on the Xilin side and just write it to the register. And when we did that, it worked, right? So ultimately, that implementation, we were able to get a COM port for open windows. We were able to echo back characters from our Xilin implementation back to the Windows host and everything seemed fine. But, and so we're done. For now, yes. And so some of the next steps were, this was one of the first times that I got involved in Zephyr. So this was about a year, year and a half ago. So I was naive at the time. It's like, oh, the documentation says just implement this, right? Just implement these callbacks. I didn't, thinking back, I should have taken more opportunity and I think I've grown in the past year where I've learned to actually dig through the code and actually get a sense of what's going on. And so at that time I was afraid. So just some lesson like, it's just code, right? It's not gonna bite you. Nothing's gonna blow up hopefully, right? It's just not gonna work, right? So going back, I probably would have actually read more code to understand how the USB stack works and what the best way was to actually incorporate the underlying Xilin's device driver into the USB stack. I mean, one of the other requirements that they had was to have like a cake config and a device tree options and that worked, like I'm happy I did that, right? Like I had a little, like when you ran the cake config GUI like the Xilin's driver showed up, which was nice. But happy customer, right? They actually just reached out to me before I hopped on a plane to come here and say, hey, are you free like mid July to like work on another Zephyr project? So like, okay, great. So it's not kind of the perfect solution but in the commercial world you learn at some point like, hey, whatever works and makes your client happy. Like, you know, you gotta do that in time but I'm not done yet, right? So sure, client is happy but I'm not entirely happy so there were two options, right? One is to port tiny USB to Zephyr properly, right? I just implemented just hacked in just the Ocell there but that's pointless because we already have a pretty neat USB device stack, why not leverage it? So what my next step is to actually incorporate the Xilin's USB driver into the Zephyr stack and properly, right? And sure, I'm not gonna use the RISC-5 core because I don't have access to it but I can just use the ARM core, right? Just create like a simple implementation using the hard ARM core on a Zinc and then just incorporate the device controller and then just implement my driver the proper way and just, you know, I have a Beagle analyzer. I can just use that to, again, go through the same verification process and I don't have access to it in cluster but I do have plenty of Zinc development boards and so that's kind of like my next step. Any questions? Nope, cool. Well, I'm gonna take like one last pitch everybody. Yeah, cool, thank you.