 So my name's Luke, and this is Assumeew, and we're here to tell you about the joys and the virtues of writing your own Ethernet device drivers and how we do this in the SNAP project. And SNAP is very briefly a networking framework that you can think of as similar to DPDK, but it's a much smaller project. It's written completely in Lua, and it's driven by small independent application developers rather than large equipment vendors. And the three main reasons that we write our own Ethernet device drivers are the fear of eternal damnation. The pursuit of our own righteous destiny, and our insatiable lost for power, which I can expand on slightly. So the fear of eternal damnation comes from the belief that there is such a place as driver heaven, and there is such a place as driver hell, and we can describe these places quite vividly. So in driver heaven, when you have a networking application, and you need to support a new Ethernet adapter for some reason, you have a new requirement, you need some new hardware, you'll find a promising vendor, you'll go to their website, you'll click the link that says download device driver specification here, you'll get a file that's about 20 pages long, because it's not rocket science to just take packets in and out of memory into some lights, multiplexing. You'll read the spec and you'll say, okay, I understand that this is going to work fine for me, and I can write a driver in something like 500 lines of code, about the complexity of adjacent parser, because you know, it's just packets in and out of memory. But before you do that, you'd go out to GitHub and you'd see, you know, who else has written drivers already? You know, in driver heaven, GitHub has a lot of drivers, and and you're going to look around and you're going to look for something that suits your application, and it's going to depend critically on what domain you're in. You might be very, very, very sensitive to throughput, or you might be very sensitive to latency, or you might be very sensitive to packet loss, and depending on those requirements, it's going to really influence what driver would suit you. You're also going to be very sensitive to the platforms that are supported. Do you want to deploy, which operating system do you want to deploy on, and do you want to be in kernel space or in user space? Do you want to be in containers? And what programming language do you want to use? Because, I mean, these days as an application developer, you can use any programming language that you like, and everything can run in user space, so you could be programming in C, in Java, in Rust, in Go, in Lua, in whatever, and that's really your choice, and you know, you would find a driver that suits you. So then on GitHub, you either find the driver you want, and you embed that very quickly, and you're up and running, and you join that community, or if you don't find it, you just write it yourself, it's only 500 lines of code, and then you put it up, and the next people come along, and they join your community, so that's driver heaven. Driver hell is a bit different. In driver hell, you need a support in your network interface, you go to the vendor's website, and you click around furiously everywhere you can find, and you don't find any link to download the host to device interface. So you open like a support request, and you say, well, can I please have a copy of it, and you don't get anything back for some weeks, and then you make some calls, and pull some strings, and you find an account manager who you can escalate through and get some attention, and then they say, okay, very, very, very secretly, just for your eyes only, here's a description of the interface, and that's 1,000 pages long, it's not 20 pages long. And now you've got a problem, right, because you have a requirement, you need to ship with new hardware support, you've got a deadline, can you really put it on your critical path to read a 1,000 page manual, and then implement the driver, and then not be able to share the spec with anybody else, so that you're stuck maintaining it all by yourself forever and ever. And you can't do that, of course, so you're kind of forced to give up on the idea of writing your own driver, and then you go to GitHub and see what drivers are on offer, because you've got to use something off the shelf, and of course, nobody else has written a driver either, everybody else made the same calculation that you did, and there's nothing there. So the vendor driver is the only game in town, so okay, so you get the vendor's driver, and it's going to be written in some programming language, and it's going to support some platforms, and it's may or may not work in containers, and you're never going to understand it, and any time you have any kind of a problem, if your performance isn't at the level that meets your expectations, or you get some really, really strange production bugs that may involve the device driver, you're going to have to be resolving these through conference calls with your vendor, and that's just no way to live, right? So that's driver hell, and this brings us to the pursuit of our own righteous destiny. So we're standing at a crossroads, right, when we write an application. We have driver hell, and we have driver heaven, and we get to decide for ourselves which kind of route we're going down, and the very, very, very first step that kind of everything follows from is whether you're prepared to use hardware that you don't have any documentation for, that you can't see how it works. So once you've done that, you can't write your own drivers, you're also not going to be able to use drivers developed independently by other people, so you take the next step, and you're using something that's just from the vendor, and you're never going to understand it because you don't have the docs, and many of your colleagues are going to understand it, and from there, the vendor's going to realize that nobody else is really reading their code, it's just all going kind of blindly upstream, and they don't get any kind of constructive feedback from anybody, and everything just kind of comes off the rails, so we don't want to do that. So the alternative then, which is kind of harder, so driver heaven is not very far away, but it's a kind of a steep climb, it's a bit of a vigorous path, so the first step is to say, we won't use any hardware that isn't publicly documented, so if you can't go to the vendor's website and click a link and get the spec, then it's just not a valid option, and that's tough, right? Like, when we started, we only could use Intel cars, because Intel was the only company that put good specifications online for everybody to see, and then you have to read and understand these specifications, right? Because even like lovely Intel manuals, there are 1,000 pages, they're not 20 pages, and it might only be 20 or 30 pages of that that's really, really relevant to you, but it takes a lot of time to read through and understand that, kind of condense it down and see what the relevant subset is. And then you need to write drivers, and you've got to do this kind of in a group-wise way. You can't just have one person who wrote all the drivers and is the only person who understands it, you need to kind of spread the work around, and this is something we've done a lot in the SNAP world. We started with a driver, and somebody else came along and needed a spot for a related card, and somebody else needed some features, and we've kind of spread it around, so we have a lot of people who have done work on the drivers in one way or another and understand parts of them. And you need to engage with the vendors. You need to be part of a kind of a constructive dialogue, and the way that we're trying to kind of get into this driver heaven, we engaged last year with Melanox together with Deutsche Telekom, and convinced them to take the driver interface for their ConnectX nick and make a public version of the specification and put it on their website. So now, if you go to Melanox.com, you can click the link that says Programming Reference Manual and get the spec and write your own driver for the ConnectX cards, and a couple of people have contacted me and told me that they did exactly that because the specification was now available and it was fantastic. And you need to seek out kindred spirits because there's strength in numbers, so that's why we're here, right? So we're trying to get to driver heaven. We're not all the way there yet, we're kind of climbing as hard as we can, and it's, you know, we've got some stuff on GitHub. Maybe you guys could have some code on GitHub too. You know, maybe one time you'll find yourself in a position that it would make sense to write some drivers and join in this community, and that would be really cool. So that's driver heaven and hell, and that's maybe reason enough to write drivers, but thankfully as a bonus, if you're writing your own drivers, it also means that there are some applications that you can write that you couldn't write at all using off-the-shelf drivers. It's a really nice thing. Once you drop down and actually understand what the hardware's capabilities are, there are some things you can do that you just couldn't do before. And I have three examples from Snabland. The first one is a program called Packet Blaster, which is a load generator with infinite capacity. So it transmits packets onto the network, and the basic property is you give it one CPU per processor, and it is always IO bound. It will never run out of CPU cycles. So I have a screenshot here that you probably can't read, but it's on a server with 20 10-gig ports, and this is socket zero, and it's sending 14.88 million packets per second on each port, and this is the same thing on socket one, so it's about 300 million packets per second in total going out, just generating load for testing, stressing some application. And in H-top down here, we have 100% on one core there and 100% on one core there and nothing else on any of the other kind of 22 cores. So it's a nice thing when you need to generate a lot of load for cheap, like when you have a server that you want to benchmark or want to stress test in loopback mode. Another application we have is Firehose. Oh, sorry, the trick with Packet Blaster, the way that it works quickly, is that it never ever does any per packet work. So when it starts up, it fills all of the transmit descriptor rings with all of the traffic that you want, and then it just puts them all into a loop. So it takes no time flat to just keep telling the card to just keep on doing what it's doing, and there's no, so it's less than one instruction executed per packet transmitted. So that's the trick there. And then Firehose is kind of the reverse of Packet Blaster. This is a packet capture application that starts up and statically allocates packet buffers in memory, statically initializes all of the received descriptors to point to that memory and then just runs them in a loop. And every time a packet is available, it just synchronously calls a C callback. So again, it's only a couple of instructions executed on this application per packet, and every other cycle is available for the application. So these are both applications where it's just not conceivable to match the efficiency on any framework that's doing any work kind of per packet at all. And finally is an application called Sidespy. There's something new that I'm working on. Which does a kind of a side channel attack against an existing device driver. Side channel attacks are cool now, right? So this is solving a problem where you have a server with a bunch of network cards and the network cards can be used in different ways. The kernel might have a card, VPP might not have a card, a VM, SNAP, all everyone's got different cards. And so you have no kind of unified way to control them. But if you drop down to the hardware level and look in physical memory and inspect PCI registers directly, then everything is the same. So if you have a sideways monitoring application, it can see all of the traffic passing through and everything because at the DMA level, at the hardware level, it's the same. And it has to do this without actually disturbing the applications. And of course, you can't do that with an off-the-shelf driver. That's the why. All right. Hi, Anderson. So I'm here. So Luke told you about why we want to get to driver heaven. And I'm here to talk about, in this part, how we can sort of start to get there. And in particular, how SNAP's drivers work and how it's on the path to getting to driver heaven. And this part just gives a flavor of the implementation because of time I won't be able to get into too much detail. But let's start with sort of the big picture of the SNAP driver world. So I'm going to be talking about SNAP's Intel NIC driver. And that's about 1,485 lines of code of Lua. And that code is pretty high level. So it's not quite 500 lines of code, but it's getting there. And the nice thing about this driver is that we're using an implementation of Lua called LuaJIT. And Lua is quite high level. So the code is quite easy to understand. And LuaJIT, because of its tracing JIT compiler, is quite performant. And so we're able to get the abstractions that Lua has with relatively low cost. So I'm going to show you some code, showing you how this driver is implemented. And we'll just talk about how the received part of the functionality works. So briefly, a SNAP driver is an app like everything else in SNAP. And so when I say everything else, basically a SNAP program is composed of a bunch of apps that are hooked up in a graph like this. So for example, this has two instances of driver apps. And they're connected to some filter apps. And you can create various combinations of apps in interesting graphs that give you the functionality that you want. And so a SNAP app is really just a Lua object that has a particular set of methods. So for example, I might have a new method which does the initialization for the app and push and pull methods, which basically does things like receive and transmit for a particular app. So if we're talking about a driver, a driver is also just an object that has some methods. So in particular, let's consider the pull method, which is the part that implements the receive functionality. So yeah, so what the driver does is it maintains a ring buffer at the descriptor ring. And then the NIC will use DMA to send packets via the descriptor ring into the memory allocated by the driver. So how that looks is kind of like this. This diagram here on the right side shows a descriptor ring. And the first pointer there is the head pointer. And there's also a tail pointer, the second pointer there. And basically, the blue portion between the two pointers is the part that is available for use. So they're empty slots that the NIC can send packets into and the gray parts of the occupied portion. And the driver has to maintain this ring by allocating it somewhere in memory and then manipulating the registers on the NIC to set the base address for the ring and also maintain things like the tail register and make sure that it's pointing at the right spot. So for example, if the driver consumes a packet and then makes another spot empty, it moves the tail pointer down like this. And the actual driver code, how we manipulate these registers, is pretty straightforward. So in the code, basically to access a register, you use a call like the one here, which is accessing the self object, which is the driver object itself. And then it's accessing this dot r field, which is a table of all the registers that the driver uses. And then you can access the rdt field in that, which is an object that you can call to get the value that's currently in the register. And similarly, you can just call it by passing a value into it to set the register value. And this is using MMIO underneath using a support library that SNAP provides to do all the actual low level work. And this line here, it's just incrementing the tail pointer and making sure that if it goes past the end of the ring, that it comes back around. OK, so in addition to manipulating the registers for the descriptor ring, you also have to allocate the memory for it. And for the Intel card, each entry in the descriptor ring kind of looks like this. Half of it is an address to a packet buffer that's allocated by the driver. And then the other half is some metadata that Nick provides as well. So in the actual driver, we represent one of these entries using this data type declaration. And this is using the LuaJIT-FFI, which lets you basically use C data structures as Lua objects that you can manipulate easily. And then to allocate the descriptor ring, you can use these two lines. The first line is just computing the size based on the calculated size from the LuaJIT-FFI. And then the second line uses some support libraries that SNAP provides to allocate some DMA-friendly memory that we can use for the descriptor ring. And then given that setup, then we can write the main method that does the receive functionality for this driver. And it's just this code. So this is a little simplified from the actual code for ease of putting on a slide. But it's pretty similar to the actual code. And this method, so this first line here is the method declaration saying it's a pull method for the driver. And then in the first line, we're synchronizing, we're calling a method sync receive to synchronize the driver's basically copy of the pointers and the driver's, or sorry, the NICS view of the pointers. And then there's a main loop here, which is basically just looping over the maximum number of packets that we can put in an app link at once. And then on each loop iteration, we check if there's packets available in the ring descriptor. And if there's not, we just break out of the loop. And if there is available, if there are packets available, then we call this receive method to actually get the packet from the packet buffer we've allocated and then send it off to the next app in the graph in our SNAP program. And then finally, after we've done the receive, we can just allocate new buffers to replace the ones that we've read off of the descriptoring. And then I'm just going to show you one of these helper functions here that's used in this main method to give you an idea of what it's like. This is the receive method. And this is the one that actually fetches the packet that you want to read. And it's pretty short. All you do is you first take a copy of the tail register by reading off this RDT register. And then using that register value, you then index into the descriptor ring to get some metadata about the packet. And then you also use the same index to look up the actual packet buffer in the self.rx packets array. And then after you do that, you delete the packet from the array because you no longer need it and we'll allocate a new one. And then you use this self.rdt call to increment the tail pointer. So it's pretty straightforward. The code's pretty easy to read and it's very short. And all the helper functions on this previous slide here, like sync receive and receive and so on, are about the same length as this method here. So it's pretty short. And all this basically shows you that it's pretty easy to write a driver like this using Lujit. And using this kind of approach, you can do this in your favorite programming language and put it up on GitHub too. OK, so let me talk now a little bit about recent work we've been doing on this driver and some future work that we want to do. So recently, me and some colleagues added support for RSS and VMDQ to the driver I was mentioning. And the advantage here is that RSS basically lets you scale more easily to multiple cores. And the idea is that RSS lets you hash flows so that you can distribute them to separate queues. So pictorially, it kind of looks like this. You have a packet and it enters the nick. And the nick will hash the packet based on its flow characteristics, and then send it to one of multiple queues that are on the nick. And then it'll DNA that into the memory that the driver is allocated in RAM. And so basically, the idea is that we can have separate instances of the SNAP driver app and running on different cores. And that's how you can scale to multiple cores in SNAP in the current release. And I should also mention that a lot of the work for the RSS support was done by Pete Bristow. And then finally, talking about some future work, so the current Intel driver supports some 1G cards. And it also supports the Intel 82599. And in the future, we'd like to work on supporting Mornex, for example, the XL710 as well. Yep, so that's it. Thank you for listening. Any questions in the back there? Yeah, right. So I mean, getting the Melanox PRM release was like a two-year project. And the way we got it done is not actually us. It was Norman Kowalewski and Rainer Schatzmayer at Deutsche Telekom, who said, these guys are riding a driver for your card. If you don't give them the docs, the project's going to fail. We're not going to be able to buy it, so balls in your court. Yeah, so I don't know. The only way we've succeeded is to get a very big customer and who understands what's going on and does not. I think the challenge is, oh, sorry, repeat the question. The question was, as a community, how can we get vendors to release, prevailing vendors to release their descriptions? You know, normally, you just like talking to a brick wall. And I would say, to frame the problem, I would say vendors don't care about people, just people contacting them off the street, right? They're driven by their key account managers on their big accounts. And the problem there is that the big accounts, the vendors will fall over themselves to help them. So they're always already getting what they want. So the problem is that this is not widely distributed. So the only thing that works that I think is you need to find someone important, a Deutsche Telecom, a Google, a Facebook, someone like that, who will go and say, you've got to do this. And like in the Melanox case, it escalated all the way up to the CEO to sign off on putting it out. And they had to make a whole new revision of the manual with the subset they're familiar with and everything. So talk to your friends who are executives in big companies. And well, and tell them that, tell them that when the specifications are open, that means there's a lot of development being done. The little guys as well, a lot of the actual interesting innovation comes from and they're huge beneficiaries. The big fish are actually really big beneficiaries of all of this. You think of all the code they get from Linux and everything. And I think probably a lot of companies are not conscious that it is a problem for the little guys because the vendors are so sweet with them. So maybe we need to build awareness with the big companies about how easily they can can solve the problem and what the upside will be for them. So the question was, how big is the support library that the driver is using? So it's actually pretty small itself. I don't have a number of lines off the top of my head. Luke, do you know? Any other questions in the back over there? So can you repeat the question again? Oh, I see. So the question is, what happens if two workers tries to process the same packets? That's not an issue in this case because two workers shouldn't be working on the same received queue in this setup. There'll be each received queue that gets its own app. Any other questions? Oh, in the middle. So the, sorry, can you say that one more time? Oh yeah, so the question is, the driver code that I showed is very synchronous and do we need to use asynchronous programming in other cases? So not in the particular kinds of code I showed here but in the configuration code for the NIC, or for the driver, sorry, sometimes we need to use more asynchronous styles of programming. We need to sometimes coordinate between different instances of a driver that are using the same configuration registers and there we need to use more kind of concurrent programming ideas. Yep. Any other questions? Okay, I don't see any other questions. Thanks.