 Okay, so my name is Ray Kensler and I've been working on open source since around 1997 in one form or another. This past 10 years I've been working on user space networking. And user space networking has gone from something that's been hackery in the extreme in the very early days to something that's reaching some level of maturity now. So I thought it would be cool to have a brief presentation on what user space networking is, why it's a thing, and what maturity for user space networking really means. So right out again, I just prefer if you show the slides. Okay, so everybody's familiar with the kernel taking care, or the kernel on whether it's the BSD kernel or the Windows kernel or the Linux kernel. The kernel taking care of your networking needs. Well, in your kernel there is a network stack. What user space networking is, is essentially unplugging the network cards from your kernel, whatever flavor operating system you're using, and then giving it to a user space application. So pulling it out of the kernel so those network cards are no longer visible to the kernel and giving it to a user space application. Now, why the hell would you do that? The kernel's been around for years. It's really well tested. It's millions of lines of code. You know, it supports every network protocol that you could think of. And it's a real Swiss army knife of networking. Why would you pull the net servicing of the network cards out from something that's so robust and with such longevity and giving it to an application of a user space? If I dialed the clock back to nearly like 10 or 11 years ago, that's kind of what we were struggling with. We were trying to understand what software networking performance, how fast we could get software networking performance really to go. And we spent a lot of time trying to optimize the kernel. The kernel is, whether you're talking about the BSD kernel or you're talking about the Windows kernel, Linux kernel, it's a fantastic invention. It's got, it's a huge amount of code. But that was really the challenge is because trying to optimize the kernel, you were kind of pulling on one string in a giant knot of strings and really not understanding where your performance was going. And if anybody has spent time trying to performance optimize a really huge code base, you'll have some understanding of the kind of challenges that we had at that time. We spent a long time tuning here, tuning there, tweaking here, tweaking there. But we never really got a substantial performance improvement, which is really what a great user space networking allowed us to do. It was a green field that really allowed us to go back to real basics and just start out with device drivers. How quick could we get device drivers with no network stack to go? Turns out we could get them to go really, really, really fast. And that was, it was containing the problem statement. And if you've ever done a substantial amount of software benchmarking or debugging, you'll always find it's very helpful to reduce the problem statement just to that problematic algorithm. And that's essentially what we were doing with DPDK back in the early days was that we were really reducing the problem statement down to something that more manageable. Because the kernel, at millions of timelines of code, wasn't manageable. It was too much for us to understand all at the same time. We also had a user space tool chain in user space networking. Suddenly we no longer had to use more esoteric tools like KDB. We could use something as more simple, like a GDB. If you've ever, I presume everybody's room has played with GDB over the years. Well, it's a user space tool chain. It's very, very friendly to use. And user space tool chains are typically very friendly to use. Whereas debugging things in kernel space has a very high barrier to entry. Typically it's quite hard, as particularly 10 years ago was quite hard to do. Now things have significantly improved that way in that respect over the years. Most important was in performance in that we were able to use all kinds of microprocessor optimizations in user space that weren't permitted in the kernel. So at that time, and again things have improved since at that time, but at that time you were only really able to use GCC optimization level too. But in user space we could do all kinds of stuff. We could use vector instructions which allowed parallelism. It allowed us to process multiple packets at a time. We could do something called pole mode. So instead of having interrupts turned on which were quite expensive, we could pull the nick for network, for packets as they arrived. And we could do other things like stop context switching, because context switching is relatively expensive. We could do memory tricks like improve cache locality so that we made more efficient use of the cache. So there was all these kind of tricks that we could build in user space which was really at that time much harder to do in kernel space. It also gave us a tool to separate the control plane, the management plane, and the data plane. So typically if you're building a network device, you'll have a control plane and management plane that you'll manage the network device on, and then quite separately you'll have your data plane. Now if somebody, you know, some nefarious source decides to do something like a denial of service attack on your data plane and we hear about those more frequently, unfortunately, you don't want that to take your whole machine out or your whole network appliance out. So by giving the control and management plane to the kernel but then taking the data plane and putting it in user space protected by an IOMMU. And an IOMMU is a piece of hardware that's typically built into silicon that gives you very strong and robust isolation. So suddenly that data plane can fail and you can spin it back up again very, very quickly and it allows you to implement features like high availability that gives you a very, very robust system. Now if you had something like everything co-located in your kernel like suppose you had your control and management in your data plane all co-located in kernel, somebody achieves a denial of service attack on the kernel and then suddenly you've lost all three whereas by achieving separation, by putting your data plane in user space you have more robust system. And then finally on licensing, this isn't true for BSD obviously but for Linux the GPL or the GPL v2-ness of Linux was definitely an issue. There was a large group out there who were really interested in doing network innovations but the kind of encoulement of the GPL or having to give their software back under a GPL was a challenge whereas having something up in user space with more permissive licensing something like a BSD license enabled a lot more innovation. And then finally a community velocity and this isn't a criticism of anything it's just a really reflection of how really successful communities like FreeBSD and Linux have become. FreeBSD and Linux are everywhere, they've got huge penetration and as a consequence they're huge communities, really, really huge communities and as a consequence they move slower, they have to be moved slower they have to be more risk-averse naturally. So typically what you find with user space networking communities is they're much smaller communities by nature and they typically move faster, it's much easier to go from an idea to an implementation. Good Lord, how long have I been talking? Five minutes, okay. So here's a selection of user space network stacks and there's DPDK which is the granddaddy of them all. Five minutes, okay. We also have V-switches in user space like an open-flow V-switch, OVS, DBDK, tungsten fabric and SNAP. SNAP is actually interesting enough developed in the lowest. We have traffic generators for high-performance traffic generators and we also have user space networking sector and these tend to be complete layer 2, layer 3, layer 4 implementations. We have a grounds-up design called FIDO VPP and then we have another design called FStack that reuses the BSD kernel and puts it on top of DPDK. And then we have a number of high-performance TCP implementations, CSTAR and MTCP. So we've gone from the days that we just offered back in the early days a toolkit, hey, build your own network stack and here's the toolkit to do it. So now there's a whole community, a whole diverse community of different network stacks that do layer 2, layer 3, layer 4 implementations and above that you can go pick up and start innovating on top of. When I talk about performance, what performance can you typically get? Well, you know, in DPDK, performance is typically measured in the tens of millions of packets per second per core. So if you throw a single microprocessor core that might be an Intel core or it might be an ARM core, you can typically achieve data rates measured in the tens of millions of packets a second, which is very fast. We're using all of the optimizations I talked about earlier. And just as a proof point, I talk about the performance of IPv4 routing in FIDO VPP. You can see we get nearly 18 million packets a second for IPv4 routing on VPP on an Intel Skylake server. So performance is typically measured in the tens of millions of packets a second. This is really high performance stuff. Okay, so finally on to challenges. Well, user space networking is somewhat of an island. It doesn't interact well with kernel space networking and that's an unfortunate thing. So when you run IP link, you don't see the user space networking interface. When you run IP route or IP address, you can't manipulate the user space network stack. So the kind of tool chain that you use to manage kernel space networking is not compatible to the tool chain that you use to run user space networking. And that causes user experience difficulties. There's a high barrier to entry, so achieving these millions of packets a second I was talking about earlier, you really have to understand microprocessor design very well. It's not, you know, the barrier to entry is quite high to understand how you get really, really, really good performance. It just don't get it out of the box. It takes a bit of work. And then finally the last, our last challenge is really portability of API, the portability and the lack of standardization of APIs in that we have a huge community, but we have very few standards within that community. There's a lot of diversity in projects, but the kind of standards and the reliable interface is that the kernel gives you things like BSD sockets API, which has been around since the 70s or the 80s, which is, you know, a very well-known standard or the net link interface, again, like a very known standard. We don't have those kind of standards yet in user space networking, which is, again, it's a kind of a problem. So if I have, if I develop something against DevStack, which is one user space network stack, it's not portable to Fido VPP. And that's a, and that can be a challenge in itself. So there's a huge diverse community from these switches to traffic generators to complete network stacks from layer two to layer three to development toolkits that are there and that give you absolutely fantastic performance up in user space. There are challenges, but we've gone from something that was hackery in the extreme in the early days to something that, you know, there's a large, really large community of users that have built up around and are consuming and building and innovating on these things. So any questions? Go ahead, sir. Yes. So the, should I repeat? Those are two very good questions. So the first question was, how do I consider the future of SRIOV? And the second question is, what do I think, what, how does this interact with AFXDP or XTEBPF, those kind of technologies? So AFXDP is a great innovation. Really, really, it's, in some ways, it starts to resolve the challenges I talked about in the previous slide. So it offers, for the first time, an interface that lets you, you know, use the kernel stack in a modular way, but then also deliver packets to user space in a very high performance way. So that wasn't achievable before. You used to have to run user space to drivers, but it potentially has the potential to give you the best of both worlds. And people are finding that, people finding that very interesting. It might also be able to give us some kind of compatibility and interoperability. So AFXDP is very exciting. And we had the SDN room next door earlier today. And when the AFXDP preso, which was the first one was on, it was standing room. You know, it was, it was, it was a huge amount of interest in it. This is the future for SRI, future for SRIOV. I think we're definitely moving toward, moving towards more granular interfaces. So with VMs, the kind of 64 bit, the 64 virtual functions are the 128 virtual functions that these things typically give you. You know, now we're moving to containers. You're talking about containers or in the thousands compared to VMs that were numbered in the tens. So we're definitely going to need to see a more granular interface, a more lightweight interface in order to support those kind of containerized used cases. Something that gives you SRIOV performance, but maybe not as heavy a weight as SRIOV. And there's a lot of innovation going on in that area at the moment. Okay. I'm one minute over. Thank you so much.