 Okay, so we shall start now. Thanks all for coming for more networking fun in this room. I'd like to welcome Tadeu Cascardo who's going to tell us more about OpenVSwitch in user space. Thanks and go ahead. Thanks a lot. So I got to talk about OpenVSwitch, the architecture, design of how it works, how the demons work. And I had a hard time figuring out at which level I would show you that. I wouldn't like to show you lines of code that's too boring. Lucky you I won't show you. If you want to, you can start digging because it's like 400,000 lines of C code. About 800 files including C headers plus another 600 files of tasks and building and stuff like that. Documentation. I only know a very timepiece of that so I am wrong on the internet right now, right? So please correct me if you get a chance later. And sorry that's not a talk about TPTK if I'm going to mention about that. So for those who don't know and just so we got an introduction on what you're going to talk about that. So OpenVSwitch can be programmed with OpenFlow. It's a network switch for virtual networks mostly. It can be used for real hardware switches. So I'm going to show you how that could be done. So if you have plans in your future to work on a hardware switch with OpenFlow, good luck. It supports data path acceleration for Linux. Jerry has talked about that. I'm going to talk more about user space here. But I'll show you how the design allows that data path acceleration to work. And since it's a virtual switch you don't have the ports hard coded as in a hardware. So you need a way to configure those ports and OpenFlow does not allow you to do that. So they built another protocol. It's an RFC. It uses JSON RPC. So there you go. JSON is embedded in your switches right now. So it uses something called OVSTB which I'll show you a little. So here's a, well it's not a 10,000 mile. It's three kilometers. Three kilometers of open V-switch. So you'll see there OVSTB switch D is the main component here. When you're using Linux you'll set up the flows. So there's that communication right there. It has the OVSTB server which serves all the configuration of the switch. And the yellow boxes are the utilities that you can use. So OVSTOF-CTL is basically a very, very small OpenFlow controller that you can use to program OVSTB switch D. OVSTVSTL is a very small OVSTB client. It's used to program the OVSTB server which talks to V-switch D. So it notifies V-switch D that the configuration has been changed and then V-switch reacts to that. And the DPC-TL allows you to carry the state or change the state of the data path acceleration in the kernel itself. So here's your three kilometers view. And before I forget to mention OpenFlow controllers, that's how you're supposed to program V-switch D. So here's one better view. I considered putting the complete call graph of the code here. But I guess it would take many hours to generate. So here's a little sketch of some of the things. So I won't bother you with these because it's hard to read. There are better ways to read into this. I'm going to show you that some Arsky art. So I'm going to show you a better view of these, exactly these. And going to explain what every piece here does. But there are many things here that are missing because from one mirror you can see them. It's all zoomed. So you'll see all of these and then realize that's a lot of stuff. You've got to take a step back and start looking at how everything works. I'm going to mention some of the features that you may be interested in. And I may give some pointers on where to find the files to work with that. So here we go, OVS2B. So what is it? It's the protocol. As I mentioned, it's documented on RFC 7047. It allows you to create new data paths, new bridges. Configure the controllers that are going to be used to communicate with those bridges. Then you create and add ports to bridges. Create queries, quality of servers, and collected stats. So I'm going to give a quick demo on that on how I use OVS2B. So let's take a quick look here. So OVS2B CL is the OVS2B client that talks to database. So here we are not exactly talking to the switch demo. We are talking to the database server. So it shows here just a summary of the data paths and bridge that we have. We have a bridge br0 which has port foo1 and vxl0 which has some options there. Bridge out br0 which has two ports, internal one, and internet interface. So here's how you get, for example, the stats for ENS3. So you have zero collisions, number of bytes received, bytes transmitted, and all this stuff. So the OVS2B allows you to carry stats as well. It's database has a schema and stuff like that. So get allows me to reach a table which is called interface. I want the column of the table which has the ENS3 name. And I want the line ENS3 and I want the column of statistics. So then you can set, for example, OF port request. Allows you to tell that for that given port you want open flow to recognize that port has the number three. So you may have seen a lot of open flow rules that specify that a given flow is going out for port three. So what is port three? So what's that port? So that's how you specify that ENS3 is going to be given that port number. So open flow on vSwitchD, it's not persistent. So you're supposed to use an open flow controller for that. Packets can flow to and from the controller itself. SwitchDemon may receive a packet and have a flow there that tells it, well, I don't know what to do that or the controller has more logic to decide what to do with that. It's given to the controller in-band or out-of-band and then you get it back. OVS has a lot of extensions. Most of them really, really needed because open flow is not enough for a lot of things that OVS does. So one of the examples, I guess, that they have a hard problem that sometimes the packet flow to the controller gives it back to the SwitchDemon, which should not follow all the tables all over again, should continue from where it stopped. So there's an extension that's planned and it's probably going in 2.6. Basically, open flow allows you to configure a set of matches and actions. So for a given package that matches that given flow, you've got to execute some actions. One of the funniest actions there is is normal action. Because the open flow specification does not tell much about what you're supposed to do with normal action. Basically, oh, it's like what a normal switch would do. So what a normal switch does. And in OVS case, we do IGMP snooping, for example. So this is one of the things that normal action is going to do. And if you do not use it, you're going to lose it. So here are some examples. If you've been to three talks already that showed some open flow rules. I'm going to save you from that. So porting. So here we go into more detail into how the SwitchDemon works. So there's this file, porting. It's a markdown file where it can read. Most of the things that I'm going to talk here, I'm going to try to go into some detail and try to show you how exactly it works. So the SwitchDemon basically reads from the database and talks to OF Proto. So OF Proto is something that's below the demo itself. Oskyard. Much better than the other one, right? It's the same thing as this one. It's the same thing. This one is much, much better. So basically the Switch is talking to OVSDB server, getting all the configuration, and then it uses OF Proto to tell the underlying layer to set up the bridges as they're supposed to. Okay, that's easy, right? But what does OF Proto provide? So basically it talks to open flow controllers and talks to an OF Proto provider. So the OF Proto provider can be almost anything. So for example it could be a hardware switch. It could even be another switch to which you talk using open flow itself. Okay? We've been a very nice driver for OF Proto. So you receive open flow and talk to another open flow switch. Okay? It would be very fun to do that. Well, and the other thing that OF Proto does is talk to NATDAV, which is a very simple library which wraps communication with NATDAV provider. And there's a lot of them. So if you want to add a port to a switch, you'll set it up in the database. Now the switch demon knows that you need to add a port, tells that to OF Proto. OF Proto creates a NATDAV device, gets its MAC address, saves it up in the database. Okay? So there's all these things going on here. But when you have this look, it looks very, very simple. But then you have this one, right? It shows almost the same thing. But then you realize that below the OF Proto, we have a real provider. Below the NATDAV, we have real providers. These OSC art comes from porting as well. Very great piece. So how to implement an OF Proto provider? There's a lot of layers inside of that. Okay? That's why you need 400,000 lines for OVS. So more about DPIF, so the data path interface, is the only OF Proto provider that we have upstream. Of course, we have a lot of OVS users around. Many people probably have ported OVS to hardware switches, but they didn't provide anything upstream. So that's the only provider that we have the call there to look at and see what is the OF Proto provider supposed to do. And DPIF is very, very complicated. As you notice here, you have OF Proto, DPIF, then you have DPIF, and then you have DPIF provider. So three drivers that you can use for OVS. The good thing is that DPIF provider is very, very nice. It should work with NetDev as well. So NetDev, besides allowing you to create ports, allows you to receive and send packets to those ports, those network devices. If you have such a NetDev provider, then you're all set up. You can use OVS. I'll show you some of the drawbacks about using that, but it's really, really possible to use only that and get everything set and working really, really nice. DPDK uses that. So how do you introduce DPDK support to OVS? You just write a NetDev provider, that's done. You don't need to worry about writing a new DPIF provider or a new OFProtoprovider. Everything is being done for you. So basically, one of the DPIF providers is DPIF NetDev, and how it works is that it reads packets directly from the interface, from the NetDev provider, and then deals with that. So I'll go into more detail for that. One of the advantages of using that is the number of features that you have. For example, bonding, LACP, multi-gasses, new pin, contract, all of that is available if you're using DPIF. If you are going to write a new OF Proto, you're going to do that all by yourself, okay? So these set of features here and a lot, lot more features are only available if you're using DPI Eve. If you're not using, if you're going to write your new OF Proto, then you need to rewrite all those features or counts on your data path switch to provide that. So here we can see a clear separation from the control plane and the data plane, which is OF Proto. Because OF Proto talks to the control plane using OpenFlow, and it talks to the data plane below that. So your data plane needs to support all of those features, and that's what DPI Eve does. One of the DPI Eve providers is DPI Eve Netlink. Talks to the Linux OVS module, set sports and flows. Basically, that's what it does. It receives packets that means those flows, and then set up new flows. I'm not sure if I have anything showing an image of that. So basically, in fact, I do, it's here. So the data path is here in the kernel itself. Whenever you receive a packet on an internet port on a physical leak, for example, so that packet is going to be matched with some flow there. If it does, okay, there are set of actions that need to be processed. If, for example, it's like just output to the other leak, okay? So data path is going to do that. It's all in kernel space, really, really fast. But if there's no flow setup here for that particular packet, then it's going to user space. So now the DPI Eve Netlink, it receives that packet and gives it to OFPrototDPI Eve. So now these OFProtot provider knows about open flow, right? So now it's going through all the open flow rules, all the open flow actions, it's going to build a lot of structures, data structures and going to push that to the kernel. And while I need to execute those set of actions for that particular packet as well. So that's how it works. But it needs to go through all the open flow tables because all the acceleration is right there in the kernel data path. For DPI Eve Netdav, that's not how it works. So the kernel is not involved in the data path. So of course you have device drivers in the kernel space. In those cases, you are either using TAP devices or AF packets to reach all those packets from the driver. So in that case, the DPI Eve Netdav uses the Netdav library and the Netdav providers as well. So it uses those to reach every packet. So all packets go through user space. If you're using Netdav DPDK, you're using DPDK. So you do not have to go through the kernel space. It's all in user space. You have a device driver running user space and reaching all those packets from hardware given then to OVS. So we have a Netdav DPDK, a Netdav Linux, which uses TAP and AF packets. So it reads those packets, goes to DPI Eve Netdav. It has some caching of the flows there. So there's some acceleration here, some what we call exact match catch, so EMC. So if you find the flow there, that's a fast path. Going to process that, get to know what the set of actions are, and execute them. I'll put you another port eventually. So it optimizes flow lookups. If there's a miss there, then you've got to go all the way up to OF Proto DPI Eve, which is going to process all the open flow rules, and set up the cache for exact match caches. So that's great, but you really need a really fast CPU. So if you're using DPDK, all those things are going to happen on software. So your CPU need to be really, really fast. So imagine that you have a hardware switch that's doing all fast processing, and then you need to go to software, which is running on a control CPU, which is not even sure if they have more than a single core. So you're out of luck in that case. So only use DPI Eve, NetDAV, or even DPI Eve at all, because if you have any misses, if you have a DPI Eve provider, and you have any misses, you're going to have to process on the OF Proto DPI Eve all in software. And there have been some cases reported that people are trying to port OVS. And oh, whenever I miss some flow, it's pretty dense law. They had a very slow CPU, control CPU. So it's not going to fly. OK, so any questions so far? OK, so I promised a demo involving VMs, containers, and IPv6. I'm going to do it, but not as I planned. It's going to be some IPv4 involved. So the 200 years of space, it means that we don't have Linux involved, so we don't have the excellent driver on Linux. So packets are coming in from DPDK, for example. And now you need to send that through a tunnel. But what's a tunnel? It's not a new interface device. So you need to just add a prepender header and send the packets through another interface. And then you need to write that all in software, in user space, on OVS. So I'll show you how it works. So here's my setup. I have two VMs. Two network namespaces, one in each VM. And that amounts to this complex setup. So it should be very simple. But we have here a port in the container, which is connected to a port on the host, which is in fact a VM, which is connected to a bridge, which has a tunnel port, which goes through another bridge, which goes to another net port, which is in fact virtual. So that goes to adapt device on the real host, and then through another tunnel there. OK, and then have the same setup on the other VM. You may be asking yourselves, why didn't I use block.net.gfg? I did. So this is one of the VMs. Didn't get the time to read it, but yeah. Shows our network namespace here. It's really nice. There's some bug here that I didn't need to fix, right? It's the Fedota 23 version. So you have here a demonstration of block.net.gfg. It's really, really nice, because you can see here that you have the network namespace, which is named FUCHU, and it got that right. It's really, really nice. So we have a loopback device there. We have a virtual adder net there that's connected to another one outside of the network namespace, where we have a loopback device. Didn't quite catch the switches, right? So let us see here, yeah. So when you're using the user space data path, I'm using the user space data path. So it's not a master, yeah? Need to work on that, yeah? Not supported, OK? So we were supposed to see that this interface here belongs to this bridge. And these other two interfaces here, FUCHU and VXLN0, belongs to BR0. The reason we didn't see that is that block.net.gfg, right now, this current version carries the kernel to see if an interface is a master or a slave of the other interface, which is not the case here, because you're not using the kernel to connect those interfaces. But you'll notice, well, I'm impressed that it got VXLN0, because it's not a network device for Linux. So it's just an OVS. What else we got here? So OK. So in fact, here's the setup. We have this bridge here. OK, and let's see what it's mapped to. BR0, you'll see it's a tune device. If you're using the kernel data path, there's a special driver, which is open-v-switch. So it would not be a tune device, but an open-v-switch device. The same happens with the other bridge. ENS3 is a virtual net, so the user space daemon reads packets from that using AF packets, special sockets family. VXLN0 is not a network, a Linux network device. So it's just virtual information there in V-switch D. OVS net daemon, well, that's... So whenever you're going to create a new data path, OVS tries to create a single data path for the whole system. So you can have multiple bridges, like we have out-pr and BR0, but even if you don't have any of those, if you're using the net-daf DPI, it's going to create OVS net-daf. If you're using the kernel data path, it's going to create OVS system. But there are basically ways to communicate with the data plane on OVS and try to create new bridges. The fact that we have OVS net-daf is an artifact of that. It shouldn't be here. So if anyone has any intentions on contributing to OVS, there you go, we have a chance right here. And for a block net CFG, there's a good debugging session for that, right? And supporting the user space data path is a nice contribution as well. OK, so let me show you here. So I have this out-pr bridge because if you're going to use a tunnel with user space bridge, it needs to go through another tunnel. OVS requires that. It's kind of a limitation, but we talked about that. If anyone wants to talk about that, we can talk about that later. But while it needs to go through another OVS bridge, which is in user space as well. So here we have out-pr, which is connected to the single nick that we have on this virtual machine. This other bridge has the tunnel, which goes through this remote IP address. Could be multiple hops here, but it's directly connected with the Linux bridge outside of DVM. And there's this full interface with block net CFG. We notice that it's connected to some IF2, something like that, which is, so here's our container, vf0, which is a single interface there. It has this IP address here, 200.1. And then we have the same setup on the other VM. So what if I run TCP dump inside the container, and then I ping from inside the other container here? OK, it's working. But what if TCP dump is run on the bridge itself? OK, so that bridge is connected to the virtual internet port outside of the container and connected to the tunnel. So it's not going through the bridge, because it's not going through the local port of the bridge. So the bridge, it's a virtual internet device, which is read by AF packets. vSwitchDemon gets that packet, realize that it goes through a tunnel. It's not going through another Linux network device. The tunnel is completely inside the demo itself. It does all the encapsulation by itself. So it realized that it needs to go through AltBR, which is the other bridge. But as you'll notice, it keeps running here, right? So as you'll notice, it's not going through the bridge either, because the vSwitchDemon realized that, OK, it's going through this bridge. But which port of that bridge? Because it doesn't know to go through the local port inside the kernel, which is a tap device. And if it did go there, it needed to go back to your space. So it's useless to send it to the tunnel. So realize that it needs to go through that port. And here have it. So in fact, what you see here is not an ISMP packet. You see it's a UDP packet, a VXLan. TCPDump knows how to show the inside of VXLan packets. And then it shows us that, well, there's an ICMP echo request inside that VXLan packet, OK? So notice here that the kernel is only involved in the first port and the last port. You don't have to go through the local ports of the tunnels at all. So it receives that from the virtual internet device and throws it out through the NIC device. So basically, that's the flow here, which packet traverse. So it goes through the other endpoint of the container. The VM endpoint of the container goes through user space, which sends to the VMNIC. Then it goes through the VNAT0 and VNAT1 on my host itself, which are connected through Linux bridge. So any questions, address to discuss at openvswitch.org, OK? Seriously, any questions? Yeah, because the Linux bridge is using an SCP. I didn't want to show it up. I wanted to make it clear that we had the ICMP packets here. I didn't want them to show up and make it confusing. It's on the host itself, so yeah. So it's outside of the VM, it's on the host itself. Have VNAT0, VNAT1, VR0, NIC connected to the VR0 bridge, which is a Linux bridge not openvswitch here, OK? OK, we are out of time, so thanks a lot. Thank you. Hello, everyone. I would like to ask you to squeeze into the middle so we can pack as many people as possible into the room. So if you have free seat next to you, please move to the middle of the room. Find new friends. Don't be shy. This is a new one. Let's lie down. Take this, hurry. None of us are in my way. Please, please, yeah, we need as many seats as possible. We can buy the other chair. Sorry? Here, on the top. There's, like, the slide. OK. OK. OK. We said kill all. It's OK. Kill all. That's OK. Yeah, yeah. We need that. That's OK. That was a bit. Yeah, yeah. That didn't matter. Well, there's one. Well, there's one. We can. I'm not a bit. I'm not a bit. I'm not a bit. I'm not a bit. I'm not a bit. I'm not a bit. I'm not a bit. Yeah, OK, yeah. I'm not working in control. Next, live streams. Welcome, everyone. Let me introduce Rajat Chopra and he will give you a talk about networking in containers. Flo is yours. Thank you, everyone, for showing up. I'm going to be talking about networking with containers. I had my name put here and then I removed it because I was supposed to do a demo and then it wasn't working, but hopefully it's still working or not working, whatever. Don't blame me if it doesn't work, so my name is out.