 Connect a file descriptor in user space to a network interface in the kernel The difference between them is that if you have a ton device you read and write IP packets If you have a tap device you read and write raw Ethernet packets And this is used by virtualization for you know QMU for instance will create a tap device launch the VM It will attach the kernel network side to It's imaginary local network, which we'll get to later and then it uses the user space side to emulate a network card for the virtual machine So you can see here we have a computer running virtual machines each one has a tap device which connects directly to the real kernel With containers we tend to use veth pairs. So veth is a virtual Ethernet interface When you create one with IP link It actually creates a pair of interfaces rather than only one like with the tunnel device and those two interfaces are A tunnel so that any packet that goes into one of them comes out the other one and vice versa Normally that would be completely useless, but with containers you can move one side of the tunnel into a different network name space in a container and Then you basically have a tunnel from inside the container to outside Into the the main network name space The third possibility for an imaginary network card is to actually use a real network card You can assign a real network card directly to a virtual machine or a container if you're using virtualization There are CPU extensions like vt-d That allow you to have the virtual machine directly talk to your your PCI card or whatever if you're using containers You can just move your real Ethernet card into the network name space just like you would do with a veth pair And then the container can access it Problem with this is that you probably have more virtual machines than you have network cards So they came up with something called single root I o virtualization Where if you have a very expensive network card it can pretend to be multiple virtual network cards And then you can assign Usually it's either 16 or 64 so you can then assign one of the virtual network cards to each virtual machine or container Mostly virtual machines people aren't really using this with containers yet or much And then each of those virtual network cards essentially connects directly to the physical Ethernet Without going through the the kernel or hypervisor or anything else Which on the one hand is efficient on the other hand it means that you can't do extra firewalling or filtering On that card so it's only useful in a situation where you want the virtual machine to have complete access to the local network Okay, so we have a virtual an imaginary computer with an imaginary network card now We need to be able to talk to other imaginary computers There are two basic ways that that people do this Using Linux bridges, which is the the more basic the older technology or using open v-switch Either way the idea is is basically the same you have a bunch of VMs or a bunch of containers and they're all connected to a bridge which lets them talk to each other and Possibly the bridge is connected to the outside world somehow So with Linux bridges, and I say Linux bridges just to differentiate them from hardware bridges or open v-switch or Anything like that It's just a simple Network switch in the kernel you attach network interfaces to it and they're called ports And then the kernel deals with routing traffic between the different ports You can add a real ethernet interface to your bridge as well if you want external connectivity Sometimes instead of doing that people will just use NAT and use IP tables rules to connect the bridge to the external ethernet device that's the default behavior in for virtual machines With liver a lot of the time just because it requires less configuration You can use IP tables to a limited extent to control the flow of traffic along the bridge, but for the most part The traffic just flows freely every machine can talk to every other machine If you want to do more complicated things you can use something like open v-switch Open v-switch is a combined user space and kernel network switch You can program it using a language called open flow Which lets you control the the traffic flow. There's an example open flow rule here I'll be talking more about that a little bit later So the details don't matter, but you can see this one is is routing IP packets that are going to a certain network and Does a bunch of stuff with them and then eventually outputs it to port one at the end there So open v-switch has both the user space component and a kernel component the the user space part manages the database of all the rules and Interprets the rules when new traffic comes in the kernel part routes Traffic more quickly once the user land part has already figured out where the package should go matching a given rule Okay, so We have a imaginary network on the local machine We want to connect to other machines so that you can have a whole cluster of Computers running virtual machines or containers and have them all talking to each other There are a handful of protocols Actually, there are a zillion protocols. Everyone keeps coming up with their own new protocol for doing this One of the most popular right now is vx land the virtual extensible land It Allows you to take arbitrary ethernet packets it wraps them up in a UDP packet and then you can just send them to any other computer Each packet also has a 24-bit what they call virtual network identifier So that it which works like vlan identifiers so that you can have different flows of traffic all using the same vx land tunnel in open shift we use that to do Isolation between Containers that are owned by different projects. We assign different vn id's to them And then use that to control which traffic can go into which containers on the other side So one problem with doing this is that because you have an inner packet and an outer packet You need two different check sums or two different sets of check sums on it Both the inner ethernet and IP packet and the outer ethernet and IP packet and so hardware cards that do Check some offloading Initially could not deal with that they could do one set of Check sums, but not the other one and so you ended up losing a lot of performance on your network Newer ethernet cards now have special handling for vx land to get that back before people started Making network cards that supported Sorry, no, this is not check some offloading. This is tcp segment offloading So another thing that some network cards do is they can automatically do tcp segmentation for you So when you send big packets, it will break it up for you and recheck some of them and all that again Encapsulation like vx land breaks that vmware came up with something called stateless transport tunneling where They they wrapped the packets inside other packets that look like their tcp packets even though they aren't really And then the network card gets fooled into doing tcp segmentation offload for it automatically No one other than vmware really uses this though Microsoft meanwhile is using nv gre which is an extension to the gre tunneling protocol Again, it has a 24-bit identifier. Everyone seems to have pretty much settled on 24-bit identifiers Used by Microsoft of course, we all know what happens when you have multiple competing standards So now there's a standard called Geneva generic network virtualization Which is mostly the same as vx land except they've now added variable length extensible headers Everyone seems convinced that we're going to need these at some point I haven't actually seen any examples of what it will be used for but but everyone agrees we want to have extensions It's designed so that the routers don't need to understand the extensions and can still pass on the packets And you can still have hardware that will deal with the checks. I'm offloading without understanding all of the extensions So and there is hardware that does that already even though no one is really using it yet But everyone is sure that we're going to be using it at some point so At the moment Open shift open stack various other things are using open v switch for their their local network and then vx land for connectivity between different computers Everyone is doing it slightly differently with their own set of rules So the open v switch people are now working on something called ovn open virtual network Which will provide a generic implementation of this that hopefully everyone can use it's going to use Geneva I'm not sure exactly how close to done it is and then there are all these other projects weave and flannel we're both Created for using or for use with kubernetes, which is Google's container orchestration system Which open shift is built on top of calico and contrail or other projects? Neutron is open stacks networking layer It has its own architecture, but then has plugins within that so you can have an open v switch plug-in or a calico plug-in or a contrail plug-in Eventually they decided that there were too many plugins So they came up with the modular late whoops that should be modular not module modular layer to plug-in Which sort of encapsulates the ideas that were common to all of their plugins and then it has plugins inside it So so it's a plug-in that has plug-ins But it's currently the recommended networking plug-in for for open stack So open shift as I said is an orchestration framework for containers It's built on top of kubernetes, which is Google's basic orchestration system Which in turn is built on top of docker, which is the system for running containers on a single local machine But doesn't worry as much about the the problem of multiple containers on multiple machines Open shift uses a Networking implementation called open shift SDN that like I said before uses open v-switch and VX LAN Initially The open shift SDN implementation allowed all containers to talk to all other containers The more recent version of it is multi-tenant, which means that each project Which is a open shift concept for dividing up the containers each project has its own ID Which then gets used on the VX LAN and in the open flow rules so that different projects can't talk to containers and other projects and Therefore you can have a single open shift cluster With multiple customers that don't necessarily trust each other and all of their traffic is kept separate So It uses open v-switch which uses open flow rules and and this is sort of an example of Some of the the kinds of rules that it uses so Okay, that's not really that visible traffic coming out of a container here. You see we match import which which Identifies which port or which interface the traffic is coming in from We check that it has the right source address to make sure that people aren't spoofing and and are using the IP address that We expect them to This load instruction here sets a register to this value which is the tenant ID or network ID for this particular container and then Go to table tells it to go you know go to a different table within the the open flow rules Where more rules get processed later on we have this rule which Directs traffic to another node so it's saying if you have IP traffic which is destined for any of these IP addresses in this network range then and Again, so there's this rule that that moves the the network ID into the tunnel ID field Sets the tunnel destination to the IP address of the other host where where this virtual network is hosted and then outputs It on port one which happens to be the VX LAN port And then as traffic arrives on coming in port one we go to another table where We verify that it has the right destination Load the tunnel ID into register zero go to a different table Eventually we get to the table where we match on the container IP address And match that the the tunnel ID is what we expect it to and if it does then we output it on the right port And it goes to the container For a lot more details on how open-shift networking works. There's another talk later today networking in a container world Rajat Chopra who is up there? And that is at 430 downstairs from here and that is all I have People have questions Yes, you don't count you're my boss So okay, so he's asking is there a drive to make the syntax simpler because this is way too complicated You can use simple syntax if you want to do simple things I guess You know It it makes sense once you learn it all it's it's pretty simple there are there You know different criteria that you can match on here. We're matching IP packets. This is you know the the source network source So no open flow is a standard. It's not only used by open v-switch. It's also actually used by some hardware routers and It's actually being extended with more syntax and and more different possibilities. So so the question is have we done any work with? External open flow routers repeaters talking to the containers No So Calico uses GMP or BGP, thank you to to control routing tables on on routers But yeah, I mean if we were going to use open flow to talk to external repeaters Then that would limit what sort of hardware people could use open shift with And so We're not worrying about that. We're just assuming IP connectivity between the different posts Holding the containers So he's saying is is there a way to not have all of these hard-coded magic numbers like Import 5 and output 1 and table 5 and all of that The answer is no although. I've been thinking about At least making it so internally when we're we're referring to all the rules inside open shifts to have something that lets you Substitute in variable names or something and then you will just Translate those to the raw numbers before passing it to to ovs o f control Okay, what happens when you I missed the very beginning Okay, so he's saying what happens when you add a new node to the open shift cluster Each node listens for notifications from the open shift master and when it sees that a new node has been added it will add new rules like Like like this one Basically, there's one of these rules for each node So that it knows which Which subnet of which container subnet is on which node and then it has to add a rule for each one so Sorry, the master tells the nodes when a new node has been created and then each node Adds a rule to its own local open v switch database I'm supposed to be throwing garbs at people But they don't really throw very well, so let me try tying it in and not Start with the multiple It's Okay, so you're asking about security specifically Okay, so what technology would we recommend for connecting open v switches between long distances so The nice thing about vxlan as opposed to something like vlan is that it it's a layer three So as long as you have IP connectivity between the different Open shift nodes They can send packets to each other vxlan is not encrypted So there are people who are looking at using IP sec And then sending the vxlan packets over an IP sec tunnel We don't have any support for configuring that automatically, but it's definitely something that we're looking at for now it's generally assumed that all of the The nodes are within you know on the same network basically and that you know more or less trust that network Or you have some sort of VPN or something connecting your different data centers so that the So that the packets get encrypted by that For I don't know, virtual routers or virtual firewalls that will be settled with OpenShift or something like that? At the moment, the plan with OpenShift, OK, so the question is if we have plans for more security or other fast connectivity, whatever, stuff between OpenShift nodes. At the moment, we're just working on the simple case, basically. But OpenShift is built on top of Kubernetes, which has its own network plug-in system. And so people will be able to write other network plug-ins that they can substitute in. There are people who we're working with who are creating more complicated or systems that plug into their own networking infrastructure. I'm not sure if Cisco is one of them or not, but OK. I have a question I don't want to start. So basically, I've been working with OpenVisuation. So there is a lot of this stuff that there is OpenVisuation inside Open, just like OpenShift inside OpenStack. So with each of those, the NTU drops and drops and drops. Yes. Is there any mechanism or something in the world to automatically detect? Because I had really problems with NTU when it says the NTU to something that's higher than the NTU on the lower layer, and it drops big packets. So the question is about using OpenShift inside OpenStack and problems with NTU and stuff like that. So one way that people are deploying OpenShift is on OpenStack-based clouds. So you have your OpenStack with its OpenV switch and tunnels, and then OpenShift running on the virtual machines with its tunnels. And so its tunnels are going through OpenStack's tunnels. And you lose a little bit of NTU at each step, not to mention that it's horribly inefficient. We have a Trello card open about creating a network plug-in where essentially OpenShift would be aware of OpenStack's networking so that you could avoid having the double tunneling. But yeah, it's not something that we're actively working on at the moment, but it's something that we know should be addressed. So that's plot net CFG. The question is, so we have this tool, plot net CFG, that graphs complicated networking configurations with bridges and tunnels and all of that stuff. Rashid, it doesn't actually look at open flow rules, right? So it can plot the sort of, so it can plot like at this level where it can say, yeah, you have these containers and they're connected to this bridge and the bridge is connected to the ethernet and all of that. It doesn't recognize the flow rules, so it wouldn't be able to say, well, the WordPress container can talk to the Rails container but not the Apache container because it doesn't know anything about the. Well, there you go. The author of plot net CFG says that there's a plan to add that in the future, although, I mean, that would require parsing arbitrary rules and I guess I could just know OpenShift's rules specifically. We haven't said it before. Or maybe when everybody moves to OVN, that will solve that problem. Anyone else? Questions? I still have one scarf to get rid of. OK. Do we support IPv6? Oh, do we support IPv6? Do we support IPv6? No, at the moment, Docker supports IPv6, but Kubernetes does not. And the issue is if we wanted to support just IPv6, it would probably be pretty easy, but supporting IPv6 most likely means supporting dual stack and so you need to add a second IP address field to all these different data structures. So again, we have a Trello card for... Trello is the system that we use for tracking all of our work on OpenShift. So Docker and Kubernetes both have their own solutions for this and involving what they call services. And so in OpenShift, in particular, there's a SkyDNS pod that knows about all of the services and so you can look up the services by name and it will resolve to the right IP addresses. Oh, sorry, and I should have repeated. The question was how do we deal with DNS because obviously these containers can't all be referring to each other by IP address. That's it.