 Hi, my name is Guido I'm a debut developer. I work for Google. I'm involved in the genetic project Which is a cluster virtualization project system and today we're talking about interesting networking features in the Linux kernel It's mostly kernel related things a couple of things outside Basically like its experiences that I've had in the last year and I wanted to share with you guys The reasons behind this is that before we used to have these huge network equipments like big routers big switches and so on and Linux started entering on servers. Debian Now runs a lot of interesting places like we heard in some talks before But and yeah, we scarred away a lot of people that thought that we weren't ready to do this And we weren't serious enough and so on but when it's working people maintain control over the network until now We're starting to see very interesting things happening in Linux Unfortunately, this is very not well documented Some documentation exists only in mailing list posts some documentation exists only in source code I had to look a few times at the IP route source code to figure out these things because it was like how do I get this? And yeah, I wasn't Courageous enough to start writing now to our documentation So I said why don't I just talk to people about it and at least I'll write some slides I'll put some things up. It will be on the network and then we'll figure it out, right? And yeah, I think networking is very fun. It's something. She's nice to play with Just small precautions like for people who will read the slides later You can play with this as much as you want on the machine or something If you propose it in a corporation, be careful who you propose it to and how because they might take it very badly and Start screaming and things like that So just to start with very basic things which we all know We'll be using IP route around the talk No, I've configured and all stuff because that's outdated and we should forget about that Hands up if you haven't never seen something like that, I guess everybody has good So we're just like adding IP addresses to a device setting the device up creating a bridge adding a device to a bridge we created Managing the routing table and Remembering always to enable forwarding. Otherwise, we'll ask ourselves why nothing gets routed What are we going to go through? What are the technologies? We're going to talk about well We're going to talk about villain tagging Tunneling and a few interesting things about that a policy routing and asymmetric routing Routing demons and any cast load balancing network namespaces. Is everybody already familiar with every all of these technologies you have Who's familiar with everything? No, okay one two of them One of them. Okay. Okay one of them good two of them Okay, good three We're fine. We're fine. I mean just just to see okay. I try to structure them in some useful order to Put them together and also in easy to harm maybe or order to newer or something There is some order. I don't know which one is it so what's villain tagging? what we can do is we can use one single network interface and Rather than just transmitting one a internet over it. We start transmitting more than one We have to agree with a switch about how we're going to do this Basically like there's a protocol which is already set up. It's a to two dot one q And what happens is that once we have an agreement with the switch that our port is trunked We can say okay now send this packet to this villain now send this packet to this villain Which means but on a single interface we get like a view of many virtual internets This is Useful to act as a router if we only have one interface or for example If we have one interface or a bundle interface or something and we connect it to a switch And then we have VMs hosted on our machine. I remember I said that they work on the net is so I'm like playing with These things over VMs you can connect our VMs to different physical or I mean virtual but segmented internets Which means that they won't see broadcast between each other They will be on different networks and things like that. She's useful How do we do that? Well, we can create a new Link over our main network interface at H0. I mean it could be whatever we name it Whatever we want doesn't have to be that name and then the trick is type villain ID free So we tell the kernel Disinterface now has attacked the land ID free and you'll call it a kid 0.3 and after that I Can send packets and you're going to talk to the switch that way again if they're switch Unless the switch is very very dumb, but when you have another Linux machine on the other side, which understands this The switch is going to throw away your packets if you're not on a allowed trunk port because it doesn't usually network people don't like People trunking without them knowing so you have to convince them first Unless you run the switch, which is very handy After that we can just set up an IP on the network interface as we've seen before and Set it up and then well we can use it for a connecting to a bridge We've seen before how we do that with BR CTL We can add routes pointing to it or things like that and mix it with the other technologies we have Again, if we add it to a bridge for example, then it's very easy to connect virtual machines to it genetic KVM Xion any virtualization technologies will allow you to connect a virtual machine to a bridge. So this way you can connect various virtual machines to different villains Okay, let's move to tunneling. She's a bit more fun and challenging What are we doing is just transmitting IP packets over IP packets? So we're not restricted to Villains anymore to Ethernet anymore, and we're doing this on any kind of network We don't need support from the networking people. We just need them not to filter our gerry type or IP IP type packets we're using gerry will say you see why and Basically, which can change the shape of a network So for example if we have a rock and we only have in that rock as many IP as the machines We have in that rock we can use this to add IPs to the rock and tunnel them the traffic what we need so add visual machines where there was no space for them So yeah, we can use it for mobility for example We can have a host which is tunneled to another host Then we move it to another host and we start tunneling the traffic to the other one and Hope we move the machine and an IP address from one building to another one data center to another without the networking people realizing If we had the other tunnel endpoint With the IP configured How do we do this? Well quite easy. This is the very basic initial stuff We just create a tunnel again IP tunnel add The name gerry zero we can call it whatever Mode gerry is could be more gerry or mode IP IP. We're using gerry because we can do more advanced stuff with it We can specify the local address and the remote address a key the key is interesting later I'll explain why and the device on which the tunnel actually lives on so then Basically, we can add Addresses on the tunnel so now the tunnel is locally for one and the other side verse for two We set it up on the other side. We do the opposite So for two for one and we swapped the real IPs of these two Linux machines And then at this point we can ping between for one and for two Nothing special, but we can also add routes to them and no we can't add them to bridges because these are IP tunnels and bridges bridge Ethernet, so don't do that. Well, it won't work more interesting tunnels. So this tunnel doesn't have any local or remote address Has the device 8h 0 so the local address will be the one of 8h 0 if 8h 0 has many addresses Maybe you want to specify an address or it will use the first one But the interesting thing is that it has no remote address. So basically it's an unbound tunnel It can tunnel to anywhere on the internet and on your internet Of course any reachable IP is a fair game to have the other end point. So you do this on every host You create this gre0 on your all your 40 hosts can be in the same rack and be in different tracks in the data center without Broadcast access to each other can be around the internet Then you add these fake IPs and then you set the link up and nothing works because how does the tunnel know? where to send the traffic for 4.2 like what's the real internet address well turns out that the kernel will look it up on the neighbor table So there are two ways to do this if you have multicast between all your nodes You can all subscribe them to a multicast address So you specify your local IP just the remote doesn't work You have to do both local and remote then you specify the same multicast address everywhere And the nodes will automatically do multicast lookups saying where is For free and for free will respond so it will basically do an ARP a fake our protocol over multicast and Then you have your network already set up automatically now supposing you don't have that because someone disabled multicast Multicast is just broadcast on the network, and you don't want to do that or whatever reason then you can actually push your neighbor maps Permanently on your neighbor table. So in this case we on all the nodes we have to specify that node for free is a real IP blah and Like on the device for the device gre0, which is the one we created and at that point after we've done it Statically it will all work now. Of course, we don't want to maintain this statically, but we can basically like Write a script or have a software that integrates with our virtual machine environment We have an example of this in guanetti and BMA Which is a small side software to guanetti, but it's it's very basically very easy and after that It's easy also to replace this So basically once you move a machine between one machine or another like a virtual machine between a physical machine or another this command needs to be run everywhere and Our demon that keeps the mapping up to date will automatically do that and we have our new mapping There are also other specialized protocols to do with this One of them is an HRP. It's implemented in this open an HRP package. It's not packaged for Debian unfortunately Just source code available. I was thinking of packaging it But apparently it needs RPD support in the kernel so we couldn't really use it because while we didn't have RPD and RPD is both Experimental and deprecated so I wasn't going to suggest to add it. So we just went for the user space demon and Neighbor table look up the niceness of that solution is that that's true The neighbor is the demon is user space But it's only called when the kernel doesn't know or actually it's the demon updating the kernel But then when actual traffic is coming everything happens in the kernel the neighbor table is stored in the kernel it scales very well to thousands or hundred thousands of entries and The kernel doesn't need to talk to users. They saw it's very fast to actually send the traffic and look up where it should go Okay Let's move to policy routing. What's policy routing? Has anybody done policy routing around? Yeah. Yeah, good so what we can do is Rather than having only one routing table, which is our normal routing table. We maintain more than one So we give a number to each routing table 100 101 whatever we want and then we can specify rules by which different packets get looked up on different routing table If you do routing on your virtual machines and for example, you have your panels from before by the way, sorry I Said that I was going to say what this key was well, suppose you want multiple generic tunnels with different networks Being them unbound you wouldn't know when you receive a gerry packet Which network interface it is it for this key is going to specify that for you now back to policy routing If we have more than one of these Tunnels on which we want to tunnel traffic Then we can associate the machine with a particular routing table And that is going to route it to the right tunnel or the other machines that are part of its same network or administrative domain okay, so How does this work? Well, we're adding a few interfaces Basically, there is other than IP route any peer rule table so what we say here is Well Packets coming from device gerry zero and turn zero and turn two which are probably virtual machines We can tell gonna your our KVM networking script to to do this for us when it turns out the virtual machine Don't need to do it manually Get looked up on table 100 Then we can add routes to table 100. I'm using replace because I can use that both to add and to change them later So I don't need to Figure out if their entries are ready there and maybe tools like Associated to another interface and I just want to change it because I restarted the virtual machine Whatever the old entries to where I'm replacing it In table 100. I see I should have done probably. Yeah. Well, no, it should be fine. So We we can basically like add static route to that network over device gerry zero Via this other link I Think I have to cut the other route there But basically the idea is that we cannot route to IPs on various interfaces. So for example, we can say like this machine leaves on Town zero so for for free lives on 10 zero for two lives on time to and everything else Route via gerry zero And we can say basically then Jerry zero will be looked up over their neighbor interface We had before this is very handy because it follows the rule for a routing table. So Stricter entries like for a single IP is get looked up before broader entries and we're all sorted So the last example is to say that there is another network 5-0 and you have to go through our Jerry interface to a gateway there, which is going to route us to this other network or to the internet This is the example of the internet. So we can say that the default route is Via an endpoint on our gerry network So we can actually like save the machines don't have access to the nodes at all and the nodes doesn't do Routing for the machines in general but sends them all to a central place where it can be policy For example, there can be a central firewall or something like that a Bit more policy routing. So that was the basic part other interesting things is that we can add a rule that gets a firewall mark and routes it to something to a particular table and Then we can actually add IP tables rules in the mangle table that For some type of packets set this mark. So in this case, for example What happens is that ICMP fragmentation needed packets get set to mark a hundred and this gets set to our sent to our table a hundred Which handily gets delivered to our virtual machines? this particular example is useful because when you have your Jerry interface you usually run into MTU problems the kernel will create for augmentation needed packets for Instances, but it won't know where to route them unless you told it With rules. So for example in this case we we say that if we don't want in general the node to speak with the virtual machines We can say that only the fragmentation needed packets which are needed because they're generated by the kernel and We want them to reach their destination. Otherwise Your copying of files won't work and people will have very strange things will work now This other interesting thing is a symmetric policy routing That's just the IP address. So well, I should have gone on the other line But anyway, the line is almost complete. What happens is that on the table? You can add a route which says throw as a destination, which means that basically You stop looking up things on That table and you go to the default table or You go to for further rules, which at some point by default go to the default table Or maybe you'll hit another rule for generic defaults or something else This is interesting because it allows you to create exceptions So to say for example, okay I want my default gateway to be this one But for some internal networks do a symmetric routing don't go over the GRE but just route locally and In order to do that we can just use our normal Kernel routing table or use another specialized table for all our different GRE tunnel tables Since of course I've done in this example with table 100 But I can have like five of them ten of them and this will all work Nicely and insulating Routing demons most of us probably have played with them at some point They're better document and then the rest their user space and not in kernel and This allow you to Integrate yourself with the rest of your networking environment if you have a big dynamic network or a small dynamic network But not only static routes They allow you to acquire routes from the network and to push your own So for example again, if you're hosting VMs It's nice to be able to say well route the traffic for these three VMs to this node and for this four VMs To this node and when you move VMs around Well, you're routing demon is going to see the local rules on your machine and push them To the network and everything all the packets are going to flow in the right place So we can push for example routes for as we said VM or for MBMA network So we said before that we have this big Network of jerry tunnels and we have one end point which acts as a gateway or it could be like done with any caster in some Other ways we'll see any cast later, but basically what we can do is from that point say oh our network which in the example was For zero is here like at this machine and then it can route it over the MBMA to the actual destination nodes. Yes. Oh It stands for so the question was what's MBMA and it's non-broadcast multiple access This jerry network is a non-broadcast network since you can't easily broadcast You would have basically to unicast to all your nodes and it's multiple access because well since it's unbound You can have many people participating in this network And Yeah So how to learn a routing daemon? Well, I didn't get very far on the example But install Quagga or there are others bird and a few more. There's nice examples You can test it with VM. So you don't need to integrate with the network immediately Start for VMs route them or connect them to a bridge and then start Exchanging routing tables between them to see that your configuration is correct. Your authentication is correct and things like that Quagga for example supports most routing protocols You probably want to SPF or SPF v6 or BGP depending on what your organization runs because in the end your end goal is probably that to integrate with the routers of your organization and Yeah, you can try things with Static routes which means that if you create a static route on your machine when it gets pushed and read by the daemon and pushed to the rest of peers participating in this or SPF or BGP network or you can Integrate Quagga with your own daemon for example with your virtual machine management to say dynamically where your visual machines are and push the routes Correctly, although if your virtual machine management creates the static routes then you don't need a particular daemon or whatever codes Additionally any casting so this is very neat and easy any casting is just publishing the same route From two different places the network just believes that these two are both connected to the destination What you're actually doing is you're providing the same service, but separately so you're cheating the network somehow But it's very useful to increase availability because when one of these go down, especially if you do your Service correctly by which if the service is down the network daemon stops advertising the network then your traffic automatically gets routed to other places You don't need to wait for DNS time propagation to update, but just network route propagation you can Failover thing gracefully for example saying well I'm going to stop advertising on this machine and then when the machine is drained I can actually like it's not receiving traffic anymore Because the network updated then I can easily like turn down the service and do updates on it And nobody's going to notice that out of my five any customers. This one is down So yeah, it's it's a very nice technique and it's very easy to implement and it's basically like a trick on the network We didn't need to add anything to our existing routing protocols. We just started using them in a different way load balancing We have this Linux virtual server probably the worst name ever since it gets confused with everything else Which is virtual and actually it's talking about sending traffic to real servers. So why virtual? Well Somehow it's what it means is that the front end is a virtual server and it actually sends the traffic to some back-ends For once it has very good documentation if you go to that website. It's very easy to set up Like you need your kernel support, but then like there's Extended documentation like long long how-to with everything you might need and it can be used to balance by a nut So for example, you can have a router which balances to a few hosts which are behind it knotted or you can balance by a tunneling so you can use your gary tenders we've seen from before and Send your balance traffic over them Or you can use their direct routing which is useful if you have a physical internet or a villain and it means that basically you don't need to Do encapsulation of any type or changing of the packet which you receive But you just basically send the packets with a different MAC address and the destination machine is going to receive it and Handle it because it has that MAC address as an alias, which is not published by ARP So it doesn't confuse the network But once you send it with the correct MAC address because you're the load balancer and you decide where to send it It's going to receive it and handle it and already send it back directly to the customer So to the client which try to contact it So it basically like halves the traffic need of your load balancer, which is very nice with tunneling that isn't for example, you don't can't always do that and Just to close well, this is the last thing we're going to talk about network name faces This was a recent addition in the Linux kernel. It doesn't work Before 2629 and I'd recommend something newer anyway while I was playing with the commands I have in the next slide I managed to crash 2634 a couple of times on my laptop and then I started trying this on a little machine because I Was sick of rebooting every time So the network stack was completely gone so this is basically a way of insulating processes and That way you can create basically jails What happens is that if you pass this flag to clone clone unit The process is going to see a different network environment in which your network interfaces don't exist and Just a local host exists, which is different than the one of the host Now after that you can create new interfaces and share them with the host or move an interface which the host has to a name space and This allows you to Basically insulate processes network wise now this would be not enough But luckily there are quite a few clone new Thing under clone so you can unshare the process ID the file system ID and a lot of other things To actually create a full jail the LXC software Automates all of this for you and has nice configuration files and then can start An installed container. So for example, like kind of a virtual machine, which is actually in kernel Like in again Jails or zoning Solaris or things like that. This is quite new. So it's probably less stable than those But it's quite nice in the way in which is done because you don't actually need to do all of this So you can create a full jail or zone But you can just like unshare a couple of name spaces which you need and in this example We're just I'm sharing the network because we're playing with networking and not everything else and this is not to talk about LXC or Container virtualization So, how do we do that? Well, we can easily use LXC and share to start a bash in a new Network container. This is just going to basically call clone with that flag and then call exact On bean bash. So our bash now in this Shell doesn't have any interface except law local host local host is down everything is down So we need to bring it up and then well, we'll we'll want another interface to communicate with our external world so in the external world, we can Create a VATH link, which is a virtual internet or we can create it with giving names to the interfaces I probably meant to just to have a second line and I left the first line by mistake, but Basically this way we say well, I want a VATH zero and then Of type VATH and then I want a peer name the ATH one And then that last thing which you can't see is net and s the PID of The bash so in bash we can do echo dollar dollar or anyway realize what that bash PID is And it means that the peer is going to appear in that PID if we don't do that we can Use IP link set net and s and again move on existing network interface or one that we just created to that namespace And yeah, for example, this is totally not documented if you do my IP Nothing that tells you how to do this is there I had to have a quick look at the source code in order to figure it out And then I said well, let's publish at least on slides. So maybe someone will Google it one day and then You can create an IP address on this virtual internet interface Which is basically just a local tunnel and on the other side at that point. We waited for shark to to finish So now we have this VATH one in this case and we can add it With the opposite IP addresses set the link up and at that point We can ping from within the container to without to out of the container and at that point This is a virtual internet. We can bridge it. I suppose I haven't tried but pretty sure we can bridge it I'm totally sure we can route on it. So we can send it to our MBMA or our normal routing tables or our policy routing and Basically insulate the variables containers in the same way that we would virtual machines There's a bit more which you can do in user space. I just like brushed on this all of these packages have Good documentation But just as pointers open VPN everybody has used it and I'm sure It's nice because you can actually encrypt your IP tunnels or internet tunnels So Jerry you should only use on a trusted environment or over IP sec because otherwise basically Anybody can spoof your network if you need something secure open VPN VDE is a virtual distributed switch runs in user space. You can connect Processes or virtual machines to it and then you can connect switches together on various Hosts on the internet. So it does the same job in the end as our MBMA tunnel to run at the slower performance But you can like add again Encryption between the various places and things like that and last but not least is not only related to network But so cut is basically and see but you can do a lot more with it You can use SSL you can connect to unique socket Sockets you can connect a port to a unique socket and all these kind of things which and see doesn't allow you to do and finally Q&A and also suggestions like did I miss any interesting technologies which you think is very cool and should be in this kind of Environment do you have any suggestions or further hints which I haven't brushed on or Any other question? Although I don't know if I will have the answer or not again It's just sharing what I played with Thanks very much. I just had a question about your anycast example What the anycast anycast? Well, there wasn't any example Yeah, right, but how do you avoid horizon problems with so you're essentially putting the same service on the same IP in two different places Yeah, at some point. There's a router that's going to have to make a decision About how to get there. So I'm just curious. Could you just give a example of how of where you use that? Well, I Basically, I'm not exactly sure Sorry, it's all right. I'm just trying to figure out how it seems. I mean, it's a cool I'm sure it's a problem because it's used in production for lots of things many DNS is nowadays run with anycast So actually you think you're reaching an IP address, but you're reaching services servers Which are actually near to you, but I'm not sure how exactly this is that way Probably BGP has ways to deal with it Okay, yeah And if someone else has the answer for him, please go ahead Hi As you already mentioned this is used in lots of root DNS servers to provide some more locality to the network location where you are and I think it just announces several prefixes in the BGP and then just the normal BGP selection process for Network is nearer kicks in and just picks you out Which is from where you are newer to you without any extra effort, which is why it's so nice Yeah, it probably handles the equal path by choosing one Good enough for our case as long as it doesn't send one packet one packet one packet one packet you'll be fine for most services Anybody else Going once I remember IP tables at some point had this ability to filter by command Itself you could provide a filter by Not PID by the name of the command which kind of interfaces, but then it was removed is namespaces kind of good alternative to jail Processes so they so I could provide a specific per process IP tables firewall I suppose you can do that So if you insulate your process in a different network namespace then you can easily do that, but you can do it Probably easier with users so IP tables can just Filter depending on the user. So if you run your processes under different usernames Then you can apply different IP tables rules and in this case You wouldn't add the overhead of an additional VATH interface and an additional routing table look up and things like that If you just want to apply different rules to different processes Yeah, let's say Firefox. I want to limit it and I still want it to be under my user Yeah, then then you probably may want to try that So the only problem is that you need to be root in order to start these containers But then you can drop the privileges under the container start Firefox and Yeah, then you can easily on the host so on the external part say well traffic that comes from this virtual internet interface Is firewall this way? Okay? Thanks you the things that are missing maybe are TC and maybe some traffic shaping and then Link layer discovery things like LLDP or CDPR things like that Okay, can you send me an email with these names and I look it up for my the next time I'll give this talk or another thing which I found today on the 35 kernel Which was just released last week or something was that there is this IP tables T support to duplicate packets Which I'm sure I'm going to have lots of fun with If we have reached the wish list things here then Fair queuing Okay. Yeah Every time I google about it. I found things in in Polish and I don't read Polish. Oh, I mean Yeah, yeah, totally makes sense. Yes Qs Hi I have a question regarding something in between any cast on IP tunneling is there a way to use IP tunneling mapping Unicast address to a broadcast one and then maybe with UDP packets I would get lowest latency response and Totally got lost halfway through the question Maybe we want to talk later with a diagram because I totally got lost. There was too many layers of interaction Sorry, okay. We'll talk after the talk more questions Otherwise we can go early have a coffee relax Make the video team rest Okay You