 Hi, so we're now This is Gita Trotter and he'll talk to us about owning the network with Linux or advanced networking Thank you very much So first of all, I already gave some form of this talk in New York This is updated version from two years ago. How many people were there a few? Okay, so we can restart from scratch I guess rather than just going to the updates So the idea of this talk came out during some work I did for the genetic project and it was related of It was about basically doing more things At the network layer with Linux. So how many things do you already do with virtualization that somehow Go in the domain of switches and network administrator and other people that basically own the network or used to own the network until now and Somehow we're getting into that space And so we're looking at what we can do at the Linux level right now without bringing in all the virtualization part Just let's discuss about the network so basically We used to have a big Cisco switches and small Cisco switches and other brands as well Managing the network, but those were all proprietary boxes basically with a full operating system on them, but Basically only like not modifiable at all. We might have some free switches, but I don't know much about them Now we have three routers at least to start So we started using Linux on servers and nowadays somehow like Can you still hear me? Yes, okay. I couldn't hear me anymore. It was strange. Okay, um, so Somehow we we scared off all the proprietary server people and we owned the Server world and now let's scare off some networking people as well. So let's start integrating and taking over their area Why well networking is fun Much of this is not very well documented Like you'll be able to do things with IP for example that the IP man page forgets to mention and the IP online help Kind of sell something about it, but not really and I didn't want to write documentation So I just wrote a talk and you can all try it and then some someday will write documentation maybe So yeah, you can try this it's quite safe before doing it at your office try to talk to people not over lunch and Make sure it they're okay with what you're trying to do because we're not expecting it definitely So this is just basic things that you're all used to do We can just great Okay, so We can just easily add an address to an interface Set the link of interface up or down. So activate or deactivate the interface Just do simple bridging That's what we do normally to create virtual machines and then attach them to a bridge and then we can do Basic routing like just enable routing this will allow traffic from one interface to go to the other interface We all know this hopefully but just to put things in context So what are we going to talk about? Well a bit about VLANs tunneling policy routing and what policy routing and some asymmetric routing a Bit of about Routing demons some load balancing network namespaces and some open v-switch which is kind of the update that I had So VLAN tagging so over a single port We can have many villains. We need to have our switch cooperate with this I will test to be very very dumb or it needs to somehow know about Tugging and need to allow us to trunk so it needs to configure the port as trunking and also allow explicitly the villains that we're going to use or allow all of them So you can easily add a link to your Network card so you can say add link to 80 h0 name at h0.3 type villain ID free You can create an interface in a TC network interfaces that will do that. So I If up down will help you to set that up Or you can do it manually Then you can add an IP on that interface and set it up And now basically you have a second interface which is kind of insulated from your villain zero from your ATH zero and Is on a villain now a fun thing is that your hdh zero can appear on another villain according to the switch configuration So what's the untag traffic for you might be tag traffic from a point of view of a switch So you'll have to be careful how this is all configured and not to tag the traffic for the interface You're untagged on and things like that So that's that's just it by the way at any point if you have questions or you have doubts You don't understand something feel free to or you have indeed comments or helpful suggestions Feel free to interrupt me or you feel free to ask me questions at van however you feel So tunneling is Something that layer free basically at villain level. We are only limited at the Like data center level or anywhere at the switching level at layer free we can transmit IP over IP so we can create overlay networks and Allow basically traffic mobility Change things under the networking people and this we can do without making some people notice. Although once they find out We're going to hate So this is a very basic example we can add a tunnel on our host zero It's a gerry tunnel We set up a pier so we decide where that this does go And then we turn it up on the server network Which is at layer free like hopefully Well in this case they might even be in the same, but let's say that they are on two different slash 24s You add another gerry tunnel this to network and cooperate And so now you have basically the gerry zero interface on the two Connected and you can ping 4.2 from 4.1 and vice versa And you can also route traffic over this link You can then do things like bridge traffic over like encapsulating Ethernet over IP over IP over Ethernet, which is kind of well you can anyway another cool things you can do is well Suppose we have this no no go back go back. Suppose we have this but we have many nodes If we have free well, we can create Tunnels between all of them if we have five Starts to not scale basically you have n-square tunnels You couldn't maintain them even if you have an automated system to maintain them. It's going to be a bit of a mess You can start creating Concentrators, but when you have single point of failures or you have to create more of them And this is going to complicate things a bit So one thing you can do is you can do a full mesh by creating Tunnel with just maybe a key Which means that you can have more than one tunnel on the same machine, but not specify what the remote endpoint is At this point Basically, you can decide what the remote endpoint is by creating a neighbor table Entry with an IP address as its destination So normally a neighbor table entry is something where you have an IP address and an Ethernet Mac address on the other side if for this Device you create Basically an IP address as the destination rather than a Mac address This will mean that once you ping an address like I don't know 4.1 This will be looked up in the neighbor table and then it will be encapsulated over GRE to this other device But we specified there over IPN. This means that We can create only one GRE tunnel per Device that we want to connect so we have I don't know these five Linux boxes We create exactly one GRE tunnel and then we can dynamically reconfigure the neighbor table either either manually or with some user space demon that updates it or by a ARP demon to actually redirect traffic dynamically where we want depending on Uptime or like where our virtual machine for example is The ARP demon requires a kernel patch that is not enabled in many kernels I think and it's it's a bit more flaky But the other option to just inject the neighbor table entries works quite well So policy routing. What's policy routing? Well, it's just routing really the difference is that with normal routing We just can route on the destination IP because the people who Thought about writing routing first wanted to keep it simple But nowadays we could say well, I want to route differently depending on the source IP or on the protocol Maybe I have a slow link and a fast link or maybe I want Something completely different Does anybody have another one of these keyboards and is making a joke about? Okay, nothing. It's just moving So basically, yeah we can Decide different routing tables depending on some Entries on some values and then on the routing table decide the actual destination This is just a double lookup that is done in Linux to simplify or complicate this thing, I guess So how does it work? You first add a rule that says look if the device From which this packet comes from is jerry zero use table a hundred So don't use your default routing table that the node traffic uses But start using something else and then in table a hundred you can decide to do quite a lot of interesting things you can say well You can basically replace routes and do different things So this means that the traffic coming from some interface will be routed differently Without affecting your network traffic from the host or on the host itself This will make it very fun to debug Yeah, and yeah, these are just examples we can for example decide that Basically we can put things on the same I don't remember what this was But it works and it allows you to do different traffic for this So you can do it for specific packets rather than just Interfaces where this is from we can use the help of IP tables by saying okay If a packet matches some things and we know that IP tables can match on basically everything including It's not really reg X but beat field matching on the packet itself So it's basically anything you can dream of on the packet. We can say well mark it for Like number a hundred and then we have this rule that says that if the marker from the firewall is a hundred use table a hundred and This allows us to do Basically marking for any type of packet asymmetric policy routing allows us to I don't remember this one either I should have gone through this more I guess From what I remember Throw throws this to a different table to that main table. Sorry just throws it this network to the main table Okay, so it allows you to basically for some things Default back to the main table So it allows you to decide that this table only overwrite some of the rules and rest goes normally Thank you. So Routing demons This will basically allow to you to configure this by making various boxes talk to each other both Linux to Linux or Linux to router and Acquire outs and push it either to your main network table or to some particular tables that when you use the way we said before So for example at this point we can use these routes just for our host as VMs or for our MBA networks, which are the networks of the gerry tunnels we were talking about so we could use the Network demon for example to acquire where our VMs are and then push these neighbor table entries for them or Or For any any cast service that we run or and load balance on so for example on our machine We have an any cast IP address we receive traffic and then we want to know where to direct it We could use a routing demon for that as well as the other technologies. We were talking about before later An easy one is well an easy one quite good one is quagga. It's quite easy to install Then you can see at the example You can for example do a quick test just on your laptop or on a single server by installing quagga on multiple virtual machines and then make them Create routes on them and then make them talk to each other and basically share the routing table over OSPF or BGP OSPF is more used inside organizations BGP is the global internet one But some organizations prefer BGP for everything mostly because they already have it and they think that it's useless to run two different protocols and Yeah, then you want to make your routing demon interact with Your static routes that you set up from before either by acquiring them or by pushing your routes through the Routing demon and then the routing demon will update the network table So what's any cast with any cast we can run an IP in multiple locations Basically, we just use quagga or another network demon or indeed proprietary one on on some router to Publish this from more than one place at the same level. This means that Automatically the network will configure itself to go to the nearest one and It's very easy if you were wondering it's all things like the Google DNS is 8888 and they take for for a run There's no one central server that use for that and many of the root name servers now do this as well So they're not actually one box But there are many boxes pretending to be at a particular IP address that is configured on all the world devices And they can't change anymore and Yeah, it's quite easy if you start doing it a lot one IP by one IP people will get mad at you because they're Routing tables will explode. So maybe you want to any castful networks or maybe you just want Karen to do it, but This is a good way to basically push services over multiple data centers and Make sure that everything automatically reconfigures if your data center a goes down then hopefully your bgp broadcasts Will or your bgp? Advertisements will stop going out and automatically all the traffic will go somewhere else This will take some time and it's of course better if you have better ways, but it's still better than having a single point of failure Load balancing So we have this Linux virtual server project as I say the worst name Ever for a project that does actually things at the networking level and not at the virtual server level I guess their excuse is that virtual servers were not that popular when they started, but yes This confuses things a bit. So it's not actually a virtual server. It's just virtual IPs This allows you to have a central box that Advertises more than one IP You can do that by basically just putting it on the network and then have on the other side tunneling or just direct Mac routing or like layer 2 Mac routing of the data to your back-end Destinations, which means that the traffic basically arrives at your data center gets to the load balancer and then gets balanced over your actual machines that serve the request and machines can be inserted or removed from there dynamically and even Automatically if the machine doesn't respond it will be put down by your virtual server Or you can do it remotely or over layer 3 over GRE or some other tunnel Although the defect of doing that is that when your traffic Sometime has to go back through the tunnel and then out while if you do it directly the machine can respond directly To the request without involving your load balancer on the return path Which of course allows you to scale more For what this project has amazing documentation, so you can just go there and try things that they Document in their man page and manuals And as I was saying you can either do not bullet balancing or tunnel or direct routing so What's a network namespace? This is something that came with Linux containers and it's a way basically to insulate a process From the network of the host so you can put a process in a separate namespace For indeed many things beside the network just the ps files is the ps entries so it won't see what other processes run on the machines or and We'll believe itself to be in it if it's called one in the new System and things like that right in this case. It's just a flag to clone You can separate then all or just some of the namespaces if you clone network It allows you basically to see different network interfaces the way you use this is You create an interface you put it in a different You create a couple of interfaces then you split them in a separate namespace or actually you first Okay, let's let's go through this You clone a process this creates a network interface a network namespace which is empty So doesn't have any interface at all or perhaps just The loopback interface and then you can easily create these couple of virtual interfaces that just exist in your kernel Move one of them to the other side and then basically you have a routed Network that just lives in your host that can talk to your process So your process is effectively insulated and now you can apply to a single process on your machine all the games that we're applying before So you can say basically traffic that comes from this network goes to a separate routing table It gets routed through these jerry tunnels or whatever and this applies only to the traffic that is actually generated by one process that for example, you don't trust very much or you want to insulate more because then that process only sees this virtual interface and Comes out to the host through this poly seed network So let's see an example with Alexi, which helps you to do that Basically, you don't need to use Alexi you can code all of this at the kernel level in C and libc, but Alexi allows you to do this from the shell which allows you to test it very easily so you can unshare the network and Open a bin bash that is on a separate network now you can basically Set up a local host interface and then on another Shell you can create two virtual network interfaces and up one of them Then you can move the other Vth to the other interface and this allows you to add the second IP there and set it up on the second shell and So this happens in the shell. Okay. This is clear now. This happens in the shell One but the second thing happens only after you've done these commands. So these are the commands that allows you to Move things and set it in that particular namespace Then once you have it up these two VTH zero and VTH one can talk to each other and Transfer traffic. So this bash and any process indeed the dishbush spawns So if you start a demo in this bash It will see only that particular interface and then talk to the machine through this virtual Ethernet Open this switch. So this is the updated a new part. What's open the switch and where it does come from? So open this which at the basic level is just The switch that we're used to have at the bridging level in Linux What are the differences one of them is trying to be a little bit less scary for the networking people? So this is this supports some protocols that are standard across Also proprietary network devices. So for example, open flow is implemented also by Cisco and other devices So basically you can give up control to your network people and say look My virtual machines run on this environment, but you run the network I set up on the switch and whatever configuration you push on your routers gets automatically Updated on the servers, which means that they can manage centrally the network again at the data center level And they're less scared about what you're trying to do. It might be important for your organization The other thing that it can do is do this conversation between multiple open V switches So if you have this service in multiple data centers, you can easily say well I'll have an open this which here and open this which there I link up to late open this which that are over gre Which is what I was saying before basically internet over IP over IP over internet, but It allows you to create an overlay layer to network over layer 3 Which is quite helpful if you want for example virtual machines that are in the same Network to actually be able to talk to each other Of course, this doesn't scale very well if you try to put over to remote locations a huge Internet with many many people doing lots of broadcast traffic This won't work, but for some particular things it works actually quite better than any of the technologies We were talking about before It's focused on mobility So it allows you to basically easily bring up new open V switches and Make sure that the central configuration gets easily pushed to them and then you can move your machines between one host and the other host without having to reconfigure things like Your IP table rules and all these other things that you normally have to bring We view all the time if they are configured of the open V switch level and the open the switches all talk to each other Then you're sorted This is upstream in Linux 3.3 As you all know we did runs with Linux 3.2 But luckily the open V switch guys Packaged the module as an out-of-tree module for we see so this all works with 3.2 and we see with the kernel patch, which is just a DKMS module that will compile itself and mod probe itself by just installing a package. So quite easy So how do we do that? Well, just install a bunch of packages as usual And if you're running Linux 3.2 as shipped with we see then also install those if you're running 3.3 Because you self-compiled it or you're running experimental or one day there will be backwards for this, right? Then no need for the KMS module anymore. You can just use the upstream one So what can you do with open V switch? Well, basically? first of all, there is a compatibility level for Bridges so you could use your normal bridge tools if you have the compatibility level installed To just create bridges that actually create open V switch bridges The big difference there is that while a bridge is completely in kernel so anything that Gets passed through a bridge the kernel needs to know where to send it with open V switch the The kernel still keeps a cache of where to send things for efficiency But it can go to a user space if it doesn't know so it can say well, this is a new traffic I don't know about it. I last call when we switch and open this which can go ask some remote switch or look in some database So can do a lot more things that what the kernel can normally do So you don't need to push all your table of information inside your your kernel What you need to do is making sure that open this which knows where the traffic goes Then your kernel can only have a button for the traffic It's actually passing through and all the rest can stay in user space where it can be paged out or Storing to disk and not bother your normal tables So we can add bridges either with this or with a compatibility module We can add ports to the bridges. This is just basically sugar over the normal thing Right, but this allows you to set an open V switch enabled bridge You can add a fake switch, which is basically a villain tag switch Normally, you would do this by just creating a new switch and then adding an interface which has villain tags on the switch Open this which allows you to do it in different ways one way is this fake switches and one way is just to say I'll create only one switch, but whatever traffic comes in from a particular port tag it with this villain So this is quite a change of game. We don't need to have many switches one per villain But we can use one switch and tag the villains on the source ports like a real switch would do So a real physical switch will have a configuration that says all the traffic that comes from port one is actually on villain three Well, here we can actually do this without Going for the I'll separate physical switches and finally the other way is instead just add a Top to the fake switch, which is basically what we were doing before So creating another virtual bridge rather than Tugging single ports inside the bridge. It supports both ways Qs haha, so this is actually cut I should But well, this is quite well documented and I put these non-cut on the internet So you don't need to see and copy the command line now But basically open the switch has native QS inside the switch So it allows you to set some ports to have some particular Policing normal rate and have some maximum burst rate, which is How much traffic can do when there's burst of traffic, but not at the average level? Once you configure this again, you configure it once for your whole open the switch infrastructure Hopefully and then you don't need to think about it again when you move virtual machines around and things like that because all the open The switches will talk to a central database or central point now I haven't tried the cluster mod yet I plan to but I was mostly experimenting on my laptop and I didn't have all the VMs set up to do this on Many of them, so I'm waiting to go back and have a data center to play with Jerry encapsulation It's what we were saying before so we can actually add a poor a Jerry poor to the switch and then Specify that that port is actually a Jerry interface with a some particular remote IP Same thing on the other side and then the two switches will be able to talk to each other at layer 2 encapsulated over Jerry VxLan is just a different protocol to do so it comes from the networking people who weren't happy with Jerry I'm still not completely sure about all the differences and similarities They perform pretty much the same according to some benchmarks. I've seen but your network people may prefer One or the other depending on what the actual physical switches support So you may be forced to use one or the other. What is this? Okay? This is the Open-float thing. Sorry. I cut the title. I need to fix these slides They were looking good on my laptop I swear So we can create a central controller This is an example the controller could be on a physical switch or on a router managed by your network people Or could be actually on a open the switch box and then you can on your switches set what the controller is To to actually have a single infrastructure, of course when this obvious controller becomes a single point of failure So you start having all your problems like let's have two of them. Let's load balance over them. Let's make sure that they Keep in sync and things like that. So This is the really really basics, but then there's going to be more needed to make an actual good network infrastructure on top of this user space fun So all of these were technologies involving both the kernel and some user space to do something You can do something in pure user space For example use open VPN to do encrypted IP or return tunnels to this is just a VPN, right? We all know about this but It integrates revolver that we've said before and it can allow us to say well for example my Traffic that is Non-encrypted because the protocol doesn't support encryption like pass it over the VPN link The rest of the traffic that I know it's already SSL Why wasting my time and pass it over the VPN link pass it over the non-encrypted link Vde is a user space visual switch if you don't have open this switch You can use Vde to play with all of this in user space without even being root And the socket is a very nifty tool It's like net cut but it can do a lot more so you can cut for example from a network TCP port to a unique socket or to a pipe so it allows you to deal with any kind of possible streams and Put them one to the other we use it for example to connect from standard input to KVM console because the KVM people weren't as good as exam people to Implement this very nice console service. So you just usually connect your virtual serial console to Well in our case we connected to a unique socket and then we So cut to connect to that unique socket. It would be cool 10 minutes. Wow. I'm done then so to do for next few years and Bore you some more or have some more fun More open v-switch More open flow and so do this traffic integration between different switches Have a look at s-flow that allows you to do monitoring of data Over open v-switch. So this allows your open v-switch is to report how much data is used by the various interfaces to a central point Try this cluster level for the open v-switch and Try to see if I can attach the open v-switch to an unbound gary tunnel to do this in a Completely decentralized way through IP lookup table rather than specifying what the remote endpoint is But this is just to do some ideas. I have I haven't tried any of this so Q&A suggestion hints any other questions that are this point. I think nine minutes. So I have a question more going back to the routing demons Have you actually tried using a routing demon on commodity hardware on Linux with Substantial traffic or is this just usable for experiments So I know some of our teams do it for Actually running road load balancers. So they advertise the load balancers IP through a routing demon. I Don't know how substantial the traffic is it's basically substantial traffic But just for one server that one service that needs to be load balancers That might not be for a whole data center, but then again the routing demon doesn't need to point to itself so The demon needs only to scale to the point at which it Advertises the right routes then the traffic doesn't need necessarily to pass there You could actually divide this traffic between many actual physical boxes if your actual hardware box is not a Very good network device or is not Completely dedicated network device and can't handle the whole traffic. Yeah. Yeah, I can only it's probably more a question about the Performance of the Linux Networking and routing stacked and about the routing. Yeah, so I think your question one was more Have you tried routing huge amounts of traffic through an actual Linux box rather than just using the routing demon over? Yeah, I think ruining routing huge amounts of traffic unless you have Lots of network interfaces and lots of cores dedicated to this as we know will not work But we can scale scale horizontally and use more boxes basically use more load balancers or indeed just from the switches use the routing demon to actually allow us to for example do any cast over many many Boxes so Divide the traffic to many of them rather than centralize this Which we would need if it was one box on an internet without routing what much traffic Yes, so well one gigabit or multiple gigabits should be enough for a modern server multiple 10 gigabits is what modern servers are difficult to do with but multiple gigabits is fine multiple 10 gigabits is not Nobody else I Everybody is completely too confused by this or they all knew about it or Conversation on IRC I think One question about open we switch does it actually Incorporate any kind of encryption if use that Multiple open we switch instances. Yes, so you can either do it unencrypted if for example, you are running on a private network or you can use SSL and it has its own PKI so you can point it at the certification authority. You can do SSL tunnel So you can do actually SSL over Jerry over blah and so on. So yes, it does that I think There's a question by Zobal on IRC. How does open we switch work with you carp with what you carp? No idea He had a question. I'll look at it. Actually, I really have no idea. I Know I thought you had a question. Sorry for pointing him at you Well, then enjoy your 10 minutes go for coffee and Have a good lunch. Thank you. Thank you