 Hello everybody, and welcome to Barcelona. Sorry for the dark, rainy weather. I'm a local and I swear it's not usually like that. Although that being said, autumn be autumn, and you cannot plan for those things. All right. So the presentation that we're going to give today is about container networking, a bit of an introductory one. So how many of you are familiar with containers and run containers in production? That's a nice show of hands. Very well. So I hope that it will be very relevant to your current knowledge and that we will expand it a bit maybe. But before we do that, let's introduce ourselves. Myself, I'm Antonio Segura Puimeton. I work at Red Hat. And also that second logo is from the Superfluidity Project, which has a lot to do with containers and microservices and so on. You can introduce yourself. I'm Neil Jerem. I'm with a company called Tegera, which is the originator of the Calico project. And yeah, so that's me. I'll be talking a bit later on. And then we'll have Flavio Castelli, which I hope will arrive very soon. Otherwise, I'll have to present his part. And well, it's going to be a bit of improvisation, but it's good. It's good. All right. So let's continue. First of all, I want to give a word of disclaimer of which are the constraints we took to take on such a broad topic like container networking. There are many container engines. Like I'm sure that if I ask, let's ask. So anybody here run LXC or LXD? You see, there's many container engines. There's also Rocket and so on. But today we'll just talk about Docker, which is the most common one. And also about container orchestration engines, we're also going to restrict ourselves to the two most common. There's a big missing one, which is Mesos, which is also quite common. But due to time, we had to restrict it to just swarm and Kubernetes. The thing is, each of these orchestration engines comes with a set of assumptions, or sometimes even an interface that kind of couples the solution to some kind of networking. Sometimes you can work around that, sometimes not. But we're going to explain a bit how that can look like. And finally, of course, what I'm saying today in two months is probably going to be obsolete because somebody else will come and innovate and so on. And that's really a good thing. So the problem is with the advent of containers, a more important or more field-changing thing happened, which was the arrival of the architecture based on microservices. How many of you know what microservices are? All right, let's work for me. Perfect. So due to the nature of microservices and how they are split into a lot of small containers that each serve a part of the application and that each part can be served by multiple containers, this implies a lot of endpoints that you need to connect to your networking and a very changing one. Because the good thing about microservices or rather one of the good things is that they allow you to scale up, scale down, depending on the necessity. So it gives you a lot of agility. So let's start first, though, with the foundation, with all of you found when you first started with containers. I suppose like most of the developers here, you started by installing Docker and seeing what you can do with Docker. So let's start with the most basic one. When you put create a container and you would not have any kind of networking thought out, you would just plug it into the networking of the host. So what happens with this is that you just have access to any of the interfaces that are on the host. So if I start it on my computer, I'll have the ethernet and I'll have the wireless card. And if you do something like show the links that you can see in the container, you can see them all. Of course, having the possibility to use them all doesn't mean that you can do changes on them, on the devices or on the routing tables and so on. Because what that would imply would be you could wreck havoc into the host networking. So typically Docker, when you start the container, it will drop some of the capabilities. So you cannot do things like flash the addresses or change the routing tables and so on. You can do it, however, if you pass cabinet admin, which is a Linux kernel capability. All right, so well, one thing I forgot to say about the previous slide is when would you use host networking? So a very typical use case is when you want to do infrastructure containers. So let's say that you are running Kola. So your Nova, Neutron and so on are running on containers and you want to network them. So the most common way to do that is to just place them in the host networking and to have the host networking configured. Of course, you could do it otherwise. You could have Kola with a VLAN based networking with the driver, but it's not so common. So this is the mode that most of you used for the first time since it's the Docker's default in which Docker sets up a Linux bridge for you that is not connected to anything. The only connection to the outside world is through forwarding and I'm gonna show a bit how it does that. And to that Linux bridge, which in truth it's more like a switch because it has a lot of ports, it just puts a VF device and which is like a pipe and what goes through VF zero appears on VF one. And the good thing is that VF one and VF three in this case are already in the kernel namespaces that Docker sets up for the containers. So how does it look like when you look outside and inside? So from the outside what you see is that you have a bridge with an address. It's a private range, so it's completely isolated from outside, but since Docker wants you to be able to serve things or the containers to go out, so it sets up forwarding and for the return flows it sets up masquerading rules. And from inside the container what you see is that you have an IP on that private range and the default gateway is the one on the host networking. The other important aspect is as I said before that you should be able to open the networking that you give to the containers so otherwise you cannot serve anything. So it would be quite useless. And to do that it uses IP tables. And as you can see here that is the Docker chain that sets up a D NAT rule so that the port 8,000 on the host will go to the port 80 on my NGINX container. And now I give the word to Neil. So now Demi you move on to kind of wanting to run containers on multiple hosts instead of just one. And when you do that then two kind of new factors come into play. So one of those is that you need to start being a little bit more careful with your IP addressing because if all of the containers were all just on a single host and if you're only talking between containers on a single host then the IP addresses which those containers are gonna use are never going to go anywhere outside that host. And so it doesn't really matter what IP addressing you choose there. But if you have a set up like this where we've got three hosts and let's say you want the yellow containers to be able to talk to each other, one of those on host A, one of them on host B then whatever IP address container one has that's gonna kind of go outside host A. So a little bit more care is needed to avoid allocating overlapping IP addresses in cases where you don't want to allocate overlapping IP addresses. And secondly you need to address the question of how the data gets from one of those hosts to another. And of course those things since we're at an open stack conference here. These are things that open stack of course has analogous answers for VMs in neutron. And if we look at the transport question first, the question of how that data gets from one of the hosts. So just to pin this down a little bit more. So imagine container zero one is sending a packet to container zero two. So clearly what first thing that happens is that data goes from the container on host A to its host. And then that host has to do something to get it from host A to host B. And then once it gets to host B, host B is able to deliver that packet to its local container. The one in yellow numbered, numbered two. And traditionally neutron has solved this problem using overlay networking. I think called an overlay network. So an overlay network basically means every packet that those containers want to send to each other gets wrapped by the source host inside some other IP packets. Then that IP packet is addressed to the destination host. So host B in this case, when it gets the host B, it gets unwrapped and delivered to the destination container. So, and this means that the addressing that the containers have is completely independent of the host infrastructure addressing. It also means if you do that kind of wrapping up such that you include the layer two, the ethernet headers that were sent by the original containers, then you can still simulate a layer two adjacency between the containers if you want to, even if there were, let's say, intermediate routers in the fabric in between host A and host B. So that's not shown explicitly here, but there could be if you're running on, let's say, GCE, for example, there were intermediate routers between every GCE node. So that's what overlay networking allows you to do. And also just on the addressing point once again, because of the point that the addressing of the containers is independent of any IP addresses that are used in the host infrastructure network, that also means that you can use overlapping IPs. You basically can allow different users of your data, of your container cluster to, for example, what's called bring your own addressing. So they can quite happily, as long as you've got kind of a user of certain containers over here, say the containers in pink, and a user of the container, another completely separate user, say of the users, the containers in yellow, and as long as those never need to talk to each other, they can use the same IP addresses. It just means that when you kind of work out this way of wrapping the packets, when you encapsulate those packets, there needs to be something in the way that you encapsulate that it kind of has a scope in it that is able to differentiate, let's say, if both containers using 10.65.02, then say this is 10.65.02 for the yellow containers and this is 10.65.02 for the pink containers. So that's overlay networking in a nutshell. I hope I'm not just telling you things that are already absolutely familiar to you all, but it comes at a cost of having to encapsulate and de-capsulate those packets all the time and there's a performance cost to that, there's a slight complexity cost to that, there's a cost in terms of MTU, having to know what's going on with your MTU. It's also a bit more difficult to troubleshoot. It means that in different parts of the fabric, if you're looking for a packet going from one place to another, or more typically looking for why you don't see a packet going from one place to another, then you need to know that in a different part of the network you need to look for it in a slightly different form because it will be wrapped up in some way. And so is there any alternative to that? And in fact, there is and this is the thing that we're going to call routed networking. So if you have a situation, so look at this and say, kind of like look at this and say, well, what would I need to say if I didn't want to have an overlay network? So if you had containers which didn't actually need the layer to adjacency to each other, and also if you didn't need bring your own addressing. So in other words, if you are happy for all of your containers to get their IPs from a managed flat IP space which was managed in such a way that there were never any conflicts between what one group of containers wanted and what another group of containers wanted and also managed so there were no conflicts between those IP addresses and the host infrastructure. Then you don't need this overlay network. Then essentially you can basically just do IP routing to get the packets from one container to another. And that packet will never be encapsulated or de-capsulated anywhere and would look the same. And anywhere you see it in a fabric, it will look the same. And this is what we call calling a routed network. Yeah. I do. Sorry. So I'm gonna talk about a load of specific projects which are now implementing these approaches and that was all kind of like preamble. So to just go through some of the routed and overlay projects which kind of provide all of these approaches. So, and actually that's not the next slide. This one is. So this slide is showing some routed approaches. The thing on this slide which is saying docker zero is slightly, is not correct for all of these projects. So actually where it says docker zero, what you should think of is that a packet is coming from one of the containers and is actually going into the routing table of the host. And then the host is routing that to wherever it needs to go. So I mentioned some implementations down there at the bottom, calico, flannel, and romana. So, I'm gonna talk about flannel and romana first because they are similar in that what they do, the way they manage the routing is that they say, well a certain prefix, a slash 24 for example, belongs to a particular host. And then if, so imagine container A is sending to container D then the routing table on host A could know that all of the 10.0.9 addresses are for containers on host B. So basically it can put an entry in its routing table which says 10.0.9 slash 24 via that IP address there which is the IP address of host B. So 172.16.0.5. And that's how basically how flannel and the routing elements of flannel and romana work. So calico takes a slightly different approach. So calico actually doesn't say we want to kind of reserve a slash 24 for every host because when calico started we thought that was a little bit inflexible. So we go instead with allocating slash 32s and I've rearranged the IP addresses here so you can see that you can have say 10.0.8 on actually one on host A, one on host B and similar 10.0.0. So basically with calico the addresses can be anywhere. And instead what happens is that we use BGP to propagate those routes around to the other hosts that may need to forward data to those addresses. So to put a bit more, just a bit more leads on that. So what happens in calico is that something the orchestrator says, well, there's this container which should exist on host A. It's got an address of 10.0.8.2 and it will be connected to the host. It has its own network namespace but it's connected to the host by a v-th pair. And so the first thing that happens is on the host we say we create a route in the local routing table which says, well, to get to 10.0.8.2 please go through that v-th interface, okay? And then what happens is the BGP speaker for which we typically use bird notices that there's this new route in the kernel to 10.0.8.2, then exports that to all of the other it's all BGP pairs. And the result of this is that the BGP on host B ends up with a route that says 10.0.8.2 slash 32 via 172.16.0.4. So that's how calico works. But the kind of the fundamental routing idea is the same. Now the interesting thing about this is that, how are we doing for time by the way, do we need to? Okay, so the thing about these route approaches is that you need to know where you're going at every hop. So basically we've got all of these IP addresses associated with containers and at every point where you have an IP routing operation in your data path you need to know where the next hop is for a particular container address. And so if you, sometimes that's tricky because sometimes as kind of indicated by this diagram there are no intermediate routers between host A and host B but sometimes they're all. And if you've got intermediate routers, you know fabric then you either need to, if you have the option of programming those intermediate routers or peering with them, then you can arrange that they all have those container routes as well or else you need to do something to skip over those and skipping over them is essentially introducing some other form of little tunnel or another little bit of overlay networking in this particular section of the network. And so what we then, and this takes on the funnel because what we then move on to is a kind of a kind of a big picture in which sometimes you're doing routing but then sometimes you're kind of using a little bit of overlay or tunneling to get between two places. And Flannel has some interesting, so Flannel's a different project originated from CoroS. And they, as I said, they do have this mode where they do routing and they assume that every host corresponds to a slash 24 prefix. They can also do the bridging as Anthony showed. So basically they can bridge onto a device which performs a UDP encapsulation or a VXLAN encapsulation and those are both forms of bridge networking that Flannel can do. Or they can, but they also have a couple of nice options for running on AWS and GCE. So both AWS and GCE, GCE always has routers in between any two nodes, but it does allow you to program some routes into the routing table which GCE uses. And so you can make use of that facility in order to traverse those intermediate GCE routers. That's an option and Flannel provides that. And AWS, of course, if you're running on the same subnet in a VPC on AWS, then you won't have intermediate routers, but if you have hosts in different subnets, then you will. And again, AWS gives you the option of programming the VPC routing table. And that's another thing that Flannel can allow you to do. So in other words, the upshot is you can have containers and multiple hosts on AWS and Flannel will let you connect between them. So a brief mention, this is a bit tiger centric to my company. So my company originated Calico. We kind of, Calico kind of started off with this kind of very flat IP based connectivity model that I've showed you. Later on, we kind of realized that security was very important. So we added a lot of kind of quite interesting layer for describing the security that you needed a data center and then implement mapping that down onto IP tables. And that has really become, if anything, a more important part of what Calico does now than the basic connectivity. And we also were speaking to CoreS. And CoreS said, well, sometimes the flat layer free connectivity doesn't completely work on its own. You need to kind of add these methods for getting across intermediate routers, say in GCE. And so we actually came up with this combined project called Canal, which is fundamentally the idea of that is to say, we would like to provide a system which provides all of these different connectivity options for getting wherever you need to between your hosts combined with the Calico security model for securing those connections. That's basically what Canal is. But fine, let's skip that because I don't want to spend too long on that. So those are the routing-based approaches. So for completeness, let me just touch on all of the overlay approaches that are available for containers. And there are quite a few of these. So Docker itself provides an overlay method. So all of these fundamentally, what is happening is that you're adding some encapsulation for getting between the hosts. And in cases where there are multiple scopes for the addressing for knowing which scope you're kind of, which scope the encapsulation corresponds to. And so Flannel, as I've already said, has UDP-based and VXLan-based ways of doing that encapsulation and providing those overlays. Weave is another important project which does that. And Docker itself provides a native overlay. I don't actually know full details of those myself. So if you have questions about that, I hope my colleagues will be able to answer at the end of the talk. But I do want to talk about career at this point. And so career is the project which basically says that what if you're already really familiar with the Neutron API and what in particular if you, Neutron provides various forms of basic connectivity, overlay and rooted-based, as we've been talking about itself, what if you also really like some of the things in the Neutron API which are built on top of that, couldn't you just use that same abstraction, that same API to connect all of your containers together? And that's essentially what career does. So the first kind of phase of career, and Anthony will correct me if I get only this wrong, the first phase of career delivered that integration for Docker. So basically making a lib plugin, a lib network plugin, I mean, which connects to the Neutron API. And it means that anything, any kind of connectivity you can configure in Neutron then becomes the way that your containers are connected together. And the next phase of a career is looking at extending that to Kubernetes and also to multi-level connectivity so that you can kind of connect both to VMs and containers and containers, containers running bare metal and containers running VMs. So what that means is that any kind of form of either overlay networking or root networking that you can do in Neutron by using career to connect to your containers, you can actually do those same forms of connectivity or for containers via career as well. Fine, and I think I've probably covered this slide already. So yes, I mentioned the VMs, containers and containers and VMs and essentially whatever form of networking you choose, you can connect kind of hybrid forms of workloads together using this approach. So fine, so I'm gonna come to the end of my part here. This is just a quick summary or comparison of rooting and overlay networks. From a rooting point of view, perhaps they have better performance than overlay. I made this point about them being a little bit easier to troubleshoot and that wherever you look in your fabric, a rooted approach will, the packet will look the same. On the other hand, there are kind of typically points within an infrastructure where you don't have complete control over every intermediate router. So either you need to tunnel that part or you need to, or maybe if you do have control but then you need to do something extra in order to kind of peer with those intermediate routers so they know those routes. And that also means that in overlay networking because essentially you're always tunneling from one host to another, it's not a very big step to go from there to traversing multiple clouds, to adding into cloud as well. Whereas if you kind of like starting from a rooted networking implementation where you're generally speaking or perhaps not at all tunneling anywhere, then it's a bit more of a thing to add on there. So fine, I think that's pretty much what that size says. So I will kind of rest and talk to Flavio. Okay, thanks a lot. So you might think that now that we have connectivity, everything is done, everything is settled. In fact, there are still some challenges that are specific to this distributed environments that you still have to face. So first of all, there is the problem of service discovery. So to define the problem of service discovery, let's introduce some concept. So you have a producer who is a container which is running a service, and then you have a set of consumers who are clients of this service. So now you need to find a way to put the consumers in contact with the producer. So the consumers have to find where the producers are located. So for example, here we have a web application running on host A, which is looking for a radius database which is running inside of another container this time on a different host, on host B. So you can solve that problem and have the two of them in touch. But then what happens when, for example, the radius database is suddenly moved from host B to another host. Like what happens if there is a failure on host B and all the containers running on host B are automatically relocated to somewhere else. Now the web application is going to be broken. This is something that we don't want to happen. Also, what happens if, for example, you have multiple producers? So like here we have multiple containers running on different host, each one of them providing radius database. Now which one of these radius entry points we should use from the web point of view? So to solve this problem, there can be different approaches. You might be tempted to do things like, let's say the old way, so like to use DNS. But now you have introduced different problems. First of all, containers are really fluid. They keep moving everywhere. So you cannot assume that the entry returned by your DNS server is still valid over time. So you don't want your clients to cache these results. So you have to configure your DNS server to return replies with a really short time to leave, maybe zero time to leave. And now you will think you're set, but that's not true, because unfortunately you realize that there are lots of broken clients out there that do not respect the time to leave. So they will keep caching this result and they won't react to sudden changes of infrastructure. So you will end up with something which is broken in an inconsistent way and it's really hard to debug. So this was the approach which was used in the beginning with Docker when distributing containers. So initially they decided to update, et cetera, host, dynamically then they introduced an integrated server. Now we've won the 12 things for a bit different. So I'm going to talk about that later on. The approach used by a lot of people is to just rely on a key value store. So something like at CD or console or zookeeper. So the producer as soon as it starts, it will register itself into these yellow pages and it will say, hey, I'm here with this IP address and this port number and then the consumers will either look up manually into the yellow pages or in most of the cases it's up to the orchestration engine to just automatically inject this kind of information into the application. So for example, there could be environment variables or configuration files that are created with this kind of information. The other problem we saw before was how to react to changes, how to handle multiple choices and a new one which is handling ingress traffic. So to solve the problem of multiple changes and the problem of handling failures, orchestration engine like Kubernetes or Swarm starting from version 1.12 of Docker, they have this concept of service. So as soon as you create a container which is a producer, so it's offering a service orchestration engine will also create a virtual IP address. So now this virtual IP address will automatically redirect all the requests to any of the containers which are providing the service. So what you have to do is just to point or your consumers or your clients to this virtual IP address and that will solve the problem of figuring out which radius database we have to use like in this picture and you can also figure out what happens if one of the nodes goes down. The nice thing about that, the last thing is that given that this virtual IP address is stable, you can just add DNS on top of it to just make legacy application work because now the IP address doesn't change so there is no risk of caching an IP address which is not persistent. So the ingress problem is about exposing an application which is running inside of your container to traffic from the outer. So what you can do, I'm going to show you a picture, you can just ask Kubernetes or ask Docker's worm to just publish the service on each worker node of your cluster. So a port number on each node of cluster is going to be reserved. So all the traffic which is directed toward this port is automatically redirected to one of the containers which is actually running the service. So now you take all the traffic from the internet, you pipe it through a load balancer like a traditional one, you configure the load balancer to point to all the worker nodes inside of your cluster to visit a specific port on each node and then you're done. It's really important to notice that like in this chart, it can happen for example that the load balancer redirects the traffic toward a node which doesn't have, like host C, doesn't have any instance of the gas book container but that doesn't matter because from this public port on the node, everything is redirected to the virtual IP address of the gas book which automatically finds out where the gas book is running. So to recap and leave some space for your question, there are several approaches when it comes to networking in the container world. You can have approaches that are either using an overlay or a routing approach that can be implementing the CNI or the CNM specification but most important of all, it's not just about connectivity, it's more about that, it's about security. As we have seen before, it's about handling changes and handling ingress traffic and all of that. So with that, I think we can start a Q&A session. If anybody has a question, like, I'm sorry, I didn't realize this was so short. I can bring the microphone over, any question? All right, so thank you all for joining the presentation and if you have any question that you don't want to voice now, you can meet us later. Thank you.