 A lot of people work in a small place where I can ask questions and people can give suggestions and we can take the work forward from here. So some of the topics, with containers, what we were suspecting earlier, I personally found it out was that scale is not just about how many containers can be connected, but it's about how quickly. I'm going to talk about this a little bit. We have to scale the control plane, we have seen good solutions, comprehensive solutions in open stack, open delight, all these projects. We have worries about scaling the control plane. There's a control plane which figures about who's getting born, who's getting who's dying, and how to get the network through and through working with them. We need to scale that. This was a challenge with containers. I'll talk about that. And case of mixed workloads, the eventual thing is that networking is about eventual computation needs to happen. They are nested compute units. There will be containers. There will be inside VMs. VMs will be inside VMs. VMs will be inside bare metal. We never get to see them, probably cloud and whatnot. So those cases need to be considered for networking. There's a challenge of passing the user intent from the orchestrator. Now when I mean orchestrator, I mean the container orchestrator. Somehow there's a divide that the networking provider is a networking whole thing which supplies a software. And then there is a container orchestrator, docker, swarm, or Kubernetes, or OpenShift, or Tectonic, or whatever. Non-rather, I'm just saying in general. The application developer wants to use the application and wants to use the network eventually. But how is the intent passed that what do I want out of the network? What are the tiers of my application? How do I pass it on to the networking vendor, which may be different companies not provided by the orchestrator? Kubernetes at least does not have its own networking solution. It has plugins. So that's a challenge. And there's a challenge of making the networking providers work together. This is probably where I would hope that some of you can bring in solutions or ideas on how there was when Chip was talking about there was a question on encapsulation and double encapsulation and probably triple encapsulation. We need to solve that, okay? It's not solved yet, and we need to solve it. And there's probably ideas. How do these make these networking providers, which are working on different layer, parallel layers, work together? I want to discuss about the marketplace. There are so many solutions out there. I have a few that I have in mind. We can discuss about what they do, how they're different. And hopefully, I can do a demo. Just five minutes before the network connection, whatever, it was so slow. It was doing 400 bytes per second, so it wasn't working. But anyway, if the demo works, it's gonna be a live one, like really live. So I need the internet for that, hopefully it works. So anyway, coming back to the original topic, when containers, they come in really fast and disappear really fast, that's absolutely entirely different from the way earlier nodes, how they used to be. We know that they're probably just processes. I mean, you can call them containers, whatever, new talker or rocket or something, they're just processes. They're just in a different namespace. From the network point of view, there is a new network namespace. But from the user point of view, they just want a process. I mean, you would say, this presentation is in a process of creating a container, like do a Docker container for Vamol, so everything. They're born and up and running in less than a second, not minutes. And the infrastructure needs to have a really, really fast response on providing a network endpoint. The requirements that these processes have, not only that they want connectivity to the internet or external network, they want inbound connections to start happening from load balancers. They want other policies to be there. But they all need to be provisioned very fast. And that's the challenge of the containers, which is slightly different from VMs. You wouldn't have minutes there, you would have. So pretty early, at least from where I have been working on the containers, and the networking is, we came up with a solution of preallocate. You cannot have VMs be born, and they take two, three minutes, and you can say, hey, let's get that wiring up and done. And see what policy it might get attached to. And while things are coming, we'll take care of it. We need to preallocate. Why? Because when the container is born, there is only one thing running in it, the process that needs the network. So we need to get the network before the container is born, and not after the container is born. You can't have F0 or F1 be created after the process has already started. Because the process will, as soon as it starts, the first thing it's gonna do is do a bind on the network interface. So we need to preallocate all these things. All the requirements of the containers on the host. So I would think that probably the idea is to pre-create like a whole bunch, which just comes out of the box. And as soon as containers are born, everything is pre-wired. There is no IPAM fetch for don't try to get IP address for it or something. No routing updates across, proliferating across and something. What we need is probably, even if the things get wasted. For example, there is a host which can host 256 containers. We pre-wired everything for it. The container is not even born yet. Probably there's just two containers running. But we need preallocation so that everything that is born, we just get the vf pairs, the previous talk was talking about vf pairs going into the namespaces and outside and the OBS bridge. Just the vf pairs need to be created and wired in. Everything is pre-decided. And that's how the difference between networking with containers is that we need to preallocate. So this is the diagram. There'll be certain requirements around subnets. It'll be different for tenants. So let's say we get a subnet A, which is some IP address space here for tenant A, then we keep spare subnets around. We don't even know how many containers are going to be born out of that tenant or how many containers are going to be born out of that tenant which is not even there. So we need some handshake with the orchestrator here. And we need to start preallocating. So let's say this had 256 containers. And by the time we reach 220 containers or something, the controller should already figure out that, hey, I'm not going to wait until the end where I'm going to find out, hey, let's get a new set of 256 endpoints. I'm just going to preallocate and keep ahead of the pace of the containers being born, because containers will be born really fast. By the time I'm ready to get the whole bunch out given by the IPAM manager, I should be already ready with the wiring. I should already have the isolation ready with subnet A and subnet B across node one, across node two, and even across the clusters, if that's the need. And I'm still not talking about how it is done. This is what I'm saying. This is what is different than when we do networking with containers. The other part was the control plane. Quickly, when we tried a little bit of the existing technologies, for example, ODL, we figured that there's a central place where things are computed and pushed down the OBS TV, if you're using an open V switch thing. So to keep the scale out, the central computation should not happen. We should not have a place which is central and saying, now there are going to be thousands, no, no, no, 10,000 or probably 100,000, or maybe millions of containers getting born, and each of them, they have their demands. But the computation of the network or how the policies are going to be pushed down, that cannot happen centrally. That computation has to happen locally, also for security reasons and also for scale reasons. However, the entire state of the network needs to be kept at a central database, which could be replicated or shorted, but that's one of the learnings that we figured is different from when we're talking about massive scale in a container cluster. Some of the security aspects is that if there are several nodes which are hosting containers, and those nodes could be nested, by the way. What if a node gets compromised? It should never have so much power that it can spoil other nodes. It should never have so much computation power that it can access to the database so that it can spoil the database so that other nodes get spoiled. So these are the things which get really important when we do containers. And networking with them, some of the things were kind of missing when we looked at solutions which were not just container specific. This is about user intent, and the container orchestrator could be Kubernetes, and it could be OpenShift, it could be Docker, it could be Swarm, something from Red Hat, maybe not from Red Hat. But this is what I was talking at the beginning, that the networking provider will be a separate vendor, most likely. And there will be demands from the container orchestrator saying, I need some isolation policy. I need some external network access. I need external network access, and I can reach the internet or expose to external network, which means I have these green containers. Or the yellow containers need to be exposed to the outside network, which means they can be reached directly. While you can have an overlay for the green containers. Or there's an isolation policy, of course, green can talk to green, and yellows can talk to yellow, but not to each other. Then there are other things which are great limiting, throttling. It's a challenge to understand how the operator of the Kubernetes cluster wants to enforce them. There are no APIs yet, and we need work, which needs to happen there. We were working on this, we are working on this. The trouble becomes that if the orchestrator becomes standard, then how do all the networking vendors kind of adopt that standard? It's a little bit of a challenge. Some of the container SDN basics, this has been covered before. But it's a standard thing that we want to do overlays. Overlays use encapsulation, VXLan, Ginev, MPLS. They're common encapsulation standard. The diagrams I've been discussed before, but this is particularly our, in a generic fashion, this is what happens. They're two nodes, they're containers, they attach themselves to a V-switch. And there's an encapsulation between them. And this is how you go, there's nothing specific about it. This recap of the basic few extra requirements which come from the orchestrators, which are multi-tenancy. A very specific one which comes is DNS related. There are almost always there is a load balancer on container orchestrators like Kubernetes or OpenShift. There is a special deal about load balancers. Because load balancers could be global. The containers would have tenants. You could have two companies hosted on OpenShift Online, which do not want to talk to each other. Their network should be absolutely segregated. But there is a common load balancer. Just bifurcated by the layer seven DNS. That load balancer would itself be a container. Everything would be a container on an atomic system. You would see everything is running on a container. So even the load balancer is running on a container. It would scale as a container also. But this is on a special network. It's a special tenant, it means it's an admin tenant which has access to all other tenants. All other tenants don't have access to each other tenants. So these are few extra requirements which come. And there are solutions which have to be developed up and new to meet this requirement. There are other requirements like there are admin roles which need to be done on monitoring, debugging tools, which need to be run on hosts and nodes. And those ones have to bypass all the multi-tenancy requirements. So we need to have, if we're using Open vSwitch, we need to have special rules to say, well, this is an admin tool. Although it has security that if one node gets compromised, the other nodes do not get compromised. But within this node, please allow that it can access the other nodes, all tenants and all containers and see what is going on, who's getting bottlenecked, who's getting what. So these are the aspects which need to be looked at. These are, you guys can correct me. I'm not so, what's it called, mature and networking, it's new for me. But these are the things which are different, which happen in containers, which apparently do not happen as much at scale in VMs. There is a case of nesting. There will be containers in VMs. There will be VMs in VMs in VMs and whatnot. There will be VMs in metal and there will be containers in container. And Google really likes to hype about it that we run containers inside containers inside containers. That's not entirely true, but they like to say it. It started from that LM-CD5 project. Let me containerize for you. But we understand they have the Linux kernel level. We still don't do network namespaces to, we don't nest them when we do Docker. But the use case is there. You would have node one, which has spun C1 and C1 thinks it's its own master and needs to do some more sub jobs or something. It will spend C2 and C3 and all of them need to have network connectivity. And each one thinks it's, there's a hierarchy of who owns who. And when the applications are running, they're running in different tiers. They're probably owned by different teams. And they need boundaries between them. They need policies between them. And that becomes a little bit of a challenge for when the entire application has to work. For example, there'll be compute units in containers. There'll be some mathematical jobs be spun off by those ones inside another container. And there'll be some good logic which is probably middleware logic running inside VMs. But when it comes to database, people may still prefer a real metal, but all of them need to be connected, but still isolated. So those policies become a challenge and we need to see how this nesting of things need to be solved from a container since the advent of container, I would say. Yeah, that was the diagram I was trying to explain the words. I'll come back to this diagram. And all of you, any of you, need to suggest how this should be solved. Would there be one single solution which will do it all? Probably not, because you can do it when you say, hey, I own this all, then I can do it all. But from the marketplace point of view, it's a challenge because these containers would be run by probably an orchestrator which is not the orchestrator of the VMs. And these VMs would be running on a layer, probably open stack or maybe multiple layers which are not plumbed by the guy who did the bare metal. So there is an interoperability challenge. This is a standard thing that has come to OpenShift team, I work for OpenShift. And they say the bare metal network was probably provided by Cisco. The VM network was provided by a neutron plug-in, say, and the container network by a Kubernetes plug-in, say, OpenShift SDN. Now, can you bridge them all? That was the question which was asked. Can we have layers removed? Can we not have double encapsulation, one done by OpenShift SDN, then done by the neutron plug-in, and then finally who knows what was going on at the bare metal level and they were probably nesting going on. Nobody talks to each other. And I don't know. I've drawn the line. These are my stupid ideas from two hours ago. So how to bridge them all? Can we make a standard? Can we say, hey, you know, I did the VXLanit cap. Why do you need to do it again? Just use the same VNID or do something else or use the GNF standard and have some headers. You know, they have extensible headers there. So use that. Yes, exactly. I mean, most of the set is just sorted for no one. Yeah, it's not exactly solved. So yes, I'm in good talks with the OvenGuys, Russell and Kurchan Chakri. And the courier project is aiming at bridging the lib network and a neutron. And so here is the picture again. The containers and VMs can be done by Oven, which is the Oven project. The VM itself was plugged in by a neutron. But when a container was born, it says, hey, Oven, can you plug me inside this network? And Oven says, okay, I will do so. But I am living inside a VM and outside me there is neutron. So I need to tell neutron that I am going to VX and cap this. And when I VX and cap this, when you go out, don't double in cap. So can we use the same IDs? So they need to talk to each other. And we need to have an API there. So yes, this is a solution. It's a valid solution that now the courier project has taken the initiative to have one API, it's an actual extension of the neutron APIs where you can say, these are the ideas which I'm gonna use and please reuse them for those tenets because the tenant inside the container called X is the same tenant which is called Y inside the VM world. And who knows what it's gonna be outside. So we need to have those APIs. But as of today, it still resides in neutron. What happens when there's more nesting going on? What happens when there is Kubernetes inside Kubernetes? I'm not joking, there's no use case probably, but it may be laughable. Yeah, it's true, but people will do it. I mean, there is already docker in docker and I have a demo for docker in docker. Hopefully it works. I have a demo of exact this diagram actually. I note this here. I have this diagram, yeah, damn it. Okay, so there's this machine which is bare matter and I have a VM, oh, I didn't draw the VM there. Then I have a VM and inside VM, there are containers running and those containers are OpenShift nodes. They are not actual workloads. And inside those containers, there are docker containers running which are actually serving something useful and they're still not doing the entire job. They still think that they're master and they may spin even more nested containers. So I don't know if the demo is gonna work or not, but that's the exact demo that I had. And you're right, Oven would solve it and Oven is great because it would say, I will hide all the open flow. We were asking, right, who needs to solve the open flow ugliness? Oven will because it just says, create a switch, add a port, create a switch, add a port. You need a bridge to switch, it's created out of. That's awesome, right? Because it's not speaking all that ugly language because it's creating all those ugly languages behind the back. So we like Oven and Oven will connect the containers. Oven will connect VMs because it just understands ports. If you can create a controller, it will understand it. But the interoperability is a challenge. So. Honestly, if you're an explosion of this kind, if you have that open stack, one shape, and inside a docker, I think you did something wrong in the first place. No, unfortunately, I mean, who am I to say? I mean, this came from the PM's first. Because they're hearing it from the customers. They've done open stack. They've done their VMs and everything. And then now they want to do container workloads. They're going to run OpenShift on it. She has called this spade. I absolutely agree. It's probably over engineered. But let's call it that then. Let's say we'll stop it and we'll not do it. And there's the other challenge of, I don't know. If people have done OpenStack and they've done VMs, they're going to say, let me have OpenShift nodes being spun up by OpenStack. I don't know how far we can unsell it. It's already sold. So from the networking point of view anyway, if you also see the case of, this is not just Red Hat. I didn't know this was just, we're going to talk so much Red Hat. If you see the Mezos case, Mezos thinks it's the data center brain. I hope I'm not running out of time. Mezos thinks it's the data center brain. It is not only going to spin VMs, it is not only going to spin containers, it is going to actually spawn processes, Hadoop workloads and whatnot, database stuff. So now we have so much of a mix going on. And the VMs that Mezos will spin, it doesn't even have an account of what exactly it's doing. So people will run Kubernetes on Mezos and they're actually tooting the horn about it. We can't stop it. And it would happen, I think. If you look at, if I see sometimes this as you know this whole internet thing and then the big data center thing becoming as a big sky net thing. So I'm not the first one, I mean this is borrowed term, but this is what will happen because it will modularize itself and we can't prevent it. So anyway, without argument, this is the reality is that it does exist. One of the weird ideas that came with some person that I was discussing this was how to do interoperatability with IPv6. There's NVGRE and I sit inside these containers or VMs. And then there is Opusher as the NVG does VXLAN and Neutron does something and these are all these layers and everything. How about having, shoot me down right now if it's so stupid. But how about having IPv6, which everyone understands. No one is going to complain. I'm not going to do NVGRE, IPv6 everyone understands. How about at the boundary of your own overlay transferring it to IPv6. You're all IPv4, IPv6, whatever it was. Get to IPv6, IPv6 has the big advantage of headers also being changed. You can have, it's like a link list. So you can offload some of the specifics of your own overlay onto the header as a stack and then pass it on as an IPv6 thing. And if every software vendor kind of agrees on it that containers get out and they say IPv6. VMs understand IPv6 and they say we have our own overlay by the way. So that's okay. When it gets out of the VMs, I will put in an IPv6 header again and put it on the stack. When it comes back, I know how to unroll it. I am the NVGRE guy, I know how to unroll it. I am the VXLAN guy, I know how to unroll it. Okay, maybe this is getting too weird or something, but I was expecting people will either shoot this down or support this or dispute. Yeah, so isolation bits would have to go as part of those header stack. I have not thought through this. So if you think it's outright stupid, it must be. But if you think there's some flesh in it or something, we need more people to work on this and bring it on. I think containers with the advent of containers and nesting, we need this kind of solution. Anyway, that was that and we can take more questions on our suggestions on this thing that I've brought about. Yeah, so throttling or what's it called? Yeah, filtering, yes. It's a demand. The solutions that we have developed so far don't do anything about it. There is a current demand on external attacks being resolved by the load balancer itself. And that's there and that is there. We use F5 and HAProxy in OpenShift and that is there in terms of you can do it. If it's isolated, then it would not be able to because in OpenShift at least if you have pods of your own project, if you want to kill your own pods you would be able to, but you could flood it. Yeah, you mean traffic shaping. Yes, it's an absolutely valid requirement and it's something which you need to solve. We thought we would start doing with some TC shaping there, but we haven't done it so far, but it's a very valid requirement. Yes, I as a container could just eat up the entire nodes thing and let the other tenants die or whatever, I don't care. Yeah, it's a very valid demand. And some of the advanced SDN solutions that we've been talking to otherwise from Cisco and Juniper, they do that. They do specific controls on rate limiting. OpenShift SDN, which is the one that I have been working on and does not do it yet, but yes, it's a valid one and yes, it needs to be taken care of. There will be customer requirements around it. So this is the marketplace. There's Docker LIM network, there's OpenShift SDN, Flannel, Veeve, Calico, Cisco, Juniper and Avain of course. I just wanted to share how LIM network is probably a little bit different from how Calico works and how it is different a little bit from OpenShift SDN. So LIM network, what it does, a little bit different is it uses Linux Bridge. It was done by the guys who were working on our OBS project, but anyway, they chose Linux Bridge. It does, Linux Bridge is inside name spaces and containers are attached to the Linux Bridge per tenant. So if this is a network, green network, these containers would attach. If there is a container which wants to attach to both networks, it would have a VF pair going into both the Linux Bridges and the Linux Bridges are separately connected through their own VX LANs. This is slightly different. I like this idea. I mean, this does very nice isolation. You can actually see this as a network developing. It's their solution and should use it if you like it. There's a different solution by Calico and host gateway by Flannel. This does not use encapsulation, which means it does create more efficiency. It's useful in containers. Unfortunately, it requires that you own the equipment and they use the VGP protocol and node one, if there's a container, is gonna say if you need to go to submit A, use this rule. This is the routing table. It says the gateway rule via node two, device that and it would just jump directly and it might just cause a little bit of hard floods here and there, but you need to own the equipment. You would not have it working on Amazon Cloud because you don't own that. Amazon would drop it because it would say, Mac address this, so type it is. I don't know, so it's gonna drop it. But these are different solutions. Open your SDN, no talk, there's a demo. There was supposed to be a demo. I don't know if it's done. Okay, hopefully. So this is my bare metal and I have a VM and inside this VM, I have OpenShift running. If you do Docker PS, the three containers running, these containers are not nodes, they are containers, but these containers will be treated as nodes. So if you see OC get nodes, you will actually see three nodes, but they are containers. You can get inside these containers and create actual containers which will do the job. So since I just created this, the images may not be there, so I can just start doing it. It might just fail, but so it's bending and it's probably planted into one of these two nodes which are containers. So there'll be two containers which will be born inside those containers which are different and these two containers running inside this VM which is running on this bare metal. We can see the internet works. We can have external access from all those two containers which are deep embedded inside and those two containers can talk to themselves also. There's containers coming out of the container, it's coming at the VM level and then talking to each other so we can ping it. That's the demo. I just don't want to run out of time if we are. So while the image is being downloaded, if you have questions, we can just take that in parallel and if it works, you can clap. If it doesn't, just believe me, it does. Yeah, if there are any questions right away. Was there or was there a yawn maybe? No question. You can get a nice scarf for a question. Anyone? No questions. So if you see this Docker PS here, this is the VM level and you see these three containers running. This, each of these containers, if you, let's go inside one of them. Docker exec, open shift, node one. Of course there's two containers, I mean the pod running but if you see this, this is the container. This is not how the container was even originally evangelized. There's a system D running inside the container itself. This is as good as a VM. I mean you would have clear containers. Just don't think Docker containers. So once you have clear containers or more examples of that, you would see containers are becoming more comprehensive. So initially what I said, it's a process. The process itself was system D and now you have the whole tree running through the system D itself. So yes, it's a mess but from the networking point of view, we've got to fix them all. We've got to merge them all. I mean, were you doubting when you were asking that? Yeah, I can try and do. So we went to node one, yeah. So there's a pod running and I can do old fashioned, five, three, eight. So this is this container. Let me go again inside the VM. Now the other node, docker exact open shift node two and see the two containers running. Docker inspect. So 1012.2 and 1011.2. And I should be able to ping it. 10.1.2.2 at pings and you can curl it 10.1.2.2. It says this and you can reverse it. You can do TCP dump and you know it is working. So this is from one container. It went up to the other container. These containers were thinking they were nodes but they're actually containers. They come out to the VM level. You can have several VMs. This is done by OpenShift SDN. And this is Docker and Docker as an example. That's the demo I had. Fortunately it worked. Yes, sir. A little problem, thank you for everything like that. What about using mobile? Why not? Yes, I'm taking notes and I'm going to read about it. I'm going to find out if it's going to work. We should work. You should help and contribute. Nothing is off the table, I would say, yeah. My original idea with IPv6 was that we can use the extensible headers and then once it goes out of the domain of one provider, it can just latch onto the header and the others are supposed to preserve it. No one drops the IP headers as a standard. You're not supposed to drop it. So we can take advantage of that. It may be a bad idea, please. It was just a two-hour old idea. But if it's all within the same container or our own domain, then we can do it. But as long as we can't force them because they might not be ready for that. For example, I can have it before his options too, so what is this? I think that it's not going to work with this. Let's take it offline. I agree. Yeah, yeah, yeah. I mean, I- It has the extensible headers. Yes. But IPv6 to IPv4 translation was a performance killer. Maybe it's a proof? I am told that this is how Google runs their entire back and on. They do IPv6. Everything is back and on IPv6 and then this one. But you know, this is how it was. Because Google says it must be true or missed, but we write it must be efficient. That's not true. We'll find out. We'll see if it works and we stamp it when it does. I'm sorry. And that was all. Any more questions? Yes. Well, we don't like the double tunneling, double this thing because you have to uncap, decap and everything. And that's the only problem. And empty you. I mean, it's the same thing, right? So that's the way out. If you just go to a customer and then they don't know anything about what they did with the bare metal and then what they did with the VMs and then what they did with their VPN and then they brought the whole Kubernetes solution. That's exactly what ends up happening. And no one has done it with containers inside containers, but it's going to happen. Yeah. I think the containers and containers was a little bit troublesome of the mind, but now you saw a container running system D. You can do anything now. It's become a machine now. And see clear containers. And there's more stuff, Hyper-V and whatnot. Okay. Thank you. Please, copy your presentation to this USB. Before you leave, I need that back. Give it back to Jeremy. Yes. Thank you all. Yeah. Yeah. Everyone is trying to do a tap-down case, but that's not going to be the case. Yeah. What about you two, this afternoon tomorrow or after this time? How do you feel? Because we are in this live, all the talks are live and we feel and then after that, the process is going to be in the public report. Thank you. Yes. Yeah. I'm glad. Yeah. I was going to ask you for some advice. No, it's the problem. I'm just asking people, I'm a bit awkward. It's not really a big deal, but it's just... It's just... I don't really know how the Lightning Talks are organized. Well, it's just a bit of a deal. Yeah, I guess. Maybe we'll ask Telegram before, but... He's making some mistakes. I'm looking at him and I see that he has a lot of things to do. He's writing that he's going to run away. To make sure that we understand him. He needs to go back to the gym. He's already coming. What? He's going to run away.