 Good morning everybody. So let me just introduce ourselves. So I'm Anand, this is Rahul, you work for Cisco. There are a couple of more folks who have contributed to this idea but couldn't make it unfortunately, Satya and Meenakshi. So we work on a product in Cisco which is providing VPN as a service. It's a cloud VPN product and there we use NFVs day in and day out. And what we felt was missing in that gave a way for us to come up with this idea. So this is something we did on our spare time. So what better forum than to bring it up here and discuss with you guys. So this is the idea that we came up with, running NFV service chains on Docker containers as against running on virtual machines. So before we get into details, I'll just give a quick intro to people who are new to this. So Docker is a piece of software which helps you to wrap your software along with its dependencies, binaries into a single file system, along with the run times and everything so that it runs the same anywhere you run it. So that the dependency problems, version problems are all gotten rid of. So that's the Docker solution. And what is the difference between a virtual machine and a Docker instance if you see? The Docker engine runs on operating system while it shares the kernel libraries from the operating system. Whereas your applications namespace is separated out into a container and all the containers share the kernel namespace with the operating system. The advantage Docker brings is that or any container technology for that matter brings is that the extra layer of a hypervisor is gotten rid of. So that means your extra hyper call overhead is not there. At the same time in a virtual environment, you don't have to replicate that many operating system instances. If your purpose is to run only a dedicated service, you can just run it on a container instead of replicating that many operating systems. So as I said, these are the same kind of advantages. It reduces latency and it's easily shippable and very easy to replicate and share. And some of the challenges that we have faced when containers first came up was multi-hose networking was initially a bit of a problem which was solved much later. And similarly, there were other challenges that were faced with respect to network functions which was also overcome over a period of time. So before we get into what is SDN and NFV, let's look at what is middle boxes. So when a packet travels through a network, especially if you take in case of VPN, so you would want to add multiple filtering and in between it gets routed, it would go through a set of filters. It could be firewall or load balancer. So those guys are called as the middle boxes which is other than the routing and switching. They are usually used for providing SLA, service assurance, quality of service. Those kind of capabilities are assured by running through these middle boxes. So they are traditionally hardware boxes, so it becomes difficult to move them around or if you want to change the path or the sequence, it becomes difficult, you have to hardwire them and stuff. So that paved way for virtual middle boxes which were called as VNS, so virtual network functions. So typically they are very useful in a cloud environment where you can spawn them in on-demand, so easy to scale. Some of the drawbacks of middle boxes are overcome by NFV. Now what is SDN? SDN allows you to control your network devices irrespective of what kind of device it is. You could have a Cisco device or a Juniper device, the CLIs might be different, but SDN allows you to centrally control them irrespective of what kind of device it is. Now NFV is those physical devices are just converted into virtual machines. Hardware converted into software with the same capability and you can move them around anywhere and provision wherever you want. What is service function chaining? It's the sequence of NFCs that you put together, NFVs that you put together so that you achieve a certain assurance. It could be you want to assure some firewall or you want to do some packet filtering or some van acceleration, so different kind of functions are there. So depending on the requirement, you put them through a sequence of functions, so that's what your service function chaining is. So many people would confuse whether to use SDN or NFV, though they actually solve completely different set of problems, SDN actually solves the question of who controls what. So you centrally try to control your devices. NFV, what runs where? That is whether you want to run, whatever is a physical component, where you want to run it, which network do you want to run it in, for what packets do you want to run it. So these are two different problems that they solve and actually when you use them together, the power is immense. You can achieve much more things by using them together. So there's no question of SDN or NFVs. It's more of SDN and NFV gives you more things to achieve. These are some of the terms that we would have heard of, just giving a clarity to that. So as SDN and NFV started growing, there was a need for making things standard. How devices are controlled and configured, there was a standard that was required. So hence ONF was formed, that is the Open Networking Foundation. Their first standard was actually called as OpenFlow, which defines how your flow should be controlled and how do you configure different flows. Open Daylight is a open source project by Linux Foundation, which actually implements multiple protocols, including the OpenFlow protocol. And the point is to popularize the whole SDN and NFV and also standardize it. OP NFV is basically an open platform for NFV that is like a consortium which brings together a lot of companies, your enterprises, service providers, customers, developers all together into a single forum where you can standardize things and accelerate your innovation, of course. These are some sample network functions, like I said firewalls, packet filters, and virtual routers. So these are some examples of virtual routers, load balancers, and van optimizers, intrusion detection. So all these are depending on whether the customer wants to put their packet, put their packet through these functions or not. So it helps if you can spin them off on demand, depending on what is your type of application, or you want to provide one of these as a service, firewall as a service, or just intrusion detection as a service. You can turn it on on demand, depending on type of application as well. So in case of a video streaming, you would not want to put them through all of them. Maybe a van optimizer would be good because that would accelerate your packet transfer. So these are some of the advantages that we get by running NFVs as containers. They are very easy to scale and on demand. And we know Docker just runs anywhere the same. So it would help if you can run your NFVs also anywhere. Just pick it from here, put it there, it runs the same. That's a boom. And of course your latency is reduced and as we said the hypervisors overload is reduced because of which you get better performance and also better utilization of your hardware resources. Better than what you can do with virtual machines. And the fact that Docker or any other container technology already has a big set of established tools to manage and configure and deploy containers. It helps that you can use those directly to deploy your NFVs as well. There's a lot of research work and experimentation going on in this field. And there's a lot of positive results in running NFVs as containers. Thank you Anant for just going over the terms and explaining so that we are on the same page. Now let's just go forward and see the design and what we're trying to achieve over here. So the first thing is that we're trying to do the service chaining locally on the nodes where your VMs would be available. And so that would reduce the latency. Having faster reusable dynamic network function deployments so that you have a low overhead of network infrastructure. You know, network function infrastructure itself. And it doesn't add a lot of weight to your OpenStack infrastructure. Avoiding the loss of performance of network functions due to virtualization overhead. So these are the main three things that we are having in mind when we go ahead with the design. So this is the solution design that we want to propose. But let me just give you a brief idea. So think about it as you have access to a public cloud and you're running a couple of VMs which is performing some kind of business logic and generating a lot of network traffic. And now you want that network traffic to go through some kind of network functions. They could be multiple, like they could be going through one function, get processed, go to another function, get processed. And as Anant explained, that would be your service chain. So in a public cloud, you won't really have a lot of control over. I mean, you can have a control in OpenStack as so as to which region you want to put, but still a region would be huge. And you won't have a control over which host your VM is there and on which host your VNFs are running. So we thought, why not utilize Docker? And since we're getting so much positive result from running VNS on top of Docker, so this is the design that we proposed. We would have a controller which could run anywhere, preferably on OpenStack controller itself. And it has admin privileges, so it can talk to different components and understand that where exactly the tenant VMs are running. So let's say for my host one runs tenant one and tenant two VMs. It could spawn the network functions in form of service chain on top of Docker on that host itself. And let's say my tenant one doesn't exist on host three. It doesn't go and spawn it over there. And all the VMs of the tenant one that are there on host one can get serviced by the network functions over there on top of Docker. The agent over there would take care of this dynamic nature. We'll go next into the node design itself. Okay, so this is my design per node. So I have my controller which could talk with its northbound APIs to the southbound APIs of any kind of ODL or any kind of SDN. And my controller would talk to the agent. The controller is one in number and the agent would be one agent per host. So the agent would actually configure the OVS. This OVS is shared between the Docker D and the KVM. And the agent would also manage Docker D. I mean, when I say manage, it can spawn network functions on top of Docker is what I mean. And you would have different network functions and you could chain them by using open flow rules over here. And then your traffic could just go out. So this is the basic design per node. Now how it flows, how am I expecting my packets to flow through the node? So instead of the packet going out and getting service changed somewhere else and which would introduce a lot of latency, I want my packets to come out here and then go to the, I mean, these are just kind of examples that I have taken. We could have a router instance running on top of Docker D. And my traffic would go and get serviced over here. Let's say a VPN use case where you are adding a VPN header and all the other info required. And then my packets come out here and then go to the firewall, get processed over there. And then they could go to any other VM on the network or they could go out external. So how are we exactly trying to service chain? This is the thought process. So we have OVS and without changing the network header itself, we would add OVS flows to the flow table. This is kind of a small example. There could be many ways you could route the traffic. It could be on basis of source IP, input port, or VLAN. I mean, I'm sure most of us are aware of open flow here. So you can do a lot more than just this. So based upon this, you can actually route your packets to different network functions without changing header. And so this is how we are thinking of service chaining. So also the other thing that could be done is depending on what kind of VNFs we are using, we could take a lot of different kind of policies for routing. So it could be an exhaustive routing where you force all your traffic through the network functions, all the network functions, but that's a costly affair. So you could also have, let's say you want your network functions to act upon the HTTP traffic only. So you could have a flow where you could just ask the traffic to port 80. That would go through the service chaining. And the rest of the traffic doesn't go through that. The third use case is maybe if you're just monitoring and you don't really want, during the runtime, your packets go through the network functions. So you could have a replica-based routing where you could have replicated packets which come out. And they go for the monitoring and similar kind of processes and get processed and go out. So OK. So that is something that can be taken care by my agent itself. The advantages of this design would be that we get a high density and better utilization of the resources that we have in the cloud. The performance would be near-metal performance for network functions because we would be using SRIOV or DPDK. And there would be no hypercalls due to the usage of containers as network functions and not using VMs as network functions. So we avoid that additional hypercall that gets introduced in case of VMs. And there would be a low latency since all the service chaining is happening locally on that particular node. And once your packet goes out, it's not going somewhere else to get service chained. So you have a very low latency. You get all the Docker native advantages, like quicker build-ship model, that all gets carried forward as is. And in a public cloud model, it really works out well because that's where in a private cloud, you still have a lot of control over. You could have admin privileges and you could get to know what host your VMs are running. And still, you get a lot of advantages. But in a case of public cloud model, you would see the real advantage of running the system. So we figured out these are the areas of work that needs to be done for that. And first would be running Docker and KVM on the same host machine. There are some challenges to it. We thought that we need to change the compute scheduler because right now, if I run a compute instance for which manages Docker, and there would be a compute instance which manages my KVM through LibWord, both of them, when you run the resource tracker, they both would give you info and that would get duplicated. So we need to solve that. And it needs to be aware that both are running on the same host. It could be on basis of we could put a UUID or host name. I mean, there could be different ways of identifying host and then singling out that both are running together. And we need to add that logic. And there are changes required on the OBS side as well because it would be shared between your Docker and KVM. So then we need to configure. There would be a service, the component that would be configuring OBS, creating service chains using OBS OpenFlow rule modifications. Since OBS gives you a lot of other info as well, like the throughput through a port and a lot many such things. So we could also add performance HA and load balancing using those OBS features. Then we go and choose the best kind of routing for the packets based on the type of, I'm sorry, that should be VNF. So the same thing that we discussed a couple of moments ago. So then we need a module that interacts with the Docker daemon and creates network functions on the containers on demand. And then tenant-based visibility and segregation of Docker containers need to happen. So my tenant one shouldn't be able to see the network functions that are there for tenant two, similar for all the other tenants. Then we can store, we can take advantage of all the Docker registry and we could store a lot of stateful Docker images as well into the registry and spawn them on demand. So there's no need for such changes for stateless, but definitely it would be very helpful in case of stateful network functions. Then obviously the implementation of controller and agent. We have started the implementation. This is the design that we have zero down upon. And we have started the implementation to some level. And this is just like the kind of our side work and we would like to work with the community and get your inputs on the design if you have any. And even if you have a similar thought process and you agree to what we're doing, it would be great if we could join together and do this. Thank you very much for your time. Could we have any questions if anybody has? This is Lushan from Comcast. Very interesting in your work. We also tried similar things because of the advantage you mentioned. But we have two things that really limited us. One is there are not that many in a V actually you can run inside the Docker. At least kind of production of great. I don't know if you can provide some of the options. So that is a challenge we also faced actually. There aren't that many Docker images which are directly VNF images which you can directly run on Docker. But that is something we are also working at standardizing that way of creating from using a Docker file to just spin up Docker images from our existing software. That is a challenge I agree, but that is something we are also trying to overcome. The second one is really, I think one of the unique aspects of V in running inside Docker is the network I owe the throughput. If I understand it correctly, when you run a Docker, you put in the network in your namespace. So the packet actually goes through the kernel. And then I think once like you mentioned about DPDK, so just want to get your, they say, theoretically how you can make the kernel and then your DPDK work together. So actually this is something that we were also actually reading on where to do the routing, whether to do it in the user space or the kernel space. It can be done on both the sites. Kernel space of course gives you a better performance, but it's less flexible. So that's a trade-off you'll have to take. So it's a design choice at the end of the day. Yeah, you mentioned, one of the slides you mentioned, you can achieve like blind rate with SROV and DPDK. There's something you verified in the lab in this setup? That is something, yes, we haven't verified personally, but I have seen a paper from Intel where they have verified that. So when I share the PPT, I'll give a reference to that. So you can probably point, check that from there. Hey, thanks for the presentation. That's definitely will come in handy actually. So with the recent talks that we have been attending, so there are so many talks that talked about implementation of yet another controller agent, especially in the Magnum Docker Kubernetes space. So why have you thought about implementing this with Magnum conductor and heat instead of implementing yet another controller agent architecture? So yeah, we were looking at that, but we thought that, I mean, that is always an open option. None of us had a lot of expertise on the Magnum itself. So we kind of held that back, but if it can be integrated, we can definitely go ahead and do that. This is more at a concept level that we wanted to say a controller. It could as well be the controller, which provisions on Magnum as well, because Magnum allows you to provision on VM as well as bare metal. So it is at a concept level, it will fit in there as well. Since we have not gone into the implementation details, we have thought about it. Also we read that a little on Magnum that Magnum would give you a full or no control kind of scenario. We wanted multi-tenancy, a proper multi-tenancy so that your VNFs are not visible. That was actually one of the thought processes while we took forward and we left Magnum. Okay, so there is a career coming as well. Yeah, yeah, I think once that is there, we can definitely look into it. Thank you. Thank you. I'm from Intel. You mentioned that the hypercore, overhead or hypercore, can you give me a specific hypercore? I'm not sure we are really using hypercore in this case. In this case, we are not using the hypercore because there is no hypercore. No, no, even when you run the VM, I don't think we're gonna... No, those VMs are not actually running your network functions. Your network functions are running in the Docker instances. So we... No, no, I think even you run, if you use the VMs, I don't think those VMs are gonna generate hypercores. I mean, did you try to run the VM and then to see if you sell some hypercores? That's my question. No, so we haven't experimented that part for with, with respect to this work assets, but we have worked with VMs and hypercores previously. Yes, definitely when you run any disk intensive or network intensive software or operation on a VM, that would go through and in direction through the hypervisor. So that is what we want to say here that it is not there because Docker here runs on bare metal. There is no extra hypervisor layer per se. I know there are hardware assisted virtualizations, KVM itself is one such. Yes, there are optimizations over there, but it has been proven that containers are definitely performance wise, more optimized than VMs running on a hypervisor. So that was the point we were trying to clarify there. But personally, we haven't gone into the implementation here. So we haven't verified whether in this case in a VM and VNX performance, is it giving a better result running on a container than a VM? We haven't verified it. Again, just an idea. We've gone through some of the papers from Intel itself and they point to a very similar scene where they are proposing that with proper data. Maybe once we complete the implementation, we could complete the prototype. We could give you the data itself, but that comes from your company itself. Yeah, I think when you try to get the high performance from dockers, even dockers, you need to, for example, have an SRV or the large pages or NUMA, lots of optimizations are required to get the line data. There's a lot of scope for innovation data. And also the context switch between the OBS and then the docker cell themselves can be significant. Correct, yeah. Thank you. We're very interested in this specific use case, too, of the container and a service, kind of a degenerate service chain right in front of the port to the network. Have you thought about how you're gonna orchestrate that? It would be really nice if this was back in Neutron. It was a call that the tenant itself could make to set this up. I didn't see anything about the API. Yes, we were trying to, so when I was trying to say that my controller would go, it's not bound API, it would talk to Southbound API of SDN. It could talk to Neutron as well. So, yes, we are in thoughts of doing that. Yeah. And someone asked about the docker images and the scarcity of them. What we're thinking about for kind of a reference implementation would be an image of IP tables, an image of Snort, it's all public domain stuff, we could make them. So, I tried out a couple of images that are available and I think they work just fine. I don't know about the performance because I didn't actually put them through a lot of network traffic and didn't do performance analysis on top of that because right now the whole orchestration thing was the problem that I was trying to solve. I think once the orchestration is complete, then we go forward and look at the performance, what kind of performance you're getting and what kind of optimizations you wanna do to increase the performance. So, definitely, yes, I did look at a lot of network functions. They are actually doing the functions that they're supposed to do. I'm not too sure on the performance numbers. Thank you. Thank you. Any more questions, guys? Okay, then we'll wrap up. We'll share the presentation. Thank you. We'll share the presentation with our mail IDs. So, if you want to connect back later on, it will be awesome to have you guys. Thank you. Thank you.