 Hi, everybody. Come on in, grab a seat. We're going to get started with our third session. This is going to be the second part of our CVIM saga this afternoon, deploying cloud-native apps with the Cisco Virtual Infrastructure Manager. We're going to have to talk about a shorter product name. But this will be presented by Abashek and Nikolai. I'm going to let them take it away, so we give them all the time they need. All right, so yes, this is part two of our CVIM talk. But just full disclosure, this is more of a use case of CVIM. And all three of us who worked on this, we're all in development. So this is more coming at it from a developer point of view. And this was done as a proof of concept work. So we will be taking some of this into our actual final product. But we just want to share some of the findings of what we did and what we took away from this experience. I mean, I thought it would be useful for the community to learn from it, too. So with that, so this is the agenda that I have for today. I'm going to give a quick overview of what PCRF is. It's actually the cloud-native app that we try to deploy on CVIM. So I'm going to give you a quick overview of why they decided to go cloud-native and how CVIM actually helped them out, and so how it meshed well together. So in that regard, they wanted to deploy it on bare metal. So CVIM, as Chandra had mentioned, was running mainly VMs until this point. So we now have bare metal support by Ironic. So I'll give you an overview of that. And then just a quick overview of how we deployed Kubernetes. CCP support for it is still not there. That is in the works or in the planning stage anyway. So while that was being done, again, this was a proof-of-concept level work. So we manually deployed Kubernetes, but we'll just quickly give you an overview of how we did that and then deploy the actual app. So PCRF, that is the Cisco Policy and Charging Rules function. It's part of Cisco's policy suite for the mobile industry, essentially. They provide policy-charging subscriber data management. It's a solution that is virtualized, and it works in conjunction with policy enforcement in the network. It's to provide real-time management of your subscribers, your applications, your network resources, all running on service-provider environments, whether it be in 3G, in 4G, or in 5G. PCRF itself is a 4G application, but they're using, again, this finding to deploy for all their 5G solutions, too. So this PCRF is another team within Cisco. I represent the CWM team. Today, though, I'll be representing both the PCRF and the CWM team. They are, unfortunately, not able to be here. So the main reason they decided to go cloud native was they wanted to be a stateless app in their current monolithic state. They wanted to be a standalone app that can be loosely coupled and managed individually. They wanted to be stateless in terms of both the application and the database that they had. And the main reason they decided to deploy on bare middle was it's mainly a control plane app. So they wanted to try and take full advantage of running directly on bare metal. Having said that, they have run on VMs, too. And again, this is also, even on their side, this is still a proof-of-concept level work. So they're comparing performance both on VMs and on bare metal. For our case, we decided to start out with bare metal since we wanted to bring in bare metal to see them anyway. What were their requirements in addition, apart from the cloud infrastructure and bare metal? It was pretty simple. The instances that they were deploying, they needed just one external, routable interface so that they could SSH to it. And then all the different instances could talk to each other through that same interface. And they also needed shared storage between the instances. This is easy to do when you have VMs. You can just use Cinder. But in Ironic, the community itself doesn't have such a solution for shared storage. So we'll just quickly tell you how we try to solve that. And again, it's something that we're looking at more solutions also when we go to a product state. Ajay will come on stage and we'll give you an overview of all the different things that we're considering. So how did CMM with Ironic meet all these needs? So this is the actual architecture. You, based on the previous talk, you probably can figure out everything on the left side was the existing way in which CMM is. There's a management node, there are controller nodes, there's storage nodes, dedicated compute nodes. For Ironic, now what we have are additional bare metal nodes that are used as Ironic computes. As you can see, we only configure one single interface on each one of those. And so they go through the same top of rack switch. And Ironic will take care of dynamically configuring that switch and with the other additional open stack services deploy these bare metal nodes as your instances. So again, coming back to our CMM architecture, this is how we have control nodes where we have all our different open stack services running. For Ironic, what we did was, of course, add in the Ironic services. They're essentially Ironic conductor, Ironic pixie, Ironic API. Ironic also needs a compute agent where the driver is an Ironic driver instead of a LibWord driver. So we had to bring in Nova compute, which we normally had running only in our CMM compute nodes. We are now deploying compute also in the controller node. The main reason for doing this was we did not want to use up one compute node as an Ironic driver, in which case that entire server is lost and you're not doing anything on that server. So we repurposed a control node itself to have just Nova compute agent running that with all our compute nodes are still available for VMs. So that way you can have a dynamic hybrid of VMs and Ironic bare metal servers running if you wanted to. So as part of CMM, with Ironic, these three things happen, the placement of the Ironic services that I just mentioned. When you actually want to deploy the bare metal, what happens is Ironic will program the TAR that I just showed you in the previous picture. It'll go to the actual interface of the bare metal on the TAR that the bare metal is connected to. It'll program the VLAN that you're going to assign that bare metal will get assigned to. So Ironic makes use of Neutron to do this. I'll go over the details of how Ironic uses this feature called multi-tenancy with Neutron to actually get that happen, to make that possible. And finally, again, through this multi-tenancy feature, we switch from using the management network for the actual provisioning to the tenant network because at the end of the day, when you provision your bare metal, you finally want it running on your tenant network. So, yes, this is what I touched upon. The way Ironic works, you have this multi-tenancy feature turned on so that the initial provision, the finding out of the bare metal properties and such takes place on an Ironic management network. You can have a dedicated bare metal network to do this. We didn't want to create a separate network. What we instead did was we repurposed our Mercury management network itself. We create a small pool within that Mercury management network that is dedicated only for Ironic. We use DHCP that one of those IPs from that pool is taken and temporarily assigned to the bare metal node. And then that is used for the actual Ironic cleaning if you've probably seen the Ironic config. You know that you need to have a cleaning network and a provisioning network, so the Mercury management network plays that role. And the way we then switch over to the tenant network is through the configuration that we've shown here. This is, again, a feature that's available in the Ironic community so you enable it through your Ironic config file. Now, at this point, Ironic makes use of Neutron. And so with Neutron, you need to have a mechanism driver to actually communicate and program the tar. So the mechanism driver we chose was to go with the generic switch driver. The generic switch driver actually doesn't have support for Cisco and XSOS. So Nikolai was the one who actually added in the support for an XSOS plugin code for the generic driver. And so that essentially is enabled through, again, if you're familiar with Neutron config files, the ML2 config file. So you can see that we have the generic switch mechanism driver added in there. And then you'll have a generic switch section within that same config file which has the details of the switch, the login credentials and the IP. So what CWIM does is it automates all of this for you so that you don't have to manually do it when you run our Cisco WIM installer automatically these details are gathered and slapped into config file. So at the end of the install, all of this is already there for you and you're good to go. This gives you a high level network topology. Again, that is the overall CWIM architecture we have, the controllers, compute storage, and the management node. If you look for CWIM documentation online, you'll see this and all the existing networks are assets. What's new that we did for Ironic was, as you can see, you have a bare metal there. Again, just the one single interface going to your Tor. And as you can see, we didn't create any new network segment as such, but what we're essentially doing is we initially use the mercury management network and then we switch over to the 10 network once the instance is actually deployed. On the control plane side of things, as I mentioned, for... So when you're trying to launch the bare metal instance, it is still at the end of the day, a NOVA instance that you're booting up. So NOVA needs to know how it is a bare metal driver, whether it's pushing a bare metal instance or a virtual instance. So as I mentioned, we have NOVA compute agent running. And so the compute agent has, in this case, the Ironic driver and so the LibWord driver. This again is automated when you enable... You can run CWIM with or without Ironic. When you run it with Ironic, automatically we will deploy the compute agent in the controller nodes and the NOVA config file will have Ironic driver enabled in there. So the actual workflow, as I've mentioned this, are OpenStack services runs. We create the Ironic management network, which takes care of the provisioning. The next step, creation of Ironic nodes and Ironic ports. So for this POC level, we didn't get around to finishing automation of that, but we are working on automating that in the next release. So what you do is you essentially use OpenStack CLI. You... It's again OpenStack CLI that's available for any Ironic users that know how to use it. You create the actual node, you provision it, you use... Ironic is a two-stage install process where you first run a deploy image. We used TinyCore image as our deploy image. So you deploy it so that the node then becomes manageable. You also have to create the Ironic port so that Ironic can, of course, provision that bare metal. Then once the node actually becomes available, at that point it's ready for you to create a tenant network and then launch an instance. So those things are a bunch of manual steps that we had to do, but as I mentioned, we are going to be automating all of that. And this final slide on Ironic, I just want to show... So this is similar to the same slide that Ajay presented. So we still have all the same CVM features where you can do an online or an offline install. You can download the images... Ironic images will come with this, with the TinyCore deploy images that we use. We also have Ubuntu and CentOS as the user image. For PCRF, they specifically wanted Ubuntu Xenial images. So we built Ironic Ubuntu Xenial images that we then actually deploy on the bare metals. Also it comes with update, rollback support, upgrade support. So once you have Ironic upper running, once you've actually created your tenant network, you launch the actual instance. So the way PCRF wanted to do it, so this gives you the architecture of what it is that they wanted. So again, this is proof of concept work. So they did it on the CVM MicroPod, which is essentially a multi-control compute storage, all in one node. So in that node, we have one VM that we create that is going to be your master. And we've provisioned two bare metal nodes, as I mentioned, running Ubuntu, and those are going to be your workers. So in addition to manually deploying the actual Ironic nodes with Ubuntu, we then went about deploying Kubernetes manually. So it's a custom script that the PCRF team has already written that does Kubernetes install. They also, as I mentioned, needed shared storage. So there is Rook that's available. We used Rook as been suggested in the community. We used Rook with Ceph. We didn't do anything new there. We just used the custom configurations that was already mentioned in the website. And we had Rook with Ceph running, and that actually worked fine in this proof of concept level, but we're contributing other solutions for actual production. Once this was done, then we had a final script, which again is the actual PCRF script. Again, I can't share that script because it's Cisco's scripts, but at that point, this infrastructure essentially was available for their cloud-native app to be deployed. It's a two-stage deployment that they have, one where they deploy a platform and then the actual PCRF GUI that runs on top of it to get all your mobility data for that. So that was it for the actual infrastructure side of things for making sure that you can have cloud-native Kubernetes on OpenStack. That was the infrastructure and that's what was available. So as part of this, we figured out, as I said, Rook was the only storage solution that we could come up with, but we wanted something that was more production-grade. So we're actually still discussing that. There are a couple more things also that we took away, which I'll share in the actual summary that when we get to it. But I'll let Ajay talk on the different things we're evaluating right now. Thanks, Abhi. So what you saw, if you remember in the previous presentation, what you looked at Cisco Container Platform, and what that enabled you to do was you had a single cloud which could basically host virtual machine workloads and container workloads. But what we also showed is that the Kubernetes containers were all running on virtual machines, right? They were not running on bare metal. So this is the next step, right? What we want to do right now is since that you have Cisco Vem now being able to deploy bare metal nodes, the next logical step is to allow Kubernetes clusters to be deployed on those bare metal nodes. But this throws in a little bit of options our way, right? There are lots of things that come into consideration at this point. So if you look at this slide, Neelima actually presented a demo in the last presentation of how we deployed the CCP control plane on OpenStack and it deployed one-to-n Kubernetes clusters on Cisco Vem OpenStack Cloud. We will continue to run the CCP control plane in virtual machines because that's just a control plane application. It doesn't need bare metal hosts. We will also run the Kubernetes, now you can basically run the Kubernetes control plane. When you deploy a Kubernetes cluster, you can run the master on virtual machines and you can run workers on bare metal. So we will basically use CCP to kind of deploy a Kubernetes cluster with the Kubernetes masters on virtual machines and the workers on bare metal. Now when the workers are on bare metal, you have important considerations as to what are you going to use as storage when you have a bare metal Kubernetes cluster, right? So we are evaluating a couple of options for that. I think it's not showing up there. Yeah. So one of the things that, when you deploy a Kubernetes cluster on bare metal, what you need to do is you need to evaluate what dedicated storage you're going to basically provide for that Kubernetes cluster. Now when you need to evaluate, Kubernetes as you know provides different kinds of dedicated storage support. So we are planning to evaluate Portworks, ClusterFS, OpenEBS, and Abhishek already talked about Rook, right? So we will evaluate these for various conditions. One of the things that we look at is ReadWriteOnce versus ReadWriteMany. So can you have multiple applications writing at the same time? The other is the HA constraints, right? Suppose one node goes down, can the volume be detached and migrated to another node? So each of these storage solutions perform very differently depending on these criteria. So that's the evaluation that we will be going through. So one option is a dedicated storage per tenant cluster. So every tenant cluster that CCP deploys will come built in with its persistent storage. So then CCP manages the cluster storage along with the Kubernetes cluster. So both as one, this thing. The other option is there were talks today which talked about using Manila as shared storage for different Kubernetes cluster. So you have two options. Either each Kubernetes cluster that CCP deploys basically deploys its own persistent storage or you use OpenStack's file system service and these Kubernetes clusters can then talk to that. So these are the two options that we're going to evaluate in our roadmap from the perspective of various constraints like ReadWriteOnce, ReadWriteMany, and HA constraints. Things like how do things persistent volumes get detached and moved around during HA failures? So once we do that, that's what we will evaluate and then CCP will then be able to support bare metal workloads. So then you will have a single Cisco VIM cloud today which supports virtual machines and container workloads on virtual machines to then expand to container workloads on bare metal. So with that, I'll hand it back to Abhishek to conclude. So yeah, so in summary, so this essentially just want to again, as I mentioned, share our findings and our experience with this. So we managed to deploy, sieve in with Ironic, make the open stack infrastructure available with Excel network on the bare metal instances which Sieve does for you. On top of that, we were able to successfully deploy Ubuntu albeit manually. That will also be automated hopefully in the future. And we had our master running as a VM and the Ironic nodes were our actual workers. We did run into issues with storage as I mentioned that we solved it for now with Rook. We also feel that the Ironic manual steps itself can be actually automated with inspector. So we're actually working on adding, bringing in Ironic inspector support so that that part of Ironic is also automated. And yeah, so that's essentially how we managed to deploy cloud data app on our Sieve infrastructure.