 Think good to go Okay, good afternoon everyone Welcome thank you for coming and joining us over the past few days we have seen quite a few presentations and demos around telco NFV use cases using OpenStack and SDN controllers and whatnot and Definitely if you look at all those presentations and the use cases it will become clear that there is no single classic cookie cutter kind of Deployment model or a classic cookie cutter case of a VNF deployments and so on Today, we are going to touch upon one particular class of deployment models Which seems to be quite popular in the telco use cases which talks about deploying the infrastructure in multiple tiers Okay, and specifically we'll talk about what we call as micro DCs or headless NFV I if you will and these micro data centers are something which actually serve the purpose of a point of presence a data center or maybe a central office or Mobile switching Related central office. Okay. I'm Vinod Shegu and I'm joined by my esteemed colleague Roshikesh Ganguru and Both of us are part of NFE business unit inside Hewlett-Packett Enterprise follow back So here are the topics we're going to cover today as high-level agenda if you will I'll start off with the basic requirements description or a problem statement or capabilities expected of this particular micro data centers if you will and I'll talk about whether OpenStack actually helps in meeting these requirements or not in its current form I Understand there are also projects which are in progress here and then I'll turn it over to my colleague Roshikesh who will talk about the some of the alternative mitigative plans and also talk about some of the open items and Topics and open it up for basically a more interactive discussion at that point So basically as I mentioned earlier like there are quite a few telco use cases Which require this so-called multiple tiers of NFE I or NFE infrastructure so at the top I have a very Left-hand side I have a pretty simplified version of the multi-tier architecture We have basically a main data center typically deployed in a very very large footprint and it has you know a full-fledged Control plane with HA a full-fledged SDN control plane with HA a lot of storage and top of rack switches and spine switches and back Back-end switches and so on and the compute footprint is somewhere in the order of thousands of computes Okay, then we have in the second tier Basically, what we call as micro data centers now these micro data centers as I mentioned earlier are the ones which serve the purpose of point-of-presence or Central offices or mobile switching offices and so on Now they have pretty serious constraints with respect to space power and cooling and Typically you see the size of these micro data centers anywhere from half a rack single rack maybe a couple of racks of servers and When you have such smaller footprint cloud deployments, right or at least in the data centers you There's a there's a there's a discussion which happens about What is the upfront cost and setting up the minimal may micro DC When I want to point out that basically what is the ratio ideal ratio of having? Control plane servers versus compute at that at that particular micro DC for example It's there's no point in having seven or eleven servers just serving your open stack as well as Seth cluster or for that matter SDN controller plane and so on trying to serve Let's say three or four compute servers So the upfront cost becomes critical Then of course the main data center is separated from the micro DC's over the van That means they're geographically distant distant and you could have up to a few dozen or maybe a few hundreds of these micro data centers Driven from the main data center, and I've also shown a few customer edge related Devices there which typically are driven from the point of presence or something like that look those could be computer Sorry a customer provider edge equipments Pretty small form factor ones, and they could be thousands of those and Or it could be driving the micro DC's could be driving base towers in a mobile tower and so on now between the main data center and the micro DC you need to a guarantee certain amount of bandwidth and latency and Typically what we've seen is service providers and telco Customers are okay with providing that level of guarantees and qs levels and so on The other problem we need to basically make sure as we when we actually have this multi-tier architecture How do we actually go about? Maintaining managing actually start with provisioning maintaining managing this entire array of Micro DC's if you will right so how do you do upgrades, right? We heard in other talks earlier that upgrade is one of the very serious issues in such kind of large deployments Now in addition to these set of what shall I call the problem statements or challenges Just because vnfs are getting deployed in these micro data centers doesn't mean that all the typical things to expect the vnfs Expect are going away. We do need predictable performance Thanks to the community over the past year and half. There has been considerable progress in enhancing the What we call as EPA related features which ensures that you have predictable performance things like pneumo affinity IO affinity huge pages and CPU pinning and so on there's also been a lot of progress made in the process measurement mentioned in the previous talk about Having a real-time stack essentially having a real-time hypervisor and and of course the real-time kernel host kernel and setting all of those things up to ensure low latency low jitter at the same time using DPDK enable v-switches or SRIV to get the maximum line rate throughput performance Another topic, which is nearly very dear to us is basically high availability We also had a lot of presentations during the summit talked about high availability now This high availability topic and we also know that okay the the open stack control plane solves the problem through various mechanisms and The problem shifts to how do we ensure high availability of the vnf applications and in that context? we have seen Basically the problem boils down to Having a very low latency for detection of either a vnf failure a v-switch failure a nick failure or a compute node failure and Making sure the appropriate entities in the stack get notified very quickly And then the vnf managers have the choice of making the appropriate taking the appropriate Decisions of either you know restarting or whatever or failing over to a standby and things like that Optionally you can also have auto remediation or auto recovery features if that is something which the vnf managers request Of course when we have a multi-tier Architecture we want to have vnfs which are actually deployed in the micro DCs to avoid having these round trips back to the main data centers To satisfy, you know the networking related needs or requirements like how do I get the floating IP address? How do I get the l3 and l2 connectivity? Again in the previous slide I listed about a more like a VCP kind of a use case other Examples of a real-world telco use cases are like the VEPC and we router and we ran. I'm sure there are a lot more like that So now let's go over and see how does open stack try to meet these requirements And I'm going to talk about these I'm going to assume two different use cases or other two different Possible approaches if we will okay the first one will basically talk about having Each micro DC be a whim of its own That's that's how you can think of it as you know hundred different wins being controlled by a main data center, which is a win Okay, that's what are the challenges we do that way if you do that way Well, basically you end up with having this issue of control plane footprint How do you manage the control plane footprint at each of these micro DCs? There have been quite a few projects in interesting progress made in that front I believe there are cases where they have virtualized the control plane They have containerized the control plane and so on and that's one area Which will definitely evolve and definitely help that particular use use case the other case is basically how do we actually do the centralized management? It's not just about upgrades, which is by the way a very critical topic. The other topics are okay We have each of these whims. They're all generating logging and monitoring data How do you call a coalesce across all these whims and try to make sense of that? And also you have SD and controllers giving out information various logs and so on again those need to be coalesce to Then of course if you take the one level higher than the VIM you have the in the manual area the NFE orchestrator and It has to now manage multiple endpoints and it comes the various issues Like how do we globally do the quota management or source management and how do you present a single dashboard all of these are topics? Which are some of them have solutions some of them don't yet on the still evolving and of course once you have these whims Separated how do you ensure that you have the appropriate overlay networks across these whims so that VNFs can do? Proper communication or you can have proper L2 extensions broadcast domains and whatnot The other approach is basically headless micro DC Where in we talk about having each of these micro DCs without a control plane and The main control plane actually sits in the main data center or otherwise known as mothership data center What are the challenges if you did it that way? well Think of it as basically extending the VIM at the Brisbane main data center and you're extending the compute nodes to farther away okay, and What are the what are the challenges in that particular approach obviously? I already talked about bandwidth guarantees and latency guarantees and Other thing is basically you want to make sure there's a proper L2 connectivity between the two so that you can actually do things like you know Basic pixie bootings and downloading images and so on Of course some of the things if you do if you do we do end up with this kind of an architecture if you will Or an approach then you'll have to limit the data transfers between the Between the main data center as well as the micro data center What are the typical type of actions which will probably trigger that things like image copy? While launching right you don't want to be going to the glance and the main data center and trying to Download it then things like shattiness you have ribbit mq my sql cluster and so on and you don't want to have too much of Chattiness between the control plane in the main data center and the compute nodes Of course there are some vnf's which are happy to use local storage. They are stateless We're also hearing the other use cases where the vnf's do need to be stateful And so that that means that basically they need to have using block storage for booting and mounting and so on and Those scenarios you would want to not have your cinder back volume sitting in the main data center and you're trying to load off of that wham Of course these vnf's do need a basic l3 and fip connectivity And they should be avoiding multiple trips to the main data center and of course we already talked about logging and monitoring now If we want to do low latency monitoring of the vnf's and the health of the health of the compute nodes or things like that That's of course the challenge if you have a lot of latency Now I'm going to turn it over to my colleague Rushi Kesh. He will talk about some of the mitigation plans Hey, thank you. We know so When this problem statement came to us we we started thinking about let's push for having a Independent vim on each micro DC location. That's the simplest approach and let's do it and we went back to Operators and say what what they will say about this so they were very much Interested in having a headless computes So let's start with this and if you have any issues then we can we can talk about the ultimate solution as a separate vim per Per micro DC. So what we did is for an experimental purpose. We simulated this environment and we started Stretching out the compute node To the micro data center simulation mode and then we started figuring out that what are the hot spots? What are the areas where the things will start breaking and So therefore we even started talking with the operators that okay We can go with this approach But there are certain prerequisites that you have to meet and if those prerequisites are met then we can go ahead with this particular Project and understand the complications here So the first mitigation plan we thought about and we came up with some solution is to go with SDN based solution I mean as Vinod was pointing out that stock open stack if you take there are certain set of limitations challenges and doing doing the communications over when so we took the SDN solution this it's our own HP's DCN solution and we started evaluating that using this and with prerequisites like Networking requirement, which we'll be discussing is if those are met can we meet this particular headless compute node requirement So here is what we came up as a prerequisite the first one is we need an L2 extension from a mothership DC to a micro DC Typical example is a VPLS. So when we discussed this with they said, yeah, we already have such infrastructure between The mothership DC and micro DC. So you don't have to worry about this. So it will be given on On the day one then we talked about okay. So what are the bandwidth requirements between these two? Centers so the bandwidth requirement based on our experiments is at minimum 2 GB But it can I mean if it's 10 GB, it's great. I mean we ideally need a land type of environment so we went through the some experiments about what type of transfers are happening between between the mothership DC and Compute node. So what we figured out is glance. There are logs going here and there and there are certain workloads which actually For for example, if there is a monitoring of VM going on But that is like more of chat in us. So we will discuss in the next topic about where the latency matters a lot so then we Simulated the latency between these two endpoints and we stretch from 10 20 30 and up to 100 millisecond and we start seeing that okay 30 millisecond start things started breaking. So what what exactly? started breaking is the communication between These services inter service communication because all of these inter service communication is happening through rabbit mq and database updates are Going on there are some timeout values default timeout values and then there are some communication between the The SDN directory services to the SDN controller. So we ended up coming to a point that oh if I start Putting in these hot spots like control interface or data interface. I would say that I won't I can't stretch more than this So let's set this as a prerequisite for our telco operators that if they can give this we can really stretch out our computes to To the macro DC then the next prerequisite was okay We can do this but now how to install the compute nodes. I mean you can do that I mean we typically observe that a compute node in such Situation can take up to an hour to get deployed because it's doing a pixie boot over the van and getting the images from The deployer nodes which are running on the mothership data center So we thought about let's for now assume that a pre-installed compute node is shipped to the micro DC Yes, we can do that. So a set of compute nodes were shipped to those micro data centers And they were already having some cached images So that way you are not doing any cache images on on the you're not doing any image copy on the fly whenever you are booting a VM So these cache images are mostly the most frequent images which will be typically the BNFs And that that way we reduce the data transfer between these two endpoints Then as you know, there was mentioning sometime back about the Requirement of persistent data. So I can't always write over the van to some mothership storage area So if we can localize the storage area and we can make a prerequisite that okay a storage area is running on the micro DC In the micro DC center so that way when I will be creating a volume It will be what the volume orchestration is happening through Cinder will be happening on the local Local to that compute node so that way the VMs are going and writing locally and so it's storing the data persistently and Needless to say I mean we have solved the compute problem. We have solved the Storage problem it comes to networking. So the networking here Was a requirement that if we could use a hardware v-tab which reduces again a certain nodes for your gateways, right? So you can use a hardware v-tab so that top of the racks which also acts as a floating IP gateway or a Vx-slang to v-line gateway so that way if these few requirements if we're visits are met we can come up with a logical diagram of how the The logical diagram of how exactly it will look like so here if you see that on the right hand side there is a Mothership data center and this mothership data center is turning running a typical All the components like open stack controller sdn service directory sdn controller monitoring and logging There is a floating IP a l3 l2 gateway and a bunch of compute nodes, which typically a data center will have There is a local storage to each of these compute nodes and plus there is a storage area for persistent data The orchestrator is running on the in the micro in the mothership data center and it's orchestrating The multiple n number of micro DCs here, so mothership DCs here and also at the same time is programming the or it's sending the Requests to the micro DC through the mothership data center On the left hand side, you will see that we have used here the minimal footprint For compute storage and network. This is just a set of compute nodes Which are either enabled with a survival interfaces for high throughput or lined up performance and we switches to provide the local routing and switching so it's not like for every Networking request it has to go to get DCP IP from mothership DC or it has to get Getting to the external network is going out and going to the mothership data center and then going out So we tried confine confining everything to the same within the micro DC The network is going out directly to the pops as to the CPs And and the the bunch of CPs which are connected to this is very at a closer proximity So here now let me Play some use cases where okay now I'm going to launch a VM so where exactly it will go to now the The tax which we have used here is availability zone so availability zone is indicating that it's a micro DC So micro DC is set of compute nodes which are sitting in this micro DC data center. So That way when I launch a VM through the orchestration I launch a VNF through an orchestration I will ensure that the scheduler picks up the availability zone called as micro DC and the VM gets launched here When we are attaching a persistent storage where we are creating a volume with a volume type micro DC So that means whenever a volume type is micro DC It's going to the center services running on the mothership in the mothership data center is going to create volume in in the micro DC So this way your VM and the persistent data both are locally on Locally installed in that particular micro DC and at the same time your SDN controller is programming the V switches And it's also programming the top of the rack switch for providing you an external connectivity to the world so looking into this Aspects we Kind of I mean examined all the combinations and it we found out to be a good one And when we started thinking about scale and the other requirements, so we started evaluating that, okay Let's compare this with another Mitigation plan wherein we can come up with a reduced Footprint for control plane instead of having a full-fledged. So this was one mitigation plan where? We discussed we ensured that there's no control plane here But still there the open items which we discussed in the last slide There are certain open items which we want to highlight So in the next mitigation plan What we did is before I go there. I will just show you the physical implementation of this So this is a typical physical implementation of how Micro DC and a mothership DC is there and you will see that there is a VPLS or L2 extension provided by the service product or a telco customers On the left hand side, you will see this is a part for micro DC And in this part you will see there is a minimal footprint some storage nodes and some compute nodes The story knows can be either it could be a just a single 4u rack for For a storage area like a 3 power or any other storage subsystem But it could be a commodity hardware as well to have a software defined storage on the right hand side, you will see a typical data center where all the the leaf switches spine switches router central routers and They are being used and on the right hand side. There are certain components with a minimal footprint here so I was discussing in an earlier slide about That okay, this this looks great. I mean at the same time we have a set of prerequisites, right? So not all the prerequisites could be met by certain telco customers So how we can mitigate those prerequisites and come up with another mitigation plan with with a reduced control plane So that in that that started coming that started the indication of having a coexistence of a control plane with a compute I mean so if you if you take a look at this logical diagram what we are Emphasizing here is the control plane as well as a compute node are on the same hardware and in this particular On the right on the left hand side, you will see the micro DC is Is is is hosting two nodes which are an active standby mode controller is an active standby mode And it is also with compute nodes but so they we segregated this is through resources and we dedicated certain resources for Platform and certain resources for virtual machines which are running on compute node You can keep adding multiple compute nodes to that but your control plane is still confined to those two first the first two nodes Now when we have So this is exactly a complete whim here, but in a reduced footprint So you will see that there is an SDN controller also sitting here So we have a SDN solution which actually can federate across data center So you have a SDN service directory on the right hand side And it has two SDN controllers one sitting in the micro DC and the other sitting in mother ship data center So that is providing the L2 connectivity even if there are two different separate clouds So my colleague Nana is sometime back in the previous or day before She presented something about multi DC communication. This is exactly the same Communication model happening here, but it's in micro DC. So the other areas which we can reduce footprint on is either you can have the Storage array or SDN controller being virtualized. So that is something we are Looking forward which is more like virtualizing this entire control plan rather than co-existing the next slide which talks about is Virtualizing both control plane and our storage So certain telco customers may not be interested in having buying a spring expensive storage area for their persistent data So why not we use a software-defined storage here and then what you lies Control plane and the requirement for the storage and put it into a common set of nodes so this comes out as a logical diagram for that and what we have done here is The micro data center is having set of KVM nodes which are actually in a for for ha features and they are co-hosting control plane as well as the SDS and the SDN controller So that way you can find your control plane in just three KVM nodes and all other nodes are your compute nodes the remaining stuff like The the typical communication for L2 connectivity or there are two separate WIM So you will have same multi DC solution for L2 connectivity for a network between VMs running on a micro DC and VMs running on the mother ship data center So that is Still a way to do this but it's not solving the entire problem, right? So we have we have in each of the mitigation plans. We have open items We have we have considered some aspects which may not be applicable or which will not be of interest to all the telco customers. So What we came what we are discussing now in this particular topic is that there are some open items There are some projects which are going on and it's good that we are seeing some traction on those projects So we will see how we can Balance this or how we can address this for each open item So let's talk about deployment of compute nodes in the micro DC over when I mean we typically I mean there is no open source solution or community, but we can typically use being HP hardware or we can use some certain HP's own features where the images could be shipped to those micro DC's and the images could be attached to ILO if you know about ILO and then it just Boots from that image and stores this images then sorry boots from this images. So that way we reduce the traffic between the over the van Or the other option is that if you go with a big see boot of the compute node, I mean if you're okay with one hour of Provisioning of each compute node. That's also fine But we need to come up with certain way that there is a smaller deployer sitting in these micro DC The second one is about low latency monitoring of the VNFs. I mean, this is a problem statement. Yes, because we are doing a Subsecond detection of VMs in our solution here And if it goes and stitches out to the micro data center, then this becomes another headache to solve and that is how we can have some remote Detector or remote sensor which can do this detection fast detection and send the notification to the mothership data center So today we are doing through multicast so we can see that how we can use remote clusters to address this issue So typical talks you have seen on The pacemaker clusters with a main and the remote one so that way it can help here And there are some other options which are thinking from open SAF side That that will also address some of this low latency monitoring of the VNFs The third thing is image offload So whenever you are launching an image, so I said we have to cash it so image it is okay for one time boot or for some Duration, but eventually you will start upgrading the VNFs So in that case you have to again upload a new image and ship it or send it to the micro data center So that is again over the van. So is there a way we can do some server some sort of caching offline by doing some other no IP is or some other way that the images are shipped back and this are already cached in the the Nova compute nodes The third for the fourth thing is about ice-cazi traffic or when I mean we we do mention that Okay, you boot from a local storage that is root NFM Also, so that way you don't have any ice-cazi traffic going over van But some VNF requirements are that they are always booting from volume in in such cases You can't avoid ice-cazi traffic. So that that Implies that I mean we need some or other way to address this currently no no other solution has this no other solution addresses this problem But we have to still think about it and how to address this problem the The second last bullet is more about the underlay L2 L3 VPN connectivity between DCs So this was kind of I mean requirement because we chose a deploy model Which does pixie deployment and we chose a high availability model which actually does multicast and requires a single L2 domain So it can be mitigated we can purely have L3 connectivity and that way We don't really need a L2 extension going over from mothership data center to micro DC. That is one way to solve it then the last one is Limit on number of micro DCs. So how many micro DCs can be attached to a single mothership data center? I mean, it's not a problem. What micro DCs comes finally at the number of compute nodes per control plane And that problem can be solved typically by partitioning your cloud by regions And each region will have its own control plane or no one neutron aspects So that each region can address a set of micro DCs that way you add another region that will be addressing another set of micro DCs So that particular thing can be mitigated through region concept or having a dedicated control plane So now let me hand it over to Vinod for discussing on some of the open items related to Computes with control plane. Thank you. Thank you So Rishi talked about the case of a headless computers and all the issues around that and I had a earlier highlighted some of the issues or challenges around having Computs with control plane that has basically multiple little whims sitting in each of the micro DCs and we talked about the footprints and so on Thing which talk about is basically how do we actually come up with a way to manage these multiple clouds? This is still an unsolved problem because we have the question of logging monitoring and The question of okay How do you actually consolidate all of these information across these various little whims and bring it to the main data center and show It in a common dashboard That's a problem Which is still has not been addressed that well and that's something which we are looking at and if there any ideas out There we like to try to hear and try to understand what approaches are there like to participate in those activities And of course there is this classic problem and a very important problem of how do we actually do? upgrades in a Across this whims so we have to make sure that okay these When you're doing these upgrades they have to be hit less in each of those whims and they are to also make sure that the Upper layer orchestrator can't deal with different versions of these particular whims Okay, and of course there's this generic orchestration overhead of basically how do you actually manage quotas across these various whims? How we actually discover the resources across the whims what is available what is being used and so on and of course things like you know around We have multiple service endpoints now And how do you actually manage these endpoints using the NFE orchestrator now some of these problems have been talked about even today they were taught the presentations on combining tacker with I think it was King bird and Try circle if you'd like to watch how they wall and we like to participate in some of those Activities and see where these things go and I understand packers a little ahead on the VNF management side It is still evolving on the multi-site management side and we understand the the the projects like King bird and Try circle are still you know in the process of the new projects They're evolving definitely you like to see participating and some of those activities to try to solve some of these issues Yeah, I mean there are still like I mean some of the like distributed quota management Which you're talking about there are so many things you have to manage the quotas coming coming with a consolidated quotas and showing up to the orchestration and talking about even the image management across those whims, so I mean People are thinking at this point which are like a more pain points about provisioning first because we could see some examples Tracker can now support multiple whims, but there are still areas where user management is a big issue Quota management is another another issue and then image total image consolidation across those whims so with this I mean we we almost I mean have not a complete solution to this problem, but there are a lot of medication plans and There are some areas where we have to focus on and continue working on it and come up with the exact solution or right solution for Some of the Delco Customers, so we are now opening up for a proper basically questions and this question points Thanks for the good presentation Actually, I have one question regarding the mapping of VNF's on the data center Do you have any idea about this? Do you suggest that let's say control plan VNF's like Mme to be in the mother ship Data center while media or forward plane VNF's like PGW and so on to be pushed on the edge This is the first thing second thing is how this map to the 5g Let's say road map where we have MEC and mobile edge computing where we will push the compute to the Yes, I mean at the end of the day will will we have like hierarchy of that data centers So we have that the center and the at the edge and that the center in the region and that the center in the center something like this So let me take the first one I think we're talking about control plane related VNF's and the data plane intensive VNF's obviously in those situations You would deploy have the NFE orchestrator Basically deploy your control pin VNF's which don't necessarily have high latency requirements don't have to be Pretty close to the point of presence They can be deployed on the main data centers and of course the data plane intensive ones would be deployed in the Near the point of presence or central offices and so on your second question was about 5g could you please elaborate? Are you talking about the how many tiers there? Yeah, do you have any suggestion what we understood that for the 5g? There is initiative called MEC mobile edge computing where we will push the compute to the edge up to the run maybe So to achieve end-to-end latency with less than 1 millisecond For vehicle to vehicle communication or something like this. So how it's mapped. Do you have any recommendation for this? Do you have any insights for this? No, I mean something which we have a while discussing with some teleco operators about The typical I mean packet core or some some sort of those VNF's would reside on in a pop, right? So that is more closer to the customer edge So is that what you're I mean those type of things will be residing there. Yeah, but I mean, okay This is a general understanding. However, the question is still there how you will decide how many nodes any In terms of hierarchy How you will decide based on what you will decide how many data centers will be needed for that purpose Is there any working group? No, not that we are aware of but at least we had not yet plugged into that But something like to talk offline and get to know more about that and going back to the thing Which I'll say about control plane versus VNF versus data plane VNF I mean some of the things which we have seen in tackle is you can put some descriptors, right? So while you're deploying that complete VNF, you can specify that these are my data plane So it will use flag like availability zone called as micro DC. So it will go there and they get deployed So that way, I mean orchestration can do some level of intelligence based on the type of VNF it has Thanks. Hey, thanks for the question. Thank you. Okay, no questions. Yeah, thank you very much for attending. Thank you