 Good afternoon. You can all move up. There's plenty of space if you want. Thank you for making up to the session. My name is Hazar Saeed. I am one of the chief architects in Red Hat and the telco team. My job is to work with lots of service providers, understand their requirements, understand their deployment models, and I hope to share some experiences with you based on those deployment models today in this multi-site open stack deployment options and challenges for telco. A quick disclaimer, I'm not a product manager, so don't ask me for roadmaps, please. Don't ask me for product support questions. They are owned by the product team. They are owned by the product managers. What I do want to do, though, through this presentation is start a dialogue and start a conversation with all of you in terms of what are those multi-site deployment options, what are the challenges, why do we need them, what are the use cases, and how do we actually make things happen. So what we'll do is very quickly we'll zoom in on the open stack architecture very, very quickly just to identify some components that are critical to us in terms of multi-site deployment. Well, I'm not going to go into the details of open stack history or architecture or anything of that sort. We'll talk about the telco deployment use case in quite a bit of depth. We'll look at a couple of different use cases, see where it's applicable, how does it work. Then we'll look at the distributed deployment models or options that are available to us. Especially some of the newer projects that are going on, some code that's actually being committed now. It's also the code that's coming in the next release of Pyke. Then we'll look at what are some of the issues with those options that are there. And we'll see if we can find a solution that's relevant in the context of what we do. How many of you, by the way, before I start attended the Verizon session on microCPE? I think it was also shown as part of the keynote. One, two, three, four, five. Quite a few. That's good. So it was shown as part of the keynote, the microCPE. And actually, there was a session that described what they did inside that microCPE. That's actually in a very interesting use case, and we'll dive down into that particular use case here and see what did Verizon do? Why did they do it? What were the options? And what are some possibilities in terms of how we do the deployment? We all know this picture, right? Everybody has seen this. It's on the OpenStack documentation. Pretty complex picture, but two things that are important for you to note. One is Keystone, right at the bottom, right here. Another one is NOVA. Those are the two important components that we will actually see how they can address some of the multi-site deployment conversations. Of course, there's also Storage, OpenStack Block Storage right here that connects into the AMQP with respect to the Keystone. So if you don't want to listen to the rest of the presentation, you've got the three hints that I gave you in my first opening sentence, AMQP, Keystone, NOVA. Same picture, just redrawn it in a slightly different way to look at it against the same three important points to focus on in terms of all of their connectivity and the models. Now let's look at the use case a little bit. Let's understand why do we need to deploy a really multi-site OpenStack? What are those use cases? Of course, no telco operates on a single data center. If there is a telco that operates on a single data center, I would like to meet that telco. Every telco has multiple data centers, primary backup, two, three, four, one in each region. Some have tens, some have hundreds, some have even thousand. Now especially with some of the architectures that are going around like virtualized central office where every central office is kind of becoming a data center, you will actually, how many central offices are there in the US? More than 10,000. You want to have 10,000 data centers? Each data center with an OpenStack install? Awesome. How are you going to manage that many? That's an interesting question. Mobile edge compute, a lot of telcos, mobile providers are deploying compute much closer to the user. Why are they doing it? Because they need that compute power to deal with latency and roundtip times so that they can process it locally and transmit it back. Virtual reality, smart car or self-driven cars, those are the, they impose a set of requirements in terms of the amount of compute that needs to be available as close as possible to the consumer or to the user in the IoT context as well as in this entire mobile edge. Virtualized RAM, virtualized CRAN, VRAN, many people use different names for it. Again, the idea is to actually remove the active components from antenna locations and put them into a compute location that's virtualized and much closer to those antenna locations using some sort of a front hall for a mobile carrier so that you can actually process that information right there, make those antennas passive elements that can be reprogrammed and redirected based on software. Of IoT gateways and fog computing model. I mean, Cisco has been pushing around this fog computing model for a little while in terms of, again, trying to put those compute capabilities much, much closer to the data acquisition points for sensors so that you can provide information or much faster decisions based on the data that you've acquired back to those endpoints that can then process it and actually act accordingly. So again, speed is of essence, latency is of essence. That's why you need those compute sitting in those remote sites. Of course, everybody knows the collapse branch application that's been sitting around the collapse branch deployment model that has been around for a very long time where you have multiple sets of different capabilities within a branch that are all being collapsed due to the power of virtualization into some number of virtual functions that sit on standard hardware or standard servers and they can be virtualized and they can be managed remotely. So all of those requirements push the telcos into looking at options in terms of deploying compute capability much, much closer to the edge of their network or closer to the consumer. When you deploy compute capability much, much closer to the edge of the network and closer to the consumer, the challenge you will have is how are you going to manage those compute nodes? How are you going to instantiate those compute nodes? How are you going to orchestrate those compute nodes? So going back, just illustrating that in a big picture environment, you have all these remote sites. You have multiple data centers that sit out there. Each one runs an open stack instance. You may have a hierarchical connectivity model that doesn't mean anything. Each one is an independent island. Each one actually manages its own local sets of resources. And every telco has this particular deployment model today. In fact, most people, if you ask them, how are you deploying open stack, that's how they say they do it. Each one is a separate instance. The HA functionality in or the service functionality that they bring to the table through this multiple deployments is not at an infrastructure layer, but it's at the VNF layer or at the network applications layer in that particular context. So this is a standard deployment model that you typically see every single. But the thing is that service always spans across multiple data centers. What does that mean? Suppose I'm a residential customer to my service provider or I'm a business customer to my service provider. Some functions may be hosted in one data center. Other functions may be hosted in other data centers. What the service provider really has to do is stitch that functionality together to provide that overall service for me, which means the service provider is actually looking at those multiple different entities of open stack deployment as a separate zone or a separate model. Service provider has to ensure that the policies are appropriate, the authentication models are appropriate, and that when you go provision a service, when you have to provision some VMs in one data center and other set of VMs in another data center and then do the network stitching across them, that needs to be really done in order to really offer a typical service for me as a user, as a consumer. So that's not as easy as it sounds. Well, open stack is great at deploying a single data center. Open stack can manage a whole bunch of resources in a single data center, whether it's Neutron, whether it's Nova, whether it's Glance, Sender, whatever. Within the limits, within that operational boundary, within that administration domain, it's very, very good at operating that. It's very, very good at controlling that. It's very, very good at providing those set of resources to you on demand. You can do elasticity. You can do all of that. Now you have to actually go to each location, pick a template to the deployment model, worry about autoscaling a different way on those different contexts, and then actually deliver the service. So you need some level of service orchestration over and above this open stack island deployment. Let's take a real case. This is a real case, by the way. I've hidden the name of the service provider. This particular service provider came to us and said, well, we have 25 locations, not too many, not hundreds, not thousands, only 25 locations. We really need two to five VNFs in each one of those locations to provide services. Two to five VNFs can easily be hosted in one or two compute blades, maximum. Now, you want a redundant model. Two compute nodes per site, 25 sites, you get to 50 compute nodes total. 50 compute nodes total for a large telco is a two rack, three rack deployment in one site, simple. But here it's a distributed model, right? You have these 25 locations. It has to be closer to the user. If you want to deploy a redundant configuration, the minimum redundant configuration supported model is, what, three controllers. So you're burning three controllers for two compute nodes, or maybe even one compute node per site. What's your overhead? If you just do a simple math, you have 75 storage nodes, you have 75 control nodes, but you have 50 compute nodes that are sitting there. That doesn't make any economical sense. Why should I go deploy OpenStack in this particular situation? I'm going to try to find something else. And what is that something else? I don't know the answer to what that something else is. But it's an interesting problem. And this is a real problem. It's a real service provider with real number of sites, with a real requirement that came to us and said, help us. How are we going to do this? So you have literally 75% overhead in terms of compute power that's going there into controllers, into storage nodes for initializing three VNFs literally per site. Here's another example. Take a telco that has about 1,000 central offices. You want to go CORD, anybody heard of the word CORD? C-O-R-D, central office redirected as data center? A few. Good. So what that architecture is all about, it's all about creating or building that central office like you would build a data center with a fabric, with compute nodes, with controller nodes, with VNFs that are hosted there, with appropriate service chaining, orchestration, and so on. And oh, by the way, for those of you who know CORD, we are also interested in building a CORD-like architecture with ODL instead of onus controller, and so on. But that's a separate topic. Regardless, whether you adopt the VCO architecture that's being discussed in the ODL working group, or whether you adopt the CORD architecture as defined by Linux Foundation and onLab, you still have to deploy OpenStack. When you deploy OpenStack, you have 1,000 central offices. Each central office, now you're going to go deploy full OpenStack. How many subs a central office may serve depends on where that central office is located. If the central office is in Boston City, then you're serving probably lots of customers, no issues. If your central office is in rural Pennsylvania, well, no, it's serving a bunch of hundreds of houses. That's about it. So do you then go and deploy the same model? Do you put the entire keystone? Do you put the entire authentication models on those sites? What do you do? That's a real, again, a real challenge and a real question. Now if you ask Verizon, what did they do in their micro-CPE? What they did, they put actually the entire OpenStack on the micro-CPE. I didn't get a chance to ask the question, but if anybody from Verizon is in the audience, maybe they can answer my question. Is what's the compute power you put on that particular micro-CPE? I know they containerized it. I know they made it lightweight. I know they made it run, but it runs the full OpenStack suite. Now what telcos have done, and I did ask the question to the presentation after the presentation was done, is how do you create, how do you manage the authentication model in such a situation, where you have thousands of those micro-CPEs that are scattered around in your network? Well, they said, look, we don't do that. We don't worry about trying to build an common central authentication model in this particular case. We use TACACs from a user's perspective to actually manage that. And we still let the standard keystone templates recite, and speed users aside, and authenticate the local resources on the micro-CPE. The advantage that it gives them is nobody can interfere with local containers that are sitting on that particular micro-CPE. OK, that's fine. Is that an option for everybody? I don't know. Now let's go look into those deployment options and models. And I kind of described them a little bit for you before I got into the slide, multiple independent island model. We've seen that. It's being done today. That's how people are doing it today, in fact. Common authentication model, I kind of hinted that a bit. We'll actually dig down a little bit more into this common authentication model. Then we'll also dig down into the stretch deployment model. What does that stretch deployment model mean? Can we peel off nodes functionality from a single open stack and then start to go deploy them in different places? That's what we'll look into it a little bit more in detail. Then another model maybe, while I still want independent open stack islands, maybe what I need is availability of resources to be able to orchestrate within those open stack islands. So that's what kind of tricycle tries to do. And we'll actually dig into tricycle also a little bit what that tricycle approach is. Proxying some of the APIs, creating an API gateway model, those are the things that we need to look at. And this is the work that's happening in the open stack projects or groups these days. Maybe we look at a complete agent-based model. I don't know what that will look like. I have an idea in my head, but I want to be able to code it before I come back to you and say this idea works. And of course, is there anything else? Any other wild ideas? I don't know. There was a very interesting presentation that was done, I think, yesterday or today. I don't remember now I'm drawing a blank because I did one presentation on Tuesday, and this is my second one, so everything's lost in between, about something called Kubernetes sandwich. We'll take a look at that Kubernetes sandwich as well. So in the first one, you have multiple different, and this is, by the way, what is being done by most people, is you go deploy open stack in different locations. You put some sort of a load balancer in front of them. You authenticate your users through the directory, through LDAP, or through TACACs, or through some other means, outside, out of band, nothing to do with the open stack environment. You authenticate the user through that. And then once you hit the load balancer, the load balancer is the one that's responsible for you to push your request to A or B, depending on how you divide it. Of course, within OpenStack 2, you already have those capabilities of regions and projects and all of that. But here, this is like you're masking some of the deployment capabilities behind some load balancers. And you're making, for the user, things become transparent. Of course, when you have a load balancer type of model, then everything in one OpenStack instance or one data center has to be identical to another data center, because only then the load balancer situation will work. If you want specific workloads in specific locations, obviously that's not going to be helpful for you to put a load balancer in front. Then you want to go to a specific instance, go configure that, pick a project, pick a template, pick a good deploy, go do your heat deployment, hot-tem deployment models, and so on. Disaster recovery is an external problem. It's probably an application issue, it's not an infrastructure issue. So then you actually use some of the disaster recovery tools to be able to actually recover data from one data center to another. Now, in each of those data centers, you have a fully redundant configuration with three controllers, three storage nodes. By the way, I use 3.3 a lot, because 3 is kind of the minimal HA deployment. You need an odd number of controllers, 2N plus 1, in terms of any, to create a quorum and to ensure that HA operates appropriately. So we use a three controller model here. Each data center has to have those three controllers. You roll with the idea. The next one is to actually have a common keystone. This is an interesting model, and this is of interest to actually a lot of people, which is you pull the keystone out or you point to a central keystone. This is sometimes referred to as shared keystone or distributed, in other cases, in the previous cases it was distributed keystone, common backend integration, right? So keystone integrates with LDAP in terms of the authentication capabilities. You have one central keystone to point to and you take all of those remote locations and point back to the keystone. In keystone, each, again, region or each area is a fully redundant system. It's independent, except that the keystone is common. So what you do in keystone is there is something in keystone services. There are, you know, keystone services has tables associated with each of those capabilities, Nova, Neutron, and so on, right? In that, there is something called an endpoint table. You can actually, if you have access to your open stack, you can actually do it. Check that. There's something called an endpoint table. In the endpoint table, you define the following. You define the endpoint type, you define the service, you define a region, and you define an IP address. So what you do is you actually go modify that endpoint service table, right? Endpoint table to point to the same keystone in each one of those locations. So it's a manual hack, but it works to create a central model, and if somebody wants a centralized keystone, that's how they do it. Now, the advantage of having a centralized keystone is the ability to have one central location for all of the authentication models, right? For all of the users, resources, and so on. And then the rest of them can still continue to operate in the same way for those distributed open stack instances. Now, you can still have a central controller, but this is another common requirement that we get, which is how about if I just do storage and compute in those different remote locations, like HCI, right, hyperconversion infrastructure? I have controllers in one site, and then on another site, I put storage in, I put compute in, and I have a hyperconversion for structure there, and now I'm gonna operate that remote site as if it was part of my master site with the same set of controllers. This is doable, this works, if you wanna do it today. The key question is how far can you put that remote site? The key question is how much workload that crosses the WAN boundary between those two sites. And that's what we'll dig into a little bit later during the presentation. In this particular model, when you have this remote hyperconverged clusters or hyperconverged nodes, you can potentially replicate some of the databases and some of the storage capabilities across from one site to another, and you can do a manual restore later on. So it's almost like that's a stub site. It just doesn't have full control of everything. You've replicated some information, you're pointing it to it locally on that remote site for faster access, and you're doing some RBD mirroring or some synchronization across from the master to the stub site. So this is like a stretch deployment model for storage and compute, but it is limited to the number of few sites. Obviously, you can't take this model and say I'm gonna deploy this across thousand sites, or 500 sites, or even 100 sites, or even 20 sites. It becomes really complicated. Two sites, five sites, maybe you can. Now let's revisit that thick CPE branch office use case. I showed you a little picture about those telco use cases and I said, here is a deployment model of converged branch where you're converging a whole bunch of functionality onto a couple of servers, one, two, three, whatever those servers are. If you start to do that and start to deploy these type of servers on those branch offices, there is a deployment model that people have coined called thick CPE. The reason why they call it thick CPE is instead of putting a router, a firewall, a switch, and something else, they put a server there. Take those functions, virtualize them, run it on that server, and that server has a V router, virtual router function that actually tunnels traffic back into the data center where you do the traffic manipulation and provide the rest of the services. Now that, again, is an interesting requirement in terms of a thick CPE, but if you have, you know, how many customers do you have as a telco? 1,000, 2,000, 10,000, 5,000, lots of numbers. Each customer has how many sites? An average customer has what, five to eight sites. How many are we looking? 35,000 sites, 50,000 sites. Now you start putting servers in all of those sites, 30,000 sites, 20,000 sites. And you want to actually put, you know, you want to put compute there, you're gonna be able, you need to be able to manage that compute, that bare metal in some way. Can you start to put open stack? That's exactly what micro CPE is for Verizon. It is that three VNFs, if you remember the picture of Verizon, you'd have three VNFs, full open stack embedded in that particular box that they showed up on the keynote. It's thousands of sites that you have to go and deploy that. Is it gonna work? Excuse me. Now one may say, well, why can't I look at the flexibility that was provided to me since OSP10 in terms of splitting some functionality on the controllers? And then trying to put those functionality in an appropriate way that I can actually architect my infrastructure, architect my open stack capability to actually just distribute them. So OSP10 allows you to actually do what is called as composable roles. So you can build controllers with certain set of capabilities. What it has done is you have actually split that into place maker and system D level of functionality and you can actually split those across and create custom roles for controllers. What this custom roles allows you, this has been available since OSP10, by the way there are some more enhancements that came in in 11 as well. So those, by doing that, you can actually separate Keystone out, you can separate potentially Solometer out, you can separate Neutron out, you can separate, you know, RabbitMQ out. Why am I talking about separating these components out? These are the components that we will need to play around with in order to actually build a multi-site OSP infrastructure. And those are the components that potentially have some bottlenecks in them that we will actually need to tune. So then going back to the central site, you could do the same thing. You could create a hierarchical model. You could create a hierarchical model by separating those components out, placing them appropriately in your infrastructure. Now, looking at this picture, this in no way represents a real service provider. I did have a real service provider network from LATAM. They had 220 locations in which they wanted to put OpenStack, 220. And then they had about 18 top level locations where it was full service. And then they had about 1,000 plus lower level locations where they would have compute elements. And they really were after us to say, hey, Red Hat, can you help us get to this particular deployment model? We've tested it. It works. It seemed to work when we can put remote compute nodes out there with this. And we had a very long conversation with them going back and forth to understand what's the deployment model? How are they doing it? Why are they doing it? But it creates an interesting, so what they wanted to do was have that flexibility to say, I'm gonna go deploy a controller capability here, and then manage this region because some of them sit in a fiber path. If there is a big fiber loop, I'm gonna put a controller here. And all those nodes that are sitting in that particular big fiber, all those central offices that are sitting in a big fiber loop, they can be just all compute extensions instead of putting multiple OpenStacks in each one of them. It's a real deployment. It's a real question. It's a real challenge. So what are some considerations then in that case? What do we need to look at? Now, as we started talking to these service providers, as we started to have these detailed conversations, some things became obvious. One that was obvious was, what's the latency between those sites? From where you put the compute to where you put the controllers. What's the latency? What's the round trip time delay that you can expect on an average? I was talking to one of the global providers. They had a location in France, and then they had another location in Miami, and they wanted to actually put some compute locations in Miami and manage it from France with all of the controllers. Well, there's transcontinental link. There's latency challenge. What is it that the maximum that you can get? And not only it's just the latency of that particular link, but it's also what's the outage time. What happens when the node disappears? When the link disappears, the node is still running. The controller thinks the node is down. The controller is trying to rescale, and the link comes back up. How do you deal with that particular? So it's not just saying, my latency is within this boundary, and I can manage it, and I can probably tune the confile to say, expect half a second delay. Expect two seconds delay. That's probably a lot easier if that's the case, and if it's always consistent. But if it's not consistent, and you have an outage, you come back up what happens. So you need that headless operational model and recovery option for that. Also, when you have a link that fails, and maybe there are 10 sites hiding behind that particular link. Now the link comes back up. You have 10 sites that suddenly come back up, awake and say, I'm here. So it causes another thing called startup storm. These are common issues you find in telcos. We are not visiting, or we are not getting these issues because what we focus on always is deploying OpenStack in a data center managing computer infrastructure that's local. Suddenly now you want to take the same OpenStack that's managing computer infrastructure that's local, and start to try to do all these type of funky things. Everybody throws up their hands and says, ah, no, it doesn't work. Okay, where does the other bottleneck show up? The other bottleneck shows up in Oslo messaging. I know that this has been a source of a lot of pain for people just scaling OpenStack within a data center, in terms of scaling message queues, understanding Oslo messaging, understanding the RabbitMQ. Why? Because things like Cylometer, of course with, you know, Iod and Ginochi are the biggest consumers of RabbitMQ. Cylometer is the biggest consumer of RabbitMQ. Also, Nova uses RabbitMQ. To provision, to make changes, Nova uses the same RabbitMQ. Now what happens? You have a remote site, you have Cylometer message, you know, information, agents collecting information and sending it back to the collector, and you also are trying to make changes to the Nova API or to the compute information. And now you have latency, and you have round trip time issues. That's what creates the biggest challenge. Now, one of the conversations we ended up having with at least two different providers here is to come up with a latency bound and say, what is my latency boundary? Can I test within that boundary and say yes, this works, hence go ahead and deploy within that boundary. So what is that round trip time? Based on that round trip time, you can adjust the RabbitMQ queues. So there's something called a bandwidth delay product, BDP. It's a standard, you know, thing BDP in, when you do any kind of queue tuning for our buffer tuning for routers, for switches, and for people who have deployed WANs, this is a standard term that they come up with. It's bottleneck speed, link speed times the round trip time. So you do that. What is the outage time? You do the queue tuning through that. And when you have problems with it, you can have things like Nova Flap, Neutron Timeouts, you know, headless operation recovery and RestartStrom. So what you do is you do bandwidth delay product-based tuning in terms of buffer sizes. You look at the number of messages that you're passing in the system from those remote nodes. One of the conversations is, why don't you split the message queues? So if you split the message queues into a message queue that handles salameter agents, a message queue that handles Nova, a message queue that handles Neutron, perhaps then you can apply some external QOS to actually, you know, tune the environment. It's complicated, it's not easy, but it is possible. There was actually a very interesting presentation done in OpenStack Austin, and there's a video available online in terms of splitting message queues and what the performance gain was for those by splitting those message queues, even inside a single data center, forget about the wide area deployment. Take a look at that presentation. It was actually quite impressive. Now, recently there have been some AMQP enhancements that were done. In the AMQP enhancements, what happens is in the new enhancement, you eliminate the broker model where each time a message needs to go to a broker, you have a tree-like structure and the message travels from broker to broker and out to the actual agent. So what you do is instead of that, you create this mesh router model that allows you to actually pass messages directly and route those messages to the endpoints. If you have to go between exchanges or between domains, then you can actually potentially use a broker to go between domains, but otherwise you can send a message directly to the message router and it gets delivered directly to the endpoint without having to go through a broker latency, re-queuing and so on. So these are some new enhancements that was done. This is a new driver that's available in Newton, I believe, for AMQP for Oslo messaging. Should consider if you must use RabbitMQ, then tune the hell out of RabbitMQ, use exchanges, use shovel plugins, split those into multiple and try to scale this as large as possible. Here's another interesting project called OpenStack Cascading Project. This is another option. This was actually submitted, I think, a few years ago. There was an interesting presentation done on OpenStack Cascading Project. The idea was to actually create a set of proxy APIs at the central site that corresponded to each remote site. So what you see here in yellow there on top is this is a set of proxies for this site and this is a full OpenStack deployment and you see the control functions that are listed out here. Similarly, this set is for this site and likewise, you'll have one set for each site that you deploy full OpenStack on. This is called the Cascaded node or region. This is called the Cascading region. You can create an interesting hierarchy with a parent-child relationship with that. Subsequently, since then, this was obviously quite complicated. So since then, they've actually split that into two different projects called tricircle and trio-tool. Tricircle is the ability to extend the network from one OpenStack site into the other OpenStack site so you can create stretched Galera clusters, for example, or you can create stretched networking. So in other words, from one data center, you can provision resources over into the other data center right here, from one to the other and you can create this through Neutron, through Neutron API and Neutron extensions, you can actually go provision workloads in the other data center. So that's what tricircle is and in fact Verizon team actually stated that as part of their presentation that they're very much interested in tricircle to be able to manage these remote micro-CPEs. Then the other portion, so this is all on Neutron from a tricircle perspective, is to create the API gateway. So instead of creating a proxy API set for every site, you have a common API gateway and in that common API gateway, you actually position that in front of the user as your central management paid or central location and that in turn then goes and provisions down into each data center that's modeled as an availability zone. So now you can create a AZ1 through AZM, you can treat each availability zone as a pod and the way you do it is this API gateway, what it does is it creates this unique ID with a tenant ID and a pod ID. So because these parts, there are some finite number of parts and that are unique, you have finite number of tenants that are unique, you can now create a unique ID to go provision any given workload through this particular API gateway into any one of these open stack regions. So an interesting way of dealing with this, still I don't know for example, how far this will scale to that thick CPE model that we just discussed. So what is the alternative? We are almost getting to the end of our presentation. So what is the alternative? Well, the thick CPE model was the all in one open stack model. Now the question is, should we abandon the idea of open stack on those remote nodes? If you have to run one, two VNFs, do you really wanna do that? Maybe we can do some hypervisor, some workloads, take that and do some level of automation outside open stack and actually go deploy them. That might be an option for that thick CPE model. It might not be an option if you want full-fledged data centers but still want to be able to extend those compute nodes across. Use Kubernetes. Kubernetes as a master orchestrator for those remote sites. I mean, as a master orchestrator that then goes in provisions open stack which then goes in provisions workloads further, whether they are VMs or even containers. And this is where the word container sandwich comes into picture which is, you have Kubernetes sandwich which is you have Kubernetes as a master orchestrator installing open stack control services which then create the compute on which you install further, either containers or VMs. And that's an interesting concept, interesting conversation just in the next room over. Robert Starmer was actually talking about exactly that particular topic. So this is what it is which is to take those control services, run them as containers in those remote nodes and then manage those capabilities across and use Kubernetes as a master orchestrator to drive that particular conversation. So just in summary today, a lot of people are deploying independent open stack islands. They, it is tough for them to manage all of those islands together. They are trying to look at external tools such as cloud management platforms to make that happen. Tricercle and Trio2o is of significant interest to multiple people. They offer a good promise. This is where I need your help, your input, your feedback because this is where we are working in to see whether that really addresses the problem or do we need to fundamentally look at a different way where we have some sort of an agent that sits out on the remote side and actually helps us manage that maybe just in plain Kubernetes environment. Crafting that availability zone model within the bounds of how many different data centers that you have to create that will definitely help. The Nova agent proxy which we just saw in terms of the Trio2o, that's useful. Perhaps we can just partition Nova agent out of that particular API gateway and look at it differently. Deploying bare metal at remote sites then how do you actually just power on those bare metal? How do you actually make that happen? Not an option. You probably need to ship those remote sites with some image that boots and then does a call back home to be able to make it happen. But if you want to go to the Verizon route then yes you have an option to do a full open stack there. And then I briefly spoke about this Kubernetes sandwich which is of interest to a lot of people where you're running Kubernetes as a master orchestrator running open stack control services containers which in turn then deploy Nova and then you use those compute to go deploy for the workloads in. No good answer, unfortunately. There's a lot of work that's happening in the Trio2o space. There's a lot of work that we are involved in. We'd be happy to work with anybody who's interested here in this particular context and happy to learn more from you in terms of how you're deploying it or how you're solving that particular problem. That's about it. Thank you very much. One or two questions please and then we'll wrap it up because I think we're out of time. Take two questions, one here, one here. Adam Young, Keystone Core. So of course I wanna know what features or what enhancements would you like to see from Keystone to better support multi-site? Thank you, Adam. So Adam, if you don't know, actually works in the next aisle over in the same company so he's also from Red Hat. Thank you for asking that question. That's an important question in terms of what you want to see. You're welcome. What I would want to see from Keystone is the ability to really handle thousands of sites from a scale perspective. These different requirements with respect to compute storage and the ability to manage those at a central site. So when you say that there's a little consistency issue with the assignment database, even if you use Federation for the user in groups, you have to have somebody in charge of doing assignments and propagating that information around. How would you like to see that work? Even if it's in kind of general terms? Perhaps along the conversation, but we can discuss that. I think that UID approach is actually an interesting one in terms of what's being discussed in Tree or 2.0. But we can talk offline, yes. This is a slightly open-ended question. There is a model by which when two entities are disconnected, you can have one entity instead of declaring failure. I expect that the other entity can come back up and the other entity can buffer any state changes and when they get reconnected, the buffer state can be applied. Is any of that, I know it's a harder problem to solve, buffering is needed, it's much more of a research topic, but any of that thought process going on in the community? So yes, it is a harder problem to solve and we don't have answers to that problem because one of the things that Nova does extremely well is when you know that there is a node that's down, then you can auto-orchestrate or kick off a new one. So what should you do? The decision is what should you shut down that process, wait for it to recover, should you make that time out long enough so that then you can actually recover state this way or should you say, no, something else happened, maybe the machine died, not the link, so I'm gonna try to put a new node in that location. Do you ever sense which way telcos are leaning? I mean, are they saying, declare failure, I'll stand up another VNL, that's what they're doing, exactly right, because they want to be able to get the service backup as quickly as possible. All right, all right, thank you very much, I'm sorry we're out of time.