 Hello. Hello. Good morning. Welcome to Multistart OpenStack for OPMF for NFB. Let me introduce our team first. This is the three players from Block. And this is Dimitri from Erikson. And I am Cao Yihuan from Huawei. In this session, we will talk about the gaps and challenges in Multistart OpenStack for NFB. And we also will talk about the project like TAG, Kimbert, and the tricycle to adjust the gap and the challenges. And at last, we will talk about how these projects can collaborate with each other in different scenarios. Let's first talk about the gap and the challenges for Multistart OpenStack for NFB. We identified, like, there are some features missing in OpenStack to support disaster recovery in multistart scenario. For example, for NOVA, currently only support single-version machine consistency snapshot. So it's not good enough to support application level consistency snapshot for disaster recovery. So we filed a spec in NOVA to support application level snapshot. And so that the snapshot could be used for disaster recovery. And the code, hopefully, will be merged in Newton. And we also identified, like, lacking the feature for volume level application in volume application version 2. We will discuss with the community after the application version 2 finished. And in NFB, we have to deploy the multiple VNF in different sites. But there are some gaps in this scenario. For example, if we deploy a V-looter in different OpenStack in different site line, the tenant's V-looter should be connected with each other. So the east-west traffic should be accelerated. But currently, no L2, L3 not working automation to accelerate the traffic. And we also identified that the code can choose for multistart admission. And like others, like IP micro space management and other resources synchronizing among multiple sites, also mission. Not only for different VNF in different sites had these issues, also for some VNF, which is designed to be deployed in different sites to achieve higher reliability. For this kind of VNF, the L2, L3 not working also required, for example, for heartbeat synchronization or state application between different components in different sites. So the L2, L3 not working to accelerate the tenant's level of traffic is a common feature liking in OpenStack. And just to mention, for example, for quota control and resource replication between different OpenStack mission. So we, in OPNF, we multi-site started these use cases and identified some feature missing in OpenStack. And started some projects to adjust these issues. For example, for identity service in multi-site scenario, how to achieve high availability VNF across different sites and geostatic disaster recovery and resource synchronization, quota control. We have projects like Kimber, Chessicle, and Multistack to adjust these issues. So next, welcome, Salipriya, for tag. Thank you. Thanks, Joe. So these are some of the gaps and challenges we have identified in multi-site OpenStack for handling the NFV use cases. So the last challenge that needs to be handled in multi-site OpenStack is the end-to-end service orchestration. As an NFV orchestrator, it needs to be able to manage multiple VNFs that are deployed across multiple OpenStack instances and change this VNFs across these sites to provide an end-to-end service orchestration. Now, these service function chaining needs to be done for both east-west traffic, across data centers, and also for the customer-facing traffic, which is the north-south traffic. In order for us to handle this kind of a requirement, the challenge does not end there. Once we deploy the service function chain, we need to modify, monitor, or heal the service chain. In case of VNF failures, the VNF goes down. We need to be able to update the service function chain to move the VNF to a different data center. And based on customer requirements or policies, we will need to modify or update the service chain, like scale the service chain for VNF high availability or any new high availability requirements that come in into the OpenStack multi-site. And finally, it needs to be resilient to the one bandwidth network delay and throughput and modify the service chain accordingly to provide this end-to-end service orchestration. So before we talk about some of the existing projects that are trying to solve the OpenStack multi-site challenges, let me give a quick overview about the TACR project. So TACR project is a NFV orchestration project focused on VNF lifecycle management. It also has the monitoring and management framework for deploying and management of VNFs. We also started working on some of the new features in Metaca, like supporting the TASCA template in VNF catalogs. We have also worked on providing the EPA support for the VNFs with high performance requirements, such as CPU pinning, huge pages, and NUMA awareness. And finally, we support the auto resource creation where the operator can just specify the resources such as flavor, network, and image, and then TACR can automatically go and instantiate these resources for the operator in OpenStack. So now let's zoom into the multi-site WIM feature that we have started to work on in TACR to handle some of these gaps that we discussed, which Joe pointed to. So a quick overview of the feature, it provides a single pane of glass view for the operator to manage multiple OpenStack sites using the WIM management feature. The feature itself is easy to deploy in existing OpenStack installations. The operator can just go ahead and install TACR on one single controller node and be able to talk to multiple OpenStack instances. And TACR itself heavily uses heat and keystone services in the background. So TACR automatically talks to the other OpenStack services such as heat and keystone on the remote sites to perform the resource orchestration functionality. So a bit of deep dive into the multi-site feature itself. So in Liberty, TACR was able to deploy VNFs in a single OpenStack site and it was agnostic to remote sites within the telco infrastructure. So in Mitaka, what we did was we started supporting this multi-site WIM management where operators can deploy VNFs using a single TACR controller in multiple OpenStack sites. So the local WIM just becomes another WIM. The operator can register and use that WIM as an OpenStack instance to deploy VNFs. We also provide the explicit region support. So in case there are available regions that are provided in the OpenStack site, the feature can auto-detect these regions and allow the operator to place VNF on a specified region within that OpenStack site. Of course, in a telco infrastructure, the OpenStack sites may not be running the same versions. So the versions can go all the way back to Kilo and TACR's multi-site WIM feature is able to support each of these releases starting from Kilo. And once the resource requests come into TACR server, it can gracefully downgrade or upgrade this resource request and provision this VNFs on each of these remote OpenStack sites. So here's a bit of a zoom in on the TACR multi-site architecture. Here, we are focused mostly on the NFVO component here, the green block. So here the multi-site feature sits outside of the VNFM component. So TACR can still be used as a standalone VNFM component the operator can consume and there's no dependency on the multi-site feature itself. And this is a pluggable driver framework. So in case the operator wants to bring in a custom WIM into TACR and wants to deploy VNFs on their custom WIMs, we have a pluggable driver framework where we can write our own custom WIM driver and integrate that into the NFVO component and it should be able to talk to this custom WIM in the same way the way it talks to the OpenStack WIM. And the WIMs are shareable across tenants. So as an NFV admin operator, I could go ahead and register a WIM in the WIM dashboard and allow my users to use this WIM to deploy VNFs in their tenants. And finally, we do support the horizon and the command line support. So please check out the multi-site feature in TACR. These are some of the TACR resources we have. I would encourage you to install TACR and try out this feature and any commands or feedbacks are appreciated for this. Thanks. Right, so we'll continue with the second project which is called Kingbird. And it's a new OpenStack project that provides a resource synchronization management for multi-region deployments. So basically it's representing a reference and implementation of some of the use cases we have identified in OpenNFV multi-site. And well, our intention is to provide first an aggregated view on distributed resources. So if I have VMs in region one and VMs in region two, I wanna be able to get an aggregated list of those VMs with one single call. Second is resource synchronization, such as security groups, images, flavors to address the use cases when users wanna boot a VM in region two, let's say, with an image from region one. So how do I address that seamlessly? And last but not least is centralized quota management and that part we already have implemented in Kingbird in our first release and I will expand on this topic. So currently quotas in OpenStack are defined on a per-region basis and further these quotas are also spread across the different services, right? So Nova is responsible for the computer related quota limits, Neutron is responsible for the networking and sender for storage. So when it comes to multi-region deployments, basically it means that you have to go in and set quota limits in each region, in each service and well, that doesn't scale and when tenant reaches the maximum capacity in region one and then simultaneously having still enough capacity in region two, he's not able to reuse this capacity. So thus, there is no process for, well, automatically or dynamically synchronizing the allocated quotas across the regions. And in Kingbird we actually have implemented this centralized quota management function that allows you to dynamically adjust the quota limits and on top of that we also provide, we call it global quota limits across multiple regions so now you don't have to set your quotas in each region and in each service, you can define everything in Kingbird directly and then the automatic adjustment will happen in the background basically. So one of the most important requirements for us was that we shouldn't touch Nova and we shouldn't touch Neutron and sender. So we want to have minimal or zero impact on the existing OpenStack services and that was our main motivation for design. And Kingbird itself has a requirement on an underlying infrastructure that Keystone has to be deployed centrally. But that goes in line with our use cases in OPA and FEMultiSight so we basically take this for granted so that we have centralized authentication system. Right, like I said, minimal or zero impact on existing OpenStack services means that we are basically using existing APIs to dynamically balance quota values to calculate in real time the actual resource usage of tenants and use these values for synchronization. And on top of that, we also implemented the APIs to set and delete and update quota limits, the global quota limits and these APIs basically are similar to the ones Nova has already or Neutron or sender so if you're familiar with the structure of those APIs, there shouldn't be any problem for you to use Kingbird quota management. Yeah, when it comes to architecture, Kingbird is pretty much like any other OpenStack service. The main process here is a Kingbird API that provides APIs for managing global quotas and on-demand quota synchronization. So for example, if you have perform an action and this action requires triggering rebalancing of quotas, you can just invoke this API and it will work. And the second process is Kingbird Engine, which responsible for all the dirty work basically. So for communicating to the OpenStack services in each region, fetching tenant resource usage and dynamically adjusting those values. So like I said, we've had our first release last week. It's a minor release in that release we have the baseline for the project as well as a quota management. So we already have APIs for setting global quotas and also dynamically rebalancing quotas for Nova. In the future, we will proceed with covering the Neutron part and sender and then move on to the use cases related to resource synchronization, images, security groups and so on. So you can follow the progress of the project on Launchpad, you can check out our status, blueprints and you can download the source code from GitHub. And yeah, we invite you to participate if you have any use cases or if you have any feedback, please let us know, it would be good to hear. Hello, let me talk about Chashaka. Chashaka is a new project in OpenStack. Chashaka is OpenStack API gateway. Chashaka can accept all OpenStack API requests. And so Chashaka can work together with already built application based on OpenStack API. For example, Tycho, McNam, Mulano, and COI, SDK, and so on. Chashaka's most important feature is to offer is the L2LC networking automation across different OpenStack instance in one site or in multiple sites. And except the L2LC networking automation, also provide functionality to support, to support data movement from one site to another site and image application and so on. So because Chashaka provide OpenStack API and the manager multiple OpenStack instance work like one interconnected OpenStack. Chashaka will use the L2GateWay to build the cross OpenStack L2L3 networking. L2GateWay is a project on the Neutron and this API extension in Neutron to provide L2 connection. And currently, L2GateWay only supports from the overlay network to the physical network. But a new feature has been delivered in L2 network to support, to connect the overlay network or to provide enough work in different OpenStack instances among L2GateWay so that the overlay network can extend to different OpenStack instance. But a L2GateWay only work in one OpenStack instance so the automation between different OpenStack instances need another layer software to do that. Chashaka can finish this work. Chashaka is OpenStack API gateway. We have a NOVA API gateway, standard API gateway and a Neutron API gateway. We reserve NOVA API gateway for the reason that whenever you provision a new virtual machine, then Chashaka can be aware of a new virtual machine will be provisioned. So we can do the networking at that time automatically when the new virtual machine provisioned in a new bottom OpenStack instance. Chashaka plug-in in Neutron will create regarding NOVA API gateway will create L2 network in regarding OpenStack accordingly. And the line operates the network segment to the Neutron network. So in Neutron network, the network will be considered of different segments. If the tenant has resources in different OpenStack instance, that means they are virtual machine in different OpenStack instance connected to different network segments. For example, in this picture, network 1.1 in the left OpenStack instance and network 1.2 in the right OpenStack instance. And the L2 gateway driver in Chashaka plug-in, when detect that the network has multiple segment and distributed in multiple different OpenStack instance, then L2 gateway driver will start so we'll start a synchronized job. This job is for the networking automation. This job will call the API in different OpenStack instance to do the remote L2 gateway connection configuring and populate the IP and the mic in different L2 gateway. After the population and the remote connection establishment, L2 gateway can understand where the trafficker should be forwarded to for the virtual machine in the same network but distributed in different OpenStack instance in different OpenStack, in different sites. So leveraging the L2 network and the L3 network across the different OpenStack instance, we can move the data from one OpenStack instance to another OpenStack instance. We just only needed to create a virtual machine in different OpenStack instance and the virtual machine with transferring tool installed. This virtual machine attached to same network or slow L3 connection. Then the transferring virtual machine in different OpenStack instance can communicate with each other. After that, we can attach the volume with the data to be transferred to the virtual machine. Then the transferring virtual machine can talk with each other to move the data from one site to another site. And one project called Conveyor is being established to provide the data movement functionality based on Chessacle. So Chessacle is one OpenStack project in OpenStack and we are planning to apply a big project. So please come to contribute and join us. You can find all resources in web from the Launchpad BP and the tools code in GitHub. We just talked about several projects I mean to adjust the multi-site issues and the challenges. All these projects can work together in different scenarios. For example, if we have multiple OpenStack instance deployed in different sites running in multi-legion model, that means no L2L3 networking automation needed across this instance line. We can use tag and a keyboard to orchestrate multiple OpenStack. And if we need the class OpenStack L2L3 networking functionality, even the automation line, we can use Chessacle to manage your multiple interconnected OpenStack instance and work like one OpenStack. Because Chessacle can process and accept all OpenStack API. So tag and the keyboard can seamlessly work together with Chessacle. This is the different scenario for different combination of this software. Okay, thank you. Question and... Regarding this multi-site support, my first question is, last time I checked, looks like there's no standard definition about the site concept in OpenStack. One way to view it looks like it's just, for each site, we will have one keystone installation and across the site, there'll be multiple instances of keystone, right? Different site has different neutral. So how about keystone? So keystone should be isolated within each site, right? For keystone, we'll be using the federated keystone or use the shared keystone. Shared keystone means all of OpenStack service, we will directly talk to this shared keystone. Hello? So one of the use cases we have seen in NFE infrastructure is they want to use these kind of multi-site projects in their existing installations. When you talk about the federated keystone service, that's a day one configuration where you want to bring up your whole OpenStack cloud, configured with a shared identity service. So that does not scale when you already have OpenStack instances deployed. And some of the operators don't want to bring down all of their sites just to implement this shared keystone service to solve the problem here when you want to have a shared keystone service across all these sites. So here the requirement we are trying to address is focused on the existing installations where they can have dedicated keystone services running in their sites. So that means if there's existing installation of keystone and existing keystone installation, it will just keep it. Yes, and you have a shared keystone service that can be configured as well where we can talk to that single keystone service and that can internally process the request on behalf of the other sites that are listening on that keystone service. Okay, thanks. Okay, let me see. So if I wanted in this diagram right here, I see Nova and Ascender API Gateway. Let's say I wanted to extend that and actually distribute heat stacks across multiple OpenStack instances. How would I go about putting in a, do I have to build something called a heat API Gateway or how would that work in Tri-Circle? For Tri-Circle API Gateway, it's a very slim web service just to receive the API and forward the request to the bottom OpenStack. But the API Gateway will catch the routing information so that in later for volume operation or for version operation operation can redirect to the correct bottom OpenStack. Okay, but so does that mean basically these are special cases of API gateways and that if I want to use another subsystem, if I want to distribute calls for another subsystem, is there any special work I have to do in the Tri-Circle? Like heat is an example. Yes, API should be processed by Tri-Circle first because so that the networking automation can be aware of when the version machine is provisioned. Otherwise, the network automation will not be possible. You need to do the networking immediately after the version machine is provisioned. This is a very important Nova API Gateway to know the version machine boot request and the line create regarding network 1.1 in the live OpenStack. Or if the version machine provision in the live OpenStack and create the network 1.2 in another OpenStack and the line ask the Tri-Circle plug-in to connect these two segments into one L2 network. Okay, so what I'm actually thinking may be a different layer than you're thinking. So let's say for a minute, I wanted to basically instantiate a heat stack on one of your lower, what are they called? Bottom, what are your bottom instances, right? The only thing I want to do is I want to do a stack create up at the, and have Tri-Circle direct that to a bottom OpenStack instance. And then within that instance, heat operates autonomously. It does its entire thing autonomously. Yes, there are some slides I'm not sure here. So maybe you have some question about that. And the process is like this, if you create a network first, line only one logical object in the Tri-Circle plug-in. And the line, if you create a version machine attached to the network, line will create a network 1.1 in the guarding OpenStack. Line updates the network 1 with the segment of this part. Line create a new port for the version machine. And create the version machine in the guarding bottom OpenStack with the port IP mic allocated in the Tri-Circle. Yeah, okay. So in my particular case, let's say I wanted you to create a heat stack in bottom OpenStack, your left bottom OpenStack instance. The only thing that's gonna come to Tri-Circle is a stack create. It distributes that and every one of your steps one through six are all gonna go on in that bottom OpenStack. Is that an easy thing to support in Tri-Circle? Okay, so it's pretty straightforward. Okay, okay, all right, thanks. That's exactly, exactly. We wanna be able to do that. We wanna be able to basically just say, execute this heat template out in region A, which actually is a bottom OpenStack instance. Okay, right, right, okay. Okay, all right, thank you. Oh, I had two questions. One of the king bird is the support for king bird done declaratively or programmatically? In other words, in your slide, you showed its integration explicitly with some of the other OpenStack services such as stacker, knower, whatever. So do they have to explicitly program to that interface or the code happens declaratively? No, that was the main requirement, right, to keep the services intact and just use the APIs for Nova, Cinder, and Neutron. So the diagram that I showed basically represents a connection, API connection from king bird to the OpenStack services. Yeah, this one. So we don't require any extra configuration from Neutron or Nova side. Was that your question? Yeah, for example, if a quota exceeds and somebody is trying to create a VM using other services, you'd be automatically honored, right? Yeah, well, we have periodic function that dynamically rebalances quotas, right? So if you have it like the timeout is, let's say, two seconds or five seconds, depending on how dynamic your cloud is, then if you put a new VM, that this will be calculated as a usage value and then we adjust the quota values accordingly. Okay, okay, thank you. The other question was tricycle. If you go to the final slide that you showed during the presentation, and the final slide you showed tricycle almost providing an abstraction for multiple OpenStack instances to tackle, right? You showed Vim1 and Vim2 and then 3, 4, 5. Yes. Providing an abstraction to tackle, right? You're not this one. If you go to the final slide that you presented. Another Java. They're putting code together. Okay, okay. The last one. Yeah, that one. So in this particular case, if the attacker has to configure VNFs in four, five and six, does it have to explicitly be aware of that or is tricycle providing a level of abstraction and attacker is only talking to tricycle gateway so that tricycle can internally talk to internal? Tacker sees tricycle as yet another OpenStack instance. Yeah, just a theory it's a tricycle as another OpenStack. Yeah, so the orchestration beneath tricycle is happening on tricycle's behalf. So basically, Tacker doesn't see the underlying OpenStack. It's just see the tricycle as one OpenStack. Okay, but if it wants to place it, so how does it decide where to place the VNF in which VM, which VM, among four, five and six? So the way, this is something we have to think through thoroughly because the tricycle plugin is, so we want to integrate with tricycle as a plugin just like another WIM driver. Yeah. So if we have the tricycle enabled and so what happens in Tacker is the default WIM driver you can specify it as OpenStack and probably tricycle will become another WIM type which Tacker can talk to and you can provide the tricycle as a WIM type to deploy your VNFs on the sites four, five, six when you want to talk to the three sites below tricycle. So is there an option that is, does the API interface that tricycle provides, does it give an option to say, okay, let me do it, default, I will decide on what WIM to place or does it say, okay, I really give an option, I manage these four WIMs, so I give an option to place which VNF, which VNF you want to place into which WIM. Well, I think, and Joe can correct me if I'm wrong. So each OpenStack like four, five and six is representing the tricycle as an availability zone. Is this on? Okay. The reason I was asking my question is about heat. So let's say I wanted to go on these different sites and I want to go instantiate VNFs on each one of these sites, right? You want to do that as a, what do you call it, an autonomous instance, right? And so the way, one way you could do this in Tacker is basically, Tacker takes, goes from, what is it called, the top level template, the Tosca template, goes down to VNF templates, right? And there'll be a VNF template for each VNF and then you take tricycle and you direct that, you do a stack create with that VNF template and that will create your VNF in a particular site. Yes, but even if you want to explicitly say that I want to put a VM, let's say on site four, site four is represented in tricycle as an availability zone. Yes. So you can even explicitly say that that VM in that availability zone make it happen and tricycle would be. Absolutely, I agree, I agree. But, and that's all even passed up through heat though, right? All that stuff is passed up through heat. Yeah, heat, I think heat resides on the tricycle level so it will basically invoke tricycle APIs for talking to a neutron. No, that's not what I'm thinking. No, I'm thinking that the heat client is at the layer of tricycle. The heat client does a stack create, the actual engine, the heat engines are in sites four, five and six. Uh-huh, okay. So that's your deployment scenario. That's what I'm thinking. Okay, so my next question, sorry. So is it all in Mitaka? That was my, or is it a future roadmap or is it all there? So the three projects are already working on the multi-site use cases. So we saw the project overview of how they're trying to solve the multi-site use cases. So the thing we will be focusing on in Newton is how these three projects can collaborate together to fulfill this picture, what we see here. There are these projects separately solving the multi-site use cases, but it's now important for TACA project to interact with the Kingbird or project like tricycle to provide the end-to-end service orchestration. Okay, that will happen in Newton. Thank you so much. Yeah. Appreciate it, good presentation. Okay. So just a couple of clarifications. This driver that you mentioned for the tricycle, is it MN2? Is the driver MN2? Driver, one. The MN2. There is a plug-in that you mentioned, right? The tricycle? Yes. Your plug-in, is that ML2? Is it, is the driver MN2 point in? So can it coexist with other ML2 drivers? Not a mechanism, Java. We provide a new interface for... So she's asking about the Newton API, the tricycle, the plug-in. Yeah. So can it coexist? Newton API is the same, yeah. Okay, for example, if we have a SRIOE driver, will it be able to handle both? Another question was, since these were two different independent OpenStack instances, and you are trying to extend your ML2 or L3, how are security groups handled? The L2 network between OpenStack sites, how we can handle the security policies? Yeah, because they are independent, so how do you handle when you say extend the L2? Not a good question. So, well, firstly you can do it in Kingbird. Oh, okay. Okay, you are talking about security group, okay. For security group, we deal with it like this. You know that if you're using the remote security group, then all ports attached to this security group should be allowed in some ruler you need to bypass. But in another site, you have no support information. So in Chessicle, we have some trade-offs. So we only provide IP prefix-based security group ruler, but not based on the remote group ID, because when you have the ruler with the security group ID line, you have to look up all the ports beyond to this group ID. And currently, in production cloud, mostly the remote security group ID at a ruler is used for the default security group, so that all machine, added to the same security group can communicate with each other, so the ruler opens the port. So we have some trade-offs because when you deal with the security group in different sites, you have no support information. So we only accept the ruler with IP prefix-based ruler. I think it should be acceptable because the security group control should be based on IP prefix scope, but not based on single IP port. Yeah, otherwise, it will be very difficult to manage so many ports. Yeah. So many IP and ports, yeah. So another question. Sorry, I have a little problem in this, and maybe we can discuss after a meeting, individually. Okay, thank you. Just one quick question. I know I'm very, very close to the lunchtime. Okay, the other thing I wanted to ask you about TACR and TriCycle is the VNF monitoring because the VNF manager is not just for creating or deleting, but it has to do ongoing monitoring for scaling or recovery. So what features we have today in Mitaka for doing that? And particularly in that multi-site case where you are creating the VNFs and services through different projects, who will do the monitoring and how the recovery or scaling can be done? So right now, TACR does support monitoring framework and the way it looks is the L2 networking between the TACR controller node and the site, there is connectivity between the two instances, the TACR node and the open stack site. So TACR could just trigger the monitoring for the VNF on the remote instance in its own VNFM component based on the driver you provide. But in multi-site, there can be a complex monitoring use cases. So we could use silo meter-based alarms and we are looking into it of how we can integrate it as a plugin within TACR to manage monitoring across different sites. So that is something we are looking forward to in the Newton Cycle. Yeah, I know it's not an easy answer, but the other thing is if in future if the customer can deploy any VM, we can't dictate that this is the VNF you need to use. So are we going to define any standards for the VNF so that TACR or tricycle can monitor it? For example, SNMP or there are standards so that you can pick up the messages and monitor. So would that be... So we are having a developer session happening today in TACR for some of these questions with you. I think these are very valid questions and we plan to talk about some of the standards of how we can improvise the monitoring policies and especially for the different types of VNFs that are out there. And since we have integrated the TASCA template in Mitaka, that's something we want to work on to improvise our monitoring for some of the different complex VNFs. So you should be attending the TACR session today. You're welcome to join and provide your inputs there. Thank you. Okay, thank you.