 OK, thank you everyone. Thank you for joining our presentation. This presentation is about how to collaborate OpenStike and the NFV orchestrator. And I'm happy to introduce myself. Firstly, this is Xinhui, come from where I'm where. I'm working on the NFV with OpenStike integration. And I'm also a TSC member of the own app community. Now, on my left is Qiming. Qiming, would you like to introduce yourself and Yixing? My name is Qiming Tang. I'm with IBM. I'm the creator of the Sending Project. Hi, I'm Yixing. I'm a software engineer from VMware. I'm also a core contributor to the Heat and Sending Project. Thanks. OK, thank you. Now we can jump to our motivation. Actually, the virtual network function is very popular nowadays. But we see from the very beginning, actually, all the virtual functions need to well manage it. So that's the reason why we have an NFV manual. Actually, that's a working group under the European Standard Institute to give us some reference to the framework about how to manage and orchestrate different components and resources in the cloud environment. And on this graph show, you can see we have three. Actually, on the right side, actually, we can break down to the three components. The first one is the most up one. It's an NFV orchestrator. Actually, this component should be responsible for the onboarding of the new network functions. And it will orchestrate different network services. One network service can consist of different virtual functions. So that's the reason why we have the orchestrator. We have the up layer orchestrator to schedule and manage the allies. And we have the different orchestration things need to be handled by the orchestrator to access and control the infrastructure resources. And the second one will be the VNIF manager. This component actually working under the NFV orchestrator. It will use it to manage the lifecycle of the VNIF. As I just mentioned, actually one network service that consists of different virtual functions. So here's the VNIF I'm supposed to help the orchestrator to do the network function-level orchestration things, such as manage and lifecycle management and configuration and even including the KPI monitoring things. And then the last one, but also very important one is the virtualized infrastructure manager. That's VIM, actually. That's where we communicate with different backend cloud provider or infrastructure provider. That can be different kind of things such as we can be VMware and we can Microsoft or any other OpenStack versions. So here actually is where the OpenStack is used to cooperate with the whole NFV world. That means OpenStack actually is an industry realistic standard for us to manage and operate different resources. So it's very normal for us today to adopt OpenStack in the NFV world to work on some standard life things to communicate with different infrastructure. So that's where, you know, how the OpenStack is bringing into this NFV world. Now let us to give a closer look about the VNIF I'm because you can see it is a bridge between the NFV orchestrator and the underlying infrastructure. So actually as a key component working under the NFV orchestrator, we need to handle several things about how to do the life cycle management of different VNIFs. That's about how to create a provisioning, terminate all these different virtual functions. And we need to consider how to upgrade and downgrade all these things and to handle the elasticity and health management. Everything are very, very important because actually the virtual function is the one provides the business value of the NFV. But the VNIF I'm is the one to maintain the health and manage all the virtual functions but to be allowed any inference with the internal logic of the virtual function. So that's where, you know, we see this component is very important. And in the practice actually we always see we should do the model driven and the policy driven things. We are not model guys. We just, you know, IT guys actually here but I want to give a more, you know, complete loop for us how to use everything. Actually in practice, we always use the talks called Young or some modeling things to describe all the network service and the orchestrator will pass this, you know, the descriptions into the more, you know, realistic template as the input of the VNIF I'm and the VNIF I'm will talk and communicate with the infrastructure layer to do the real, real work about how to create and manage all the resources. And we can see there are different models. That's not our focus. But if the VNIF I'm directly talk with OpenStack, Nova, Neutron or every, you know, Schindler things that will be very trivial because we need different, you know, client and, you know, separate processing to the different resources. That's easy cause errors and not convenient. That's lack of straight mapping with the model of the NFV. Let us give a quick view about what's, you know, resource description in the model world. Actually that's a VDU, that's a virtual deployment unit provided by the Tosca NFV. Again, we are not modeling guys. I just trying to give a, you know, example about what it will be look like about the, you know, resource description in the NFV world. You can see it's not separate at least, right? That's for each unit actually we have the compute, we have the network, we have the, you know, the storage and the dependencies, the relationships, all these things. So we need to consider how to, you know, give a tight collaboration, better support from OpenStack world to help the orchestrator and the VLFM to do their job. Let us come back to this page. Actually that's where, you know, Sunlin can provide. At the born time, actually Sunlin is a clustering service. That means it can provide a very fine granularity operations to operate different results. And here you can create and delete and resize, reboot. All these kind of things to do a cluster or group of VM. And actually we can provide very rich operational support such as elasticity, auto scaling, auto heating, you know, load balancing, all these things. And the least, but not less important is we provide abstract, where it is a consolidated model profile for us to create and operate a resource. That's what we call the VDU profile. That's allow us to directly, you know, communicate with all the backend resource service that's including NOAA, Neutral and Cinder. That's more easier to control. And we can handle all the clustering management things. And here we also provide infrastructure-level model and policy that allows users conveniently to control and customize how to do the health check, how to do the auto scaling, you know, all these things. So I would like, now I would like to transfer my handling to the Qi Ming, the PTI of Sunlin to give more introduction about how Sunlin can fit into this, you know, this space. Thank you. Thank you. Thanks, Xinhui, for the problem statement. Next, I will give you a quick introduction of the Sunlin project. The Sunlin service was created to be the clustering service on OpenStack. It was created two years ago. It's pretty stable, pretty robust today. It was designed to help you manage resources or objects exposed by other OpenStack services collectively. On this page, I'm showing you a very high-level architecture of the service. As you can see in the middle of this diagram, we have multi-engine support so that you can handle large-scale clusters. On the left-hand side, you can see we support QuantCommonline interface as OpenStack client plugins. And we have Horizon plugins as the web UI. And there are some other developers contributed Java bindings as well so that you can interact with the Sunlin service using Java or Python. When managing resources or objects on OpenStack, we thought to ourselves, what are the objects we are gonna manage? The first idea we came up with is heat stack. So as many of you may know, heat is the orchestration service on OpenStack. It already provides a lot of resource-type abstractions so that you can interact with those services easily. So we don't want to reinvent a wheel. So the first thing we designed is to manage collections of heat stacks. So that's the first profile we implemented. But later on when we applied the service in our production environment, we felt that maybe, okay, we need to operate Nova servers directly. Now sometimes we want to look deeper into the Nova server profile to find out its properties. That becomes a constraint on how you write your heat template. It is not a reasonable constraint to users. It impacts the service usability. So we come up with Nova server profile also. The Sunlin engine talks to OpenStack services through the OpenStack SDK, which is also maturing. We don't rely on blah, blah, this client, that client. All those client libraries, all those projects evolve very quickly. There are many dependency and many compatibility issues. So we decided, okay, OpenStack SDK is the right way to go. So that was the design. And later on, we also got some requirements to manage a cluster of containers on OpenStack. So that was done as well in an experimental status. We talked to Docker Pi directly so that we can create and manage container clusters as well. The other area where we spend a lot of energy is about policies. Most of the time, we want a cluster management service to be a little bit smarter. So on this page, the right-hand side is not so complete a list of the policy types we have implemented. For example, with the scaling policy, you can specify how you want your clusters to be scaled when something interesting happened inside or outside your VMs or your nodes. And the health policy is targeting the autohealing use cases, which is a very important requirement in the NFE world. And deletion policy helps you decide when you want to scale in your cluster, which nodes you want to delete first, the oldest one, maybe the youngest one, maybe the ones with the oldest profile, we allow you to make that choice easily. And we also support some other placement policies and affinity policies. These policies are checked and enforced when you are creating new nodes into the cluster. Basically, these policies can help you decide where the new nodes should be located in a new availability zone, in another region. We support that. We also have a load balancing policy that makes your use of load balancing much easier. You don't have to know how the load balancing is implemented. You just specify the properties you want to enforce. Sydney, one of the interesting part of the design is it provides a lot of primitives, a lot of operations so that you can interact, you can operate your clusters easily. This is different from, for example, HEAT. HEAT is designed to provide a lot of nodes, a lot of concepts, but the operation side, you can only create, delete, update. But sending is designed from a different perspective. We provide a lot of primitives so that you can manage the membership of your cluster. It can manage its scale manually. For example, we provide scale in and scale out operations. We also provide a resize operation. With that operation, there are many options, many arguments for you to customize how you want your cluster to be resized. And speaking of policy management, these are all designed for some special use cases. I have to mention that on the last page, I showed you quite some built-in policy tabs. All those policy tabs can be used individually and in any combination. For example, you may want to attach a scaling policy, but you don't want load balancing. That's fine, you can do that. You only want to customize how your cluster will be scaled in, okay, just attach a deletion policy. And you can use all those policies in combination in any way. On this page, you can see a sample command how you create a cluster. You specify the desired capacity, the minimum size, the maximum size, and the profile you want to use. So my personal experience is that with this service, I'm no longer using the Nova command line. Even if I'm creating only one Nova server, I would like to create it as a cluster because maybe tomorrow, maybe this afternoon, I want to scale it. I want to replicate it. I want to create two instances. That is how a cluster really helps you. Most of the time, I don't believe we are operating a cloud by touching each and every Nova server each time or just a floor loop in shell script or something like that. There are many ways to use sending service. On this page, I'm showing you two different ways. The left-hand side is you'll create sending policy type specifications. It's a pretty simple YAML file. I'm showing you an example of affinity policy. You can attach this policy to your cluster, detach it from your cluster. When the policy is attached to a cluster, you can dynamically enable or disable it. These are all doable. If you don't want to do all these things in sending way, you can do it in the heat way. That means we have already implemented all the sending concept clusters, nodes, policies, receivers, all those things as heat resource types. You can write a heat template and inside that template, you can specify sending resources as heat resources. On the right-hand side, you can see an example how you will specify write a health policy in heat template. With that, I'm handing over to Ethan to see how we are doing this in the NLV user scenario to make a video work. Thanks. Hi, I'm Ethan. Yeah, I'm going to introduce the video profile in the sending. As you may know or not, there are many profile types in sending. One of the most common use is the NOAA server profile. We can use this profile to create a cluster of virtual machines and then use the sending to manage the lifecycle and the health of this cluster. So I think when we came to the NLV world, I think we can take the advantage of these sending features to manage or create a cluster of video because a video is actually a virtual machine that has many ports connected to each other and running some network services in it. So I try to use the NOAA profile to create a cluster of videos and it failed because the current design of the NOAA server profile cannot meet the requirement of the NLV world. So basically there are some problems in it. The first thing is that the NOAA server profile cannot take the advantage of the network features. For example, if you want to assign some security groups to a specific port, you cannot do it in the NOAA server profile. And if you want to create a floating IP and associate it to a specific port, you cannot do it in the NOAA server profile. And also there is an important part in the video is that they use the user data script to initialize the neural functions in the virtual machine. So this script will take some variables when we want to run it because there are variables maybe different according to the cloud environment. So we need to solve all these problems to take the advantage of sending to manage the video class. So that's why we create a video profile here. The right part of this slide is actually a piece of heat template YAML to create a video profile in sending. And in the networks part, we create a network port and assign it to security groups to it and also allocate a floating IP to this port. And in the user data part, we define a variable in it. It's the DNS management IP so that we can change it when we create a cluster from this profile. So that's why we invent the video profile. And actually it is extended and enriched of the OS normal profile. And now it is as a plugin in the sending code. So you need to install it manually and then restart the sending engine. But I plan to do it in a pack to put all those features into the OS normal profile so that you don't need to install it manually in pack release. And like I said, there is an important part is the user data script in the video. So let's see how we interact with the user data and the sending cluster. Because when you create a profile, you cannot modify it. So if you want to pass some variables to the profile, we need some framework to replace those variables in the profile. So here I use a Jinja2 library to do this job. And there are two kinds of variables in the user data part. And one is the internal variable that you can, internal variable is some attributes that when you create a video, so you can retrieve some like fixed IP or floating IP from this internal variable. So you can directly use it in your user data script. And another variable is the external variable. Like here, I define a DNS zone in the user data script and so that we can use, we can create a cluster and passing and changing these variables using the metadata. There is a config key in the user data, in the metadata. So we put all those variables in there when we create a cluster and all those variables will be put into the user data script so that we don't need to define different kinds of profile. One profile can adapt to the different cloud environment. So that's how a video work in the, in sending. And later, Xinhui will give you more details about how to create a cluster of video profile. Okay. Thank you, Yixing. Now I would like to, sorry, actually now I would like to conclude this presentation with a demo. Here actually we use Clearwater as an example actually to show how the sending capability to be used for the, you know, when I am and, you know, provide the video such kind of capabilities. Actually, we will save time to explain what the Clearwater is and what's the detail for each component. Actually it is, you know, open source VMS type VNF. So now we just choose three of the components over there as an example to show how the capability of the sending to use for this, you know, scenario. And the first one is Alice. Alice actually is a component to actually provide the, you know, a sample portal where you can registry your own password and user information. So we will attach a health policy with this component to do the reboot why we do not want it to down because we want to, you know, health running a long run. And the second component will be the RAV. The RAV actually expose, you know, the basic HTTP API for both Bono and, you know, SP route the component to use to report their event for billing purpose. And then after collect all these events, RAV will, you know, forward all these things into the backend CDF for the charge purpose. Here actually the RAV contains the memory catch and always, you know, writing logs. So here we want to, you know, keep it, you know, always have enough disk space for the, you know, log uses. So we will attach a health policy with this component to do, you know, resize work. And then the third one is SP out. Actually that's a component to do provide the writing proxy functions. It is designed to be horizontally, you know, scalable. So we hear attached auto scaling policy with this component to keep it elastic. If the workload is high, we can scale out it. If the workload is down, we can scale it. So that's what we want to do. And then I will, before the, you know, real video comes, I will show the templates we used to do this demo. The first one is the video. Actually, there were so many components I cannot show all of them. Here I just use Alice as an example. You can see just as Yisi and Qi Ming introduced, we can include every resource description here in one profile and you can define the flavor, you know, the resource and the image belonging to the glance and the Noah side. And we also can define the network and use the data. You can define what kind of network and the security group and the floating IP. That is directly mapped to the VDU model of the, you know, NFW world. And this is the scaling and the reboot policy. Here I only want to explain a little bit about the event to trigger the auto scaling. Here we just use a cluster scale out to trigger, you know, the scale in, scale out actions. And we want to use the recover policy to, you know, registry the cluster to the Sunlin engine. And the Sunlin will pull in, you know, the status of the registered members. And if anything wrong, that means any node failed or stopped, then we will do the reboot recovery for this cluster. And then we jump to the recess health policy. I will just escape into the define of the recover action. Here actually we use the miss tool as a recover workflow as a backend to do the recess work because, you know, recess is actually is a process instead of just a single call of the rest API. And the list is the Sunlin receivers. As Ximing just mentioned, we provide very important abstract to receive the events and, you know, calls from the third party monitors or event triggers. Here we have two kinds of retriggers. The first one is the scale out. The second one is the resize. That's separately mapped to the scale out action and the recur action. Both the type, you know, here we use is a webhook, but actually Sunlin is not only limited to webhook. They can support that car events or something like that. So that's easy to extend based on Sunlin framework. How, you know, to trigger all everything, actually we use the VMware VR Ops monitors. Here I just, you know, VR Ops is a very reputed enterprise monitor. So here we use it to emit the health risk and efficiency about the metrics, events and alerts. And the VR Ops law, you know, adapters frameworks. That means a lot of different data sources to consolidate their data into one, you know, data store. That's very important. That means we can collect the different resources, metrics, including CPU, memory, such as the network, such as the part intension things and, you know, storage things that combines them into one, you know, a bunch of data and trigger alerts. That's very important for the infrastructure level management. Actually, this is a loop, is a graph to show the loop. Actually, here we use a heat template to contain the VR Ops plug-in and the ceiling plug-in. Actually, as we already introduced in heat, actually all the ceiling resources have been done. And we also, in the past cycle, you know, do the plug-in to define all the resources needed by the VR Ops. That's a low heat to generate the VR Ops, you know, related things. And then we will use the heat template to create a ceiling cluster. And the ceiling cluster includes the mistral plug-in. That means not only the simple reboot, rebuild and migrate always, you know, simple NOAA and the heat can do such kind of REST API so we can support the mistral workflow as a backend for the recover purpose. Once the VR Ops, you know, collect any failure information, the alert will be triggered and the notice will be sent to the ceiling, actually by the web hook we just mentioned. And then the ceiling engine will do the recover predefined by the health policy. That's the whole loop actually we want to show here. And the following slides actually give the details about the VR Ops alert resource. Here, this page actually show how to do the recess. Firstly, you know, the right side actually is a symptom used to define when to trigger a resize. Here we use a disk space. If the left space of the VM is less than five gigabytes, we will, two gigabytes, sorry, we will trigger, you know, some alert. And the alert actually is concretely defined by the alert definition here. We need to give the symptom to the alert definition to let the VR Ops generate the proper alerts. And actually here is two things very important to connect the VR Ops world with the OpenStack world. The downside actually is about notification. Here we have two very important inputs. One is what kind of, you know, right-side API or the target we need to notify. Here we will input it as a web hook we just generated by Sunlin receiver. The other one is what kind of condition we need to trigger as notified functions. That should be, you know, the alert definition I just showed. And the app layer actually is about a custom group. Actually that's a very interesting thing because the VR Ops is an enterprise monitor. We need to let it to know what kind of objects is really related to the OpenStack world and what kind of target of the VR Ops we need to monitor and generate alerts. So here we use a custom group to do the bridge. That means we use a filter to, you know, filter out the real targets we need to monitor. And here I will skip very quickly about the alert part about the scaling. Actually similar, we use a CPU utilization if, you know, greater than some bar, then we will trigger the alert. And then, you know, again, similarly, we need to use a custom group to connect which bounds of the VMs we need to monitor and, you know, do the notification things. And we need to connect the alert with the underlying Sunlin receiver to close the, you know, notification loop. And actually, this is maybe very similar because, you know, actually that's the flow of the resize that is used for the recover purpose because if the disk space is less than some bar, we need to do the resize. That means maybe escalate the, you know, enlarge the VM disk to some, you know, bigger flavor. Here that's the flow because, you know, often not only the resize option is this single option should be do but actually we need to firstly check if, you know, original flavor is right and we need to check if enough capacity can be used on the target cluster to do the resize and we need to do the real resize. Yeah, so I will show the demo very quickly. Yes. Here a little bit of small story about that. That's just the, you know, use a stack create to, you know, use a template I just showed to create a stack and then the template will create the clear water instance that's Alice and of course the DNS should be created firstly and then go to the Alice and all these things you can see, it created in the dependency. After this, we can show that's the policy we attached as I just showed here. That's a health policy for the reboot function. That's a workflow used to resize the, you know, the rough node. That's a silly receiver list. That's the reboot and the scale out actions we use to react to the scale out and recover event. That has a skip list very quickly. Yeah, this is actually the vSphere, you know, you are actually here. I jump to skip very quickly about different kind of resources related to the VR ops. You can see, you know, different notifications and all the notification under alerts have been created. That's the horizon, sending dashboard to show what kind of receivers we used. Now we simulate, you know, stop, just stop the Alice and to see if, you know, the node can be reboot properly. Actually, I will jump to this very quickly because we already run out of time. Here just to show, okay, after reboot it can work very well and the user can still, you know, after stop this node actually, user can still log into the node and registry a new user. Here you can see everything. Even I simulate the node by stop it, everything will recover to active very quickly. Yeah, and then we simulate the scale out by jumping into the spree out to increase CPU utilization here. And then you can say, okay, the VR ops already detect the scaling notification should be triggered. Let us to jump into the horizon again to see if any new node is increased to the group of the spree out because this node is designed to be, you know, work horizontally scalable. Okay, now lastly, let us jump into the rough node to use DD to simulate the disk space is full. Let us just escape very quickly. Yeah, that's our simulation. And you can see after two cycles, the resize notification is triggered. After the alert is generated and the draft is triggered to do the flavor, you know, change it to resize a bigger one. So the original is a clear water flavor. Now it had become into the small. After the resize, we change the flavor into small. So that's our actually very quick demo. Thanks for your attention. And we now open for the questions and you can use the two mics or because we already run out of time, you can, you know, communicate with us offline. Thank you very much.