 Hi, hi everyone. So I think this is, this should be the last session, I mean in this room today. Hopefully you enjoyed the, the did I saw me the first day about this huge event, right? So I think we are about to talk something about QS and it's, you know, it'll be fun stuff. And okay, so let's get started. The term quality of service, right? Refer to the user reservation or guarantee of certain status of the service quality. So running a cloud under the assumption that there should be unlimited resources for application to consume. And the cloud admin, I mean constantly looking for ways to observe the user perceived quality and to do some dynamic tuning about their service level. So both of them should be based on some real time data generated by the cloud itself, right? To address the QS in OpenStack, I think we look much beyond the traditional word QS in network area. And I need resources where there exists the contention, there goes needs for QS. So for example, CPU, disk IO, network IO. So in this talk we will explore the current status of QS in OpenStack. And we will be looking at both what is possible now and what is right happening in each project track. We'll also share what we did in WebEx OpenStack deployment. How we built up the native QS as a service framework inside OpenStack and how we tackle this problem systematically. And so here is a team. We come from China. And so my name is Ian. I'm called Platform Architect Workforce Cisco inside WebEx Group. And this is Yiping, our senior software engineer also workforce Cisco, active neutron developers. And this is Xcela, so technologist of EMC, I mean lab China branch, very active developers around Cinder project. And also, you know, there is a Simon to whom is not here right now. It's a handoff China Cisco WebEx cloud services. So now I think I'm going to hand over to Xcela talk about, you know, some motivations and project status about QS in our role. And greetings to the guests. And thank you for all attending our presentation. I know it is close to the dinner, right? You're hungry, I guess. So why do we need a quality of service? There are quite a lot of different use cases if we are thinking about the enterprise context. For example, there are many applications. The applications are the heart of all the services internal, external customer facing or IT oriented in enterprise. There can be applications such as very critical customer facing. Those websites are really important to the business. There can also be applications which are real time and plan very carefully of their resources in each of their quantum or time frame. There can also be a lot of batch jobs. They are much lower critical. Some are not even have a time deadline. They should run on less important resources or run on whatever resource left on the cloud. So what do we need in such context? The first thing we need is to make sure that those high priority applications always get what they need. Always serving our customers the best. This question becomes even more complex and important on the cloud environment. On the cloud, all the applications share the same underlying infrastructure. They share the resources like the computing host, the storage, the network. Because they share so much things, so we have to work hard to make sure that high priority applications are untouched when some other things go wrong. For example, the virtualization is so much widely used. The virtualization consolidates all kinds of resource and increase the resource utilization. But over committing still have a risk that when a sudden spike workload comes, our application may fail to handle the workloads. Particularly when there are a lot of virtual machines on one physical host. We also have the application owners who are worrying about their noisy neighborhoods. We always hear about the complaining about that. So why do we need a quantitative service? For us to manage all kinds of the cloud resources, such as computing, such as network, such as the storage. And we need the QoS to help us to make sure that those critical applications are always running well, always serving their best. And we also need the QoS to give us the confidence that the cloud can run things well, they are trustable, reliable, and are totally okay to run those mission critical applications. So that's why we need QoS. And thinking about the cloud, we always think about the question, how could we improve the total resource utilization? There are many ways. Given an application, we always want to give only how much resource the application needs. But in fact, we always give them more. Think about how you allocate a virtual machine on the cloud on OpenStack. We need to consider the peak workload. So we always allocate a virtual machine that are much larger than the application always be. This is the normal workload they are changing according to time. But this is the virtual machine size. So there is a gap between those two. The gap creates the potential waste of cloud resource. With a good QoS, we can have a way to control how those applications consume those cloud resources. And if we have more granularities, for example, the granularities of per individual virtual machine QoS, or granularities of per tenant QoS, we can control things better. And if we can also adjust the QoS settings in real time when the virtual machine is up and running, we have the flexibility. The gap can be then lowered down because we can do things well and control it better. So the whole system can be even more powerful if we read the real-time feedback from the cloud. For example, how the cloud is running, how the virtual machine is scheduled, and how the resource is allocated. And we can analyze those feedbacks with some intelligent algorithms, some data mining skills. And then we can always choose the best strategy dynamically when the cloud is up and running to ensure all the status are okay and all the resource are leveraged. So the next vision is an application of our infrastructure. So what is an application-aware infrastructure? Given we have a private cloud, there are, say, a long-time running enterprise application on the cloud. We have a lot of time to see how the application runs to collect the data, both from the infrastructure level and the application level. And we can use all kinds of mining or analytics techniques to find the hidden patterns. Because the private cloud is much different from the public cloud, on the private cloud we usually host the enterprise application. The private cloud can be dedicatedly optimized for those enterprise applications. But the public cloud is facing to the open space. There are too many applications on the cloud to run. The public cloud cannot optimize for the specialty of a private one. So after that, the patterns learned from the long-time running application can be used on the cloud to improve how it performs. And the quality of service here becomes a way for the cloud to interact with its dwelling applications. The cloud and the application can somewhat interact and cooperate each other and makes the hosting environment more optimized. Next page. Thanks. And thinking about the current community status, there are a lot of existing QoS implementations in the current version of OpenStack. Well, thank you for taking the photos. I feel a lot happier. And there are many components that support QoS. For example, Cinder. Cinder supports QoS in front-end and back-end. The front-end QoS makes use of the hypervisor. For example, the QMU to limit how much resource is consumed on the virtual machines. If you track the API workflow, you will see, first, the Nova API will be evoked. Then it enters into the RPC API, going to the message queue and go to the compute, the Nova manager, then to the label world, and eventually the C group. And thanks to the kernel, C group is very powerful. On the Nova part, oh, just forget one thing, the back-end QoS support by Cinder. And how is the back-end part provided by Cinder? The back-end QoS is backed by all kinds of volume of vendors and their drivers. For example, the VMAX. The VMAX provides such as the service level objectives, such as the diamond level, the gold level, the silver level, or bronze one. You can specify different QoS specifications from the command line or the horizon, and the Cinder will interpret those settings to the back-end drivers and the back-end storage. If you search into the code for the keyword QoS or keyword QoS support, you will find a lot of code related to QoS in the Cinder volume of driver folder. And talking about the Nova side, the Nova provides the instance resource quota. By instance resource quota, the user can adjust the QoS settings in the virtual machine flavors to configure, say, how much CPU should be used, how much disk IO throttle, or how much disk space to quotas. And currently, there are some problems about the memory support. These are related to the underlying kernel support. And we also have a Neutron QoS API. The Neutron QoS API is recently supported in LaborVirt version. It works on the poor network and the poor port level. You can send API queries into Neutron and specify the QoS specifications. The network QoS is supported and operated from the network perspective. And also it would be very interesting that we can support if we can adjust or limit the QoS in the aggregated bandwidth level. Next page. So why do we need to implement the QoS framework by ourselves? Well, that's a good question. The first thing we want is to QoS in more granularities. For example, the per individual virtual machine level or as a whole tenant. The tenant QoS can be very useful when we want to map the applications of different priorities to different tenants and limit the resource usage on those tenants. This is quite a common use case in the enterprise condition. Say we have a lot of application owners and they have a lot of different applications. Some are very critical to the business and some are less critical. We can create a lot of different tenants for those owners. And on each tenant we set a different QoS setting. Those granularities of QoS help us a lot. And we also want to be able to adjust the QoS settings in real-time when the virtual machine is already up and running. So we love the flexibility. And we also want to watch the real-time feedback from virtual machine. For example, if we have, say, the CPU usage should be cut down by half, then we can watch the result from a centralized web console to make sure that, oh, it takes it works. And there can be a severe resource competition on certain virtual machine hosts. For example, if a program just went crazy because of a bug, the resource usage will crazily went high. We need a way to cage those crazy resource consumers and to make sure that our high-priority applications are untouched. So here comes the QoS and we need the QoS in real-time. And we can also use QoS in other ways. You can imagine that on the highway, a lot of cars are trying to take more resource. In this condition, the resource is the road itself. If there is no older, all the cars will collapse or collide with each other and the overall performance won't be high. So we need some owners. We need the lanes on the road to make the application cooperate with each other, to make the cars respect with each other. Even though seeing from a certain car, the available resource for it is less, but the overall performance can be improved. It works the same for the QoS condition. With a proper QoS setting, if we use QoS to limit the resource usage of each application and make them cooperate better, the overall performance can be improved. I have seen a test case of the Mexico. If we have, say, my name is Mexico, it's running, and they collide the compete resource severely on the one host. After we have applied QoS, the overall performance would be a little higher. And there are other ways to leverage QoS. We may want to pin the VCPOs to different physical cores to isolate those resource competitors and to gain some performance. When the competitors have been isolated, their cash competitions can be less, and they can cooperate better with others. The overall performance can get a little bit, or get greatly. So to explore how we can address those needs about QoS, we have implemented our own framework, the QoS as a service. So after that, it would be the design and implementation and discussions that invite Leaping to give us a deep dive. Okay, thank you. Here is our functional decomposition. So as Exela said that our QoS should not only be based on VM level, so we think the QoS is more important based on tenant level. So we should coordinate the QoS groups roles that include CPU QoS roles, disk QoS roles, and network QoS roles and group them and assign them with tenant. And this should be done dynamically. And by doing this OpenStack admin also need a web UI to monitoring the real-time resource usage, including the CPU resource usage, disk resource usage, or network resource usage, both in tenant level and the VM level. So we have three roles when we use this feature. So we have tenant admin. These guys only care about the VMs in his tenant. So he need monitoring for him to know the real-time resource usage. And for OpenStack admin, he may not care the VM resource usage. He care more about the tenant level resource usage. So we need to aggregate the resource usage by project for them to know which project is using their resources. And we need to make sure our core applications is using more resources. And there is the panel for OpenStack admin to adjust the QoS roles for the tenants and for the VMs. So there is another guy who is the tenant DevOps engineering. These guys is the application owner. They develop the application and they operate the applications. So we give them permission to call the VM level QoS APIs to control the QoS in their tenant. Because in some case, he may get alert that some of the VM is very abnormal. So he can use these kind of APIs to control the resource usage. So this is an example that you can see the VM running status in the project. So the tenant admin can get the VM resource usage percentage in his tenant from this. And this is a real-time to get. And this is a UI for the OpenStack admin to create the tenant level QoS groups. And he can define the different QoS levels and assign them with the project. Once he assigns the group with the project, it will be applied in all the VMs in this tenant. And also, OpenStack admin can also modify the QoS setting for one specified visual machine. Once you modify the QoS setting of one visual machine, it will overwrite the rules you set in the QoS groups. So the implementation of this is very straightforward, I think. We add extension APIs in Nova and write a QoS manager in the Nova computer. And we have a driver which is using C-groups to control the CPU, disk and network. So do you know any question? Well, so I felt free to ask any question if there is some... So I think the idea is quite straightforward is being at the cloud admin, right? You sort of like you run some road infrastructure. So every VM maybe betrayed as a car. So every VM will have equally prediction to get resources from your road infrastructure. But the idea here is not every car should be betrayed equally, right? There is some car like ambulance, right? When some accident, traffic accident happening, this car need move very fast, faster than any other cars. And here we should produce a way to let them to just tell the cloud admin how I am an ambulance. Please let me run faster than any VM else, right? So I think the idea here is quite straightforward. Also I mean inside Webex, we run some real-time media engine stuff, right? So those kind of stuff is really king of our business. So I think it's fair to say that those kind of VM should get better performance than the web tier. So being at the cloud admin and being at the group who turn it on, we have to produce those technologies to let the tenant admin to do those kind of close feedback loop, those kind of thing. So I think the rest part should be a little boring, right? So like I think Sarah and Riping mentioned, we have some command line interface inside the NOAA client. For example, you can get your tenant QS group with all the tenant, and you can show the QS settings for your QS group, right? And we could create a project-based QS group, and you can get the QS status for a particular project, also update the status. And then this QS group to a tenant also on then, right? This is command line interface. And also, you know, beyond the tenant-based QS, there is a hypervisor-based QS. For example, you can do this such kind of thing per VM inside any of your running hypervisor. For example, you can list all the hypervisor-based QS status, and also you can show one particular VM inside one particular hypervisor with the QS status, right? And also update the settings. Those are, you know, command line interface, and there is some, you know, restful interface as well. So for the, I mean, the horizontal stuff. So if you want to build some portals, those are nice restful API you can consume. The feature-wise is all the same. So, and there is a NOAA debug to help you get the detailed information about the request and response format. Also, you know, hypervisor-based QS settings. And I think, as I mentioned, I mean, being as called a mean or the tenant a mean, there is a desire for them, for a team like that, you know, to get a real-time status about our, I mean, my project, maybe. So we produce some, you know, the diagnostic APIs for the tenant to consume. So this is just trying to close the feedback. First, you'll get your writing status. And second, you let me know you are, you know, you'll be authorized to, you know, hooking those QS API. And the next step, you hook up your QS API. And I think it's called just, you know, give you what you want. So, and I think you're to, along the way, we, you know, really great learning from the OpenStack communities. You know, for example, how to write a NOAA extension and how to add a matter to OpenStack API. So all the API reference and the code, I think the code base is a great way for you to, you know, to learn how to make improvement around those projects. And with that, I think that's all we want to share. Is there any questions around this topic? Sorry, I've been taught that you have to talk with the microphone at first. I'm wondering on the networking QS, how much you have been leveraging the existing neutral QS APIs? Not yet, actually. It's just about to coming. So I think the, there's a lot of stuff we may have to do in the future is right now we just, you know, directly hooking the backend infrastructure we produce to update the QS settings. And moving forward, you know, we have to, you know, have the mechanism to merge all the requests and to have the mechanism to decide which one we have to, you know, commit to the QS backend. And like you mentioned, the neutral QS is just coming. So we still are evaluating, you know, how it can be and how to integrate with this feature into our platform. And thanks for the questions. So, if no more questions, I'm glad to see you are here and thanks for listening. I think that's all for today. Thank you.