 Let me start. So basically, thanks for everyone coming and listening to the talk. So I'm Yin Zhang. I work for Cisco WebEx as a cloud platform architect. And today, what we are going to talk is really put application, the performance optimization intelligence, actually, into your cloud. And so let's maybe get started. So we believe, actually, that not the cloud has to be the same. And as a private cloud vendor, so I work for Cisco WebEx, so we are not targeting for any public cloud type of setup. But that nature, so having an infrastructure which is flexible enough for us to config into different model, it's really very interesting and quite important for the private cloud type of setup. Because as an end, what we want to enable is our application, is how our application will behave to empower our application. As an to empower our business. So this is actually, you know, then it give us a flexibility to actually fine tune the overall platform. And let's take a look at the agenda today. As first, let's start from understand the characteristics. When I say characteristics, it's really how your guess will behave under their projected workload. And the second, find the bottleneck. After they run all their workload, under, I mean, let's say, their projected workload, and is there any limitation overall to kind of, you know, not let them to go far beyond what they want to achieve. And second, third, third one, tune the system. So fine tune all the system based on all the, I mean, Linux offer the technicals. And that's then integrate the fine tuning technologies into the cloud itself. So this is today's agenda. Let's maybe start from understand the way and our guest virtual machine characteristics. So we believe not every cloud has to behave the same. And when they offer something that's shared resources, we cannot actually predict what kind of application will run here. For example, there might be, could someone please help the technical issue so we suddenly lost the stuff. So yeah, that's OK. OK, don't push. All right, thank you. Got you. So when we offer something as a shared platform, we cannot actually predict what kind of application will come here. And for example, there might be some front end applications. Might be some back end application. Might be some application built on Java. Or might be some application, I say, the native applications. Some application may deal with the voice. Some of them may be deal with the video, right? Let's say Hadoop type of workload is really doing some big data uploading and transforming type of services. And also there is application which is doing some real-time calibration. And at the end, people talk about NFAs. Really some of the, I say, the workload or the task heavily handle the packet versus consuming your resources here. So different applications might have different nature. And that's really true. I put two pictures here. One is voice. So from the traffic perspective, I think about the voice traffic is really very smooth. And I mean, it's very job-sensitive, right? We cannot let our ears to think out, all right, there is a job inside this application. I can think out, it's really bad for any of the voice type of applications. And delay is sensitive. So it's always UDP priorities. And take a look at the data traffic almost, you know, heavily off the application, deal with the data. I mean, data go is very smooth. But sometimes it's really burst, right? So when there are workload coming, so you cannot actually predict how much traffic they are going to handle. They are going to handle. And from the job perspective, so packet job incentive, right? I mean, the TCP could handle that. And delay also incentive. So it's a nature of TCP retrances. So basically, understand the application nature is really very important for a private cloud, especially like Cisco WebEx. And let's take a look at this. These two pictures actually capture the, I mean, the front application perspective. I mean, if you are lucky enough, you always could find some team which is really very happy to do the performance evaluation on top of your platform, which is really eager to learn or eager to find out what's bottlenecked in their world and ready to handle all those performance shortage. And if you find those teams, then probably you are very lucky to have a good partner. So those two pictures shows one of them is a CPU memory consumption. I mean, when the application run any of their workloads here, they must conduct the performance testing. They grab either data on their side and to actually give the training for how application will behave. And the traffic, also, they want to know more about the traffic consumption running on your platform. So after you get here, you probably could find a great partner so that you can do something together with the application team. And if you got such good application partner, then probably it's a little bit of a chance for you to work together with your application partner to get better performance data, let's say, the better performance optimization from the overall system perspective. The top is a command line you can use to grab all the information, how your gas virtual machine is running and how your host machine behave, and all the state command line. The ETH tool is really a tool to help you understand the internet interface data, what's less than a NUMA bus, NUMA information for about your physical new card. And also, LibWord offers some API to help you understand more about your gas VM behavior. And there is a tooling, I mean, all the user space Linux tooling you can use. Perf is really very important tooling is, I mean, Perf itself could give you a holistic view about how gas VM behavior from the user space to the kernel space. And all the event will help you understand more about the application nature. And the NetFlow is another tool. You can kind of understand more about the application traffic pattern. And sometimes it's not just application running here, it's really a whole system interact with each other. So you can, with the NetFlow, you can understand more about how the application interact with each other from the traffic pattern perspective. Now, after you understand all those application characteristics, all the gas VM characteristics, now it turns to the point is you probably, you know, have a way to find out what's a bottleneck in the overall system. So maybe it's the packet loss. And maybe, I mean, there's a hyperbider overhead. And maybe it's low routing throughput. I mean, sometimes maybe it's CPU load, all the IRQ distribution. Obviously, the bigger than the particular gas VM is a VM resource allocation and scheduling problem. But it's not, I mean, remember, sometimes it's not that straightforward to find out the bottleneck, actually. I mean, think about the scenario is your application partner complaint, hi, my gas VM is running very low here. Can you tell me why? Then, I mean, it's turned out to be you, like provider, I'll call the administrator, jumping to the boss. I said, first, the take-off at CPU is pretty normal. And how about the disk I go, disk I, oh, it's not very high. And how about the network performance? That's normal, memory, or not so much about, you know, page faults. So, but it's not like I mentioned, it's not that straightforward to find out what the real bottleneck might be. Think about, I mean, the CPU. If the CPU, I mean, yes, the CPU workload is really normal. But sometimes the gas CPU, they are just waiting for some logical resources, not physical resources. Maybe some log in the iNode, right? Maybe some, let's say, some IPC semaphore to change the value. The gas VM, maybe just wait for that. And think about the disk I, oh, right? So the disk I, oh, yeah, yeah, it's normal, it's not very high. But I mean, every time when the user data be right into the real disk, it has to be go through the kernel buffer. It might be certain chain that the kernel won't allocate a buffer, but it certainly failed, right? And let's take a look at the memory. Yes, there is so much paid for in your system. But when the gas, I mean, they always use their virtual memory space, right? But when the kernel won't allocate another, a certain set of virtual memory, it certainly failed. Application logic might be just a wait. They decide, all right, they detect a failure and they decide, how about I just wait for a couple of minutes and then try again, right? That might be the reason to slow down the overall gas system. And the network, right? Yes, the network traffic is pretty normal. It's not something, I mean, we could kind of like, we need to spend time. But think about your gas VM is the only thing to consume your network bandwidth. There might be another system inside your compute host, consume the network a lot, right? Every time when the packet in and out from your system, it will, I mean, introduce a hardware interact. And the interact will introduce overhead in your overall processing system, which will, I mean, eventually slow down your gas VM. That might be the reason. So remember, it's not that straightforward to find out the root cause. You have to put a lot of techniques there to find out what the real bottleneck might be. After you find the bottleneck, then probably you'll turn out to be the time for you to fine tune your system. Let's maybe start from the hyper-sliding. Yes, so hyper-sliding offers you some capabilities to, let's say, well, increase your overall system throughput, right? But it's really, I mean, the performance gain or the throughput gain is really not that obvious for some CPU incentive applications or CPU incentive job. For example, every of the NFA workload, right? So if your application nature is really, I mean, why we care about the throughput? Then probably it's good for you to turn on that. But if your application is really care about the, I say the job start time to finish time, how much time it will go to spend in order to complete their job, it's not a good idea to turn on the hyper-slide. You must give the application or the gas-VM fully functionality. So that's a hyper-sliding. So another one is CPU affinity and IO-NUMA. So basically, I mean, from the, let's say, if you want offer more capabilities to the gas-VM, then probably the good way to think about is how to give application or how to give gas-VM predictable behavior. That way, where the CPU affinity and IO-NUMA coming jumping to the picture. So basically, if you'll find out there is some applications who is really, I mean, who is really in care about how your gas-VM CPU behave, it's probably a good idea to give, to let all those workholds dedicated running on particular CPUs, right? And the IO-NUMA is, if some of the application, let's say some of the workhold, is really doing some IO incentive stuff, right? It's probably a good idea for you to find out what's exactly physical new card running on any of the new manual. Unless the new manual just stick with those physical new card, you're going to achieve that performance. And the task scheduling. Yes, Phoenix offers some functionalities for you to find to those task scheduling. If you understand how your application are going to behave, if you want to eliminate the contact switch between different applications scheduling the task, you probably want to give the better value for this task scheduling configurations. And IRQ affinity. Yes, that's probably another one is, I mean, by default, the system will turn on the IRQ is kind of like the, let's the interrupt be distributed among different CPU calls, especially under the multi-call system. But remember, if any of your gas VM, if the virtual CPU call, which is running your VM, have certain chains, have certain chains to be interrupted by those software IRQ. So that's probably not good idea for you to let it turn on by default. If you really want to achieve the performance level you want to achieve. So, and also the number of queues and the queue length is really parameter for you to find to your applications if the application really care about the, let's say the network throughput. And the transparent field page. Yeah, it's very interesting. I mean, the Linux system is really one do a lot of things for you actually collecting all those free page and allocate the particular page for you. But, let's say we want to give application a more predictable behavior, and probably not good idea for you to turn on this transparent huge page by default. And I saw in the control group, yeah, something I mean, very commonly be used to kind of give occasion a certain limitation for running their own. This is a fun to your system. Now, after you've gone through the journeys you find out, you know, the application characteristic for your, I mean, for any of the workload. Then you have the good relationship with your application dive office owner. I say, how about we work together to achieve better performance. And after that, you'll find out some bottleneck for them. I mean, some of them may be some tricky bug inside the application itself. Then if you find that, they'll probably give you some additional credit from the cloud provider perspective. So not every cloud provider will help application to find those bugs. Then it turns out to be the point you probably want to integrate with all those technologies inside the application, inside the cloud you are offering today. Itself. So basically how we start? We start from horizon. So we offer some functionalities in horizon is, let's say my application partner won't detect their performance, not only on their side, but also won't detect, is there any way to help them running better in the cloud? Then we kind of like embedded the functionality into the NOAA client, and also NOAA API using the NOAA extension framework and maintain the persistent data into the database. And down to detail, there is a performance evaluation or performance analysis API to gather as a self-tuning API inside the hypervider itself. So this query may give you some idea about how it looks like. So basically there is a drop-down list there. So called start performance analysis. I say I'm an application owner, I'm an application dive office owner. I'm ready to test my performance. I really care how my application will behave under the certain projected workload. And my product manager tells me I need run better. Now, after the colleagues, the performance analysis, what's gonna happen is, all right, they are doing their performance for sure. But there's API there for them to call, I say. Not only me, but also you as a cloud provider. How about you evaluate the system? You come out of some tuning, I mean, suggestions. I mean, some of them I can do that, but some of them only you can do that. So, and the idea is after, I mean, they done some certain period time, they down the overall performance evaluation. I need to pop up some report, like, you know, how the guest VM behave under, I mean, during the performance testing timeframe, right? And there's a tool that's got open source. This tool is called PerfVis. It's really give you a way to visualize all those, I mean, perf in ones. And after you get all those perf in one, you can understand, all right, this is, I mean, I say the KVM exists. How the KVM exists from the guest user mode to the guest kernel mode. And for whatever reason, why they do the contact switch. So basically all those information you can get. And after you get all those informations, it's really back to cloud and the treaters job to come out of fine tuning suggestions inside the overall system to less application running better than any other cloud. Let's imagine any other public cloud when there cannot do certain things for just your application. Some of the time we just worrying about, you know, whether my application will got impact by all those cloud members, right? Yes, I mean, public cloud do offer some flavor for me, but I can, I'm not 100% sure whether the instance I got, the VM spec I got is really just that. It's really just reflect what I really got from the, I mean, during my instance running phase. But with that, with all those tuning technologies, we actually could offer guest VM a way of dynamically tuning all those system, right? As and what we want achieve is really a right balance from, I mean, between investment and the outcome, right? We do have, we just have certain amount of source, but how we enable more to the application, this is really one thing that we can think about as a cloud provider. So the reporting that you think out what's happened for this guest VM. So at the end, we start from a very messy room. So we just put everything together just like public cloud, even public cloud have some way and flavor, all the availability zones to help, I mean, to certain point of time out, to some extent give you some capability to well organize your stuff, but it's still messy. It's still non-guarantee for the, for the resources and people love to be organized. People love to be tidy. I love to live in a tiny room. It's just like after we did some performance evaluation, not only on the VM, guest VM space, but also from the cloud provider perspective, we let things run more organized. Even VM still VM, even there is a VM placement filter inside the OpenStack platform could offer us some level of the well organized cloud. But I mean, I'm thinking this is, this should not be enough. We can actually do more beyond just the VM placement flavor, VM placement filter. So basically from messy room to a tidy room, this is what we are achieving. So yeah, I think that's it for today's call. So is there any questions we can talk? No, so good thing or bad thing, I don't know, but since there is a book there, right? There's book in the first floor, so people maybe. All right, if that's it then thanks, thanks for coming. Thank you.