 Hi sorry for that delay technical problems in a tech conference who would believe it So without further ado, we now have Stefan Finuka who would talk to us about Enabling real-time computing in OpenStack Thank you afternoon So yeah talking about real-time presentation and we're not running out of time. So I'm gonna speak to this a little bit Stop me if I happen to there's a laser pointer at the end of this or stop me if I happen to point the laser with somebody's eyes by mistake So a little bit of background This is a pretty broad presentation that covers a good few technologies So I'm going to give a quick run through of some of the technologies I'm going to be talking about here first and then I'll move into like a deeper dive on This functionality how this is implemented and how it works So in case you haven't heard of it OpenStack is a big deal It's an open source cloud computing system Made up of a load of little project a load of individual projects that focus on various aspects of a functioning cloud So you have services for Networking you have services for compute you have services to block storage and so forth The thing I'm going to be talking about here today is Nova, which is the compute service I'm an OpenStack Nova core developer basically one of the maintainers and Most of my focus on OpenStack Nova is on enabling NFE functionality So high performance compute huge pages CPU pinning that kind of thing Which I'm going to go into here OpenStack From a high level perspective it tries to be as much of a cloud as possible So you should realistically never have to know anything about the underlying hardware capabilities of the cloud that you're running on You're able to create servers. You're able to create networks images attach things together It's all nice and kind of like abstract and cloudy And that's that's what 90% of people are looking for But the problem with this kind of cloud abstractions and that is when you talk get down into the nitty-gritty of the the kind of features that telco users or Scientific users are looking for those cloud abstractions tend to hamper your your performance and That's not something that you want. It basically means OpenStack isn't useful for your use cases so we've been working on kind of Fixing a few of these issues for I'd say since before the June over lease, which was three or four years ago So we've implemented a couple of a couple of features during that time to kind of close this performance gap with like normal bare metal so things like Allowing you to attach SRIOV devices to instances allowing you to pin the CPUs of your instances to host scores huge pages Like representing your your hop or your guests apology and like a numeric aware fashion and Then the last two releases we've worked on On two particular features one of which I'm going to talk today the real-time policy and then another one Which I'm not really going to focus on which is emulator first policy so for Anyone that isn't aware what real-time is real-time means that you're if you have a real-time application And you're you're guaranteed that that will meet certain timing constraints So if you're delivering messages over a network that those messages will be delivered at a certain amount of time something will happen Like I'm a given repetition. It doesn't mean that it's fast or anything It just means that it's consistent and this is important from again the telco perspective because if you're talking about Sending voice packets or something over your network. You need to make sure that they actually get where they're supposed to be getting So there's a couple of prerequisites for what I'm going to be talking about here The first of these is that you you need suitable hardware. This is a bit of a This is a bit of a tricky area because if you go on to like the real-time Linux kernel wikis and stuff they'll list hardware and they'll say well We tested on this hardware and the performance wasn't that great and it was better on this particular chip than this chip and They very rarely give a reason for why it's better. There's just something in the underlying architecture of those chips That makes it better for real-time use cases All the testing that I did on here was on a standard like Intel Z on base Micro and micro semi server or something. Yeah, I got Obviously, you're gonna need open-stack piker newer because as mentioned before This functionality wasn't available before pike You're gonna need a recent enough version of livevert because this all hooks down into livevert Deep down and then you're gonna need a kernel that has the real-time pre-empt patches applied So sent us packages this you can get real versions of this. I don't know about a bunch of But I did all my testing for this on the center server for this job from the host Configuration the hardware level When your benchmark in a system these same kind of things come up you want to disable kind of like Fancy features of power management Things like turbo boosting that you want your your you don't want any magic going on in your hardware You don't want it for us in the CPU because it decided that oh, you know You we're gonna save you a little bit of power. You want it running a hundred percent a hundred percent of the time So basically disable hyper frame power management turbo use that kind of thing And then from the software perspective the real-time kernel that I mentioned before the real-time KVM module and There's a utility called tuned which you configure like your grub kernel boot parameters for you tuned comes with profiles for for real-time that you can install through a Repositories and that will just pass through those are those kernel boot time of boot parameters So instead of the kernels and started installing the tune profiles Configuring those tune profiles enabling huge pages and then this last option This last option is a Nova specific thing Which lets you determine What cores on your system Nova's allowed to touch When you're talking about real-time Linux You should usually leave a couple of cores for non preemptible processes So the the preempt patches make most things in the Linux kernel But not everything So the the general best practice is to get leave a couple of cores that the schedule looking keep for itself So yeah, basically isolating a couple of cores from the from Nova The guess what you're actually going to be running from your application perspective is entirely dependent on What you're trying to do for the sake of testing? I just went and use the exact same configuration for my guest as I had on my host that meant that I was using the real-time kernel again and The tuned profile only this time I went for the guest profile, which again is a bit like it's packaged and available If you already have a real-time application whether that's something that you've got from like you bought in Application from Ericsson or something a VNF that kind of thing thing you should obviously use that Again install all those stuff Enable the profile enable huge pages And then we actually finally get into the needed other the opensec stuff Opensec Nova has two ways of configuring a guest The first of those is via image property So when you create an image you can set certain properties saying I need this specific CPU feature for this image to work The other way is what they call flavors. So flavors dictate how much memory your your VM is going to have How much CPU is going to have? It also allows you to use a thing called extra specs, which are like scheduler hints So they'll say I want CPU pinning. I want real-time. I want real-time to use these cores this kind of thing There's three things that we're going to want to configure here. The first of these is CPU policy, which dictates Whether your guest cores are pinned to the host on that We're going to want them pinned The second of these is whether you're enabling real-time because it's a real-time demo Obviously, you're going to be enabling real-time and then the last of these is huge pages Which you usually want from a performance standpoint You've also got the other two things, but again, I'm not going to talk about these today So going back to that open stack The command line client will create some of flavor. We've got four CPUs four gigs around 20 gigs of disk space and We're calling we're given in a name RT1 small and Then we're going to set these extra specs, which again I like scheduler hints. So there's four of these here The first of these the CPU policy is the one that determines whether your guest cores are pinned or not Dedicated means that we want our guest cores to be pinned to the host cores So it'll be a one-to-one mapping between guest core processes and your host cores The second of those real-time we want to enable the real-time and We also want to Tell Nova that some of those cores Shouldn't be marked as real-time cores Again, because we're using the Linux kernel for our guest here and we want some of those cores to be non real-time and for scheduler processes and Then the last one is wonky huge pages And then we create a server with this flavor and with some sample image that doesn't really matter so the interesting Thing about how this works. This only works Nova supports a couple of couple of hypervisor drivers This only works with Libvert KVM. So if you're using Zen or something, sorry, this you're out of luck So What's interesting to look at here is the Libra XML that Nova will generate for you and what that actually how that reflects Like beat down what kind of calls and stuff is making to the host So the first of these is the the CPU tune element This will be if you do like dump XML for your Libvert domain, you'll get a big XML blob With all the information about your your guest in it. If we look at just the CPU tune one CPU tune It's an element that specifies which of your host physical CPUs that are maybe CPU would be able to do So that CPU pinning that I was talking about this determines that for you So there's two important attributes to note here The first of these is the vCPU pin one and the vCPU scheduler one vCPU pin How that works under the hood it calls and Linux Function, which is set affinity if you look At your your QMU process and you look at the threads that QMU has spawned It has an individual thread for every single one of your guest your guest cause so each vCPU is basically a host process a KVM process and By Configuring this property what we're saying is that the the first vCPU in the instance we want it pinned to Core to on the host we want the first vCPU then to be pinned to the third core on the host and so forth and It called liver calls the set affinity function on that and if we go and we look using the tacit command at The pinning information for these cores. We see that in fact they are actually Affine to host cores and nothing else if assuming you have your host configured correctly Nothing else should be running on these which means that you're getting essentially 100% of your performance For each of your guest cores The other one then is the vCPU scheduler attribute So this is again another optional element and what this determines is the scheduler type that your CPU threads are going to be using Under normal circumstances this will just use the standard scheduling process that whatever your host is using Naturally, we don't want that what we want is we want VFO or one of the real-time Priorities to be applied How that works under the hood is again another kernel function this time the previous one was set affinity this is set scheduler and As the name suggests You pass in a process ID And you tell it what scheduler policy you want to be applying to that process ID and Various magic will happen and that will get applied and for this to validate this we have a CHRT application you pass in a Process ID to this and it'll give you the scheduling policy That's in order and the scheduling priority the priority is a value from 0 to 99 and that determines If you've got two things in their conflict in which of those will take priority over the other process The interesting thing to note here the first two calls that we had We're using like I said we had told we'd mass those because we didn't want every core in our guests to be running With a real-time policy because the kernel isn't necessarily guaranteed to work that way So they're using the standard other Policy for this and then the other two cores, which is where if we were running an application This is the cause the guest cause that we actually run our application on They are scheduled with the the FIFO policy We also talked about a huge pages as far as that's implemented. There's a memory back in attribute You say what page we configure what page size we wanted We and then there's some other stuff in there. It doesn't really matter Nova does all of this for you And if we go and we look at again the QMU process we get the process ID for that and we look through the through the huge page mapping for each of those processes We'll see that it is actually indeed mapped to the QMU process as we'd expect if we Because we created the server if we go and log into that server then You can use an application called cyclic test, which will evaluate the real-time readiness of your system That's you that will give you a measurement of your latencies From what I've been told you want you want to be looking for something in the below 20 microseconds, I think Of latency on average anything below 20. You're usually pretty good You wouldn't expect this to ever hit zero for using the real-time 3M patches in the kernel You might if you had your own real-time application to be an F or something from a telco But for the kernel patches, you're not going to see that kind of thing So yeah a wrap-up from a usability perspective today if you go and you saw open stack You configure your hosts correctly and you're using a recent release You create your flavor you set your various flavor attributes and then you boot your instance and you now have Real-time compute within your cloud wherever that may be. So yeah, that's keeping it real-time. Thanks for listening If anyone has any questions shout Can you say that again? so If I heard you correctly you're saying if you boot multiple instances and they're pinned they'll use the same host They'll be pins the same host CPUs Do we do anything to prevent that from happening? We use Yes, so far no of a perspective The recommendation that we have is if you have pinned instances and non pinned instances The non pinned instances As a way of the pinning information and just stomp all over your the cores They'll float across cores and they'll use anything that they can The recommendation we have for that is to use a normal feature called host aggregates Where you divide you eventually essentially divide your cloud into multiple parts and you're saying this part of my cloud ID servers are allowed to run real-time Processes or pin processes these ones are allowed to use non pinned processes And that just stops them from overlapping Far if you have two pin processes noble will make sure that the two servers won't overlap There'll be no overlap. So you'll give four cores or whatever to one server And then if you give four to another guest There'll be four different cores and if it's not able to do that it'll fail to schedule So I'll make sure that we don't overlap that doesn't happen So the question was One of the drawbacks if you have CPU Pinning already. What are the drawbacks of adding real time to that? So the I know that there are performance impacts Of using real time within the kernel real time I think I said that at the beginning real time doesn't necessarily mean faster. It just means it's more deterministic So the general recommendation will be you use real time if your application requires real time Knowing that it is going to require more resources or you're going to get less throughputs If you don't require that real time You shouldn't yeah, you shouldn't need it because you're better off use increasing your resource utilization possible. Yes So the question was in the benchmarks that I stress I stress the guest Did I also stress the host and other guests? Did you say? so as part Part of the deployment process for this the expectation is that you would stress the host first to make sure that the host is actually configured So there there's a real time evaluation tool RTE valve which you're supposed to use to make sure that it is configured correctly And then you use something like cyclic test to make sure that your latency is is what you'd expect Yeah, so that was done before anything else was deployed on top just to make sure that the host kernel was configured I did not know I didn't know there were several processes that would That would actually be a nice test to do Because theoretically There should be no impact Theoretically yeah, but I imagine there probably would be some impact You'd be trying to keep us as much logic even the way that I deployed this I used This still isn't supported in triple O like Red Hat's deployment tool So I was using DevStack for this and it was an all-in-one Deployment so I had like neutron and stuff running on that that compute node. I shouldn't have had any of that there They're all isolated, but I realistically I should only have had the compute service running on that and everything else should be on a different machine It would be a bit. That's actually something. I'll probably go and do when I get home. I I did a question. Did I try running two things? I did but I didn't benchmark it It was only just to make sure that it didn't work and there was no funky stuff going on So the question was if the VM is pinned So one CPU is it possible to live migrate to different hosts? No you sort of This is a long-standing bug with Nova We don't provide as part of a live so live migration for anyone that's not aware means that your instance is running And you move it on to a different host and the instance Stays running with minimal downtime like micro seconds. It's usually a little more than that But almost no noticeable downtime The problem with how we do live migrations is we don't pass sufficient information as part of the live migration request to recalculate The CPU map pinning in for mapping on the destination side so what ends up happening is that you use the exact same cause that you're using on the source on the host we don't regenerate that which means if you already had an instance running there and The cause would be you have we're using cause five to ten and this use cause three to seven then that would be the two or three core overlap Yeah, so it's a long-standing bug and The reason it probably hasn't been fixed is because the fix is really really difficult So you're saying that the scheduler isn't properly evaluating the the destination It kind of it's a dumb Like how it's doing it is a little bit daft all it's looking at is do I have enough free CPUs and It will attempt to find the pinning information and if you have enough free CPUs. It's not a scheduler issue per se So you you will have enough free CPUs and it's able to do the appropriate mapping But when it actually gets there it doesn't recalculate that So it's more it's not the conductor kind of think of the name of the actual specific service that would do that But it's it's only in fact that we should be regenerating this XML and we don't it's a long-standing bug and it's annoying, but The recommendation usually is like don't live my great instances with pinning and Realistically if you have CPU pinning on and you're worried about determinism and that you probably don't want to be live migrating in most cases It is a bit of a niche use case Any other questions? I think we're good. Thanks guys