 Thank you for your kind attention through these talks today. Without any further interruption I'd like to introduce Martin Sivak from Red Hat Check. We'll be talking about affinity rules in DMs. So good afternoon everybody. Thanks Brian for the introduction. My name is Martin Sivak. I've worked for Red Hat. I've been there since 2007, just quite a long time. And for the past about four and a half years I've been working on the Overt Project. There's one assumption about the Overt Project that's very fundamental to all the presentations you've heard today. And that's, sorry about that, is that our VMs are actually pads, not cattle, which means we'll do whatever we can to keep them running. And that's why we care about live migration and that's why some assumptions and some decisions are like they are. So without further delay, so what I'll be talking about today. I'll be talking about VM affinity. So first I'll actually tell you what we think the affinity is. What use case is we see for affinity. Then we'll overview the actual affinity types we decided to implement. I'll show you something about affinity conflicts. We'll talk about the actual management of affinity in Overt. And then I'll discuss some future ideas and we'll have time for questions. So what is actually affinity? Affinity, as we see it, is basically an attraction factor between VMs. It means virtual machines might want to run together or not want to run together. The same applies to virtual machines and hosts. There are certain hosts that might be better suited for the VM than others, but that's based on something different than just hardware considerations. It's just a logical attraction. The difference between affinity, which I'm going to describe, and VM pinning, which you might know, is that when you fix your virtual machine to a specific host, it will stay there, at least in Overt environment, no matter what you do. If you want to fix it to some other host, you have to kill it and restart it. Currently, whether on the other hand, when you use affinity, you basically specify the logical rules. These VMs like to run together. They're good neighbors. And when the host is unsuitable, they'll migrate both of them to the same host, to a different host, which means in affinity, we support migrations into host functionality. We don't. Now, it will also automatically adapt to the situation on the cluster. So, if you create new affinity rule, we'll do whatever we can to fix that, to migrate VMs where they belong. If you delete one, we'll stop using that rule. And that's also something different. When you change pin to host relationship and the VM is already running and pinned, it might not migrate. So, why do you actually, why might you want to use affinity in the first place? I listed couple of topics and I'll show you all the topics one by one. So, for example, there are licensing requirements that might require you to use affinity security considerations. You might get better performance when certain VMs are together or are not together. You might get high availability or you might want to use affinity to prevent high availability compromises. For planning, that's purely management functionality. You can use affinity so your data center behaves in a way you want it to behave. Or customer locality that has something to do with latency. Let's start with the specific use cases. So, here you can actually see affinity. Well, actually, it's a situation where you might want to use affinity. Let's assume you have some kind of software license. You have software that runs on your server inside virtual machines and its licensing model is pretty benevolent. It only limits you the number of physical machines you're allowed to run the software on. So, when you run virtual machines on top of those physical machines, you can actually run multiple copies on each physical machine. But you still have to maintain that all the virtual machines are not violating the actual number of physical machines that you are allowed to use. So, here you see some kind of machine that runs the software and it only runs on two physical machines. If you want to run them on two other physical machines, you will have to pay money for that. So, if that's a virtual machine, you can set affinity and in this case, hard positive affinity to those other machines. That's actually use case that was requested. That was one of the bugs and one of the reasons we created this in the first place. So, high availability considerations. The first presentation today by Piotr was talking about OpenShift running on top of Overt. Well, in that case, you are running containers inside virtual machines. And Kubernetes already provides you with high availability configuration with replication manager. But what happens when the hosting virtual machine runs on the same host as some other hosting virtual machine and Kubernetes plants your containers to those two virtual machines that are running together? The physical host can go down and both two replicas will die in the same instant. So, you will compromise high availability that way. But if you define negative affinity between those two VMs, they will never be on the same host, which will maintain high availability in this case. You might want to define affinity for security reasons. Basically, creating sub clusters. Here I have one of the examples. If you are following security conferences or watching videos from other conferences, you might sometimes see that there are very black magic attacks on virtual machines using cache-based timing attacks. They are basically able to dump bits of data from the actual CPU. And that way, they can actually cross virtual machine barriers. So, there are ways, although they are very weird and mathematically confused and I can't replicate them to read data from another machine that's running on the same CPU. So, if you have sensitive virtual machines, like I have here, virtual machine that's processing my card payments, you don't want to run them on the same physical host where you are running some public service where anybody can start their loads. So, you might want to separate them. And again, for that, you might want to use affinity. You might want to define either negative affinity between those two VMs. You might want to define positive affinity of both those VMs to their respective sub clusters. Now, another case, storage locality, performance, network overhead that all can be solved by affinity or improved by using affinity. What happens here is you have a virtual machine again, provides you some web service here. And it needs to talk to its database. It needs to retrieve the data you put on there. Some user uploads a document. You need to save it somewhere. So, if the other service, your database or storage service runs on the same node, like here, the latency is just a matter of basically memcopy and kernel somewhere. Because you are talking to a virtual machine on the same node and it's a virtual machine. So, all is memory based. But if on the other hand, your virtual machine with the service, with the other service database is running on a different node, you are crossing network, which means you go through firewall, you go through the network itself, maybe a couple of switches, that increases the latency. It might not matter much. But if you are slash dotted, you will probably want to have every millisecond to your favor. Now, there are other things with regards to locality. I'll be talking about something. I have an example here, continent locality slightly later. But you also have hardware considerations and what I call dynamic subclusters. Now, what it is, is in our world, in our world, not all hosts are equal. Which means you can have hosts that's very powerful. You can have a host that has special hardware that accelerates something. On the other hand, you can have a host that has lots of memory and lots of CPU power, but lacks the devices. And they are connected to the same cluster. So when you decide where a VM is going to run, you can specify basically a soft, what we call soft affinity or weak affinity. Which means if the host is available, if there is enough resources on that host for that VM, we will try to start the VM there. If the host is too loaded, then we'll just use whatever host is available. Because the service is more important than the performance in this case. But there is no reason to sacrifice the performance if you can have it as well. So you can prefer faster CPUs. You can prefer faster nicks, faster, well, faster network. You can have hosts with 100 megabits, gigabit, 10 gigabit network on different level of your infrastructure. And you can place the VM exactly where it's supposed to be. But if the host goes down, it will start somewhere else. So the service is not disrupted. It might be slower, but it will be up. Now, dynamic subclusters, that's something I'll probably explain again slightly later. But for now, let's imagine current Overt environment. How many of you actually are familiar with Overt? How many of you actually used it? Okay, so that's not that many. So let me explain it slightly, simply. What you have when you install Overt and when you deploy the environment is basically a set of hosts. You have physical hosts that run your VMs. You have those virtual machines. And those hosts can be grouped into clusters. Cluster is a logical entity in Overt. And it limits where a VM can migrate. You can own only migrate VMs within their cluster. That's how we currently organize the data center. But when you decide that you want to move hosts from one cluster to another, one department doesn't need that many hosts anymore. Other department is overloaded and you need to give them one more host. You need to put that host into maintenance because the cluster might use different storage. It might have a different configuration. And that means we need to decommission the host logically, move it to the other cluster, start it up. During the transition phase, it can't run any VMs. That's not that good. It works for us. But with Affinity, you can create something like LVM on top of cluster. Basically, you put all your hosts into a single cluster and then you limit the migration domains using Affinity. So when the need arises, when you need to move one host to another department, you just redefine the Affinity. And since they are in the same cluster, you don't need to shut the host down. It will automatically be available for new VMs. We will even migrate some of the VMs that no longer belong there. If you have virtual machines that do not obey any Affinity rules and you need to just start them to do some simple tasks that are not loading anything, they can start anywhere. So in this case, your whole cluster is available for those VMs and you can easily use spare resources. If there are a couple of cycles, a couple of hundred megabytes of memory you might use for something that the department that owns the host is not utilizing right now, we can start a VM there and it will be migrated later or just killed when the task is done. So that's another way where you can use Affinity instead of using the hard coded and fixed layout. Now, this is another case where Affinity can be helpful. If you have an application or a VM that can optimistically use some hardware resource using PCI pass through, for example, or device pass through. But if the device is not available, it can software emulate it. Then you can get better performance when you use the right host and better high availability when the host is not available. Because your VM might start on a host that doesn't have the necessary hardware. When the host becomes available, we might migrate it there and then it will connect to the device and it will start utilizing the accelerators, whatever it can be nick, it can be GPU. Unfortunately, this is still a theoretical. We are working on migration support for SRIOV, for example. That's a virtual network function device. So like a network card with many virtual network cards inside where you can allocate whatever you want. So that's one of the cases. But we still don't support migration and honestly I have no idea if anybody does. That's kind of the sound of the future. But if you have affinity, if you define affinity, you can even now define that the VMs are supposed to be running on better host again. So you can even use affinity for operations and planning. What it means is, let's imagine you have a host and you know that in half a year you will have to decommission it. It's too old, you will be replacing it, you will be moving it to a different data center. But you don't want to kill it right now. It's still running VMs, it's still good enough for VMs. But if you can avoid it, you would choose to not start any new VM there. Just so it's barely empty, fairly empty when the time comes. So you define soft-negative affinity to that host. And we will start a VM there if we have no other choice or if other hosts are to load it. But if you don't have to, we'll just start a VM elsewhere. And so gradually over time, your VM will basically, sorry, your host will basically become empty. And then putting it to maintenance and decommissioning it will be much easier because you won't start any migration storm. It will be basically empty. So a couple of migrations of small and unimportant VMs maybe and the host is free and disconnected. What you might actually want to do as well is keep some services together, not just for performance reasons, basically for management reasons. Because you know that this rack in the data center is running all your web services. So if you need to plan power outage, you know that all of them are here. If you have all the databases for customers in the same rack, you know that you need to double the power to that rack. It's all management, it's not performance. Performance is one of the reasons, but you might have different reasons for that. Just logical, just so you remember where stuff is. That's another way to use affinity for. So I said that I'll explain the locality aspect later on. So now is the time. Basically one of the goals you all have when you are exposing services on the web, for example, is to put your services as close to customers as possible. Customer can be internal, it can be in our service, it can be somebody on the internet, it doesn't really matter. So in this case, I have basically a service with a client, somewhere a service, this is the client, and it's connecting to the closest server there is. You also have some kind of slow network, the internet, you know, one data center in Europe, one in the US, but currently the user is using the closest data center. Latency is better, performance is better, but what happens when your data center goes down? Well in this case, if it goes down abruptly and your virtual machines are highly available, in our case what we would do is we would start the VM again, but the closest host now is the one that's remote, that one. Your client will still connect to the service, he will probably notice that the latency is higher, the trace route is longer, but he will still be able to reach the service. Now you don't want to stay in this state for long, so you will probably work with sysadmins and fix it. You will fix your data center, so it becomes online again. In this case what happens is we won't stop your service again and start it in the closest data center, that would disrupt the service, maybe briefly, but it would disrupt the service. So our goal in this case is to utilize, as I said, our VMs are pads, so to utilize live migration, maybe even using post copy as Andra explained before he left the stage, which means the client will be talking to the VM here and the VM will be slowly migrating, once it's here the client will start talking to the closest server again, latency will improve and your system is ready for all loads again. This is the most complicated thing for us, because live migration, over slow network, can be complicated, we need to decide whether we should move the VM right now, because we won't do it immediately. I mean the VM is running on a host, suddenly you start a new host and if we started moving all the VMs at once, we would basically kill the network with migration, so your client wouldn't be able to reach that, because all the memory will be occupying all the pipes. So basically the algorithm there is somewhat hazy, it will decide whether the host is good enough or not, it will start one migration per minute or not, so basically it will slowly migrate, it's not immediate, but the VMs will get there eventually. It might even happen that the VM will die before it's migrated, then we will start it on the first host, that depends on your load. So I talked about a couple of different affinity scenarios and types, so let's just sum them up so I can continue. We have basically see four different affinity types. You can see my heart there, the red heart represents heart positive affinity. Heart positive affinity for virtual machine means it will only run together with some other virtual machine or on a specific host. If you create a bigger rule, let's say one VM has heart positive affinity on five hosts or two five hosts, then we'll choose one of those hosts and we will allow migration within the affinity group. That's the difference from pinning. Currently in Overt, if you define multiple host pinning, which is a rather new feature as well, we will start the VM on any of those hosts, but we will not allow migration afterwards. With affinity, we'll pick one host and then we'll allow migration, so if the host is to load it, we'll move the VM to some other host from the same group. Now, we also have heart negative affinity. You might remember that I talked about that in the case of HA system, where you need to put two VMs apart from each other, so you're not compromising, sorry, compromising your high availability, or in the case of secure versus unsecured hosts. So some VMs should not run on unsecured hosts because they might be compromisable, and we have the soft variants of those. Those are useful for those slow migrations where you don't care about the situation right now, but you would prefer if it's slowly went to the right host, or if it stayed there, but if the host is not ready, or it's too loaded, you can use whatever host you want. So just to sum it up, we see heart and soft affinity, positive and negative, and we allow applying those rules to VM to VM relationships and VM to host relationships. Obviously, host to host doesn't make any sense because host doesn't move, so. Now, when you create new affinity rules, what you can get is an affinity conflict. Look at my example here. I hope you like my formula. Basically, in the first case there's some object A, and there's a heart affinity to B. Let's say those are VMs. So there's a VMA that has heart affinity to B, and VMC that has heart affinity to A, and you create a negative affinity between B and C. That's obviously a conflict. There is no way for us to solve it, because you declared in different rules that basically B and C should run together, but shouldn't be running together. That's something we should warn you about, and we should maybe prevent you or not. That actually depends. We will definitely warn you, because what might happen is that virtual machine A might not be running. It's currently stopped, and you start B and C, and in that case, you want B and C to follow the rule. If you start A, you will start getting the conflict, but if only B and C is currently running, you might actually want to obey the rule. So currently, we won't block you from creating such a cycle, but we'll warn you. The other rule is basically the same thing, except H1 and H2 are hosts. The two VMs will always want to run together, but virtual machine A wants to run on host 1 always, and virtual machine B wants to run on host 2, and they can't satisfy that while they are running. So if only virtual machine A is running, the rule is fine. If only virtual machine B is running, the rule is fine as well, but if they are both running, again, you have a conflict. So we will warn you again. It's up to you as administrators or designers of the cluster rules to make sure that this doesn't happen. We'll do whatever we can to help you with that, but we can't fix it. There are certain violations that we can fix it. If it's simple, if it's not a cycle like this, let's say you already have the virtual machine A running on host 1, and you create a new rule that says virtual machine shouldn't be running on host 1. We'll detect that. That's a solvable conflict, and part of the balancing algorithm in over it will try to slowly fix that up. It might not be immediate. There might be more important rules to solve first, but we'll see that and we'll try to migrate the VM to the right host. So the issue here is this might conflict with balancing. When you imagine balancing based on CPU usage on free memory on the host and then rules like soft affinity, you need to somehow specify what's more important to you because it might be that you care more about soft affinity than about memory load. You just want the VM to be there if it's possible. If the host is down, you want the VM to start anywhere, but if the host is up, you want the VM to run on the host, whatever it takes. On the other hand, you might want to run the VM on the host that's not CPU overloaded, and affinity is just a bonus. If it's possible, then it's cool, but if it isn't, I would rather have the better performance than obey some artificial rules. And that's again, that's up to you to decide because we can't do it for you. Currently, for us, soft affinity is pretty important. The default implementation, which you can change, well, default configuration, which you can change, says that affinity is 10 times as important as CPU, but you can change that. What you should be aware of is that when you say that affinity is as important as the CPU, you might get a migration cycle. Basically, the VM will decide, oh, I want to migrate here because soft affinity is telling me that. Once it's there, it will decide, oh, but the CPU is suddenly slightly more loaded than it was supposed to be, and there is a nice empty host there, so it will migrate somewhere else. We have certain logic rules that are preventing this from happening, but it's really just guessing. We can't really tell what will happen in five minutes, so if soft affinity and balancing have the same priority, you might get migration cycles, so you always need to decide what's more important. With hard affinity, that's easy. Nothing like that will happen there, but with soft affinity, you need to be aware of that. We have something which we call affinity rule enforcement manager, and that's running alongside the balancing. They are talking to each other in a certain way, but they can't do everything. It's not as smart as you can be. Now I describe affinity in general. I assume other projects might want to have affinity or already have affinity. I know Kubernetes supports affinity. Now, what do we have in Overt? In Overt, we have two different kinds of affinity. We have affinity labels and we have affinity groups. We decided to do it like this for one simple reason. Sometimes you want a data center and you want rules that are very simple. You just want to declare that this is a hard affinity. These two VMs are supposed to be always together, or this VM is always supposed to be on this host or on this group of hosts. You don't want anything complicated, and you can use affinity labels for that. There are simple labels, so you apply the label to virtual machine, you apply the label to the host, and you declared an affinity group. On the other hand, you might want to declare that a virtual machine has soft positive affinity to another virtual machine, and both of them together have soft negative affinity to a group of hosts. That you can declare using affinity groups. Affinity group is basically, I don't have this dialogue screenshot here, but it's basically a list of virtual machines with a rule that declares the relationship between those virtual machines, and list of hosts and the relationship between the declared virtual machines and the declared host. That supports all different types, as you can see on my icon, using my icons, and as I said, watch out for the conflicts. They're totally yours to make, and some, most of the time yours to fix, will do what we can. Now, so how our affinity labels work? Here we actually have a virtual machine that's supposed to start, and it has two labels, finance and db. Now we have two physical nodes that can run the VM. One, the first one on the top, has strong and finance labels, and the other one has weak finance and db labels. Our algorithm is checking for whether the host has the same labels or more. We don't care if there are more labels than are necessary, but we'll check whether all those labels on the VM are present on the node. As you can see, the db here is missing, so we will never consider the host where the db is missing. We require all the labels on the VM also on the host. If there are some extra labels, we don't care, but the VM specifies the requirement and the host has to match it. So, affinity groups, as I said, define a group of VMs and their relationship. It also can define one special value, and that's no relationship at all, because in Affinity Group, you might want to say that this list of VMs in the Affinity Group actually has no relationship to each other. They only care about the hosts. With Affinity labels, it's only VM to host, or yeah, it's only VM to host, and with Affinity Groups, you can define all the combinations, so we need a way to say that VM to VM is not important, only VM to host, because it's basically one document with two lists and two rules. And you can also define a group of hosts and their relationship to the VMs. So, the current support in Overt, we've been supporting VM to VM affinity since, I think, over 3.5, that's like a year and a half ago, something like that, and that supported both in the web admin, in the administration interface, and using REST API and SDKs. We support VM to host affinity since 4.1, that was just released yesterday, if I remember correctly, and that only supports API and SDKs. Currently, we have no UI for that. Affinity labels have been supported since 4.0 about a year ago or so, and there is a link to a blog post I wrote about that when it was published, so it will show you all the REST API calls you can make to make it happen. Now, future ideas, we might want, and we actually want to do that, but it's kind of complicated to do in the current infrastructure, to let you use affinity label inside the affinity group dialogue. Basically, you would say all the VMs with this label are part of the affinity group, so you don't have to select them one by one, and if you add a VM to the label, the whole group will automatically apply to it. That's something we are working on, but we don't have it yet, and I was thinking about inversion of a rule, you might want to say, run no VMs on this host except this group. I really want to decommission it, so only use this group of important VMs and run it on the host if it's necessary, but otherwise keep it empty. Summary, affinity basically allows you to define complex relationship between virtual machines and between virtual machine and a host. It's enforced dynamically. It depends on the current cluster situation. It's not forcing you to use one specific host when you define VM to VM, strong positive affinity. We don't care if the VMs are running on host A or host B. They will run together on one of those hosts. So that's it. Thank you. If you have any questions, and those are good questions, I think I have a USB key for you. So yeah, you don't count. Okay. And the way I guess they solve the problem with the conflicting rules is to put them in a sequence. So I want this rule. This rule is more important than this one. So you talked about the weights, but if you actually relate them in importance to one another, then you should be able to resolve the conflicts. Is that something that? Yeah, let me sum that up. So Adam is actually asking whether we have some kind of a rule priority where you would be able to declare that if there is a conflict between those two rules, this rule wills. Now currently we don't have any rule priority. It's actually a good question. We might consider it in the future, but we wanted to have at least something basic before we start merging into the difficulty and difficult area. Any other questions? Okay, then let me invite you to our booth. We are in the main building close to the info where all the kiosks are. We have overt running there so we can take a look of what we actually support and talk to the developers. Thanks for watching.