 Čao, je Simone Tirobowski, z katera. Vzovem, da sem vzela, da so tudi o operacijerami. Vzelo, da je vse zelo. Vzelo, da je nekaj, da je zelo, da je zelo. Vse je vzelo, da je zelo, da je zelo, da je zelo, da je zelo, da je zelo. Zelo, da je zelo, da je zelo, da je zelo, da je zelo. Komplektivaj je obovljena, kaj bi tega kot jaz začala, kako jaz ti budi odpoče, da boš jaz, kako so dala začalna pošskega, načinje všakv, načinje vundaljane platforme, rekošnje, države, pridahk, skalj, poškaj, začal poškaj, ker je zelo, da smo zelo zelo, So, maybe something should be changed, maybe something, for sure something should be introduced. We are going to have the bugs. We are going to have things that should be changed, but we have a common goal when we develop an operator. We want to provide something so robust that all of our users are going to choose heal to automatikovati strategij, ker je zelo počet, zelo počet je zelo počet, zelo počet, in je zelo počet, počet, da je zelo počet. Znamo, da se predaj o če smo vzelo načiniti in operatora Kubevirt. Peči nekaj delaj, kaj Kubevirt je, zelo počet, da je počet, for who is not aware. CubeVirt is a cloud-native API in its runtime to be able to define and manage virtual machines as if they are cloud-native object. The virtual machines are going to be executed into containers using KVM, QM, LibVirt, and so on, but they are going to be scheduled and executed and managed as fully cloud-native objects. Why we think that is a good idea, because this is going to follow all the Kubernetes paradigms, so you are going to use CNI for networking, you are going to use CSI for the storage, you are going to schedule and consume virtual machines exactly as cloud-native resources. You can access and expose them via service, routes, ingress, and so on, exactly as if they are pods in your application. Now, what very preconverged cluster operator is. CubeVirt is the part that is going to manage the virtualization world over Kubernetes, but in order to be able to execute virtual machines, we also need sibling operators, like something to handle the storage, to be able to import images, to handle the network configuration, and so on. So we have more than one operator here, so potentially we have a lot of complexity because our user is supposed to configure different operators. We need, we want to have a single entry point, that's why we developed HCO. HCO is an operator for operators. The operands of HCO are CubeVirt and other sibling operators that you need in order to execute your virtual machines. Here we have a graph, it's really complex just to give you an idea of what is going to be managed by HCO. HCO is going to provide an opinionated deployment of CubeVirt. So the idea is that we want to have a single entry point. A single entry point is a single configuration object that you can use to configure CubeVirt on your cluster. This single entry point should be a self-explanatory and well-defined object with a schema validation. It should be declarative. We are more interested in what we want to achieve, because VR is going to be done by the operator, and we want to have a clear separation between our desired state, what as clusters admin we are going to tell to HCO, and what is the observed state. So now let's start from mistakes. As everybody, we also did mistakes. The first one is abusing annotations. In the past it happened that we used to expose some feature to annotations. It's absolutely a bad practice. It can be really convenient because it's really fast. You just say, OK, put there that annotation. If I'm going to find that annotation, I'm going to have a different behavior. But is it going to have a schema? Can we validate it? How can we report the status of an annotation? Are we going to add a second annotation that is going to reflect what was in the first annotation? Are we really advertising that we are offering to the users a new feature? In general, annotation is the wrong way to expose a feature. As developers we know that nothing is going to last longer than something that we introduced as a temporary but dirty arc. So we have to pay for that. As a reference, I want to point out the Kubernetes API conventions is really a good document with a lot of interesting material. I strongly advise you to take a look and spend a bit of time trying to learn from that. Now, a counter argument. We are also using annotations, but with a different scope. HCO is providing an opinionated deployment, meaning that you have only a well-defined set of exposed features. Everything else is going to be automatically reconciled and covered by HCO. But if I want to have an arc, an experimental POC, if I want to touch something that was not really exposed by HCO, because it's an experimental feature where we are using a special notation that let the cluster admin use a JSON patch to overlide the HCO opinionated configuration over one of the HCO manager operators. This is a kind of overlider hack. It's an escape patch to let you inject something over HCO. It's still going to detect that and it's going to raise an alert saying that your configuration is tainted. We are also doing it recursively, meaning that one of the operators managed by HCO is the VIRT operator, which is part of the cube VIRT. And also we have the same mechanism, so we can use a JSON patch annotation to set another JSON patch annotation that is going to amend the configuration over one of the objects managed by a sub operator of HCO. So now we see what we did wrong. What's the proper way to expose a feature? For sure it's a feature gate. In HCO feature gates are simple booleans to enable or disable a feature. It's pretty simple on the other side it could become an anti-partan if we are going to abuse it. If we are going to have a huge list of booleans that the value of one is going to contradict the value of another and there is some relation between two different booleans it's going to become an anti-partan. So we have to ensure that we are going to expose only what our cluster admin is really supposed to tune. Now let's see some implementation details. In our case the feature gate booleans are optional values. A good reason for that is that we can use pointers to boolean and in this case the value can be true false or even nil. But the user doesn't care. It's up to us to decide which is the best value. Then we are going to see later on another good reason why we choose to have M optional. We also have defaults. As you can see we are using two different mechanisms here to specify the default value. The first one is the cube builder annotation to let cube builder generate open api v3 schema. The second one is a second annotation for the Kubernetes code generator that it's going to generate code for us to perform the defaulting. We need to separate mechanism because we are doing different things. With the static code that we generate from the Kubernetes code generators we can easily have initialization code that we can use in unit tests. We can simply call that function to get all the defaults that we need. It's good but not perfect. We have a more sophisticated mechanism which is using the open api v3 schema mechanism to get the defaults but this one is going to be enforced or performed by the API server. It means that when you run unit tests without having a real API server you are not going to get those values. This is why we need both of them. Then in Kubernetes we have a third way to get the defaults and it's having a admission controller. You can have a really sophisticated logic. Please notice that in this case the admission controller is going to change the value before reaching the API server. It means that the value is always going to be in the API server. While when you use the open api v3 schema the defaulting mechanism is happening when you read back the value from the API server. If you have a nil on the stored object you read back and the API server is performing the default and you are getting a value. But you store nil value. Then there is also a different pattern that is having your controller changing the values of the specs stanza. We call it late initialization because it's something that can happen at any time. You can have a really complex logic there. Please notice that you can also introduce loops. Now the golden rule. The golden rule is that with defaults we never want to change a value that was provided by the user. We expect we want to validate the values and we want to set a default if a value is missing. But the key is validation. How can we get validation on Kubernetes we have also here three alternatives. The first one was the open api v3 schema validation. We can have validation values but just on the single field and we can also have enforced string format. For instance date, date time and so on. With Kubernetes version 1.25 we also have a second option which is the common expression language where you can use more complex expressions. For instance you can validate the value of a field again the value of a different field. You can do it also for non-scholar fields. You can say that something should be mutable once you set it in the past or something like that that was not possible just with the open api v3 schema. Please notice that this is in beta since Kubernetes 1.25 but is still beta so maybe it's going to it can change in the future. The third option is using a validating admission web book. You have a web book where you can put your validating logic. In the past the only available option was the first and the third while now we have common expression language where something in the middle that probably in more than 90% of the case can let you avoid writing a customer admission web book that should be also managed on your cluster. You have to provide sales you have to be sure that it's going to continue running on your cluster. On our side we are still using a validating web book. Why? Because we have a complex logic. HCO is an operator for operators. It knows what other operators are going to do but we don't want to replicate all the logic for sub operators on HCO itself. So we work with a delegation pattern. It means that HCO is going to get an API request for a change. It's going to compute the expected configuration for all the other operators and it's going to try it in dry run mode. HCO is going to really propagate it in components if and only if the calculated configuration is going to be accepted by all of them. It's not really a transaction but it's more than enough to prevent a lot of possible conflicts between the configuration for different operators. Another topic is API's deprecation and changes. This is a really other topic. I'm going to give you some hint but please take a bit of time to reflect on this. There is a really good document or three about changing the APIs on Kubernetes architecture. In general, compatibility is hard. We don't want to introduce that much not backward compatible changes. Sometimes we have two. An API change is considered compatible under certain assumptions. In general, we have assumptions about the syntax and those of the semantics. The takeaway that you can bring home is that introducing a new optional field with a sane and opinionated default is usually safe, everything else should be properly evaluated. If you are going to introduce a non-backward compatible change, you need to bump the API version, you need to define a proper conversion mechanism and eventually you need also to define a proper deprecation mechanism. This is true also for matrix. In matrix there is P1209 about matrix stability framework, also matrix should be eventually deprecated and so on. Please notice that VLM if you are using it supports conversion with books only if the operator is deployed in all namespaces install mode. This because the CRD is a cluster wide object while an operator can possibly install it also in different namespaces with different versions. Another int that I found really useful is to use fasters to randomize inputs to test and detect conversion glitches in around trip. Another lesson that we learn is scaling. When you develop an operator probably you are going to try it first on a small cluster on a development cluster and everything is going to work pretty well. The operator is correctly working then you are going to move it on a huge cluster and you are going to discover that it's not that happy. In general watches in controller and time are expensive. If we are going to have our operator watching everything it's going to become a problem. We can avoid that with two strategies. The first one is using predicates on the reconciliation loop from controller and time. With predicates we can filter some of the reconciliation requests we can process only a few of them but we are still going to get triggered by a huge amount of events. We can also do something else which is configuring the cache of controller and time to optimize it with selectors. Selectors means that we want to have our controller and time cache explicitly ignoring some objects. So it means that if our operator is going to watch config maps on the cluster we want to get an event only when a specific config map is going to be touched by either cluster admin or by another controller. Please notice that there is also a big drawback of this and is that whatever is not in controller and time cache cannot be read with controller and time client. Last hint is about trying to limit the API burden because for instance when you process a complex status let's try to update it only once in the end. And of course try to use matrix. Then one of my preferred topic is upgrades. In cube viert we have different stories about upgrades because cube viert is a tool to manage virtual machines. Virtual machines are strange beasts because a virtual machine is not a generic pod and it is quickly killed and started on a different node. We want to have a proper lifecycle for our virtual machines so we have different teams here. The first one is about platform or node operative or upgrades on the node operative system. If we take a look at the internal architecture for cube viert we notice that we have a console plane which is faster components like v viert console and v viert api that we need to perform our operations but then on each node we also have a node specific agent which we call viert handler which is still part of our control plane then each virtual machine is a pod and in the vett pod we have a lib viert and a quimo and we want eventually also to upgrade vett because we can have bugs on new features in lib viert and then our virtual machines contains a guest operative system and eventually we want to be able to get also upgrades there and eventually we have other features and so on so we have a complex story about upgrades let's try to discuss them one by one the first point sorry, it's too long but maybe you can bring home some details the first story is about platform or node operative system upgrades you can imagine that if you are upgrading your platform, your cluster just the operative system of the nodes probably you are going to require a reboot of the node it means that the virtual machine is not the virtual machines that are running on the node in order to survive vett but we want to preserve our virtual machines that's why we use a pod disruption budget to protect our virtual machine with a pod disruption budget we are sure that our virtual machines are not going to be killed by a node drain on the other side we need to detect that the node drain can be tried to live migrate or eventually have a clean shutdown of the virtual machines the cluster admin can configure the strategies here because not all the clusters can be can be I mean in order to be able to have a live immigration your cluster should support some specific features so it's up to you to configure that in case we are going to use the live migration as an upgrade strategy we are also protecting the immigration itself with a second pod disruption budget where we have mean available equal to because in that case we have at the same time two pods for each virtual machine with a source pod which contains the source QEMO and a target a target pod hopefully on a different node which contains the target QEMO please notice that if you are configuring your cluster to live migrate virtual machines on platform upgrades you can potentially block the upgrade if the virtual machines fails to be migrated in that case the node drain is going to be blocked by our operators and you are going to get another but you have to manually do something on your virtual machine to let the platform upgrade complete second step is QV and HCO control planes here we have a few control plane pods in reality we have more than them we want to be able to upgrade them as well HCO is an operator for operators so it's managing different operators with different versions HCO is can be deployed with an Elm chart or by the OLM OLM is the operator lifecycle management manager if you use the OLM you are going to deploy a bundle a bundle is composed by the manifest for the operator plus all the CRSD in that bundle we build that build time a configuration with the version of all the operands managed by HCO then each operator is going to a report on each CR and observe the status which is going to to report back to HCO which version is going to be is currently consumed by that operator and we are also using conditions in particular we are using three conditions available progressing and degraded to report what is going to happen during the upgrade process combining these three informations we are and a bit of logic because it's not not such a simple logic HCO is able to monitor and track the progress of the upgrading for its managed operators then another interesting step is how we handle the upgrades of virtandler virtandler is the demo set so it means that we have a pod on each node that is managing other VMI spods virtandler is a critical component for us because if we don't have a virtandler on the node basically the virtual machines are going to continue running but we are not able to do anything with them so in OpenShift we generally have a convention where we try to have when you create a demo set there is a parameter which is called max unavailable that is configuring how many instances of the demo set you are tolerating to have down for a generic demo set in OpenShift is that we can tolerate up to one third of the nodes to be down for critical components we tolerate up to 10% but in this specific case we want to be even stricter and we use a canary deployment strategy it means that we start patching the demo set with max unavailable equal one so that only one node is going to get virtandler replaced if and only if virtandler is going to correctly come back on that node we are going to reach back to the default strategy with max unavailable equal 10% at that point we will start to become faster and we will work on 10% over the node each time but on the first step we try just with a single node just to minimize the ventral disruption another interesting thing that we learn is to try to be declarative as much as possible even when we write operators an operator is the bridge between the declarative word so your APIs and what should be done which at the end is an imperative word but instead of trying to get a lot of complex code into the operator we can be declarative also also there preparing some files with the configuration of the upgrade we defined a few structs where we have a set of jason patches and a semi-vm range that should identify which versions are interested by that change set here we also have a dry run strategy we try them first or if we are sure that this is going to work we are really going to try to apply these changes last step is about how we communicate back with the OLM a few years ago the only option was to use on the operator pod itself now we have operator conditions with operator condition we can report that we are still upgrading so we don't want to be interrupted or eventually we can report a failure if we have a failure we can move to fail forward upgrades which is a new feature in the OLM that means that an intermediate version that has a market as failure can be skipped and we can try to move to jump to the next one next step is the upgrades of what we call work load work loading in our cases LibVirt and Qemo on the pods that we use for our virtual machines we want to upgrade them as well as part of our upgrade also here we can configure our strategies we can configure the virtual machine live migrated if your cluster supports live migration or we can configure them to be evicted if a virtual machine fails to be live migrated it's going to stay with an old version of Qemo and we are going to annotate that virtual machine and there is another to signal you that you still have some virtual machines running with an older pod another step is not the guest operative system of your virtual machines this is not strictly part of what HCO or Qubivirt are managing you can use many different tools because at the end they are virtual machines, you can access them and run the tool that you use to manage your virtual machine but we want to be cloud native so we extended the Qubivirt introducing a set of Tecton tasks Tecton is a CICD pipeline for Kubernetes that can be better let you create pipelines and we created a set of tasks the tasks are the building blocks for your pipelines to do operations of your virtual machine you can create a pipeline to upgrade the guest operative system of your virtual machines another step is that we also have other objects managed by HCO for instance we have golden images golden images are a set of images that you can use to build your virtual machines let's imagine that you want to build a virtual machine that is always going to consume the latest version of Fedora we have a feature that is going to use a crown mechanism to try to continuously import the fresh version of Fedora as a cloud image on your cluster please notice that this is completely decoupled by the upgrades of HCO or Qubivirt you are continuously trying to get the next version of the golden images and you can also use custom virtual machines with this feature last slide the most important thing is testing the developers we know that we are going to have bugs we cannot prevent them but we can test more and even more important because what is going to work for you on your developer cluster maybe it's not always going to work for your customers but if you inject the relevant matrix in your operator and you are going to get them the solution you can learn what your users are doing how often they are upgrading how long an upgrade does it take and you can eventually also think about complex strategies like canary design where you are going to ship the new version only to select the users and so on but this is meaningful only if you have the capability to get some insights of what is going to happen and the canary version otherwise it does not make sense to ship on a single cluster if you are not able to get useful information from that here we also have a really good reference document which are the observability best practice on the operator framework as decay so thank you very much for your time here I have a QR code pointing back to the Kubernetes schedule please report me some feedback positive ones are appreciated but negative ones are even more because as a developer I am always going to learn from my mistakes so thank you very much