 Our first speaker is Milan, he's going to talk about extending a business, and he will start soon. Thank you for your introduction. Nice to see that so many people are joining. My name is Juan, I'm working for Red Red as a software engineer, and I wanted to talk to you today a little bit about takers and invertebrates systems, and especially how in communities you can create comparative expansions When we look at the content first, I want to talk with you a little bit about invertebrates systems, which are very true, and I want to actually show you the same flow we're talking closely with the security flow, how it looks like in the decorative flow, and then I want to tell you how the company is in general based on control, which can work in such decorative systems, and then we'll have a look at the main structure, that's a little bit more coordinated, which in communities you can create products, and the main structure, there's a full list of products at the store to share information with you. So let's start with the invertebrates and the actions of the software. I may have a little example that you want to go through with you, and let's suppose we have a virtual machine management application, and we have a pretty simple task to do. We just want to create a virtual machine and a disk that can be read, that can be read. We have a little bit of a little stage machine, where you can see all the steps you can go through. First, we have no disk and no virtual machine in the system, so the first thing you do is, for the rest of the API, create a disk. Once you've created the disk, normally you have an asynchronous process going on in the system, in which stage you're provisioning the disk. It's not immediately available. So you're waiting a little bit for it to show up. Then you create a virtual machine, now you can actually create it because it exists. The assistant only allows you to create the machine before the disk exists, which is referencing, and finally you can start the virtual machine. So far so good. So it's time to choose the person which has to do all the work, and that's the user. So the user has in its head the final state of what she wants to achieve, but now tells the system what she wants to do instead. That is to go through all these processes, I've already told you, and then mark the view in red here. So the user first creates this, then the user waits for it to show up, then the user creates the VM, and then uses that state. I guess in the head of many people, there are already show-ups and batch scripts to do it or whatever. And the first, yeah, it's not very nice, but still you just have the rest calls and everything is there. But when we look into more detail, we have to do more, we have to agree, we have to add re-triology, we have to make, we have to, for instance, we can't connect to the cluster, we have to make sure that we can't connect to the cluster, that we don't overload the cluster by just quickly doing the rest calls again, you have some backup mechanisms in there, some in here, the rest script, we have to make rows and rows. And all that work you do there does not, at all, immediately tell other people what the final state, what was the final state you want to reach. So you can start thinking about how you can exercise pressing the state you want to reach. And with imperative systems, you can use, for instance, Ansible to help you get. So you can create an Ansible row, put all the tasks you have to do there, it already has some helpers, we're doing all the re-triogues and so on. And then you can assign a specific, so you can put your mind, the state of, in your mind, the final state into an Ansible row, and then just do something like this. Ansible, row, Python 2 row, on the most low-to-most, this would just, with other verbs, I want Python on my node and the backup carol gets there, someone else can write the row. So that's the imperative, that's the security, so that already goes into the decorated direction of the narrative systems, but the system itself is decorated. Now we can look at the same example of how it would look like if the system itself looked particular. And therefore, we can have to introduce an operator, and the operator is actually the system which performs the steps for the user. So the user now has the final state of what he wants to reach, not only in mind, can really express it, and can, for instance, store it in it, and everyone can, more or less, easily understand what the final state is of the user's mind. And then the operator, who does all the work for the user. Here, again, you see the user, and in the red, so if you see what the user now has to do in the system, he just, just posts the state of what he wants in the system, and he doesn't care about all the intermediate states. And here's an example of how such states can be expressed in Kubernetes. I'm working on Hubert, that's an extension for Kubernetes which allows you to manage virtual machines, and we were creating a virtual machine extension to Kubernetes, and here you see a description of virtual machine. As you see the kind of virtual machine object that's here, you can specify which memory it has, you can attach a disk to the machine, and reference the system, which is the empty disk we want, and then you can just create an ordinary system, or you can create in the typical Kubernetes YAML format, and in this case, we want a disk to be divided and you post those two YAMLs to the cluster, and then the operator starts doing all the work for you. You can attach the YAMLs, it creates this basic object that is to be there, it starts to get much easier for you, but now we engineers have to deal with the complexity, so it's not going away, it's just somewhere else, and that brings me finally to the part where we can talk about how Kubernetes solves the problem, and Kubernetes has some basic considerations when it comes to writing controllers. One is, we have a cluster state, and we are trying to follow that cluster state, so we're not trying to, but we're just, we're not following every state of the cluster, we're just trying to converge to the final state we want to reach, so after you get virtual machine update events, you see you have to react to them, and you just try to adapt the cluster to it, but we don't guarantee that you're going through every mutation of the object, it's just, you guarantee, at the end you will read the desired state, and this has advantages when you think about how Kubernetes controls or reacts on the state of the cluster, because they have an in-memory cache view of the cluster, and with that, under the ratio that you don't have to process every state, they can have some very nice simplifications to reach the final state. The first thing you have to do when you want to create controllers, when you are moving in this unique axis to the data and create their in-memory view, and the first construct that I have here is a so-called misfortune, and that's an actually pretty simple construct. You can see the interface, the portion, you have to move methods on it, and you first have this code, which tests as a REST code, and this is your server, and then you have to watch it all. After you've listed all the pods, you start watching for changes in the pods. Kubernetes has a very nice watch API, where you just put these events on it, and you get all the updates on the pods. And since it uses tct in the back, it can leverage the entity and the source version function, the source version function. Whenever you put an object in the tct cluster, it gets a specific resource version, and you can start watching the directories based on a specific resource version. That's what the watch function reuses. And when you look at an information of such a list watcher, the result is therefore pretty simple. You see here a function which creates a new list for all clients. First, we create a list function, which needs to return. Oh, yeah, that's something I've mentioned. All I'm talking about is constructs. They're not just internally used by Kubernetes. Kubernetes, you guys, really made a great work of creating an extra decoder SDK, which is directly synchronized out of the core repository. All codes that you see here are using core code from Kubernetes, but the core of Kubernetes is the case. We can just fetch the SDK and write the controls the same way like core Kubernetes can do as I do. And here's the result of it. I'm just using a default REST line from the Kubernetes code we put here. Build a REST path and fetch all resources from the endpoint. And for the watch functionality, I'm going pretty much to say that we write REST path to watch for a few weeks to specify the space for the resources that can be caused or to machines or whatever your extension is. Add some extra parameters. You don't have to care about the material. Just add them. You can specify field selectors that help you to say, my controllers, we just care about a specific area while it's not all closed in the cluster. I just started watching. That's it. You just return and you enter this watcher which lists and watches and emits it, and updates the main events for your object. That's the first step. Now I'm going to put in the ready thing. Now I get all the events coming in from my controller. Why not just take the event channel like I told you and do my business logic now based on those events? And that's not a very good idea. First of all, controllers can have more than one resource. Then it's hard to combine all the events properly because you get from resource A an event for an object. For an object X and from resource B you get an event for an object Y and you don't really have a place to coordinate them, to look at them together. The second thing is the list watches don't provide any retry mechanisms or anything. They're really just good at one thing and they're very good about emitting events of changing the cluster and showing you which resources are there in the cluster. The next thing here to get part of the control and do the great little controller is introducing a store. The stores can now help you by taking events from the list watches from different sources and fill different stores with the objects which are really better this watcher and now you have a place to store in memory field. A store is a single right amount of data constructing internally with a simple memory based method and you have a few simple methods on how to add stuff and how to respond to it and how to access stuff. Basically you can add up to what you need objects in this case. They're entered into the store in different ways of getting the objects out and what you need to do is consist of the name of the object and the namespace that it is in so here I'm trying to get the part of part one in an open space and I get the object out of the the list. There are methods like this with the content of the cache here we can see the full interface head up in the list to get all the objects in this case just get the objects which are in get to check if an object is in the cache which are in there very simple and now we're ready a little bit farther now we can fetch the state now we can fetch research changes from the cluster to this question and we can use the store to create a new memory field from the cluster and we need to have something farther to share which takes the list watcher and the store and make sure that the list watcher updates the store for you and it allows you to create the callbacks and these callbacks are also very simple to add, update, delete callbacks and whenever an event is coming from the list watcher it will try to call the callbacks since it's shifted to a specific signal a shared informer does not guarantee that when an event is coming in and your callbacks are triggered that you are probably better to show you that you can see the editing and updating features so when a list watcher updates the store and the shared informer that emotes the callbacks it will for instance for your admin for the terminal to consider it will provide you with the object which was triggering the update but the interesting sign of you is that you are not guaranteed to see that the object state which was coming in by the list watcher and it was triggering your callback maybe I'll try to explain once again to make sure that you really get it it's not that hard but a little bit hard to express so there can be two or three very fast updates on the same object and you can have many callbacks on the shared informer so it can be that as hard as the callbacks were invoked after the first update of the object was coming in in the meantime two new updates are coming into the same object that the cache is updated but the callbacks were not even invoked for the first event yet so what the shared informer does is providing the latest data from the cache to the object and that's a very neat idea to reuse the shared informer for as many callbacks as possible without having that overhead of triggering all the different states early and you can steer each the goal of reaching the final state which you want hopefully you're not processing every object mutation here's an example of how such callbacks can be implemented for shared informer and you just create your shared informer and you can receive a handler function and an add function another function and in there you can take the object and operate it again now you might think hey, now we're there right we have the shared informer with the callbacks we have to store the in-memory view now we can put out this as logic in this update of the delete functions but no, not a good idea there is something very interesting internally if you are locking if the go-routine locks for a very long time and for instance this is what it normally takes a lot of time and you have to say go-routine already trying to acquire the lock immediately if it doesn't scan the other go-routine which requires the lock just for a very short amount of time doesn't get the lock so you would end up with reprocessing over and over the callback code but never updating your cache which also needs to lock and that's therefore this callback needs to be very short because of the code invitation or go architecture decisions I don't want to touch that and the next thing is as you can see the callbacks here they just take the object but return nothing so there is again no place to error here in the same way no retramming and nothing but what you can do now here is you can use the callbacks to make you think of the object which changes to a work queue and the work queue is the final missing piece which will allow us a work queue at first glance looks like a completely typical queue implementation you get something in an object and you get something out in the same way as the first algorithm but it has a few optimizations which make writing the process very convenient the first one is and I'll show you details after I'm going to do this for all those cases if you get multiple updates with the same object they all collapse into one edit key so you don't end up processing every mutation it just collects all the keys so you don't have to process so you don't wake up the queue five times because it's just not necessary the work queue internally knows when the queue is processed on when you're done with processing and that allows the work queue to make sure that you don't have to care about locking on keys so that you don't process the same key or the same object in parallel in different processes it allows mechanisms to re-inqueue the key you're currently processing in the case of errors it even allows you to specify retry policies and backup policies and and that is the case why it also has a history of re-inqueuing the case of errors if we now go through the list of more details first the thing where it looks like a normal map you have your queue contains a queue system it's called two-pot and it's called one-pot and you want to add it to one-pot the result is like expected and it's just added in the back of the queue and you have three keys in there but you already see the difference when you already have an update in the queue and you try to add the same key again then that's perfectly valid but the result is a two-pot so you can inqueue it just once and that's all you need because you don't since you have a cache with the latest in-memory view of the cluster you can't even process every object mutation so you're just interested in the technique okay just to change and then need to process it and if I didn't yet process it it's good enough to just have it marked as I need to process it by time it doesn't need and you see the fact that we can change this anyway and the queue also makes sure for you that you don't have to care about processing the same key in parallel multiple times that could lead to screen-phrase completeness just consider you have a replica set which should create five ports and and you have two updates on the replica set and ports so it's in two times and in dequeue it first and then dequeue it in another way again so both are trying to work on the changes and both are trying to create five ports for you again you have ten ports and then over time they probably adjust and start to need them again and again you end with five but you would have to take care of yourself about what you have to do by tracking which keys you got from dequeue so when you do a get and when you do a get and you get the default for wonky out and you add the default for wonky again and do another get in another version you get nothing out of dequeue also there is something in dequeue only if you are done with processing and fold down in dequeue with a key another get would give you the object and now we get to the very important error handling capabilities and if you are getting a key out of dequeue and you are processing it and you have errors you want to re-process it but nobody wants to immediately re-process it or maybe the first time you want to immediately re-process it but not if it takes five time who are now yeah it should be three or five minutes so what you want to do you want to take key rate in minutes a bit and start immediately to get out again and and here you see it so we are processing the event we are processing the key we have an error and get it rate limited again it gets enqueued again into the key and when I immediately do a get afterwards it gives me nothing out only after a specific delay which can be an exponential payment for whoever for every rate limited call you get the F3.0 that is the very same that I need to know and there is a final point of information which you need to take into account and that is the key rate internal also tracks the previous time so if you call it five times a rate limited you can take it five times then then the key rate internal it tracks five times the error and that's an important instance to calculate the takeoff and when you finally manage to do a successful process processing of the key you have to have an expert to get over the key to make sure that the shivery screen gets the object and it gets normally added and you just need to repeat the place if you complete all the methods that we talked about before and now we are again pretty much because now with all those components we can create a controller and the controller loses the list watcher and informer and the store will add in clusters in memory with you it uses the callbacks of the informer to add the keys to the work views the controller has a work run which you get to the key piece from the cube if there are errors you use a rate limited to add the key again for data processing on success you have to make sure that you forget error history if you have one and you have to call it done other workers can work on the key again and you have a nice picture which I said too which I just based it to on the left side you see here the store is already the callback of the informer into the work view here you have the controller the work things they are using the work to get the keys out of the work view and based on the key they fetch the keys from the stores from the informer now they have everything they have access to the objects and to the key they need to process and they can react to changing the cluster between x and y and depending if something was a successor in error you can call the right calls in the work view and these calls are done for getting a very limited idea of the work view and three minutes then it is perfect to go and use the notes again which we collected during the talk and so here the list watchers are really just there for updating stores don't even try to react to the events that are provided directly there is just no code out there in the release to process the error use stores to get the latest state of the known object and don't try to manipulate the hash directly let's turn the forms to the free view it's a single read amount a single write amount to the read implementation and you don't even have access to the logs so don't try to do that then copy objects from the store before you try to manipulate them when you get an object from the store you just get a reference to the object in the cache for instance taking a part out manipulating it and want to update it on the cluster but the cache shouldn't be updated based on the state of the cluster and not based on your mutation directly so you first need to decopy it for instance some performance generated and when you do shared informers don't use the code mix for doing a business logic just use them for notify work use about changes and finally that's a very interesting one don't in communities you normally have the object with the specification and the status in one place so don't try for instance have a port a replica set which would create 5 ports don't try to implement it the way that they say ok now I created 5 ports so I immediately also update my status to have 5 ports what you will see is that your port comes to start to go up and down pretty much randomly because the in-memory of your cluster did not yet recognize that you have created that port and it does not yet know that there are 5 new ports in the next updates because the next configuration is not yet so your status will just behave very strange and yeah that's pretty much it there are more interesting to see better implementations and how Kubernetes and there is even more but it's just for instance reference management publishing of objects talking about garbage collection and all that also ok thank you