 Oče. OK. So, o ta bomo povedal, v pravdjih vseči, kaj bi bomo povedali u tem, na kaj bi je veliko importance, kaj in vseč, tez, ki so vseči, nekaj prijev vsiči, in je to učil, sko je bilo način, kaj bi je bolj našlo način, je so način na našlih, je to vseč, for everybody where it is this for a king. Come on. Sorry for that. You need to plug it in. Okay. So my name is Victor. That's how I look like. I work for cloudbies. I'm associated with Docker folks, with Google. I have podcasts. I have books. I have blah blah blah. Not important Twitter account. This is important because after the talk I will give away some stuff for free. So you might want to go there probably like 15 minutes after this talk. Anyway, I'm going to skip the boring part and we're going to jump straight into what I'm going to talk about. Here you go. Okay, so deployment. There are quite a few ways how you can deploy your stuff. I'm not going to go through all of them, but I will go through a hands-on part demo of those that are most commonly used today, which would be recreate or big bank strategy, rolling updates. I might be talking a bit about blue, green, currently deployments and serverless. That covers most of the strategies that we are using today, one way or another. And when we are talking about them, there are usually a couple of things that are important that you're trying to take into account when you're making the decision how you're going to deploy your software. Even though, actually, in most cases, you're not making that decision because how your software will be deployed does not depend on you. It depends more often than not on how your application is designed and how mature your processes are, and so on, and so forth. But anyways, what we are trying to accomplish up to some level and trying to figure out pros and cons is whether it is fault tolerant. I will, that's the only thing that I will not be talking about today simply because everything I will be doing today is in Kubernetes, and everything you do in Kubernetes unless you really mess it up is fault tolerant. If it goes down, Kubernetes is going to bring it up one way or another, unless you don't have enough capacity or a few other cases. But what is important is whether your application is highly available. Unless you are having a hobby site or something like that, you do want your application to be highly available, and that is especially important if you are deploying very frequently. We are all moving away from monthly releases or quarterly releases, and you're probably all going into the direction of releasing once a week, once a day, many times a day, and so on, and so forth. And the more frequently you release, more frequently you deploy, more important that part is because potentially deployment of a new release might create downtime, and downtime as you know is quite opposite from highly available. Whether your application is responsive, whether it rolls out progressively, and you will see later on what that means if you're not familiar, can we roll back in case of failure? And we roll back just like with anything else, what they really mean, whether all those things are happening without you being involved, right? I know that you can do anything you want yourself, or by having 50 people standing in the monitor all the time, but all the things I'm talking about assume that this is fully automated. If it's not, then I count it as if not happening, even though you might be doing it through some other means manually. Is it cost effective? Will it cost us too much money, or will it save us money if we do it one way or another? So the first one I will talk about is serverless type of deployments, right? I will not be using LAMDUS and Azure Functions and all those things because this is focused on Kubernetes and because I don't like to build lockdown into any of those things. So let's take a look first at the serverless strategy, how that looks like. And in a very simplified way, extremely simplified way, what we are doing with that is that we have external load balancer, like for anything else basically, we have some sort of API gateway, no meta technology we are using that is accepting the request, coming into the cluster and queuing them depending on whether there is something running or not and then it is notifying the cluster to do something with your application, which might or might not be even running at any given moment. So let's take a look at the first example, how that looks like. So now we are taking a look at the serverless strategy and this is what I have in my cluster. No, what am I doing here? Wrong slide. Sorry, that's what I wanted. This is what I have in my cluster. It's a very simple definition, even though if you're not working with coordinates, it might look scary. It's a single resource that defines k-native. And if you're familiar with k-native, no, few of you, shame. Anyway, a simple definition, I will not go through it. What really matters in this case is this guy over here that says that nothing is currently running in my cluster, right? If I list all the pods in my cluster that are called k-native because that's how I call the application that I deployed, there is absolutely nothing running and that's normal because I created the cluster and I deployed the application a few hours ago, right? And between then and now, nobody was using that application, so the system decided to scale it down to zero replicas because why would you run something if nobody is using it? Unless you have too much money to spend, but then you can come to CloudBees, for example, and spend money there. So, let's take a look what happens if I do send a request to my application. And I'm going to run Siege. Any of you familiar with Siege, you know what Siege is? It allows you to send a bunch of concurrent requests to somewhere. So I will send to my application, which is not running at all right now, I will send to my application 300 concurrent requests during 30 seconds. So, for 30 seconds it will be bombing the system, the application that does not even exist right now in a cluster and then we're going to see how many pots we have over there running. I mean, that is happening. It will take a minute or two. I have another pretty graph. People get dizzy when they see only terminal, so I tend to mix it with different colors. Anyway, so what was happening is that nothing was running over there when I showed you before, because nobody was using the application, so k-native in this case, and this can be open fast, can be many different frameworks, so just think of k-native as an example, not a recommendation, shut it down. And now while I'm sending requests over a period of time, it will go to the system, figure out that somebody wants to use the application and there are 300 different requests happening at any given moment, so you should not only start running my application, but you should probably scale it to some number. I'm not sure what I configured it right now, but I think it should have approximately 100 requests per replica, so since I'm sending 300, it should spin up probably three pods, give or take. So let's take a look at whether that's happening and there you go. I sent 5,800 blah, blah, blah requests. 100% of them were working, so that's a good thing. And the system at the time, immediately after it finished sending those thousands of requests, had five pods running inside of the cluster and doing whatever they need to do. And if I now, if I now since I was already talking for a couple of minutes, since the time this finished running, you will see that now it already dropped to two and now it is already terminating once or soon there will be only one replica running and after that, if still nobody continues using it, it will go to zero, right? So from taking into account the things I mentioned before, excluding tolerance because that's already kind of Kubernetes thing, this is very highly available, right? Because it will always provide as many replicas of my application as it is needed, which can be nothing or it can be a thousand of them depending on how you configured it, what's your load and so on and so forth. So your users, ignoring now the problems in your application, but if your application is designed well and all those things users should always be happy because your application will always be responsive. Sorry, highly available. Responsive, that's not so much because what you didn't see is that it may take a second, it may take us two seconds until it jumps from zero to something, right? So the first request took a bit of time until it responded. So that's a downside. It is highly available, but not really responsive all the times. Progressive rollout, yeah, kind of. You will see later what that really means, but if I would deploy a new release, it would be replacing existing pods while maintaining those from before so that existing requests terminate and all those things. So it would kind of progressively rollout, but without real mechanisms to decide whether that should really be happening and when it should continue progressing or rollback and all those things. So somehow it does, but not really fully. Rollback, there is no way to rollback automatically out of the box. I know that you can script everything. I know that you can do it. I know that you can stare at the monitor and click the button, but out of the box it doesn't come and it is very, very cost effective. This is the cheapest thing you can do because you're literally using what you need and not more. Assume that your cluster is scaling up and down and that you're having some other things that you need to do, but it is very cheap option, even though it doesn't really matches everything that we really need. So let me run a few other commands just so that they're happening while I'm talking. So what I'm going to do is that I'm going to go to a different application now. Application deployed in a different way. I'm going to change the source code of that application, push that to my Git repository and while I'm talking, GenkiSex, which I'm using in a background is going to deploy a new release of my application. Now, what I will call that strategy, I like to call it Big Bang, officially at least in Kubernetes would be recreate. And what that really means is that when I'm deploying a new release, it will shut down my existing release, it will put a new release in its place and then something will happen or it will not happen and so on and so forth. I'm going to take a look at that immediately. So while I'm talking, what I'm going to do is I will be sending requests to my application. It says hello from something, something, example and when a new release is deployed, it will start saying recreate. So we will see how that works in a minute or two. And until then I have another graph. So what we are doing here, and this is like probably 98% of applications today are doing exactly this because 99 of them are legacy code. So what we have a request coming to external load balancer going through ingress in this case because it's Kubernetes and everything goes to version one or whatever the version is. And then we are going to shut down that version and then we're going to put this version in its place. This is what I'm not going to ask you who's doing this because you would not admit publicly but this is what most of you are doing. Unless you work in a company that did not exist three years ago or five or something like that. Okay, some older ones are doing it anyway. But that's what most of us are doing. And now I'm out of what to talk about. Let me see whether my new release is deploying. It will be deployed soon. Yeah, it will be deployed in a minute or two. So I need to figure out what to talk. Anyway, now, when I said everybody is doing this, what you might be wondering, especially if you're not doing it, why would anybody do this? Can you predict what will happen here? It's pretty obvious what will happen. And the reason why most of the people are still doing that is because most of the applications cannot do anything else. For example, if you cannot scale your application to multiple replicas, and by multiple replicas I don't mean multiple instances, then you have no other option than to do this. If you have a stateful application that has a state inside itself, and that state is not replicated across multiple replicas, then you have to do this, and so on and so forth. There are many, many reasons why you have to do that. And come on. Okay, I'm going to skip showing you what will happen because you know what will happen, and I'm pressed with time here, and it will take half a minute more to deploy it. Anyway, what will happen when you see it is that this is the horrible, horrible thing to do even though you are doing it. It's not highly available because there will be a period of time between old release being shut down and new release running. If it's not highly available, it cannot be responsive, simply that's impossible. There is no way to progressively roll out simply because if you don't have multiple replicas, for example, of your application, what will you progressively do? There is no progressiveness in any form of way. You cannot roll back just at least not that easily out of the box, and even if you can, you're most likely going to be messed up because of the same reasons why you cannot move forward with multiple replicas, and it is not cost effective at all. It is very, very expensive simply because for you to run that, you need to beef up your hardware beyond the need because your application most likely needs to have many, many, much, much more memory in CPU than you really, really need simply to handle the peak load if you cannot scale or you cannot do whatever you cannot do. Come on, it's going to happen. Any questions while waiting? Ten seconds. What is K-native? K-native is one possible solution to do serverless deployments in Kubernetes. Like, there is also OpenFast in few others. And what it does is it monitors the traffic coming in and out and depending on the traffic and some internal configuration, it decides how many replicas of application it should have, like nothing, 5,000, whatever the number is, and it also queues those incoming requests for it when there is nothing, you saw that there was no downtime in my case because when there are no replicas, if you queue those requests and wait, did you create a new one? Did you create a, yes, okay, and send it down there. Drank sex is slow today. It's ticked under, it's taking so much. Yeah. No, it's one of those things, you know. It's one of those things that few years ago, I mean, 10 years ago if I would need to wait only three days until my release is in production, I would be so happy and now I get really, really disappointed because I need to wait no more than four minutes. Anyway, it will fail, it's gonna be horrible. I'm going to save you from watching that embarrassment on the screen and we're going to go to the next one. So, what else do we have in Kubernetes? Let me run another set of commands and show you. This is now by default. If you work with Kubernetes, by default you will be using rolling update strategy, which unfortunately many people keep by default because they don't know that there is anything else and then they realize later on how terribly inappropriate, eh, there you go. This is what most of your applications are doing most of the time whenever you deploy. This is why most of the people are deploying once every three months because of that thing happening over here. Anyways, going back to the third one, rolling updates. Do you know what, who doesn't know what's rolling updates? Okay, shame. So, what rolling updates means, actually I have a graph here, what rolling updates means is that when you have multiple replicas of your application, they are most of the time stateless or they share the state between them and so on and so forth. What we do when we deploy a new release, we shut down one of those replicas and put a new one in its place. And then we shut down the second and put the second one in its place and so on and so forth. So, we are rolling new release one replica at a time or 10% at a time or whatever the criteria is. And in that sense there is no moment in that process when your application is not running. And the problematic part that I didn't mention before and the reason why this is hard, apart from all the reasons I mentioned before, is that for this to work and for almost everything else to work, that means that you need to create the version of your APIs. You need to be backwards compatible. You need to have detailed database schemas that are compatible with previous version and so on and so forth. You need to ensure that every single time you make a change to your code, that change is not only doing whatever it should be doing, but it is compatible with the previous version of your application. If you don't do that, this will never work because you will have a period of time which can be a second or a full day during which multiple releases of your application will be running in parallel. And finally, at the one moment, then that will not be happening, only new release will be running in parallel and I'm going to show you that right now and we're going to see the same thing I showed you before, a loop of sending requests which at one moment will start showing you, instead of the message HTTP example, it should say something else. I forgot what that something else is. But it will take a bit and how much time I have. Oh, I'm very good. Excellent. Anyways. So what will happen with rolling update is that it will be all the time fully available because there will be no millisecond, no moment in all that period or in all that process when your application will not be up and running. Even though two, sometimes even more versions of your application will be running in parallel and serving requests to random users. It will be very responsive and it is very responsive. Progressive rollout still doesn't work really because it is doing this, right? One there, done up and down, up and down, up and down but it doesn't really use any criteria to decide whether it should continue doing that or no. Should it go back? Should it go forward? What is it based on? And I say partly because I'm not, actually it's wrong when I said no criteria, you are Kubernetes is using health checks of your application to confirm, is this really running? Is this okay? But health checks are ridiculously simplified, right? You cannot really make decisions whether you're doing the right thing or no based on health checks unless you actually include all prometheus queries in a health check, but that would be a bit silly. But it is relatively cheap to do that because you are basically running the same brick load all the time, right? You might have a moment when one additional replica is running but that's more or less irrelevant and it will not increase the cost, especially if you run horizontal pod alpha scaler which will also manage the workload and number of replicas of your application. Oh, there we go. You can see that now it, like this guy over there rolling update and then a bit of example. So there was a moment before when we were receiving requests from one version and the other in parallel and then everything is going to the new one and there is no downtime and we leave happily ever after. And then this is everybody's favorite strategy. Who is using blue green deployment? Who is planning to use it? Why? It's a horrible, horrible thing to do. Blue green deployment is horrifyingly bad. It was great long time ago and like long time ago, I mean 10 years, you know, that's long, long time ago. The problem with blue green deployment is that it assumes that you have too much money to spend because for blue, the idea behind blue green deployment is that I'm going to keep the old release running, I'm going to deploy a new release and I'm going to redirect traffic to the new release and I'm going to keep the old release running just in case if I ever want to roll back, right? That means that if you need 100 servers for your workload, you effectively need 200, right? And you need to be filthy rich to be able to do that and you will still not be able to justify because it doesn't make sense. It made sense before when deploying a release took a lot of time, so rolling back would be very expensive in time, from time perspective and when we didn't have virtual machines and especially when we didn't have Kubernetes and containers and all those things, it makes absolutely no sense today and I know that I'm going to be trashed on Twitter and whatever for saying that but blue green deployment is silly. So we're going to skip it. We're going, why would I show you something silly? Makes absolutely no sense. Did you expect a demo? Is that what you wanted? Okay, you're not getting a demo. So I'm going to show you now Canary and while doing that, I'm going to push another change to my application just so that while I'm talking about it, new release is being deployed and what I'm going to do here is again output some message from my application, we're going to see what is doing. So what are we going to do with Canary? We're going to do something similar to rolling updates, we will be increasing the number of pods potentially and the percentage of the traffic going to the new release and decreasing what is going to the old release. So in that aspect, it is very similar to rolling update. The major difference first is that we are not controlling how much goes where by the number of replicas but rather through networking itself. It will control, in this case I'm using Istio but it applies to any other service mesh and probably not service mesh. What's the note? Through networking we control traffic but what really makes this very, very different is that we are not just rolling forward, rolling forward, rolling forward. We are checking all the time whether the results, the experience of our users is what we expect. Whether is error rate above certain threshold. What is the average duration of our request shouldn't be above this amount of milliseconds and so on and so forth. Normally you would have anything between two and 200 different metrics that you are continuously evaluating over period of time and then after some period of time saying, it looks okay, it looks good. Let me increase the reach of my new release for another 10% or 15% or 20% or whatever it is. And that whole process can take anything between minutes or even days depending on what you're really measuring. Now what you are probably saying now, this is awesome and there are tools that can help you do that. In my case I'm using Istio and Flagger. There are others. This is so awesome, why don't I do this? Well, most likely because you can't. One of the big reasons why you cannot do this is because you need to have really, really firm grasp on metrics that you're using to monitor your system. If you're not really confident in being able to predict when something is going wrong based on metrics, actually not even metrics alerts, if you today don't have a system that is sending you alerts whenever there is something wrong and if you're not capable based on those alerts to perform certain actions, this is too soon for you. That's not necessarily bad, but you do need to have a really, really firm grasp on the whole monitoring, alerting and metrics and so on and so forth area. So you will see Flagger in action soon. It will take another minute. Tecton is slow as somebody already said. Anyway, what do we get here? Just like with others, I mean this is kind of awesome, just like K-native. It is highly available, it is responsive, you have really progressive rollout finally. It rolls back when any of those metrics or thresholds are not met within given period of time or given number of times and so on and so forth. It is not really the cheapest solution because you're likely going to run additional pods more replicas than the bare minimum that you need but outside of that, it's also peachy. How much time do I have? 10 minutes probably, right? Anyways, so, which one are you gonna choose? You're most likely going to use Recreate for most of the things, especially those things that were not designed recently. If your application is stateful, and by stateful, I mean it is not replicating state across all the replicas so that they are all exactly the same, you're going to use Recreate. If you cannot scale, you're going to use Recreate. If you work in mainframe and cobble, you're going to use Recreate. Rolling update is the first logical step when you start creating better applications and especially when you start using Kubernetes. Don't think that because I told you that others are cool, you should jump directly there. Get the firm grip here, first with rolling updates, make sure that your backwards compatible always, that your API subversion and so on and so forth. But that's the first step over something better. Canaries are usually the next step once you really master your Prometheus or whatever metrics and your alerts and your confidence. Actually, let's put it this way. This is great when you're confident that you're not needed anymore. When you can push and change to your application and initiate a process that will deploy to production and you feel confident going to movies and watch Star Trek or whatever you like watching and I'm saying movies because you need to put it on silence so you cannot get the notification. That's why that part is important. Then you're ready for this. And finally, serverless are just as good depending on the architecture of your application or no. We will see a lot of serverless this year and this will probably be when they're taking off outside of being vendor specific. Let me see, my canary is still not running. It will run soon. This is so old. This is so slow, but I have no more time. Thank you so. 10 minutes, excellent for question. Thank you for telling me technology conversation, blog, podcast, listen, buy books, expense them to your managers, all that stuff. 10 minutes for questions while you're waiting for canary deployments. Did I scare you? Is that the thing? No, no, Amsterdam, no. Okay, okay, okay. You cannot do it with any kind of applications. So, if you don't have traffic, you can simulate traffic. You can be sending traffic to your application if you don't have a guaranteed traffic. However, I don't think that's the best idea you can do because what we really want for canaries is to get the real usage of the application. Not like, I know from tests that simulating the environment works before I got to production, right? So I want real use cases, real traffic to be measured. And if your application doesn't have real traffic, I mean, doesn't have consistent traffic, let's say. Either you don't do it or you do it over prolonged period of time. You can say, okay, I can deploy it over a full week. Let's say that you have short, long deployment cycles as well for some reason, right? In that case, you can say, I'm going to measure over five hours. Heck, over five hours, there will be a request or a few, right? So if you prolong it, it could work. But nevertheless, if you have no traffic, maybe you should not bother with it anyways. Oh, by the way, you see the guy canary, it's running away, running away. And a new ones are coming. And the percentage of those will be increasing now over time. I'm not sure what the period of time. Anyways, more questions. Yes, so the question is, for the rest, is it feasible, can we mix different strategies? And the short answer is definitely yes, right? You can say, and this is a common theme, actually, that teams, when they're doing a really good job, they're maintaining backwards compatible most of the time, but sometimes it's really kind of like, oh, I can do this in 15 minutes, or I can spend a month doing the same thing in a backwards compatible way, and then you just change the strategy. Now, depending on... Huh? Exactly. So, yes, there is, I actually hate the whole idea that, oh, we need to do this and only this. Our goal is to understand different aspects of different life cycles and so on and so forth, and to apply what makes sense in any given situation. I don't think that there should be a process that must be followed always no matter what, no matter what the process is. Anybody else? No, no. How did you get through Canary with a database? Huh? With databases. With databases? Yes. If they're replicated, then it should be okay if they're replicated. If they're not replicated like Oracle, then you better change the company where you work. Update database during the update and replicated is not still updated, replica will be broken. See you again? You will update master and not slave and replica will be broken. For example, you have updated from 578.0 in MySQL. Yes, so it works when you can have, so let me put this this way. If your database or anything else can be updated without downtime, then you could potentially do Canary or any other strategy. Now, if your application cannot be updated in any formal way without downtime, then you are messed up. Now, there is all the sorts of gray, for example, with MySQL, I haven't worked in a while, so I don't really have the answer for MySQL, but if you cannot do it, then you do something else. You shouldn't be doing Canaries only because it's amazing or you shouldn't make a database serverless because it's in and all those things. So for databases and hash implementations like Redis, Plagar implements traffic mirroring. By the way, this is the author of the Plagar that I'm showing you. And you will see a lot. Next talk is basically the same, what I said, but in a different, with a Spanish accent. Anybody else? Yes, I hope so. It's just that I don't know. So most of the terms are serverless, which is silly because there are servers, by the way. Now, you can say it's called serverless because we don't really, most people do not care about servers that's managed by somebody else. Functions is even worse term. Function is a service or whatever they're calling it because it's unrealistic that all their applications are gonna be single function deployed. So yes, I mean, you can come up with a new term. I'm not sure whether you're gonna convince anybody, but yes, I don't see. Five minutes, so I'm so fast at talking. I have more questions, excellent. Kanari, 100% over there. Anybody else? You wanna go to the toilet? Pipi in kaka? I have a daughter, so I'm allowed to say Pipi in kaka without anybody to get offended. No, you in once, you in twice. Hey, yes. To mirror everything, yes. I mean, that's kind of like in parallel with anything else. So you would probably, these days, you service mesh of one type of another that could be mirroring your traffic wherever it's going, and then basically it's just doing the same thing twice. Yeah, anybody else? Okay, I give you three minutes back of your life.