 But I usually try to follow this guy. Also. I love the bluesy song because it's too catastrophic. The bluesy song. It's each year. Each year it's the same. I hate it. Last year I was there with the speaker. But I do remember having the same problem last year with the loudspeaker. I had the same problem last year because I had the black voice. And whatever. I need to watch it for the first time and I'll be good. I didn't think that I would have to choose something. But... Okay, so let's begin another talk. The topic of this talk is call for deployments. In this talk we will learn about different deployment strategies. How to introduce fully automated CI CD flow by leveraging containers combined with 21st century cluster management hidden in the last version of OpenShift. The talk is presented by Matze Shurik from Red Hat. Don't forget to post your feedback and roles for this session. Hi, welcome. I'm going to start my talk with a few... I'm sorry for my voice. I just couldn't handle that. Apparently the weather isn't good to me. And it's just happening the second time for me in a row. Second of all, you can find me... My name is Matze Shurik. But generally you can find me under SOTUS everywhere on the internet. And today's talk will be about deployments. So let me start with a simple question. One, actually there will be three. How many of you actually deployed some application? Can be that a production application or your pet toy application or whatever to a server? Okay, there's a few. How many of you are from the operations team? Okay. How many are developers? Assuming the rest? Okay, I'm not going to be covering the rest. But I'm going to focus on the... On the interchange between the two of them. Okay, so let's start with some ground. Continuous integration, continuous deployment. Because that's what we will be talking about. Continuous integration. We all know and we obviously all do continuous integration, right? Right. Okay, that's a good start. I'm going to shoot you a coffee just for the ride. See? You can get more of this if you're going to interact with me more. And just to encourage you to talk to me more. I don't want to be the one talking about. So continuous integration is about building our application, running the entire test suite and making sure this process, that part of the process is working as expected. And we're using either Jenkins, bamboo, Travis or some other tool that is available out there, right? Right. Yeah. There's a few more heads nodding. But the second part of the process is continuous D. And the D stands for delivery or deployment. How many of you think it's for delivery? How many of you think it's about deployment? Okay, and the rest doesn't know? Okay, so it's a common misconception between the two. Delivery is about producing an artifact. Be that a RPM, zip, tar, docker image, whatever, that you can then take and deploy into your production. Whereas the continuous deployment process goes all the way to the final step, meaning not even producing the artifact, but actually deploying the artifact on your test environment, production environment, or whichever environment you have. Just to show you that I'm not lying, this is a quote from Martin Fowler. If you don't know him, you should totally check him. He writes really good stuff about deployments and generally about application architectures. So that's not something that I come up with. This is the proper definition of continuous deployment versus delivery. And since the title of my talk is color for deployment, we will obviously be talking about the full automation starting from build all the way to deploying the application to a production or test environment. Of course, I'm not going to focus too much on the building part because that's pretty simple and you already know how to do that. So let me start you with a simple story that happened a while ago when Flick was bought by Yahoo. So Yahoo didn't like the approach that Flick or Tech take that they were releasing their software very often, like several times a day. So CDO of Flick was talking with some management from Yahoo and they told him like, listen, this is not how it works at Yahoo. You've got to stop that. So they make a bet that Flick is going to continue their strategy and at the same time they will be monitoring how things are going. After about a week, they check the results and apparently the responses times and the availability times for the Flick services were much higher than the Korea Yahoo services. That means the process that they were doing, meaning continuous delivery, continuous deployment, was actually working much more efficiently than the stable, very slow, very tested processes that they who already had employees. So what we want to get actually from the continuous deployment approach. So first of all, we obviously want to deliver more features. More features means more customers, more happy customers. More happy customers means more money obviously and this is one of the points that we're targeting for. Repeatability. We obviously want to make sure that our application always works. We will go forward in the direction of making sure that every deployment works as expected because what we care about is we prefer to deploy more often, deploy smaller bits of our application, smaller features and respond to what's happening with our application. So whenever there's a problem, I don't know, we just introduce a small feature and for some reason the entire testing passed and we release it to production and then we find a small bug. So it's easier with a small deployment to actually make a quick fix and deploy that once again instead of rolling back the feature because it's faster and the deployment process is tested so many times because we're doing it over and over, possibly several times a day that it's easier to deploy a newer version with that fix in instead of rolling back because rollbacks as you probably know are always painful. If something is going to happen, it is usually going to happen during the rollback process. It's not as stable as actual deployment. That's why we rather go in the direction of continuously deploying. Obviously, we want to minimize the downtime when we're deploying the application. With bigger changes, there will be some changes to data structures, to different components that will force us to make some downtime and obviously the lower the downtime is, the better for us. The best approach is when the downtime is close to zero and I'm going to be talking about that in a minute and I obviously want to reduce the risk. There's nothing better than not having the risk at all and this is what I was talking about. It's better to do over and over deployment because the process is so well tested because it's performed so many times instead of doing the rollbacks because the rollbacks are just a safety belt that you use from time to time. Let's start with the core part of the talk, which is the deployment strategies. I'm going to start with the blue-green. How many of you heard about the blue-green deployment? That's cool. Basically, the blue-green deployment is all about having two identical environments. That is one of the requirements. Then we need some layer that allows us to switch from one environment to the other. That would be a load balancer or some routing layer, whatever you're comfortable with. But there's also one additional requirement and that basically applies to all the deployment strategies that I'll be talking throughout this talk is N-1 compatibility. Just the fact that we're running two applications at the same time, two versions of applications at the same time requires us to work with the newer schema and the older one as well. That means that when we're changing database schema in any way, we need to do it gradually in such a way that we're adding new fields. The application has to work properly with the limited set because it won't know about the new ones, but the new application will be properly setting the new fields. This is very important so gradually you're modifying how the structure of your data kept within the application is moving. This is very hard to achieve in a very simple way. Generally, we want to make our API stable and this is what we do for our customers. This is the same that we will have to do for ourselves when we are performing any type of continuous deployment. To illustrate the process, we have nine replicas of the blue application that is currently running, serving requests perfectly fine. The green one is where we're deploying and it is like identical one to one identical environment where we upload the new version of the application. During the time that the entire traffic is served through the blue deployment, we can perform the entire test suit and whatever you need to to feel comfortable that the newer version of the application is working as expected. For example, our quality assurance in the department will say this is good, we just make the simple switch and then we're actually serving the newer version of the application. Any questions so far? Let me show you a simple demo how this works in OpenShift. I'm going to be tough because I need to switch. I'm very lazy but the examples I'm showing you are on my GitHub account. There is a step-by-step process how to run this and actually it's not that hard because the setup thing it just creates a project, creates two deployments and by default I'm serving the entire traffic to to the blue deployment. I need to turn my back. It's hard. Give it some time. I do hope it will work. We have three pods running each of the deployments. One is called blue and the other one is called green. I can do a simple get. I'm very lazy. But basically this is the blue serving for us. Probably this is because it's not full screen. See this traffic? There's a nice indication that the entire traffic is currently served by the blue deployment. Unfortunately there's no simple yet way of changing that from here. I need to go to routes and modify that one. I already filled in a request to actually switch that. This is OpenShift Origin master from before yesterday or something like that. I'm usually a very hard-coded person and since I work on a daily basis on OpenShift and Kubernetes for me it's just simple. I just compile the binary and this is what you basically see. I'm going to show you that in a minute. See? That's how simple it is to switch from blue to green. And I have two of them running. I'm going to do full screen to show again that the entire traffic is now switched from left to right. Right? We're going to reuse that traffic control a little bit more once again in a few. But for now that's what we want. So to show you which version I'm running this is like 1.5.0 alpha whatever. This is the comment probably from two days or three days ago. More or less. I don't care about running old or new. Okay. We need to go back to the presentation. Where's my mouse? Good. That was the demo. Did you like it? Who noted the most? Raise your hand. Okay. Yeah. Just because you raised your hand. Enjoy. Okay. Let's get moving forward. To summarize the blue-green deployments let's talk about the pros and cons of this approach. Obviously the biggest the biggest the best feature about it is completely zero downtime. It's almost instantaneous. The people, the users will not even notice when I switch from one environment to the other one because the previous request might be served by the old environment but his next request will be immediately forwarded to the new environment. This is so very nice for us because there's no zero this is like totally almost the ideal word because the downtime is close to zero. Obviously there will be some problems some lost transaction due to the switch. So if somebody will be in the middle of transaction there will be some transaction that will be lost. So you need to be aware of that one unfortunately. But the good side about it is that just because we have twin environments we can actually perform the entire lawsuit inside or on production. How many of you would love to do the entire testing in production? This is something that is not usual that you do. And at the same time gives you a lot of confidence about the stability and the healthiness of the new application. Besides, although I mentioned before that the rollbacks are not recommended it's so easy to roll back. If I want to go back because something messed up so totally so badly that I don't want to continue with the newer version switching back to the older environment at the same time is so easy because I'm just changing my rudder or load the lancer, go back, go back there's something wrong going on. Of course about the the downsides aside from the lost transaction the data migration problems the thing that I was talking before about N-1 compatibility problem when you want to do a migrations you need to be aware that there are two applications running at the same time both of them are operating on those data which is a very problematic thing and unfortunately the cost because obviously having two identical environments means twice as much cost for having those environments in place obviously with cloud solutions such as GCE, amazing or like the cost will be limited because we will only set up this amount of environment for a short period of time, I don't know day two, three for the time of the actual deployment and then we will keep it for some time after the deployment after the switch happened just to make sure that just in case if something breaks terribly there is a path that we can go back excuse me, I have a question the transaction is lost you are not in the office you are just switching the your transaction is currently being served by one application instance and in a matter of time you are switching from one environment to a totally different environment there are certain transactions I am not saying that everything but when somebody will be in the middle that's the problem web sockets or something like that just for being great and raising your hand maybe that encourages others to ask more questions I still have a few coupons and my talk is going to happen for another 30 minutes or so keep on asking questions I like talking with you guys you are shifting your traffic to a different environment do you like to save or conserve resources it would be wise to scale it up for now the pure blue green deployment is what I showed you just now there are different variations because the other deployment strategies that I will be talking about and just starting now with canneries are a variation of the blue green deployments where you are actually defining how much of the traffic is being served by the new application this is what cannery is actually doing meaning that I am deploying my application but the newer version is available only to a certain group of people the certain group of people is decided by a feature toggle the best feature toggle that is available out there is currently implemented by facebook I have talked with a friend of mine he is an engineer at facebook it is called gatekeeper you can find some materials around the internet about the gatekeeper basically the gatekeeper is implemented in such a way that it allows you to specify gender age country of origin and a lot of more to whom the new application will be targeted for there was a super nice article and I mainly so laugh when I read about it last year about September or October the newspapers in Poland were so excited because the new feature was tested by the Polish and only Polish obviously they are doing that on every single day basically with every single feature it just happened that they decided that the polish will be the ones testing this particular feature it is not something that we are super special or anything like that they are doing that on everyday basis I got a question after my talk about that how would I implement that my original answer to that was I would rather have some intertoggle being a separate application but after talking to the facebook engineer he told me because I asked him if there is a chance that they will ever release the gatekeeper what he told me is that the gatekeeper is so tied to the entire infrastructure that facebook is running that it will be almost impossible to make it as a standard application it knows so much about how the application is running after talking to him my answer to eventual future questions will be that you want to tie that to your application in such a way that it can allow you to give a better feedback because obviously first of all you are deploying your version of the application just to a small group of people but you also want to get a feedback from those people so you need to monitor that group even more than you do the majority of your application so yeah, that's the illustration of very simplistic illustration of how the canary deployment can be achieved are there any questions about canary deployment yes there will be demo on the next slide let's do the demo then I'm going to I would need to have like two heads one will be here and the other one will be the other one actually if you go to my github account which is I'm going to show you my user name once again because it's hard to remember there are all those samples there easy to run an open shift so the setup is pretty much the same as before I wanted to show what the setup does the setup just creates a project then it creates two deployments and by default I'm targeting 100% of the traffic to the to the prod deployment let's do a get okay, it's working I want to show you the console need to switch to go back home there's the canary project let's switch to full screen again because we want to see the traffic so well the current implementation of the router that we have in open shift is very simplistic it obviously allows me to just specify the percent of the traffic that is going in one or the other direction so that's what I'm going to leverage here I'm going to decide that every 10th and now we're going to do a little we're going to go off this full screen I want to show you that see approximately every 10th requests hits the new hits the new canary deployment the newer version of the application obviously open shift is built in such a way that it's very easy to actually provide your own load balancer or your own router that will allow you to do some more sophisticated routing rules the other one that I'm showing is very simplistic that's just basically every 10th request hits the newer application we can switch that to 20% or something like that and it should work ok let's do half I didn't test half but it should work it might be that because the service works and it will answer and it decides that but I could replicate what again well we can we can we can I don't see any problem if we want to bump that to I don't know 3 so that was actually the biggest question about if you can somehow tie the scaling of boss to the environment usually you want to correlate that behind it so currently I'm not sure there's such option the manual probably is what I would say there's the out-of-scaling options but obviously out-of-scaling options well out-of-scaling might work although no because it will never hit the load balancer will always know that there's just one currently so it won't work right sorry there's a question in the back and I'm going to give you the coffee the question is what's the difference between origin yes so each of the so in front of all that because what I currently deployed is there are two deployments on top of that every of those deployments has a service and the service is always full into whatever is behind that deployment and then on top of that we have the routing layer and the routing layer is actually deciding how the traffic goes so combining the two probably would be what you were interested in but yeah with the services is one thing but the problem with services is that services do not expose the traffic outside they expose the traffic internally within the cluster we still need something like a route an open shift in Kubernetes it is called egress and it will be eventually moving to egress as well with open shift so that's basically how it works okay we're going back to our discussion we're pretty good with time so that was the cannery yes how do I tell Kubernetes to switch what the traffic so the thing that I was showing you about how to modify I was actually modifying a route the route that is that is exposing those two services so there's prod service and there's cannery service and I'm just modifying the route configuration how it should deal the traffic between those two services what again? the question was can there be more than two services? I bet, yeah there shouldn't be any problems we can actually see the row young definition of the routes probably there shouldn't be something that will tell us, yeah there's something that is called so there's the main service and there's alternate backhands so the alternate backhands if I remember correctly and from what I see is an array of different services that you can route your traffic I'm not sure how the routing division of the traffic will happen when there are more than one alternate traffic but there's no problem when actually playing with that it's very simple example as I've shown you it's as simple as setting up 30 seconds okay we're back to the to the summary of the cannery deployment so what is good about the cannery deployment is definitely the gradual verification just because we're deploying the application only partially meaning partially here means that only a certain group of users is seeing it are able to see if the application is properly deployed if it's working as expected it's easier to again roll back but I still would not recommend it it's easier to get feedback eventually about the smaller the new application at a smaller scale it's easier to eventually address some problems so and only a certain group of users would suffer if something really bad happened and obviously reducing the risk that I was talking earlier today is one of the reasons for that the feature toggle is the coolest thing about this approach especially the gatekeeper that facebook does that is so nice I still have some reading about that unfortunately the usual problems arise meaning that data migrations and minus one compatibility and dealing with multiple versions of your application running at the same time the biggest probably challenge is actually how to apply cannery to a desktop or mobile application it's almost impossible to do so obviously the cannery is something that you can only apply to a pure web application because you control the traffic in a reasonable way with a mobile or desktop that will be very hard okay the last deployment that I will be talking about is the default strategy that we have in OpenShift it is called deployment it basically requires you to have n plus one instances of your application running at any point in time and I'm going to show you what I mean by that with a simple picture as usual and the n minus one compatibility problems applies as well with that so here's the simplest possible picture that is showing how a rolling deployment is working on the left we have an old application on the right there's a newer one so at any point in time there's always n plus one instances of my application running and how many of new applications is running at any point in time can be easily defined by by deployment strategy parameters in OpenShift are there any questions you're very shy I must say okay demo I just need to figure out where I am oh this guy is still hitting that one we're heading to rolling if I type it correctly of course there's setup again that is just creating a simple deployment let's head to our application that will take some time it has to spin up nine pods it's currently 30 pods running on my machine it's not that big okay we have the application fully running I want to do okay so it's very simplistic it just prints Hello OpenShift right and now I want to now I'm going to need yet another screen probably let me think how can I fit in rolling my update script is not very sophisticated all it will do is modify the description that is returned by the application but that change is seen by OpenShift as a configuration change a configuration change immediately triggers a new deployment it can be turned off obviously but for the purpose of my experiment I'm obviously turning that thing off so and we should be seeing how my current my current approach is that I have they are changing one by one basically there's the simplest approach I don't think I have any sophisticated deployment parameters set for that particular one but it'll take a bit hopefully it should work what do you mean by the question was can I tell OpenShift to destroy to spin up a new cluster and then the new application you're saying about a new application cluster is the entire thing you don't want to destroy the cluster that's a totally different thing yeah well the thing is the let me show you it'll take some time but let's see and we can see then you are here let us see the okay where is the edit okay I'm blind so here is the deployment strategy configuration you can specify how long is the time between different between different rollouts how many of the unavailable parts you are able to to work for you or how many parts that should be deployed of the newer application so with configuring that you can basically achieve what you were asking for and that's what I would go for I haven't played that much I don't see Mihales here he was working hard on the deployment on our side but basically that's what you would do to properly set the requirements so that they match whatever you were asking for are there any other questions this is actually called MQA right what again MQA because you want to well great create is yes that great create is one function as well but the problem with recreate is that it just destroys the entire application so not always is recreate what you want because you will with recreate you will actually get a downtime and that's not what you want obviously unless your recreate is okay when you're doing that with the blue green deployment approach because then you can recreate from scratch a new environment and that is fine are there any other questions I still have a few coupons I'm going to have more coffee myself if you don't want to coffee is really great here okay let's get back to the presentation you sure you don't have any questions yes the deployment strategy for deploying open shift is one that we are actually thinking how to do properly currently there are so many problems and rather challenges that we face with the cluster especially when the cluster is really big that is probably the one of the biggest challenge and the time needed to update every single note it's quite significant we're trying work we're working really hard to to address those problems currently probably would go we'll probably go with similar approach that coupes has meaning that first of all you update the master and then every single note one by one I'm not sure what the final approach will be I'm not part of the installer team that is responsible for that I'm usually working with just those yes it is one of the key features that we're working on to have the installer to be because obviously the thing that we're talking about the application to be an open shift to be able to run and serve the applications continuously we want to maintain the same thing for for the for our customers who are for whom we provided those solutions so yes that's one very important topics for us quick summary of the rolling deployment is on the good side you have a incremental rollout so during the time how your new application is being rolled out similar to Canary you can verify the application if it's running properly or not you get the gradual verification because over time you're basically throwing more and more traffic at the newer version of the application and eventually if for some reason the performance of the new application will be not sufficient you can stop the rolling actually at any point in time and fix that redeploy the newer version and then continue with the rolling approach as a downside the N plus 1 instances but compared to what blue-green requires having the entire environment identical environment it is probably a better approach the last transaction unfortunately is something that we will be always facing during the continuous deployment it's hard to mitigate in any way to summarize my entire talk there's no silver bullet there's no one approach fits all possibilities as usual you need to know what you're dealing with but it's very important to understand what are the good sides and the bad sides it's very important and the most important thing is that both developers who are working actively on the application and the operation team cooperate very heavily because for things like N plus 1 and monitoring compared to previous versions what is better what is worse and if the worst is something that we can actually mitigate or fix because maybe there is something I was just starting up 30 containers and the difference between Docker 1.10 that was even before 1.10 I'm not sure if that was even all the way back to 1.8 or 1.9 and origin different versions how over time we're actually taking a lot more into starting containers and that's something that you just as an application developer you need to think of and have a dedicated team to actually work on those that's what I said it really probably doesn't have that but I understand that thank you but that's good great talk thank you so much it's easier to work than just Microsoft you can grow in a way that there are a few new Docker apps that would be created at every commit yes in your case it would end like that you would have a separate deployment that would be responsible for database in case of database containers are always great in the restaurant it's much better with storage but I don't see a problem if you already have