 Our topic today is Drupal Blue-Green Deployment, and if you came to the right place, I really hope so. So let's start, cause we have a lot of things to cover, and the session's got shorter, so there's no mercy there. A little bit about myself. My name is Alexei Gorbitz, and I'm a group architect at FFW, and my background is basically in backend, in architecture, but I also do DevOps for a couple years, and this is a side project, you could say. Coming from a developer perspective, fixing the operations of a team that was dysfunctional. So we'll talk about that. And I'm pretty sure you guys are doing deployments in Drupal, and you face some of these issues, and if you came to this topic, you probably know what we're gonna talk about, but I wanted to line up for everyone else kind of what the topic is about. Today we're gonna cover the problem of the zero downtime deployment that your bosses all want from you to do. We're gonna talk about the solution for that. It's just one solution, it could be multiple different ways to fix that problem. And we're gonna talk about common problems that we're gonna see regarding this solution or other solutions. So let's address the elephant in the room. Zero downtime problem. So a boss comes to you and says, no, no, no, my site should not ever go down. Like, what do you mean ever? Like, you mean like ever, ever? He's like, yeah, ever, but that's not possible. Okay, my site should not go down like this amount of time. So I mean, that problem has a name, and it's called the problem of nines. So your organization will typically pick some level of service availability that you're comfortable with. And it's basically it scales to number of nines that you want to support. Number of nines comes from the percent of uptime you have per year. So being realistic is what really matters. We're talking about like Drupal deployments, Drupal is a state full system, has a database, like a lot of things can go wrong. So you want to go and look at the downtime per year and you want to say to your boss, you want to commit to something that you're comfortable with. So you don't have a blue-green deployment problem if you're in the 90% percentile. So if you're fine, your site being down for 36 days, don't do anything. So I think you have more of a problem when your boss says, okay, three nines, three nines, I commit it to three nines. Then you have eight hours of downtime per year. And depending on what you consider downtime, is the site maintenance page considered a downtime? You might be in a good or worse spot, I don't know. So yeah, think about that. So of course you guys know that if you are doing any Drupal deployments, you know the Drupal deployment problem is, well, they tell you put the site in maintenance mode and also if they don't tell you, Trash does this for you. So your site looks like this, a widescreen or maybe you got a fancy design. That's one first thing, but second thing also, you're replacing or you're modifying rather your Drupal deployment in place. And if you're doing that, you don't have a way to go back to your previous version. Your database might have changed. Your code base is incompatible with the older database versus newer database. So basically there is no easy rollback for you. Here comes blue-green, which blue-green is like the strategy of, okay, let's always have a spare set of environment and the spare set of containers running or engine X running and a spare code base, an older code base and maybe a spare database, an older database that is always available for us to kind of roll back to if we want to. But let's prepare a new one and then switch gracefully to it so that user doesn't notice. It's like a canary deployment, you know. And there are different techniques how to implement that, but the solution is just named blue-green deployment. We're gonna look at some of the infrastructure of review you need for that. So I mean, if you've done any kind of ops in Drupal, you know that you need an engine X and maybe a PHP server. I don't know why it says web server. It doesn't matter. You need the PHP and you need an engine X to run Drupal and maybe you use a patch, it doesn't matter. You need a database, you need maybe a file storage, you need some cache and you typically like design your stack so it's highly available, so you want two of these, three of these, how many you want. And now the problem is that you need a clone of that. So all the time you invested in building this stack, you need to do it twice. So build a new stack that kind of has immediate mirror of that. So when you're doing a kind of blue-green deployment, what you want to do is you want to deploy a new stack with the newer code base and you want to copy the database, you want to copy all of the different states from the previous code base. So you might be using Drush for that or I don't know, your favorite database copy tool that works here. And then you want to test the stack independently from the production stack. So you want users to go to your stack, look at this website and say, oh, this is working or no, this is not working, this is a bad version. As soon as you're happy with your staging stack, you're happy with the result, you kind of want to flip them. Flip is implementation detail, but you want users to go to the new stack that you just deployed and the old stack to become available for testing and comparison. So you're kind of like flipping the ideas or load balancer and that's an implementation detail we'll cover. So basically that's the idea. But then if we go into implementation, this is where this particular session is going to be focused on AWS ECS and is going to be focused on Docker. So we're going to start this by modeling that stack on our local machines and kind of figuring out what services we want to run in the cloud. And yeah, you typically would dedicate an Nginx container, a PHP container for your application to run smoothly, but then there's a question. Do you want to put Drush, Composer, other CLI tools in your PHP container, or maybe you don't, maybe you want to put in a separate container so that PHP scales out to multiple instances. Drush is only one required maybe. So maybe you don't put all of them there. So you kind of come up with this third container. You need to know that between those containers, there's some shared state and shared configuration. For example, you need the code base. Your code base should be in Nginx container because files are stored from their static files in PHP container and in CLI container because both PHP container and CLI container interact with your code base. And then you have a bunch of other services that you need on your local. MySQL, Redis, whatever caching you're using. So you want to configure those containers using something that could be swapped out on local, you're going to give a local configuration so it's going to be like not secure password or something you don't have to keep secret. And on production, you want this configuration to be supplied from somewhere else. So you need to think about that. So as you think about that, first tool that comes to mind is starting building that stack using Docker compose. So if you've worked with Docker, Docker compose is basically that tool that assembles multiple containers, orchestrates them and runs them on your local. They're great tools that are using Docker compose and kind of hide this detail away from you. But if you're doing like some serious production work, you might get back to the basics and understand how things are working. So to get to illustrate a basic Docker compose you could have, you have to think about your basic services that are going to be available both on production and on your local. And you want the stack to be as close as possible. So we're going to define some basics in like a Docker compose YAML file, which is going to be common for every deployment, local or production. It could include a PHP and a web image for example. And as we said before, it's going to share a volume. So volumes from is a directive that's going to allow you to share a volume between the two containers. So anything in the PHP container is going to be available in the web container, which is Nginx. But then on the other side, for the local development, you need to add some other services. Like for example, you want to expose the web container port so that your local machine can see it and you can use it. Or for example, you want to add the DB or Redis or anything that your local machine will need for operation. But then on production side of things, you want to probably tweak some of these things. So on production side, you can actually run a different command when you start your stack, a command that pulls some service secrets, for example, because locally you had like your own local secrets. And maybe, I don't know, maybe you want to configure differently your production PHP container because you want to turn off the errors. Maybe you want to give a different database name. Depending on the stack you're deploying, is it a blue or green database, and stuff like that. Maybe you want to have two services, one for web, one for CLI, one is gonna be scaled to free, one is gonna be just a single drush container running there. And they also need that kind of support. So you're thinking in terms of CLI, in terms of like Docker Compose stack, and then you're kind of moving this away into the cloud. That's kind of the approach I took. And it worked really well. A lot of people will be thinking, oh, does Docker Compose work in AWS ECS? We'll get to that. So as you model your architecture and your local machine, you want to get to the cloud. And in the cloud, there are different terms that developers are not familiar with. And we'll start with VPC, which is a virtual private cloud. It's basically a network isolated network in AWS. And what we want to do, we want to deploy a cluster, which cluster is like a set of machines that are running, they have compute, and you can run containers on them. It could be three, five, 10, 11 machines. So basically you treat it as a single set of machines that can go away, get back to you. And they have resources. So resources are the only limitation of the cluster. So you can run five containers or 10, depending on how much resources they consume. So in order for users to get to your application, you want to expose a load balancer. So load balancer is basically a load balancer. It balances requests from users, from user browser to different resources. So for the implementation of blue-green specifically, you want to have two, remember, the blue and the green. One will call active, and it's going to be serving traffic on port 80, and one will call inactive. It's going to be sitting there on port 8080 and basically waiting for you to test it. So as soon as you have these, you want to start deploying, not yet. So you want to provide your stateful services. So your stateful services are your database, your cache. Before you need the code base, you need kind of where to store the Drupal database. So those Amazon provides these as managed services. So you just deploy them and it's managed for you. So it's really convenient. You don't have to put them in containers. And now we start kind of deploying. So when we start deploying our code base, let's say our code base is a version 1.0, you can deploy a task, a container running, and you can attach it to the load balancer. And you can also use something called the service in AWS that's going to scale this to multiple instances in your cluster so that you have failover, availability. So you're going to get another container running version 1.0. And maybe you want a sidecar container, a container that is not serving traffic, but is used for kind of deployments, for example. That CLI container is a perfect example. You want to kind of put drush in it, for example, and run deployments from there. But then you want, this is your active stack, for example. And then you want to deploy in an active stack. So you're doing kind of the same thing, but on the other side of the road, and you're attaching this to the other load balancer, and you're also deploying the CLI container. Remember how we talked about copying the database from one to another? Maybe your CLI container is that, the role of CLI container is to do that. So you can see that actually, we have a database copy happening that CLI container is doing for us from the blue to the green. And as soon as you're ready, as soon as you're ready, we're going to swap those listener roles, listener rules that are going to be basically, listener on port 80 is going to go to a different stack, and listener port 88 is going to go to the opposite stack as well. So that's kind of the architecture. So in order to implement this, we actually need to look into AWS constructs. What are these things I just mentioned? So in order to run a container in AWS ECS, you need to create something called the task definition. So task definition is kind of like a compose file. You can define containers you want to run, but it's in JSON format, you can do I think YAML format, a completely different format, but yeah, it works. Then this task definition, you can run multiple times, and the instance you're running is called a task. So those tasks could be running on different instances in AWS. But if you want to maintain a number of tasks, and you want to maintain a number of availability that your requirements are, you want to create a service. A service in AWS ECS is going to be something that takes a task description, task definition, and basically make sure the number of tasks is always running, the minimal number you require. And also a service can be load balanced. So you can say those tasks are served by this load balancer on this URL. And people going to that URL will always hit those tasks that you're running. All of that runs in the cluster, which is like a boundary of resources. So how are we gonna deploy this? So now we know how we're gonna architect this, but we don't know how we're gonna deploy this. In order to deploy something to AWS ECS, you need an image, which is a Docker image that's built with your code base and can run anywhere. So you start by, let's say you committed something in your code base and it has a commit hash. Now you need a source file for how to build your image. It's called Docker file. So the Docker file is gonna take the script that you put in Docker file, it's gonna build the image for you. And after you build that image, you wanna push it to Docker registry. The Docker registry is a storage of those artifacts. It stores Docker images. So the version that you committed, you wanna version it the same way or a similar way so that one artifact doesn't intersect with another. You want to have always a version to roll back to and you want to have it always available. You don't want to replace it. As soon as you build that image, you have to update your task definition, which is that JSON file and give it the new image name and say, okay, this is what I want to run right now. And then you have to go to that service ECS service and you have to say, use my new version of my task definition. So that seems very complicated and we came from Docker compose world. We just want two commands to run it or Docker compose up is my preferred one. So introducing the ECS CLI. ECS CLI is an unofficial or alternative way Amazon supports it to use Docker compose format to manage those task definitions. So if you already know Docker compose in theory you can use AWS CCS right now. So basically it supports Docker compose format and also does some cool things for you. So every time you run Docker compose up like locally in his AWS CCS, it will detect changes that you made to your Docker compose file and it will update the task definition and patch it and then rerun the service and kind of bring the new containers in the new task definition. Really cool stuff. Extra stuff. It has also possibilities to create a cluster if you want a quick start but those are the main things that we're gonna use. So and then ECS CLI commands are very similar to what you use with Docker compose. So if you were familiar with Docker compose you will be familiar with this. So those are some examples. You will deploy two stacks. We deploy blue, we deploy green. How do we, what is that blue green thing happening? How do we switch between the two? So the switch between the two is actually there are multiple techniques. Some people say use like DNS. Some people say, I don't know. There are multiple really techniques. And the one technique in particular that works best for ECS and gives you like no delay in how you do that is described in this article link here but also I'm gonna tell you how it works. Basically you're gonna create ALB in Amazon as something called the listener rules. So you can say, if it matches this criteria then go to this target group. And the target group is a set of resources that is balanced. So the criteria we wanna do is match all. So you can see that path is asterisk meaning match all. We create two of these and basically we can swap them programmatically one with each other. And that's how we achieve kind of this blue-green switch. So you want one listener on port 80 and another listener on port 8080. Maybe first time you deploy it, 80 will go to blue, 8080 will go to green. Then next time you wanna swap them and it's just about modifying those listeners. So it's easy to do if you know how to do Amazon API calls. So you wanna describe your listeners, look at them, find out which one is the active, find out which one was an active and then you wanna call the modified listener API which will swap them places. Call it two times because one has to become active and another has to become inactive. So the swap will happen. There is actually a good starting kind of script for that in this link that I have here. And that script is like the recommendation from Amazon. But we looked at those things and I mean it's very hard, it's getting harder and harder and we're just like Drupal developers. So basically now we need to take care of not just the deployment but the blue, green, and where deployment. So instead of deploying to blue or always deploying to green, now we need to kinda in first state from the current load balancer rules and see where it's pointing so that we can know that we're not deploying to the production site. So for that we'll need to get the current active color from the load balancer and then we're gonna have to do this deployment on the opposite color. So if the active is blue, we wanna deploy the green. So to deploy the green, we need to deploy the web service, deploy the CLI service, whatever, how many services you have. And then all of these, if you remember, the deployments are involving this huge schema of replacing things and stuff like that. So it's actually getting harder. So for that, at this project, we came up with basically a script that facilitates those things. So the ECS Deployer we call it or it used to be called orchestrate.py. It's a small Python script and small is, it's up to you to guys judge how small it is, but it's really small. So it's a small Python script that allows you to look at your current state and get the current color of your deployment and then use that color to do either one or the other deployment. So you can actually get the color, save it in a variable, and then some of the handy tools will include things like, okay, let's get a deployment to the target color and you specify the color and it's gonna look for the right stack to replace and it's gonna attach a load balancer if it's first time deployment, if it's not the first time deployment, it's gonna do nothing. So a lot of kind of cool and just facilitators for you that you could use. It also has a promote step which promote will basically swap the listener rules with places and you'll have either a promote when you wanna deploy the inactive to active or a rollback if something went wrong. So it's all promote. You're gonna switch back if you want. As a nice feature, it has an exact command. So a lot of times we as Drupal developers we wanna execute a command in Drush and let's say you have Drush running somewhere remotely and it's very hard to get. You could expose an SSH server in Drush but then you get in trouble with your security team because they don't like SSH server in containers. So what you could do basically, you could SSH into the actual instance running your container and then you can do it like a Docker exec in it. It tends to be very error prone and like there's a lot of stuff happening. You need to kind of look at your cluster, find your IP of your instance, find how to do that. Why, why? I mean there is a handy command for that. So if you wanna do like a Drupal update basically you could use this exact command. So just some handy tools. So this is open source since yesterday. Feel free to use it, look at it and give your feedback. Oh yeah, the ECS deployers like a script. They have multiple commands. I just mentioned four here. Yeah, there's more. So let's take a look at what is our typical CI CD pipeline for this. So CI CD pipeline, you guys have your own preferences. Sometimes someone's using Circle CI, someone's using, I don't know, Travis CI. My experience for this kind of blue green deployment you need something that supports workflows or pipelines. People call them differently but it's something that stops there and asks for a user input before proceeding. So if your CI CD solution offers that, you're good. So basically what you wanna do with this blue green stuff you wanna deploy something to your inactive environment but then you wanna offer like do you like it or not button and people press proceed and then the swap happens, the switch happens. So you don't want this to be completely automatic. Sometimes on production you want to actually have a gating process. So I mean, you could use that. We used Jenkins pipelines and a lot of you probably know Jenkins from you go and configure your script manually but Jenkins pipelines is actually a plugin you have pipeline as a code. You can describe it as code. You can do parallel stuff with it and it also supports all kinds of user input. So that's how our pipelines looks like. On the deploy step, it's gonna ask you where you wanna deploy and I'll show in the next slide. On the gating step, it's gonna ask you if you're fine with your deployment then it's gonna swap, go to the next step. If you're not fine, you can cancel it any time and it's not gonna happen. You're gonna do another deploy. Basically Jenkins pipelines and Blue Ocean is the new interface for it. For those of you tired of Jenkins interface this is a fresh look. We'll give you an input possibility and here we basically have you wanna deploy to Dev or you wanna deploy to UAT plus the step that fills in the version you wanna deploy is basically pre-populated from one of the previous steps. Build step, just build that version for us, push that container and we wanna pre-populate that so that users don't have to go and grab it manually. So whatever solution you choose I really recommend you look at those possibilities to be available because you're gonna need them. Both of these are plugins. Jenkins pipelines is a plugin and Blue Ocean is a plugin. So let's talk about common problems and we really don't have much time but we'll try to cover them. So a lot of you are gonna ask the question so we're gonna do Blue Green. How about environments get out of sync? Let's say I deployed production and then I deployed inactive and then while I was looking and testing my inactive on production people went and modified stuff and did something and I'm losing all this thing because my database was copied over some day at some point and now I'm losing all this stuff and I'm not saying this is the only solution but one solution you could have is you could restrict users from modifying your production environment from any like content changes while you're doing the deployment and instead of putting your site in maintenance mode you could basically say put it in read-only mode. There's a module for that and basically it will display a pop-up message every time you want to modify any content and you can put like a friendly reminder like reach out to your IT team. Deployment is in progress and then after you're done with promote you could basically swap that back. That's my way to handle it. And another way which is. We'll have questions. Oh, okay, sure. Say again please. We did. Yeah, yeah, there's nothing manual. It's part of the process. So one way to handle it would be read-only but there's another way to handle that. And a lot of times this is gonna be controversial. It depends really on the project and a lot of you will not have the flexibility of doing that but in my experience doing Drupal Blue-Green Deployments and losing data in your database is really critical not to lose. Don't put this data in the database. Make Drupal your content system. Don't make Drupal your everything system. Drupal should not be a source of record for your maybe form submissions are really cool. You like set up a web form module. It's really cool to configure it but maybe don't store them there. Cause Drupal goes down and it's expected sometimes. So recommended way would be try to manage your state that is possible that you don't want to lose. Elsewhere and for that use REST APIs to communicate with those other services. Those other services will have their own like deployment and mechanisms. Maybe they have a nicer way to manage schema so they have backwards compatibility with schema. Drupal doesn't. So every time you deploy Drupal you are in the risk of having a breaking change. Hopefully you tested this before. So that's another way. And the combination of those two things is the recommended way because they're not exclusive. Read only mode is really good for people trying to edit content and microservices is really good for user facing features putting comments on your website and submitting forms and stuff like that. So you can't restrict regular users from doing stuff on your website. Another problem we need to address here and it's something that a lot of Docker deployments are gonna address. It's not exclusive to the blue green is Drupal assumes a local public file system but it allows you to replace it. So a lot of these places when you're doing like Docker based Drupal are gonna be like going into that container locally and gonna try to find a file. But now if you're scaling your containers out across different clusters that file is not gonna be there. So you can use NFS, you can use GlusterFS all the other ways to share a network storage and make it like native for Drupal to operate or you could offload that to a different storage. For example, we use this free for files and there's a module called S3FS. There is also another module called fly system which abstracts away storages not just as free. And you wanna look into that for this kind of solution. Just keep in mind Drupal hardcodes some of these things. So maybe you'll have catch one or the other of these. Custom code hardcodes these things. Maybe you wanna catch that. Tweak cache is an Excel module that offloads the tweak caching from the file system to like a temporary folder. So you wanna use that as well. Secrets management. So we haven't talked about our secrets but it's a very easy concept. So we talked that you have your configuration for your database somewhere in environment variables because you want it to defer from your local and production. But on production environment variables if they're stored in your task definition they're plain text. So if you open Amazon, anyone with ECS access can open the task definition. Gonna look at the password, send it to you or change the picture of the security guy and say hey, I cracked Drupal. They didn't. They cracked people. So it's a problem. So if you wanna manage your secrets my recommended approach is using something called SSM parameter store. So parameter store is Amazon service which allows you to store some of the parameters like key values. It's hierarchical. So basically you can organize a hierarchy of okay, this is prod but this is prod prod or this is prod web versus prod engine x secrets. So you can organize in hierarchies your secrets and then some of these secrets can be secret really like secure string. So secure string is one of those types that is encrypted with a KMS key and the KMS key will allow you if you have access to encrypted or decrypted it will allow. If you don't have access no one can see this. So that way you can manage access. And then the running task you're running you can give only to that container you can give access to use that KMS key. So no one else needs to carry a key around and say hey this is my key to all the locks in this infrastructure. There is no key basically. You're just allowing a task to use those secret store. There's an excellent piece of software called chamber and chamber will basically automate this for you. So you can basically run a command provisioning the secrets in a single command. You can say chamber exact tell the secrets name and it will pull all the secrets into environment variables and then run the command like be it an engine X command or a PHP command that starts the PHP server it's just gonna provision the secrets before doing that. So that's how you can manage secrets. And that's it for the main part of the session and we're gonna have questions in a second. So I just wanted to remind you guys that there is a community sprints that you wanna join on Friday and your feedback on this session and other sessions is very, very welcome especially for first time DrupalCon presenters if they tell you that. All right and let's open this for question. That's it for me guys.