 All right. Good evening, everyone. Can you hear me from the back? Cool. I'll try to keep the mic here so I don't speak very loud. I'm here to talk to you today about some patterns for continuous delivery. But before I do that, I want to talk just very briefly about the company I work for in case you guys don't know about us. And then I'll talk a little bit about myself. So I work at ThoughtWorks. We are a software consulting company. We have offices all around the world. You might hear from my accent, I'm not from here. I'm from Brazil. I joined ThoughtWorks in the UK, but now I live and work in the United States, far, far away from here. But we like to say we help our clients solve their hardest problems with software. So we like to work like very challenging projects where writing custom software makes the difference. As I said, we have a global community. It's very interesting, our culture, because everywhere I go around the world, this is my first time here in Singapore, and I meet all the ThoughtWorkers from the office, and it feels like home, like everyone is friendly and very, very nice. We have offices in South America, Europe, everywhere basically, and we keep opening more and more offices. That means we have a lot of experience working with all different types of clients and companies all over the place. And then I like to hear stories from my colleagues to incorporate the learnings and share with the community. We're very, very keen on sharing. We like to pride ourselves in being excellent or striving for excellence in software and then sharing what we learn with the community. So you might have heard of us from someone like Martin Fowler. He's our chief scientist. He writes many, many books on refactoring. But we actually have many other people promote. He wrote a lot of books around database refactoring, which I'm going to cover very briefly today. We have many open source software that we share with the community based on the work that we do with our clients. And then the other thing we do is we publish this thing called the Technology Raider. The latest one was out just last month. So it's a bi-annual publication where all of the big brains of ThoughtWorks come together and they fight over a week to figure out what are the tools and techniques and platforms that we as a company have been using with the clients and we think is interesting or not so interesting. And then we try to compile that down into the Tech Raider. So if you're curious about our opinionated view on technology, I would encourage you to follow the Tech Raider if you haven't done so already. The other thing is we're always hiring people, so I've been told to add this slide here. We're hiring for developers, QAs, business analysts, project managers. If you're interested, if you're looking for opportunity, just go to our website. So that's all the spiel that I have for ThoughtWorks. Happy to talk to you guys afterwards if you're curious. Very, very briefly, a little bit about myself. Why should I be here talking about this? I've been at ThoughtWorks for the last eight and a half years. I've been in this industry for almost 15 years by now. And I spend most of my time actually in the developer world, so I've done a lot of work with Java back in the early days, Rails, web development. More recent I've been moving on to more like reactive development. Scala is my current project. I work a lot of clients as like an architect sometimes, so I don't do as much coding as I used to. People at ThoughtWorks make fun of me sometimes. But what most people don't know is my background. I actually started my career as a system administrator. That was my first paid-for job in technology. This was in the early 2000s. I was still in university, and I was one of the CIS admins for our Linux network for the undergrad students back in Brazil. So I learned the hard way, some of the lessons here. We used to go like we had weekends where we would go to the lab to upgrade our Debian installs. I don't know who's been around since that time, but like back in the day, you get like a CD distribution with the latest Linux, and we have to go like one by one install. That was like my fun, my idea of fun on the weekend. Spent hours installing the latest version of Debian. But I learned a lot from that time. Mostly like trying to think about what is this admin care about, right? Like you have to keep the systems working, making sure you're up to date with patches. You have to deal with users that complain about things that don't work. But I kind of carried that over through my career. So even though I spend most of my professional career as a developer and building applications and systems, I always had that affinity with the operation size of the house. And I would find myself working close to like, when things go wrong, I would be the one that jump on to figure out like, why is this not running? Where are the logs? Let me figure out. Let me profile, why is this running so slow? Or why is the build environment not set up properly? So I sometimes used to be the build guy in the project, setting up like how we get this thing actually deployed in the real environments. And I have like the, because of that background, I always have the empathy with the ops people so like I can talk to them and understand like their concerns, which a lot of developers unfortunately don't have. So make friends with them and like be able to see eye-to-eye and learn from each other. So that's kind of like when DevOps as a term came up in like the 2009s, I was like, oh, that's what I do. Like I've been doing this for a long time and I didn't know it has a name. I wrote a book on the topic. It's called DevOps in Practice. I was actually inspired by Jess Humboldt's book, Genius Delivery, where he talks a lot about like the principles of releasing code with high quality and high frequency. But it is not a book about like how to do it, right? It's all about like the whys and like what you think about. So his book is going to last like a hundred years because it's all the principles. I was dumb enough to write a book that was like, how would this work in practice? So my book has like a lot of code. You can follow along and learn like, if you want to deploy this type of application, like so I have to choose tools. So it gets out later very, very quickly. However, if you're curious to learn into the space, that's a good way to jump into it. And I'm still very proud of the book. It's published by the PregProgs. And if you're interested, I have like a discount code. It will give you I think 20% off. And I'll distribute the slides later. So don't worry about copying it right now. It's a little bit of a weird, long discount code. But this talk today unfortunately is not about the practical because like that takes a while. I'm going to focus more on the principles and the pattern signs. So it's complementary to the book. I'm not repeating the content that's already there. And I like to start with a story. And I think it's a common story. And I call this the day we failed our deployment. So I'll see around after I tell my story if anyone can relate to that. So this was a client I was working with maybe five or six years ago. We were building a web application. The team was actually working a very agile way. So we were working two-week iterations, releasing software every two weeks. However, the deployment story was very, very manual at the time. So we work as consultants. We usually sometimes don't have access to the infrastructure or to the secrets to be able to manage the codes that we write. So deployment was managed by one of the client's engineer. And he was kind of the expert for doing the deployment. And he was doing it for a while every two weeks. And it was a bit of a ceremony. So like we had to stay late because the business didn't want us to run deployments during the business hours. And he would run through the ritual. And it was kind of like black magic because no one knew what he was doing. Like we relied on this guy to make sure everything was fine. And one day, something wasn't so fine. So he would go through all the manual steps. And one of the features of that application was like a search functionality. So we would ingest data from an external data provider and make that available and also make it searchable. So users could make a search. And then he would show search results and they could click through and read stuff. The problem was when he was doing the deployments, he forgot to change like one configuration file, which was like the database connection stream. And he didn't notice that it was pointing to the database in the staging environment. So what's interesting is that like this didn't blow up the deployment because it was a valid connection stream. For some reason the networking rules didn't block it or anything blew up. But what ended up happening was every process ran as it's supposed to. But it indexed and ingested the data from staging instead of production, which wasn't like the complete data set. It was missing a lot of things and it wasn't the real data. But the deployment was fine. Nothing blew up. The people that was on call for that deploy, like some of the QAs, they do like some quick smoke tests to make sure the application is still alive. They could still search for stuff. Everything looks fine. They didn't realize that it was search returning wrong results or missing results. But from the application perspective, it was healthy. So they all went home and everything looked fine. Until the next day, the business was doing a demo for the products. You're like a potential client. And they decided to demo exactly the feature that was broken. So they're like, oh, here you can go and search for things and it didn't work. So they weren't very happy about it, as you can imagine. They came back and they were very, very angry. So did anyone ever experience a similar story? This is not so catastrophic. I have some worse ones. But I tried to pick one that wasn't so bad. Anyone ever saw a failed deployment? Come on. Yeah. It doesn't have to be yours. Someone else's failed deployment. I've done some worse. I've deleted production databases. I have confessions. I can talk about that after the presentation. So this is not a very uncommon story, unfortunately. The usual way that people try to tackle this is like, oh, the problem is that guy didn't follow the process right. He did the deployment wrong. So we need to put more guards and rails around him to make sure he doesn't make that mistake again the next time. And that creates kind of like a vicious cycle. So when you find a problem and the answer to that is you want to add more process, that usually means there's going to be more bureaucracy around the release. Like they're going to have to be more approvals. More people are going to have to look at what's being released. And even though the motives for that is fine, right? They're trying to improve the quality or make the deployment more predictable. The result is that because you have more of that ceremony of bureaucracy, you end up being able to deploy with less frequency because you have to go through the process. And the more process you add, the less frequent you can deploy. And then the less frequently that you deploy, the amount of changes that ends up going out on every deployment increases, right? You batch up all of those changes. And when you have lots of changes, then the risk goes up as well. So there's a lot more risk of something going wrong. And then, of course, when more things can go wrong, one of them is going to go wrong. And then you end up adding more process. You kind of stuck in this loop where adding more and more process to try to prevent mistakes from happening. So let me tell you the end of that story. That's not all we did. We didn't fire the guy. He was still around when I left the project. What we ended up doing is actually he wasn't overnight fixed. So it took a while to fix. But he was pretty effective. So the business was angry because the deployment failed. So I asked them, can I sit with the guy on the next deployment to see if we can find opportunities to improve? And they were very keen on saying yes. So what I did was I was just kind of like a reporter for the next deployment. So I sat next to the guy, and I was trying to write down on Sticky's every step, every manual step that he was doing. So he was like, he changed this configuration file. He restarted the service. He pulled the package from here. I would even write down when he was hesitating thinking about should I do this or should I do that first. Everything piece of information that could be relevant, I wrote on the paper. And then we finished the deployment. We went home. The result of that is that I got like on Sticky's, like a full wall full of Sticky's with like this is actually what our deployment process looks like. So the next day, when I got the rest of the team together, we were talking about the process and not talking about the guy anymore. So it detached the like, you're doing stuff wrong from like, this is our deployment process. Like what can we do to improve this? And then everyone being engineers, they were like, of course found many, many opportunities to, let's automate that. Let's remove this. We don't need to do this. We can do that better. And we ended up with like a bunch of improvement stories to automate and improve the deployment process. And because luckily we had the support from the business based on the failed experiment, they allowed us to save like some capacity for the development team, every iteration to work on those improvement stories. And then over time, we got a deployment process that was much, much more automated, much more reliable. We got to the point. So when I left that client, the business was still not very comfortable running deployments during the business hours. So they still wanted those deployment windows. But it wasn't like a very horrible thing. People that would stay, it was pretty much like you click the button in any way for everything to go green. They did some quick smoke tests and they went home. So it wasn't a stressful process anymore. So hopefully over time, they would get to a stage where the business was more comfortable to making the deployment more frequently. But that's when I left. And this is where I think a lot of DevOps practices can come and break that vicious cycle. So instead of like adding process or having a bureaucracy or manual steps, DevOps can help automate a lot of that. So when you automate, you basically put that process in code and then it's a lot more repeatable because it's not prone to human errors anymore. The script is not going to forget to run the script before the other one or it's not going to forget to change the configuration file. So when you automate that process, then you can start deploying more frequently. And then when you deploy more frequently, the amount of changes that goes with each deployment decreases, which also decreases the risk. Even if something goes wrong, because there's less things going out, it's much easier to find the problem. So when I go to clients and talking about CD and DevOps and why it's important, I always keep one goal in mind. And that goal is to try to make deployment a non-event. So as I said, when I left, it still wasn't a non-event because we still had that two-week deployment window. But trying to get that as a goal for ideally, I want to be able to click this button any time of the day that I want. Everything, new code gets pushed out. Everyone's fine. No harm caused. Even to the point where maybe the business people can click the button and release the code when they want. So one of the key practices to achieving this, of course, is automation. And this is one of the key ideas around DevOps. We know it's not just automation. But that plays a really, really big important role. And we cannot deny that importance. And as we are thinking about our release process from code to production, there's many opportunities, many steps along the way where we can apply automation to make our process better. Some of these is more obvious. Probably people in this room are already doing a lot of these things, I would assume. But compiling your code, building your code, running some static code analysis, running your tests, packaging, all of that can easily be automated these days. There's lots of CI continuous integration tools that allow you to run that as part of your development workflow. You can make your code and you can pick it up and run those things for you. I know a lot of clients that actually stop there or they automate the CI process and they run tests automatically, they publish some package and that's it. So they're doing CI. But they're not doing full CD because there's many other things that you can automate along the way until your code gets to production. So the provisioning of your environments, the servers, configuring the servers that you need, these days you can build environments on demand sometimes if you're running Docker or in the cloud virtualized environments where you can spin things up and down on demand. You can bring that up, you can automate that. Some stuff is still manual. You might want to do like exploratory testing, right? You need like human brains to try to go and tweak and break your app. So you might have like a gate here that's manual. But then the deployment itself can be automated as well. And then you end up with code in production. This whole workflow from idea, code, commit to production. If you manage to automate that, that's what we call the deployment pipeline. Okay, and that's a very interesting key practice as well to practice continuous delivery. It's coming up with the deployment pipeline. So I'm going to talk a little bit more about that. Here's this example and this will change, right? Because your release process is not going to be the same as someone another company's release process. So the shape and the stages and the phases will look different from place to place. So I can't give you like this is deployment pipeline, just go to more implement it. But I can show you some examples and give you some thinking tools to think about how you could model your release process into a deployment pipeline. So you have these stages. And the idea here is like things that you need to do to get your code to production. And then the things that give you faster feedback, you try to put that early on in that pipeline. And this is where the analogy, the name comes from. You try, because you can get faster feedback and run those things much, much faster if they're in the beginning. So as an example here, you check into version control. Building unit tests should be something that's very, very quick. So you can trigger that, run that, and if it fails, it gives you fast feedback. And you say that was not good. If it passes that stage, then you can go to a phase, maybe run some acceptance tests. That might take a little bit longer because you need to bring up the application somewhere and run tests against that. But then it still gives you fast-ish feedback. Then if you run some user test that's going to be manual and you need time from the person to look and say, this is good. So you can model even manual gates into your pipeline if that's what your release process looks like. There's usually a lot more stages in between from code to production. You might have multiple environments that you have to go through. We've been playing a ThoughtWorks around adding more like the security checks within your pipeline so you can start getting early feedback about how secure is the code that you're trying to release? Is there any dependency that is using that's vulnerable to known vulnerabilities? Things like that, this can evolve over time. But it's kind of like when you have a deployment pipeline, every commit is a potential release candidate. So it could go like developers should write code, assuming that if their code is good, it's going to get deployed to production. Every commit is a potential. And then the pipeline is these stages that is going to try to kill the commits to say like, you're not good enough to go to production. So you add these stages, tests, like if it doesn't pass the test, then it's not going to production. If it's not approved by the user, it's not going to go to production. Here's another example from another client, just to show you how it will change and how it can get more complicated. So this was a client where they had like different teams working multiple components. So the color is like the team that's working. This team had like two services in one application. This other team was only just one service. This one had two. There was a package. So package was deployed to less environments because it's expensive to deploy the package. And then each one of those components might have their own like kind of units, integration, contract tests. They published some artifacts, but then they deployed to a shared environment. So each team had like a development environment where they could try the latest for their components integrated. But then you do some more testing, and then you deploy to an integrated environment where you can actually run some more end-to-end testing that would kind of go across those different teams boundaries. And then you get your QA, you can do performance testing, you can get to production. So you try to model that release process as much as you can to give you confidence to put that code out in production. So this will catch things like, oh, this team I think their things are good and they work well together, but when we put it together with the other teams components, then it fails. But it doesn't end when the code is in production. There's a lot more things that happen once the code is live. So you need to monitor, right? Make sure your systems are up and running. Your services are healthy. You might need to get alerts when things go wrong so you can fix it. You might have to give support. You might have users saying, like, this is not working, or I'm not able to do the X, Y, or Z. Cool thing is you can start capturing data about actual user behavior. And you can do some analytics to see, like, did the feature that I built actually was it useful? Is it being used or not? Is it generating traffic or revenue as I was expecting? Some users might be nice enough to give you feedback, so you get, like, real user feedback about, like, I wish I had this or this is not working so well. So ideally, all of these things, once code is in production, will give you more insights, more ideas. And then you go back to the beginning of the cycle. Like, you learn more. Maybe you want to change something or you want to add a new feature or remove a feature. And going through this cycle is what we consider doing continuous delivery. So continuous delivery is a lot more than what people say continuous integration. Like, continuous integration was that very early phase. Continuous delivery is all about this feedback loop from codes to production, production giving you more ideas and being able to iterate on that. And when you do that, frequent deployments going as fast as you can through that cycle, an interesting thing, kind of, paradox happens. You decrease the cycle time. So the cycle time is how long it takes from the code to production. But the quality goes up because you can fix things faster. You're reducing the risk as you're going along. So in order to get that, I want to share with you guys the remaining of the presentation four principles, how to think about when you're trying to move towards continuous delivery and going through that loop. This is usually like a hard thing, a hard move to do if you're doing like two months release or like a very long, going from a very lengthy release cycle to like this kind of short cycles requires a lot of supporting things. It's hard to go from one to the other in like a day. But a lot of companies, a lot of our clients are actually like releasing code multiple times a day. It is something that you can achieve as long as you have the support along the way. So I'll cover those four principles and then I'll share some patterns. So principles, I think of them as more like thinking tools. So they don't tell you exactly what to do in a given situation, but it's something that like triggers your brain for like, oh, maybe I can apply this principle here. So I'll give four of them and then I'll give you more practices, more patterns later that will demonstrate those principles and I'll try to highlights them as I describe them. Okay, so the first one, I think is a very important one. So I put it first and I call it incremental is better than big bang. So there's many, many approaches. You can take big bang, you're making a change and you say like, oh, I'm not going to release this until the change, everything is changed. Or I'm building a new product. I don't want to release to my customers before all the features are implemented or before I'm happy. So you end up postponing things and you wait a long time and then you release everything all at once. And that's usually not a good thing if you're trying to increase that cycle time and deploy more frequently. We're trying to move to something that's a little bit more incremental. So I build a little bit, maybe it's not the whole thing, but I try to release that so that I can learn from it. And then when I go through my next round, I add something new, maybe I learn about something that wasn't so good before and I change it. And then over time, you keep adding more and then you get to the end states one step at a time. And I would argue in order to do continuous delivery, you need to shift away from the big bank thinking and move towards a much more incremental approach. And try to apply that principle. I'm going to show some of the practices. Even when you don't think you would apply, there's always a way to try to break the problem a little bit down to a smaller, smaller one. The second one is also very important. It's the idea that deployment doesn't mean release. This is an example. So this is a cathedral in Milan. So when I went there, I took this picture. They were doing some renovation work here in the back so it's kind of hard to see, so I'm going to zoom in. They put this scaffolding in front of it that had like a picture or drawing from what it was supposed to look like. But behind this scaffolding was like all this work happening that was like probably very ugly. So I thought it was interesting because like from afar, it doesn't look like they're doing a lot of renovation. But then when you look closely, it is. And it illustrates this idea because every time I talk about like, oh, you're deploying more frequently. The first reaction that I get is like, well, this is going to be risky because you're going to be releasing all these problems, all these bugs all the time, and you're going to be releasing things that are half-baked. Every commit you say is a release candidate, but who finishes a whole feature in one commit? That doesn't jive with small commits. So how can you do that? When you decouple these two concepts, that deployment means you put the code in production, but it doesn't mean that code needs to be available to users. So it doesn't mean the whole feature is released. The code is there, but it might not be being exercised. When you decouple those two things, it allows you to do those frequent deployments and then decouple the release of those features from the deployment. I'll show you examples later how to do that, but as a principle, it's a very interesting one. The other one is around small batches, or I like to use this analogy to think about how to fix things when things go wrong, because things always go wrong. It's about how you approach that, how do you support the team to fix it when those things go wrong, and it's the BMW versus Jeep analogy. So BMW is a very expensive car, and it's built to, like, not break, basically. If something breaks, it's probably going to cost you a lot of money to fix, and it might take a while to fix, because you might need to procure the parts, and who knows? I don't have a BMW, so I've been told it's expensive, but the Jeep, on the other hand, is something that's very breakable, right? Like, because of the conditions where you're going to drive this, it's much more likely things are going to break. But it's built in a way that is very easy to fix, the parts are very common. There's even videos on YouTube of people assembling the whole Jeep in, like, a couple of hours. And it illustrates, though, the two different ways to think about failure. The BMW approach is about optimizing meantime between failure, right? So when things go wrong, you're measuring how long between the last time it went wrong and today, and you're trying to, like, make that as long as possible, because you don't want things to fail. The Jeep approach is different. It's optimizing meantime to recover. So I don't care how long it's been since the last time it failed, but I care as soon as something goes wrong, how long does it take for me to recover? And I want that to be as short as possible, because even if I fail a lot, I'm going to be able to fix it very, very quickly. And you might guess, like, as we're moving towards continuous delivery and those frequent deployments, it's kind of an enabler, but it's also, like, something necessary to be able to value more the meantime to recover. So even if things go wrong, the process should allow you to release the fix quickly and should support that. So you're optimizing for a different type of way to look at failure. So you're trying to minimize that batch, right? And then the fourth one is around building quality in... And this one kind of applies to everything that we do. Like, this is part of the agile mantra. As we build things, we shouldn't inspect for quality at the end. You should be building it with quality in mind and adding quality as part of the process. And quality is related to confidence in a way. So this is the guy working. This is the Golden Grape Gate Bridge. I don't know what he was doing there, but he has the safety net underneath that gives him that kind of confidence. And confidence is a thing that removes fear. So if I need to do the best job that I can and the most quality job, if I'm scared of something, if there's fear of, like, I might lose my job or someone else might judge my work, whatever fears you have will be barriers for you to doing the best job that you can, right? So in order to build quality and you need to have confidence and you need to have this safety net to do so. So I'm going to jump into, like, some of the patterns then that illustrate this principle. I have a bunch of them. I think I have ten of them. So as we go along, try to keep up. So the first one is called Parallel Change. I've seen it described as Expand and Contract as well. I wrote a Blicky entry and Martin was kind enough to publish and made it widely available to the community. But this is something, like, that I've seen more often applied in many situations. I actually didn't want to write the Blicky. I asked Martin. He's like, hey, Martin, what's the name of this technique? Because I've seen it in many places. And he was like, oh, we don't have an entry for that. Do you want to write one? So I took a stab and he published it. It's well illustrated here. So this picture is the Bay Bridge. So the Golden Gate Bridge is one end in San Francisco. This one is on the other side. Connect San Francisco to the Oakland side of the bay. And in that area, there's a lot of risk for earthquakes because of the San Andreas failure and they say, like, another big one is eminent in the next, I don't know, 5, 10 years. And the Bay Bridge was built a long time ago and it's not resilient to, it's not resistant to earthquakes. So they wanted to build, like, a more resilient bridge that would survive if one of those big earthquakes hit. What did they do? I tell you what they didn't do. They didn't tell everyone, like, hey, we're going to shut down the Bay Bridge for the next 15 years and we're going to build a better one, okay? Just hold on and come back in 15 years. That would never work, right? Because people would have to go around. It's like a big hassle. What they did was they built a new one in parallel to the old one and people were still using the old one. If the earthquake hits, that would be unfortunate. Luckily, it didn't hit. But it took them probably like 10 years to build the new one. And I think last year or two years ago, it was ready for prime time and they started shifting people to use the new one and I don't know if the old one is still there or not. I haven't been there in a while. But it illustrates this pattern, this practice, and it shows you how incremental is better than big batch and also how small batches are useful to reduce the risk. So I'm going to use a code example to illustrate this and then I'm going to talk about how this applies to other things, bigger things. So this is a Java example. Hopefully, everyone here can read Java. Let's say I have this grid class and I'm trying to replace these methods about cells. They all take these integer primitives, x and y, and I want to introduce a coordinate class to encapsulate that. And I'm going to store things instead of an array of arrays. I'm going to put the coordinate in a map. So I could make this change. Just go and change the method signature here and say, okay, this will take a coordinate instead of this x and y. And as soon as I do that, every code that was calling this is going to break. Then you're going to have to go and fix that and then you might break all the things. So if you're following the parallel change pattern, instead of doing that, you build the bridge next door. So I keep the old code there. I just grade it out here for readability, but it's still there. And then I add the parallel version with the new way of doing things that takes two arguments instead of three. This one as well, this one as well. It's going to save things to this new data structure. So this is the equivalent of that. And the cool thing about doing this is I can deploy this code right now and I didn't break anything that was already working before. Then we go through this migration phase. So let's say I had three clients that was using the old version. And I go and migrates these two to use the new version. The old one is still using the old one. I can still deploy this code and things should still be working. Eventually, you end up with everyone using the new version of the contract and you can go and retire the old one. And delete, and then everyone's happy. So that's the contracts phase. So some people call it expand contracts because of the expand phase. You have two versions. Then you have the migration phase. Then you contract back to one version. Again, very, very cool for thinking about small increments instead of big batches. This is not a real example. If all you have is three methods that you're trying to change and you only have three clients, you can probably do that in an IDE refactoring with a few keystrokes and it takes you a couple minutes and you just make the whole thing at once. But imagine these methods are called by thousands of other classes. If you have to change all of that you end up with a big batch of change. So you can do those in small batches. I can commit like I've changed 10 now. I'll change 10 more tomorrow. I can still deploy this code all the time, all along. And it's still working. It never broke. And eventually, you get to the stage that you want. This pattern shows up in many, many, many different ways. So I mentioned before, database refactoring promotes book about how you make changes to the database. A lot of the refactoring is actually follow this pattern. So if you're trying to rename a column in the database, what promote says in the book is you don't go and rename the column because then you're going to break every application that's using it. Or you're going to have to make them both change at the same time. What you do is you add the new column with the new name and you keep the old one with the old name and you keep them in sync for the migration periods. You migrate all the applications to use the new name. When everyone's using the new name, you can go and delete the old one. It's applying parallel change to making a change to the database. Do you ever see contract? See what? In practice, do you ever clean up? Yeah. Yeah. That requires discipline. Sometimes you start and you never finish. Yeah. That's a problem. The column stays there forever. Yeah. That's not a good thing. Nobody dares to take it out. Yeah. That's a problem. You should track those things. The other time I used this was we're making an architecture change. So the application, they wanted to move, instead of using this data from the database, we're going to move that to a service. So that's a bigger change, right? Like, I need to create a new service and make sure it features parity with what I have in the database. There's a lot of data migration. So what we did is actually we had those two things going on in parallel for a long time. It took us four months, actually, to be comfortable that the service was actually featured parity to what we had before. And then we made the contract phase. It's very important to do the contract phase. Otherwise you keep adding complexity and then it gets worse and worse. This is also something you can do like if you're doing like a cloud migration, for instance, right? Oh, I have to migrate my whole system, the whole enterprise to the cloud. Well, think about incremental instead of Big Bang. Is there a smaller piece that you can migrate and maybe have them work together in parallel? Right? This is a very important pattern and it applies in many, many contexts. The next one is blue-green deployment. So I'm going to illustrate this one. So let's say this one is a web application. So you have an application server, web server, database server, three tier architecture. The new version you put it kind of like in parallel. So it's kind of a little bit of parallel change as well. You deploy to a parallel set of infrastructure. But then users are being routed to the old one. When you're happy that the new one is good, you flip the switch and then users will start using the new version. And then you can either kill the other one or keep it around for the next version. This is very, very interesting. It allows you to do things like zero downtime deployments because you can make sure the new one is available and then you flip the load balancer to use the old one. When I show this pattern or this picture, there's always a question that comes up. Anyone see anything weird with this diagram? Yeah, what is the data, right? What about the database? So I'm going to answer that question by applying the principle again. So the principle of incremental being better than Big Bang, I can break this down a little bit further. So imagine I have the database deployment decoupled from the application deployment. So now I can have the new version of my application in parallel to the old one, but they're still using, let's say, a green version of the database. And then I can switch users to routes to the new version of the application but still working with the old version of the database. And then I make my database change separate from the application change. And I can do whatever migration I need to do in order to do that. For you to be able to apply this in practice, it requires a lot of other things supporting. So you must make sure that the database is forward compatible, so it will work with the new version of the application. The application needs to be backwards compatible, so it should be able to either talk to the new or the old version of the database. So implementing these things require a little bit more work sometimes because you need to worry about all of this combination of scenarios, but it ends up reducing the risks because you can make those changes decouple, right? Whereas before, you might have to lock everything and say, like, I'm going to have a downtime window in order for me to migrate the database data, for instance. Or I'm going to lock the data, so it's going to be read only for a while for my thing to work. If I can decouple that, then I can make those application changes and database changes decouple from each other. Facebook, actually, they follow this principle. And when we say, like, deploy multiple times a day, they usually deploy in the application. The database, I think they deploy maybe once a week. It's a lot less frequent than application codes. So decoupling also allows you to kind of, like, have those things change in the frequency that they need to change and not having to change everything together all the time. Another pattern that's related to blue-green, that's a little bit more evolved, is called Canary Release. And the analogy is kind of a sad one. This picture here is a miner, and he has that canary in the cage. So they used to take a canary to the coal mine because of the sulfur levels. If it gets high enough, you can actually die. So they would bring the bird, and because the bird is much smaller, it would die first. And then if the bird dies, they would just run out, because they're probably going to die later. So it's not a very happy story. But the idea is you have, like, these small tests, right? Like, you're going to test it out with a smaller version of a human or of your system. And it will show how, like, again, incremental is better than Big Bang, and how you can split the deploy from release. So in a similar way, you have, like, an old version and a new version. All users are going to the old version, and you have, like, all your fleets of infrastructure there. With Canary Release, I can choose, like, some users, 2% or 2%, whatever percentage makes sense, to start using the new version, while most people are still going to be using the old one. And they're going to be running in parallel for a while. But you're testing this thing out. If stuff goes wrong, you just revert back to the old one, and you only affected 5%. This router here, in practice, is usually some form of load balancer. And load balancers allow you to make this kind of configuration, where you say, like, I want 2% of the traffic to go here, 95% to go there. The idea is, over time, you start migrating more and more users to the new version, and then you get to the point, again, where you can retire the old one when everyone is using the new version, and you just kill the old one. Facebook has a very, very elaborate version of a Canary Release, because they can actually target which users are going to see which features. So they can say, like, I'm going to release this feature to, I don't know, teenagers from this stage or from this geography. It's very elaborate. They also have another type of Canary Release where if you're working, if you work for Facebook, if you're on their network, you're always seeing the newer version of Facebook. So it's something that, like, us normal users don't see. But employees are kind of like doing QA for them, for free, because they're using the latest and greatest and before they release to users. So they have kind of like multiple layers of Canaries where they can try features and try things out before it goes out to the wide audience. And another practice that's required when you're doing this kind of things is feature toggles. The idea of a feature toggle is very, very simple. So let's say I have here my website and I have a feature toggle switched off, share with friends. When I change that feature toggle on, then my application is going to show a new button or a new way to interact with it. This is how you do the deployment, separating deployment from release. So the code for the new feature might have been in production for, like, months. But then now the business, or I'm happy that it's ready for primetime, I switch that feature on and then it's going to be available. This is also how I did that migration from, like, the database to the service, the architecture change. We had a feature toggle where we could say, I'm going to use the database or I'm going to use the service. And we could test it out. And we would do that. We would go and, okay, let's use the service. Oh, this didn't work. Okay, flip it back. We're back to using the database that we know is good. We make the tweaks to the service. So it allows you to do the incremental testing. And splitting that deployment from release. So I could unrelease as well without needing another deploy by switching the toggle off. So feature toggles can be used for a variety of purposes. We actually have Martin published an article from a colleague of mine just talking about feature toggles, the different types. So you can consider, like, this two main axes. So dynamism is how frequent you might need to change that toggle versus longevity. Like, some toggles might live for just like a couple weeks versus some toggles might be features of the app. Like, you might have it always available. And he kind of... So I would recommend you guys look at the article to see, like, how it is different, how it treats, how it implements these different types of toggles. The next pattern is dark launching. And I'm going to use Facebook as an example again here because that's where I heard about the practice. So a couple years ago, Facebook didn't have that chat feature on the website. So they were going to release that as a new feature. And they have, like, billions of users. And they were worried that, like, is our backend going to be able to support? Like, if people start using chats and start talking to each other, like, are we going to fall down or are we going to be able to survive? So they did this cool thing called dark launching, which is they put the chat feature on the app. It wasn't visible to the user. But what it was doing underneath, it was actually, like, sending random messages to all your friends. So they were using you to do QA in production without people knowing about it. And it would be triggering, like, the backend, basically, to make sure, like, messages were being routed across to the other users. No one ever saw that, but allowed them to do that testing. So it's launched, but it's in the dark. No one's actually seeing it, but they're using it without knowing it. Once they were happy that the feature was actually, the backend would support it, then they would toggle that feature and enable it to real users. They probably did it in more like a canary way. They would roll it out to, like, different audiences first, or they'd test it out. So they do a lot of this kind of experimentation. The interesting thing is, I'm talking about these patterns in the context of reducing the risk for doing deployments, but there's also a lot of business benefits that you get, like, once you get this infrastructure and this way of working, allow you to try things out, right? Like, you can do, like, A-B testing more easily, or I can try a feature if it doesn't work. I switch it off. It gives you a lot of flexibility for how you evolve your product. So this is a cool one, testing with real users. I have two more patterns, and these ones are more about on the infrastructure side. So these are around, like, the development and the delivery of code and how to handle those things. These two are more on the infrastructure side, how you manage your servers. So the first one is Phoenix servers. So the opposite of Phoenix server is what we call Snowflake servers, which some of you might be familiar with. Everyone is special in their own kind of way, like Snowflake. But if you have a server and you need to make a change, what you do is you log into the server, you make the change, life goes on. If you need to make another change, you log on again, right? You rename the config file to the old, or if you're good, you put the date where you made the change so you can back it out, and then your life goes on. And then after a while, no one knows how the server got to the current states because it's not just one person to make the change, it's like tens of people and who knows how it got to where it is today. So Snowflake servers is not a good practice. Phoenix server is one way to remedy that. So instead of making changes directly to the server, you've got to use some kind of infrastructure as code tool or infrastructure automation tool, like Chef, Puppy, Ansible, those kinds of things. You document that as code, and that gives you a delta. And then you apply the delta to the server to take it to the new stage, right? So when you have to make more changes, instead of going to the server, you actually change the code to make the changes that you need to. That gives you a new delta. And then the Phoenix server idea is before you apply the delta, you burn the server, so you reset it somehow, re-image. And then you apply the delta to take it to the new stage. So it's this idea, Phoenix, because it will be born from the ashes, right? You burn it, and then it comes back to life. The conclusion, so this one, you can use kind of like the same server and keep like resetting it and re-imaging. But there's another practice that's kind of like the logical conclusion. If you're going down this path of like managing infrastructure change as code, and then applying to servers. And that practices the immutable servers. And it's very similar, so you're still managing change as code. But instead of a delta, you'll give you like a new version of your infrastructure of that server. And then when you apply that, what you do is you burn the old one, and then the new one becomes the new version of it. So immutable, because once you make that change, and it's live, you never touch it again. If you need to make a change, you create a new version of that server. You burn the old one, and then the new one will take over. So it's a little bit like a Phoenix, but you're not reusing it. It's always, once it's there, it's immutable. If you need a change, you make a new one. There's a few different ways to do that. Implement that. If you're in a cloud environment, Netflix does this a lot for their deployments. You can use the image as the artifact. So you run all your tests, and then you do run all your infrastructure code. You generate an image. And then once the image is built, you don't change it anymore. You just use that to spin up as many servers as you need. These days, a lot of people are using the container now as an artifact. So if you're in the Docker land, you create the Docker image, you build that, and then that's what you use to spin up the new infrastructure. Okay, so I have a quick summary here. I covered a bunch of practices and principles. How to think about reducing that risk as you do deployment and as you move along this journey towards continuous delivery. I'll show you these 10 deployment patterns. Parallel change, I think, is a very important one. It applies in many different contexts. If anything, you might have heard some of these, or maybe not, but this one is a very interesting one. And trying to think in smaller batches, more incremental reduces the risk. So with that, I have some time for questions. So the question was when you're doing expanded contract, you need to somehow say, like, what version is the one you're trying to move towards? Because you might be in that migration phase, and then the guy who started moves on, and then you end up with the code base in the bipolar state. There's a couple ways to do that. So if it's in codeland, I've seen it done in a couple ways. You can add, like, if you're in java or some language that supports it, you can add deprecated notices to the methods that are the old ones. So that's a way to document that in code to say, this version of the method is going to be deprecated. So the IDE will give you some warnings. And instead of using the old one, people are going to be encouraged to use the new one. I have a colleague who was a little bit more radical, and he actually did, like, a rename. So the old method, he renamed to something like, don't call this method anymore, or, like, I don't know, make it like a very ugly name. They're like, no one's ever going to call that method because it looks very weird. There's a couple of different ways. There's also, like, just team dynamics, right? Like, you need to be able to communicate, like, this is the intention. We are doing this refactoring over time. It's going to take a while. As a tech lead, I had one where it was a lot of things to change. So I did, like, kind of like a scoreboard on the wall. And then on stand-up, we'll kind of update the scoreboard every day to say, like, hey, did anyone migrate anything else for that change? That we were doing? And they would kind of, like, fill out the X's. And then, like, once the scorecard was done, then we would duplicate the old one. So you need to keep some incentives or keep reminding people that, like, you want to get to the contract phase. The same thing happens with feature toggles, actually. Like, you can add a toggle and then you'll never retire the toggle. It's cool to read the article because if it's a toggle, that's supposed to be, like, those longevity ones. Like, if it's a toggle that you want to maintain as a feature of your product, then that's probably fine. But you don't want many of those. You want a manageable size of those. But if it's, like, a release toggle, where it is just, like, I'm trying to hide this just until my feature is ready for release. Once it's released, you should probably retire that. That's another place where you can introduce a lot of complexity if you're not managing the toggle lifecycle. There was another question there. Just a feature toggle. The way I've seen it. A feature toggle. One way I've seen or performed is when you are planning to definitely create some kind of thing, you know, you write as long as you get on the wall. You'll be writing with it. And if the story doesn't move, then it's a spell. It's coming to you that something is all right and it's all right now. Then... Question. Can I have a release? Yup. So you mentioned that you expand your radar such that you can probably use it on a really big platform. And how do you actually make consistent vision experience? Obviously, you don't want to save your data but actually sort of click every new feature and get all the size. Do you have a penalty level? It's such a long video. Yeah. Yeah, that's one way to do it. You can do it with a mode answer. It always goes to the same one. You can do some application routing actually and you might want to say you want the application to be aware that the release is going for... or the feature is enabled for this kind of users. If you're doing the Facebook style where they have very complicated audiences of potential canaries, then it needs to be another layer. The low balancer is not going to be smart enough to know that. You might have to put a layer in between to do that routing for you. So it might be something that you have to write. But the most basic one, Sticky Session, is the easy way to do it. There was one question here. I'd like to know more about the dynamic of your database to service migration. When you actually shut out the database, how did you go about doing it? Did you have a lot of time off? Did you have a lot of time off? Okay, not really. In that case, the database, from the application perspective, the feature was the same. The reason why we were moving to the service was because there were other parts of the organization that needed that data and we didn't want them to integrate directly to our database. We took a conscious tech debt when we implemented the feature at the database level because we said okay, we're going to ingest this data into our database, but we know this is not going to be very available for the rest of the company. So it was like a... I forgot what Martin's terminology is. It was like conscious tech debt that we take as a team, knowing that we're going to have to come back and fix that for expediency. So we did that, like we need to release this fast, and then we started building out the service. There wasn't a lot of sign off because it was always like part of the plan that we were going to build this other service. That was just the end of the database? Yeah, because the data was... So during that migration period, the data was being ingested to both at the same time. So the data was in sync between the data... Because the service also had their own database, so it was kind of like keeping the data in sync behind the scenes. And then we had our application level, the toggle to say whether we're going to go through the database route or to the service route. The other thing that was interesting during that migration, we were building a Rails app at the time. And Rails is very coupled to the database. If you guys built Rails application, like Active Record kind of generate classes that hides the database from you. We actually had to do some refactoring at the code level to make sure we weren't like talking to the database without knowing that we were talking about it. So we had to introduce like a facade for like this is going to be the facade for this data. And it was interesting because the facade looked kind of like the API that the service ended up being. So it gave us like by doing that preparation phase before, even before we had the parallel run, just introducing that abstraction layer that was where we put the toggle. That made it much easier to define the service API for the service. And it reduced the need for us to spread out the feature toggle across the code base. Because if we didn't do that, everywhere where like that Active Record object is being used, we'll have to wrap it with the feature toggle and you'll have like spread that feature toggle across everywhere. So there's all the things that we had to do to make sure that migration happened. Thank you. Was there a legacy application hiding somewhere? Oh, yeah, in our case we didn't. Yeah, because we built the database. The service was actually built by another team, but it was in conjunction. We were working pretty closely together. Yeah, there were other apps. Is there a question here? First of all, I want to thank you for your talk. It's very helpful. I just want to note that, for example, we have a small team very limited resource and we want to release the feature very fast as well. And also because the system is legacy systems, a lot of mess in the code and we want to rebuild the system. So what is the best button for a small company to work to actually release a new feature very fast and also have time? Are you releasing the old system? Or are you trying to rebuild the system and release the new one? It's a new system. And then you start already? I haven't started yet. Applying these things to legacy code bases is a bit challenging because it could hit these types of problems like the seam where you can put the feature toggle might not be very clear or the code is so messy that you have to change everything at once. This idea of trying to decouple the changes might not be so easy because everything is tangled together. I don't think there's a magic answer for doing that. If you're building a new system, the way that we try to approach this when we're replacing legacy ones is usually legacy systems doing a lot of things. So we try to apply this idea of incremental. Is there a capability or a feature that I can extract first instead of having to rebuild the whole thing? And then I'll try to run those in parallel. So for that one feature, it's going to use the new piece. For everything else, you might have to use the old one. Sometimes it's not so easy as well because to do this feature, you might have to integrate back some data. So the trick is actually like finding those boundaries like where you can slice it. And if you find a good one, that's why you try to split off first or rebuild and replace over time. We call this like a strangler approach where you're building something around the old one but slowly you take on the work so that eventually you end up replacing everything. But that's like a very, very long journey. This example is like a tiny way to like one first step where we extract one piece out of it. It wasn't even a legacy thing because we were rebuilding it but we've applied this kind of thinking to other clients where they have already existing system and we're going to rebuild or build around it. We try to find those seams for it. So I don't know if that's a very satisfying answer because it's a hard problem but it's a way to think about it. Thank you. Cool? Comment about the vision. We actually did it. A lot of projects where we actually did some kind of standard which I'm only looking at the HD application and then you want to move it to on your framework so the first bit you want to extract was the building platform and so start moving out the building port and start using it back in the letter and then basically control in the web every day so some of you are going to clear out the new application for the rest of the work with the older application and the new services are finishing on the other hand with these routes and then combine these three right rules that the web end to care scope prepared with you can start strategy over the application that works for us. Yes, so I actually have a question more in context of data science pipelines. So it's quite similar to the problems with when you have to basically migrate databases but for my purposes I actually have multiple data sets that have to test and pipeline against so I'm just wondering what kind of how to think about this in the more agile thing basically in your framework. Multiple data sets, do you have to migrate? Yeah, it's actually the problem is that you need to test the computation from the data set all the way to the end products but of course you need multiple data sets you need to test the entire type of the data science workflow essentially so you need to kind of maintain multiple data sets and test multiple data sets to make sure that the computation is basically up, right? Yeah, okay. That is actually another hard problem testing data, data intensive applications I think what you're doing it's similar to something we're doing so we have a client here in Singapore where we're doing some data engineering work as well and they're testing, they're calling like fixtures so instead of testing with the whole data sets which could be like very big if you're doing data science it might be big data sets you have like a fixture which is maybe a smaller set of it and the fixture has the inputs that you expect and then kind of like the outputs that should come out of it and then you run your process with the fixture to make sure what comes out actually matches what your fixture output is supposed to do like a black box kind of testing because you have to like run the whole thing for data science I think is even more tricky because sometimes it might not be deterministic like the outcome might be it needs to be 95% so I don't have a very better answer than that but that's what we'll be doing as well like black box testing for the logic itself you can probably try the unit test like if the transformation that you're doing is more deterministic the unit test the code for the transformation but if you want to test the whole pipeline then it's not as easy have you ever worked in a company where or very good have you ever worked in a company where they completely rebuild their productions from scratch or completely rebuild no no you may rewrite right? which one? let's say the it's not the it's the unitful system the unitable server is more about how you make changes to the infrastructure I don't think it applies to like if you're building a system or making the system itself be static it's more about when you have to make changes to the configuration you build that once and then you replace it but it could be the thing they're replacing is a newer version of our application if you're using images artifact or container it's not that the thing is immutable it's just once you build it doesn't change and when you make a change you generate a new version with the new behavior and that replaces the old one so you're not trying to like stop change from happening it's more you're trying to stop making change on top of change in something that's already running if that makes sense last question quick question from your experience on mobile there are a lot of people who are working with the large teams like one of them is the device the platform and also the stores and also the worship maybe on the back end that's more like a pull-disappearance good question I don't have personal experience building many mobile apps but I heard stories thought workers building mobile apps so I'm going to try to share what I heard there's also some of that stuff in the radar but when you're building mobile apps the app store kind of constrains the frequency of your releases or your deployments because it has to be approved and things like that so that changes how fast you can do something there are some tools though I'm not going to remember them you should probably check on the tech radar but they're trying to allow I don't know if Kevin might remember we were looking at it today earlier some tools that allow you to fetch some changes and apply patches to the application without having to release a new version of the app on the app store I'm not going to remember the name of those things JS Patch, that one is on holds though right yeah but there were other ones in the blip you should probably check the tech radar I'm not going to remember it but I know, yeah, JS Patch is one that fetches JavaScript code and applies to your application but we put that on holds because it's kind of like monkey patch way but that radar has some alternative tools that has slightly more maintainable ways to do that cool thanks everyone for the time thanks for being here so that brings us to the end of this meetup just a quick reminder if you're interested in developing this please register your interest and you're still around here for some time if people want to share yeah, I'll hang out for a while these two help us lean up as you on your way out chuck the ladies thank you so much for being here please stand around thank you