 Rwy'n cael ei ddweud, mae'n cyffredinol fan chi'n gweithio i ddiogel i India yn ystod. Felly ddodd wedi gael ei wneud yn ddim yn gweithio i gael. Dwi'n ffordd gyda'r penedig yn ei wneud, ac mae'n rhai'n gweithio i fynd. Rwy'n cael ei ddweud o architecture yw gweithio. Felly, mae'r cyfrifiadau architecture yn ei wneud i fynd i'r cyfrifiadau sy'n ei ddweud cyfrifiadau i gyfrifiadau i gael eu tynnu. A ydy rydyn ni'n meddwl gweithio ar hynny, when you say architecture and continuous delivery, they think micro-services. That's the trendy word. Who's actually seen this movie? Anyone? Okay, a few of you. What is continuous delivery? I'm going to start with my own definition which varies over time but today you're getting this definition. The ability to get changes, all kinds of changes. Whether they're features, configuration changes, bug fixes, experiments into production, or if you're building firmware or mobile apps or user installed software, into the hands of your users safely and quickly in a sustainable way. So, as Nicole spoke about this morning, the point with continuous delivery is that there's not a zero sum game. It's not a trade-off between can we go really fast and break things or can we go really slowly and be safe. What we know from the data is that the high-performing companies do better at delivering and at producing resilient systems and at producing high quality and at lower cost. This is what the lean manufacturing movement taught us. Toyota was able to produce cars more quickly, at higher quality, and cheaper than the competition, which is why they won. And what continuous delivery and DevOps do is they're following the same path. It's about getting better. It's about improving quality and reducing cost and increasing throughput and these things all work together if you do it right. And you might be thinking that I'm selling you snake oil and I've often wondered that, actually, but it seems to actually work, so that's kind of why I'm here still. There are a couple of golden rules to continuous delivery. The one thing I will say about it is that it's really, really hard to do. It's very easy to describe and the golden rules are very, very simple. Make sure that trunk is always in a deployable state by using continuous integration, not by using feature branches, which leads to rule two, which is that everyone is checking into trunk daily at least. So these are the golden rules. Make sure trunk is always in a deployable state and that everyone's checking into trunk on a daily basis. Sounds easy, not easy, very, very hard. Tryna find... So who in the audience is practicing continuous integration? If you're practicing continuous integration, put your hand up. OK, keep your hand up, keep your hand up, keep your hand up. Put your hand down unless all the developers on your team are checking into master, into trunk on a daily basis. If that's not true, if they're working on feature branches that don't get pushed into trunk on a daily basis, put your hands down, otherwise keep them up. If when the tests run and they fail, they're not fixed within 10 minutes on average, put your hands down, otherwise keep them up. OK, so three people in the audience are actually doing continuous integration. Big round of applause for those people. Continuous integration is not running Jenkins against your feature branches and then ignoring the build when it fails. It is about making sure that your software is always in a working state. You do not even need a CI server for that. Is James Shaw kicking around either? Hey! James Shaw has a fabulous paper called Continuous Integration on a Dollar a Day that talks about how to do continuous integration with an old workstation, a rubber chicken and a bell, which I highly recommend. So go and read that paper. It's a practice or a mindset. It's not a tool and it's very hard to do. So to achieve continuous delivery, we need comprehensive configuration management, which means that any new engineer working on your team should be able to plug in their new computer and run a single command to check out everything they need from version control to build and deploy the software and be able to run a single command to deploy that software to any workstation and any test system that they have access to. So hands up if you can get a new engineer on boarded into the team and productive within one day. That's pretty impressive. All right, between a day and a week, okay? Between a week and a month. Okay, the vast majority of people it takes longer than a week to get a new engineer on boarded. Who takes longer than a month? I'll cover my eyes. Anyone? Okay. So anyone who it takes longer than a day, the next person you're on board, their job should be to write up in a wiki all the steps it takes to on board a new engineer and then someone should help that person, preferably the tech lead should help that person, simplify and automate that process so that it can be done just by checking everything you need out of version control. If we have effective configuration management, it should be also possible to add capacity to our production systems simply by unboxing the servers, plugging them into the rack, plugging in the power and plugging in the network and have a fully automated process that pixie boots those boxes, so tests them, installs the OS, installs any middleware you need, installs the right version of the software, configures all that stuff and then configures the router to start sending network traffic to that box. That should be a fully automated process. Again, this is hard, it's not easy. Even big companies like Facebook and Amazon took them a long time to achieve that and a great deal of work and it was very expensive. This is not easy for anyone. Even the companies were like, we're doing the delivery, deploying a billion times a day. I guarantee you those people invested enormous amounts of money and work in achieving that. It's not easy to do, but it's very powerful. Step two, continuous integration, what she talks about and step three, automated testing, again, not easy. Once we have all these things, we build what's called a deployment pipeline. The idea of a deployment pipeline is any change results in a build. The builds, we build packages that can be deployed to any environment. We run a sequence of automated tests and some static checks to validate things like test coverage, duplication, SQL injection attacks, buffer overruns, these kinds of things. And you want to get feedback within about 10 minutes whether you did anything really dumb, less than 10 minutes, ideally a few minutes. If that breaks, if the commit stage breaks, nobody else checks in unless they're fixing that problem and we fix it straight away. Once we have packages that pass the commit stage, that is going to trigger longer-running automated tests. For any reasonably complex system, a comprehensive suite of automated acceptance test is going to typically take of the order of a day. You want to run those in parallel on a big grid and get feedback within tens of minutes' order of magnitude. Again, if those break, that's not the QA problem, that's the developer's problem. We all stop and we fix the problem straight away, get the software working again. Once we have builds that pass all the automated tests, we're going to then send those downstream for things like exploratory testing, usability testing, performance testing, other kinds of testing. Those kinds of tests are expensive, we don't want to waste the resources and people we need for those tests against builds that are not known to be good according to the automated tests. And we want the automated tests to actually give us some level of confidence that the software is working again, so we're not wasting our time debugging performance tests because the software doesn't deploy properly because we should have found that out earlier. Once we have builds that have passed those kinds of validations, we should be able to push a button to deploy those to production or to any other environment we choose. The point of the deployment pipeline is to give us very fast feedback if there's any kind of problem, whether it's a performance problem or usability problem, whatever. And what we're trading off here is fast feedback on the left versus more comprehensive feedback on the right. If we had infinite resources, we'd be able to condense this down to one stage and do it instantaneously. Unfortunately, that would require infinite energy, so we can't do that, so we have to split it out somewhat. But what we want to optimise for is we want short lead times. We want the time from check-in to deploy to production to be as short as possible, so we want the deployment pipeline to be short and to parallelise. So every check-in results in a build, every build is a release candidate and the job of the deployment pipeline is to prove that the release candidate is unreleasable. And if we can't prove that it's unreleasable because of a known defect, we should feel completely confident releasing it. And if we don't feel completely confident releasing it, that means there's something wrong with these validations and we need to improve them. How do we validate our golden rules of continuous delivery? There's two architectural characteristics which are very important. So coming back to architecture, there's two things we care about. Architecture is fundamentally about validating the cross-functional characteristics of our system. Is it going to perform? Is it secure? Is it going to be available? Do we have business continuity? Typically, the illities is what the architecture is for. We have architecture so we can make sure that our software meets these various characteristics. To which there's two that we particularly care about in continuous delivery. One is testability. When a developer says, works on my machine, that should actually mean something useful. And the way we validate that is because we actually have a way to reproduce a production-like environment on our developer machine because we're using containers or virtualization or other kinds of technologies which allow us to replicate something production-like on our development workstation. If you can't replicate production environment on your workstation in a reasonably comprehensive way, then your architecture is wrong. Anytime someone says, well, I can't reproduce the production environment on my workstation because it's too complex or because I've got an SAP system over here I have to talk to oracle financials over here. I'm like, well, you screwed up your architecture. That doesn't mean that continuous delivery is wrong for me. That means that your architecture is wrong and you need to fix it. Tight coupling between different parts of the system is a bug of enterprise architecture. We should be able to simulate those remote services by building test doubles of some kind that our software speaks to and be able to run automated tests against that system, talking to test doubles that simulate the remote systems in such a way that we can get a reasonably high level of confidence that the software is working with simulated versions of those things. You can't do that and that's an architectural problem with your system. We also want deployability. Deployability means that we can actually do low-risk automated deployments at the push of a button. If it takes you two days of going through a series of very complex scripts to deploy your software, that's a problem. It's an architectural problem. It's a problem with your architecture that you have to fix. So anytime I go to place and say, well, that sounds great, but it won't work here because we have this. I'm like, okay, you have that. That's understandable. You've been going for 20 years. You have all this stuff that has grown up over many, many years. That's fine, but it's a problem and it's a problem that needs to be fixed. And we're going to talk later about how you address this, but you don't address it by big bang rearchitecture of the system. You address it by incrementally over time working to make things better. And that's something we should all be doing all the time. That's really the essence of continuous delivery. So back to architecture. How do we build systems and sets of systems, environments that satisfy testability and deployability? Well, coming back to microservices, there's a couple of really good resources. One is this book by Sam Newman called Building Microservices. And there's also, if you go to 12factor.net, this is a description of the 12-factor architecture, which pivotal, very heavily promoting, which talks about all the various things you need to build distributed systems in the cloud, such as having ways to send logs remotely to different things, to be able to do monitoring, all the various things that we need to be able to do in the distributed system in the cloud architecture. So these are both pretty good resources. However, this stuff is not new. The essence of how to build distributed systems at scale have been known, actually, for a really long time. My favorite quote about how to build internet systems comes from an article written by Jesse Robbins, actually, I think, 2006, 2007, and pretty much everything you need to know about building systems for the internet is just in this one sentence. He says, operations at web scale is the ability to consistently create and deploy reliable software to an unreliable platform that scales horizontally. And there in that one sentence is like, I mean, that's most of what you need to know. Firstly, your software needs to be reliable and it needs to be resilient to failures. We know from Eric Brewer's work on CAP Theorem that fundamentally, in a distributed system, you're going to have to trade off between availability and consistency. You don't get a choice about network partitioning. That's going to happen. So the question is, are you going to optimise for consistency or availability? So fundamentally, that means your platform is going to be unreliable. When AWS went down, a lot of people were like, oh, AWS has a terrible SLA and that's why our systems are down. Well, guess what? If the SLA of your software was higher than the SLA of the platform that you were running on, you had an architectural problem. And you fixed that problem with architecture not by blaming the platform for keeping to its non-100% SLA. We know the SLA of a distributed system is never going to be 100%. That's reality. If you need 100% uptime, you need to not use the distributed system. I gave a talk similar to this at DevOps Enterprise Summit last year. And the next day, a woman who was a distinguished fellow at IBM came up and kind of made fun of this slide and she was like, I work for IBM in the mainframes department. What if you could have a platform that was reliable? And I was like, yes, very good. Buy a mainframe, don't go to the cloud. But if you're not deploying to a mainframe, you're going to fundamentally be working on a platform that's unreliable and you've got to be able to architect for that. And he also talks about consistently creating deploy. That's the continuous deployment part of it, continuous delivery part of it, and the scaling horizontally. That was the key rearchitect that Amazon did when they moved from one lithic architecture to a microservices architecture or service oriented architecture, as we called it then. They rearchitected to be able to scale horizontally and not to rely on a database that was a shared resource that wouldn't scale horizontally. So how do we achieve this? Fundamentally, it goes back to a very, very old concept in software engineering, which is decomposition of systems. What we need to be able to do is decompose our systems into components or into services. What is a component or a service? Fundamentally, this is Martin Fowler's definition, it's part of your system that could be swapped out for another implementation. Alternatively, something behind an API is a component or a service. I teach at UC Berkeley, and I've got these students and I'm teaching them about this stuff. And they're like, this concept of an API for components like libraries, can you explain that in the context of a web service because we all do web services? And I'm like, God, 10 years ago, we wouldn't be having this discussion. Now everything is understood in terms of service APIs and web APIs, and you have to explain library APIs in the context of something that the youth of today understands, which is service APIs. Times have changed. What's the point of decomposing our systems in components or services? Firstly, we want to make our systems more maintainable. Well decomposed systems have good encapsulation of the components, which means that the information on how those components are implemented is hidden effectively. We don't know how those systems are implemented, which means the people who build those components can change the internals of the system without having to change the API. That's the essence of object-oriented programming. And lower coupling, which means there's not tight coupling between those two components, which means again that we can change them without having to constantly change the API. So in that way, the system is more maintainable. We don't, if we change one particular component on service, it doesn't have a knock-on effect that means we have to change everything else that talks to that system. Secondly, we make our system easier to build and to test. Again, we don't have to rebuild the entire universe in order to deploy just one part of it. And this is a key problem we see in big companies, where deployments have to be orchestrated. So I can't just deploy this one service I'm changing. Oh, no, no. There's 12 other systems that talk to that, and I have to wait for those teams to work together and we have a release once every six months where everything has to be orchestrated perfectly over the course of a weekend or even longer to make sure that all the bits fit together perfectly. Who has a friend who has this problem? Okay. Yeah, lots of you, right? Again, that's a symptom that your architecture is wrong. That's not how things should be. That means something needs to be changed by your architecture. There's an architectural defect or a series of defects that must be corrected. That is not how things should be. Who actually releases every day or more frequently than every day? Okay. Who releases between once a day and once a week? Okay, good smashering of you. Between once a week and once a month. Lots of hands going up. Between once a month and once a year. About the same number again. Who releases less frequently than once a year? I'm covering my eyes. It's okay. No one's admitting to it. I'll buy you a drink afterwards if you admit to it. Come on. I would totally lie to that. Okay. So good smashering of you in the once a week to kind of once a month and then once a month, once a year. That seems to be kind of the bulk of people with a few outliers on the front end and some very guilty people on the back end. The third reason to decompose your system into components or services is actually probably the most important which is to enable collaboration when you're working in large and distributed teams. Who works in a large distributed team? Yeah, lots of you, right? Welcome to India. I should say I worked in Bangalore for 18 months on the back end of a distributed team that was based in the UK and India. So I very much feel that pain. I've experienced both sides of that. So I know that it's difficult and it requires a lot of work. The thing that you need to pay attention to is Conway's Law. So Conway's Law has been talked about today at least one person. Who was at Bridget's Talk earlier? Okay, a few of you. So Conway's Law says organisations which design systems are constrained to produce designs which are copies of the communication structures of those organisations. Who's heard of Conway's Law before? All the people on that table for some reason. Go on, talk to those people. So Bridget pointed out very insightfully, I thought, that the key word in this sentence is communication. Talking to people. Most software problems are actually communication problems. This can be kind of a bit obscure. Eric Raymond, the very strange and famous open source guy, has a joke. He says, if you build a compiler with four teams, you will end up with a four-pass compiler. Oh, I say joke. There's a very simple way to explain this. Rebecca Parsons, who's CTO of ThoughtWorks, says, if you don't want your product to look like your organisation, change your organisation or change your product. Which I think is a very succinct way of putting it. The popular way to interpret Conway's Law, although by no means the only way, is to have one team per service. The reason that one team per service is a nice, easy practice, is, fundamentally, assuming those teams are co-located, which, as we all know, is not necessarily a reasonable assumption to make, but assuming the teams are co-located, the communication bandwidth within the teams will be very high, whereas the communication bandwidth between teams will be very low. Now, this is not true if you decide to take your teams and split them up all over the globe. Who's working on a team where the people in your team, by which I mean the small organisational unit of 10 people, or so, that you immediately work with, who works on a team that's actually distributed? Okay, lots of you. So this doesn't work in that situation, this idea of one team per service, because guess what, the communication bandwidth in a distributed team is very, very low. One of the things you always want to try and do is have teams be co-located, but you can have different teams in different parts of the world. That can work very well, because your team, which is all sitting around one table, hopefully, or in one office, the communication bandwidth is very, very high, because you're right next to each other, so you can talk to each other all the time. And so you can work on one service or one component, and that can evolve very quickly. And again, if you have a good API around that, the API doesn't need to change very often, so you don't need to communicate the internal changes you're making to your component or service. All you need to communicate is the API changes, and, of course, you don't need high bandwidth to communicate API changes, and so if those other teams who consume your API are in different parts of the world, that works very effectively. This does not work when the teams are distributed, because what happens when we distribute teams is we have no high bandwidth communication channels in the organisation. So this only works if you're not in that situation. I very much recommend keeping teams, if you're working on components or services, in one place, and being cross-functional. This also doesn't work if you don't have cross-functional teams. So this works but only assuming those teams are co-located. And the reason it works, basically, is because APIs allow you to turn high bandwidth communication within a component or service into low bandwidth communication between services and components. Once we have components or services, we need to combine them to do anything useful. And there's two ways, fundamentally, to bind components. One way is to bind the components at runtime, which is what microservices is all about. This is a visualisation of all of Amazon services. Each one of the little dots is a service in Amazon. I don't know whether it's just amazon.com or whether it's Amazon including AWS, and then the lines, the connections between the services. Everyone is making these visualisations now. They're called death stars, which is a charming term. But this is a visualisation of what it looks like. And there are just lots of very small services. The original Fred George definition of microservices was a service that's less than 100 lines of code. I said this at a conference recently. Everyone was like, that's absurd. You can't write services that's 100 lines of code or less. But that's actually true because if that's not true, you don't have microservices. What you have is a service-oriented architecture, which we've known about for 15 years now. But the problem is people didn't do microservice. Service-oriented architecture is right the first time around. The point of service-oriented architecture was that each of the services could be independently tested and deployed. That testability and deployability was an essential part of building an effective service-oriented architecture, but no one talked about it. You go and read the articles on IBM developer network, and no one talked about deployability and testability, which is one of the big reasons why everyone screwed up in the first place, which is why now we're doing the whole thing again but calling it microservices. So, you know, I'm a bit cynical about this, but if it actually works and people do it right, then I'll be very excited about it because fundamentally this is about service-oriented architecture, but actually doing it right this time. So in this model, each of these services is independently deployable. If you have to orchestrate the entire Death Star's deployment at one time, you do not have microservices. Or indeed, a service-oriented architecture, what you've got is a monolithic architecture, a big brain that's been split into lots of different boxes, but it has to be combined in order to be deployed. The crucial thing about the independent deployment thing, which is difficult, is that when you deploy a service, you can't break the downstream consumers of that service. That's very important. Here's the big rule. If I'm deploying a new version of a service, it's my problem to make sure the things that consume my service don't break. That's not the problem of the deployment team or the QA team. That's my problem. A very effective way to achieve that is through API versioning. And AWS, if you go to AWS and you access the AWS API, you have to provide the version number that you want to access as basically a date stamp. And to the best of my knowledge, you can actually still access the EC2 API from 2007, 2008. They just keep it running forever. And that incurs a cost. There's a cost to doing that. But it's worthwhile because it means that no one complains about using AWS. There's still probably systems running on AWS that are consuming that API from 2007, and they have architected for that. Another way to do it is to use blue-green deployments where you deploy a second version alongside the first version and send traffic to it. And then if you're monitoring those downstream services and they start throwing exceptions, then you can switch people back. So there's a number of different techniques to protect the downstream consumers of your services. But you must do that. Downstream consumers break. That's your problem. And you have to fix it. And you should give people plenty of warning if you're going to obsolete old versions of your API to give them time to fix that problem and move to a more recent version of your API. And then monitoring in this kind of system is very complex because typically what happens is you find a performance problem and you're like, okay, what's the performance problem? And you're like, I think it's this service and you look at that service and that service is doing fine. And all the other services are doing fine and what you actually have is a very complex, emergent set of interactions between services that's extremely hard to track down or a problem with your SQL, which is also what it usually is. So actually monitoring things, you have to be able to trace calls all the way through the entire Death Star so you can find what's causing the performance problems and that requires very good instrumentation of your Death Star in order to be able to find out when something's wrong and then fix it. So the second way to combine components is at build time. And I'm going to use Google as an example of this. Google isn't at the extreme end of the scale. Google isn't a monolithic big deploy. There are actually services of multiple different sizes within Google but I think Google is more to the monolithic side of the spectrum than Amazon or Netflix are. Netflix and Amazon are very aggressive microservice people. Google is slightly further towards the monolithic but they still have lots of different services they deploy independently. But for example, the services in Google's cloud stack are multiple different layers that are all deployed independently. But Gmail, for example, to the best of my knowledge is deployed as one big property. And certainly Facebook.com is even further towards the monolithic end. Facebook.com famously is just an enormous compiled PHP binary that is deployed via what's that file sharing, peer-to-peer thing. BitTorrent. They have a modified version of BitTorrent which they use to deploy their multi-hundred megabyte binary. Essentially what you're doing is you're combining components at build time and deploying them in much bigger chunks. That's the other way to do this. In order to do that, you need to be able to do continuous integration at scale. So you need to be able to do what... This is a slide that I stole from Google. Google has this enormous continuous integration architecture. They have 200,000 test suits in their code base. They run 10 million test suits per day. They have more than 60 million individual test spaces per day and growing, and they run more than 4,000 CI builds a day. Now, it helps that Google has more servers than God, but nevertheless, this is pretty impressive. And the reason they do this is any time I make a change to a library, all of Google is built monolithically off one trunk, with the exception of, I think, androids and... That's one of the things. Chrome, I think, is not built off one trunk. But pretty much everything else in Google is built off one trunk. They used Perforce for a number of years, and there was just everyone checked into trunk. If I'm working on a library that's consumed by a bunch of different Google properties and I make a change that breaks, say Gmail, I had better find out about that quickly. I mean, every time we deploy Gmail, they just take whatever is the current version of all the libraries that Gmail consumes, build one big binary and deploy the whole thing out. So it just picks whatever happens to be in trunk at the moment that they take the release branch. So if I'm going to break trunk, I'd better know about that as soon as possible and fix it. And so that's why they do this, so they can get feedback within minutes or tens of minutes if I make a change to a library and it impacts some downstream property so that we can fix it straight away. If I don't fix it, it's perfectly acceptable from someone in another team in Google to revert my change from version control. That's an okay thing to do. I leave a broken change on trunk. I'm potentially putting at risk a bunch of different Google properties, and they are totally within their rights to just revert my change straight out so that trunk is good again. So this is the other way to do it, but it involves a great deal of CI infrastructure, but then the trade-off is that you don't need to then worry quite so much, although you don't need to worry about things like API versioning and monitoring and so forth. So there's trade-offs. There's no best way to do it. Either of these models works very well. You can be basically at any point within that spectrum and be fine as long as you're architecting to the trade-offs along that line. The thing that you really have to worry about is the unreliable platform. Resilience, security, scalability, availability, deployability, testability. These are architectural concerns. You cannot take software that doesn't fulfil these architectural characteristics and wave your magic DevOps wand and have the DevOps fairies come and sprinkle the security and availability on your software. That doesn't work. You have to build it into the software from the beginning. This thing where it goes to QA and let's say, oh, we only achieved one-tenth of the performance characteristics. That's not something where you can spend a couple of days hacking it up or buy some more hardware. No, you're going to have to re-architect your system, and that's going to be really, really painful. So who's actually had to do that? Surprisingly, few of you, I'm astonished. Seriously, who's run the performance test and then got a really horrible, sinking feeling in their stomach because the software performance is really horribly... I clearly asked the question wrong. Anyone who's ridden software has had that problem. How easy is that problem to fix? Can you fix that within a couple of days? Anyone? No, you took weeks or maybe a few unlucky months to fix that problem. It was super painful and there was no time in the schedule for it. Everyone was very, very sad and had to work weekends. So these are things you need to care about from the beginning. You need to be testing for them from the beginning using your deployment pipeline. They are architectural concerns and you need to be thinking about this all the way through the software development life cycle, not in the performance testing stage of your software development life cycle which you scheduled a week before deployment to production. The other thing to bear in mind about this, probably the most important thing I'm going to say is that there is no perfect architecture. It's a company where some new architect comes in and builds this new beautiful diagram and then the VP of engineering is like, this is the 2B state of our architecture. In two years we will have achieved the 2B state. Okay, put your hands up. Keep your hands up if the 2B state was actually achieved in two years. It's a lie. We all know it's a lie because what happens is that one time before that two years up one of two things happens. Either the project gets cancelled or the VP gets fired or the VP quits or all three of those things. So the 2B architectural state is a lie. We all know it's a lie. There is no 2B state because guess what? The architecture is always evolving and that is a necessary thing that architecture should evolve. The right architecture for building a new idea for doing lean startup for rapidly iterating and pivoting is not the same architecture that you need to build a large scale distributed system with millions of users. But guess what? Number one, that large scale distributed system with millions of users is not guaranteed and you're certainly not going to get to that state by building your prototype with an architecture designed for one million users because it will take you so long to build that system in such a way that it actually works and delivers value that your idea will be out of date by the time you do it. So over designing up front your architecture is a terrible idea because it doesn't give you the flexibility to pivot and rapidly change the functionality in order to understand what your users actually want. So architecture will necessarily change. There's a great talk by Randy Schout from Craft Conference last year that I very much recommend. Google, eBay where he was VP of engineering of some department or other. All these things, all these different companies had multiple different architectures over the years and that was part of the key to their success. I love Amazon because Amazon never tell you what they're doing because they think that everything they do is a competitive advantage. It's very hard to find out what's going on within Amazon. But sometimes things leak and one of my favorite leaks was by a guy called Steve Yege who worked at Amazon and then went to work at Google and then got very cross at Google because Google Plus was going very badly wrong and he was mad. So he wrote this very long memo which I think he deliberately publicly posted on Google Plus probably to drive traffic to Google Plus. But he claimed that it was a terrible mistake and he intended to either way like it reflects poorly on Google Plus. But he talks about the big architectural shift that Amazon.com made in 2001-2005 where basically they ran out of vertical scale for their database. They threw hardware at their database and in the end they just couldn't scale it anymore and so they were forced to re-architect because they ran out of capacity of their database. It was their single point of failure and so according to Steve Yege what happened was Jeff Bezos, the CEO of Amazon sent a memo to everyone and the technical staff which would not be unusual apart from the fact that the memo consisted of a series of architectural orders which is not the normal thing for a CEO to do. And so this is what Steve Yege says who's seen this by the way that table over there again it's very suspicious. But you should be buying everyone else drinks I think that's your punishment for knowing all this stuff. So Steve Yege's platform rant says this, this is what allegedly was said by Jeff Bezos. Number one all teams will henceforth expose their data and functionality through service interfaces. Number two, teams will communicate with each other through these interfaces. Number three, there will be no other form of inter-process communication involved. No direct linking, no direct reads of another team's data store, very important. Number four, the only communication allowed is via service interface calls over the network. Four, it doesn't matter what technology they use. HTTP, Corba, Corba pub sub, they didn't choose Corba. That's a spoiler. Custom protocols doesn't matter Bezos doesn't care. Number five, all service interfaces without exception must be designed from the ground up to be externalisable. That is to say the team must plan and design to be able to expose the interface to developers in the outside world, no exceptions. This is very important for reasons I will come to later. Number six, anyone who doesn't do this will be fired. And there's a number seven which I don't recommend. Number seven, which is not reproduced here but it's in Steve Yege's platform rant he says, number seven, have a nice day and then he's like ho, ho, ho. Anyone who's worked in Amazon will know that Jeff Bezos does not give a shit about your day. So a corollary of number six, I guess. So I'm not recommending being an asshole. I think it's possible to be very effective and nice to people. That's one of my favourite lines from the latest movie about Steve Jobs where who's the other Steve? Wozniak basically turns to him while having a fight and he's like, it's possible to be brilliant and to be a nice person. I'm like, you go girl. So this is what basically they did and Bezos hired this ex-army ranger called Rick Dalzel who had been CTO of, I think, Walmart to basically go around and look at all the teams and if he saw teams talking to other teams databases instead of their service interfaces he would shout at you and if you kept doing it he would try and have you fired. He was a very serious genial man and I knew everyone was very afraid of and so they enforced this thing really, really rigidly. It took them four years to re-architect all their systems to do this and they did it over a lot of time. The other key thing that they did which was very interesting going back to my thing about co-locating your component or service teams, this is Bernard Vogels CTO of Amazon I'm going to literally put words into his mouth which I love doing but it's okay because he said them what this means is the team that builds the services has to run the services in production it's one team with both development and operational capability so again this is back in 2003-2004 DevOps was more or less invented at Amazon these ideas have been around for decades all these ideas you're hearing about this day you're like oh that's crazy people have been doing them for decades they're just not widely known about people from university who still haven't heard about these ideas and have to learn about them again which is my depressing reality right now so he says you build it, you run it this is the other very important thing and again if the individual teams are distributed it's very, very hard to do this it took them four years to do this they did it incrementally they didn't do a big bangry architecture they had to keep amazon.com running so they had to rebuild the plane while it was flying this is an important lesson for architecture to say to your architecture if your architecture will necessarily always be evolving then what we have to accept is that that's a natural state that evolution of your architecture as your business conditions change and as your company continues to change is a necessary and actually a desirable thing so who has seen Tomb Raider the movie? anyone recognise this? where's this? Angkor Wat, correct so this is in Cambodia a series of very beautiful Buddhist and Hindu inspired temples that I very recommend that you go and see because they're lovely and out of this temple is growing a strangler fig what happens is this a tree grows and then a little bird comes and poops on the tree and then a fig grows up around the tree and then the tree dies and all that's left is the fig and you know this because there's fig trees but this is an important lesson for evolutionary architecture is that we don't want to cut down the tree and replace it what we want to do is incrementally evolve our architecture over time so you don't take the monolithic app and completely rebuild it from scratch and destroy it what you do is as you have new requirements coming in and you're building new features you have a rule new features must be built using a nice service oriented architecture but they will continue to talk to the original monolithic app for a bunch of their stuff over time as you continue to add new features as you continue to refactor stuff out into the strangler you will gradually over time kill the monolithic app and strangle it and then it will be dead but this is something that's done consciously over time and you have an abstraction layer over the top for the purpose of sending your users transparently to one or the other an individual page on your site might be rendered by some combination of the original bit and the new stuff that is strangling the old bit it's totally possible to do that and indeed recommend it you can do it by horrible ugly mashups of different bits of HTML it doesn't matter because ultimately you're going to kill it that's fine but strangling over time is the way to rearchitect and the great thing about strangulation as an evolutionary architecture technique is that you can do it on trunk you don't need to create branches and in fact what happens when you create branches and I know because I've lived this is that you're like oh it'll only take me a month to rearchitect this component there we are three months later and you're like just a few more days and I'll get this branch back in I promise and it's really miserable there's a great technique called branch by abstraction so again tree metaphors branch by abstraction you just type that into Google you'll find an article I wrote where we actually were changing for a piece of software I was building a few years ago we were changing the front end from being Java to Ruby on Rails and the back end from being iBartis to Hibernate while continuing to deploy the app on a continuous basis basically through strangulation so the important thing here is the abstraction layer over the top and that's absolutely true of strangler applications this is a very under-use technique but it reflects the reality which is that evolutionary architecture is necessary it's a good thing it's to be embraced your architectural evolution is something that you should plan for and build into your software so with that hopefully you have time for another couple of questions you can email to addresshumble at sendyourslides.com and get a bunch of free stuff I very recommend doing that you won't be spammed, you'll just get free stuff everyone loves free stuff Questions? Bridget Question, if the SLA for your application is higher than that for your platform then that may be an architectural problem it sounded to me like that's something you shouldn't do but it's possible that I was confused and didn't understand what you said Could you talk about that a little more please? So the example is when you require higher uptime than AWS gives you you need to be able to architect for that So you can have a higher SLA than your platform's SLA but it's something you have to architect for What did Kyle Kingsbury say? I said that you shouldn't have one and he was like, talk about that some more so clearly he's thinking she misheard that which I did, thank you Kyle Kingsbury is a king in the world of distributed systems so Kyle Kingsbury says you're wrong, you're probably wrong I would trust Kyle Kingsbury over me He goes by the Twitter handle aphere and he's built a suite of tests to validate So you know the new crop of NoSQL databases but like, oh you can have it all he basically ran, he built a test suite in Clojure that basically sets up clusters of NoSQL databases and then introduces network partitions and then sees if they are actually linearisable in the case of network failure and many of them were not and it was very embarrassing for a lot of the vendors when Kyle Kingsbury was like, your thing doesn't actually work and they were like, yes it does he's like, no and here's the test and so it's called Jepson after the singer who has a song called Call Me Maybe So anyway go and follow Aphere on Twitter go and read his articles on Jepson if you're interested in NoSQL distributed systems they're very entertaining he lives quite near me and I don't want to get on his wrong side So, other questions Is there beer after this or something? What's going on? Oh yeah, go ahead So what about a microservice architecture for embedded systems? Well here's the thing it doesn't really make sense to talk about a microservice architecture for embedded systems because fundamentally embedded systems are not being bound at run time except if you're building a microservices internet of things which you would definitely get points for in terms of buzzword bingo because microservices internet of things woo! So yeah, maybe if that's what you're doing if you have lots of little widgets distributed into the internet then yeah, you might actually be doing this and this is something you have to worry about but I can see you're panicking right now you're like, no no please, don't do that which I broadly agree with So in the context of embedded softwares basically what you're doing is this you're binding the components of build time build time means actually centre manufacturing Even the re-architecture and making it even easily buildable also is a big question Yeah, I mean here's the thing even in embedded systems you're often building large software I know one of the better known case studies for continuous delivery in the context of firmware is HP LaserJet firmware they had a 10.5 million line code base written in C-Sharp I think which is what runs the firmware for HP LaserJet printers and again they componentised it that was very important to componentise that and they built a very powerful continuous integration system for their firmware which ran about 30,000 hours of automated tests against logic boards in racks where they were doing everything except actually printing out the paper so they built this very sophisticated comprehensive automated test suite for testing their firmware both on simulators and on emulators so yeah this is absolutely something that you should be caring about in the context of firmware I mean the great thing about continuous delivery is it applies everywhere Thank you, you're welcome What's the financial side? The financial side is that it's expensive So here's the thing it costs a lot to build these things but you get an enormous return at the end of it so I'm just going to see a little bit more on my laptop but if you go to my presentations on SlideShare you'll find one where I talk about HP LaserJet firmware it's very interesting they did it before and after where they looked at the amount of time they were spending that the engineering team was spending on running tests and doing support and all these other activities and before they were doing the whole continuous delivery thing the non-value add activities I think 92% of the spend only 8% of their money was being spent on actually developing features the other 92% was being spent on support running manual testing merging code between branches and all these other things afterwards they basically massively reduced the amount of time spent on non-value add activities and they achieved an 8x productivity increase in terms of the amount of money actually going into feature development ended up being 30% of spend was invested in features the numbers on the right didn't add up because there was a new activity and it turned out that 30% of their budget was being spent on building and maintaining the automated tests so if you go to your VP of Engineering and you say please can I invest 30% of my budget in test automation what's the response going to be no, you may not but if you look at these numbers and you see the before and after what you can see very clearly is that despite investing this enormous amount in test automation they achieved an 8x productivity increase in terms of the amount of money being invested in actually building features they were able to go much much faster it gave them an enormous competitive advantage so here's the thing with all of this stuff if you cast it in terms of we'd like to spend an enormous amount of money please for something with very uncertain returns even though that's what software is fundamentally all about but if you look at it at the back ends again the numbers are published they're really great numbers but they achieved 8x productivity increase the cost per programme went down I think it was 78% the number of programmes they were able to run simultaneously was very very high and these numbers are great they're exactly the kind of thing you want to show to a CFO to present a business case for why you should invest in the kind of things I'm talking about so the finance is very good if you take a long term view and that's the problem most people do not take a long term view and that's why they do not want to invest in these things because they're focused on I have to release this thing next week not I have to be able to actually evolve this product in three years all my customers will go to someone else and you have to take the long term view once you're at a certain stage of your evolution not you know once you've got customers then you have to take a long term view we're probably out of time who's in charge we've got ten minutes three minutes so one question this is like you can clarify it is the goal of DevOps is to achieve CD or CD is just a part of the entire DevOps well that's a great question and the answer is nobody knows because there is no definition of DevOps so I mean here's the thing I actually have a definition of DevOps which again I don't have because I haven't got on my side I don't know where I put my memory stick it's very frustrating but DevOps fundamentally is a movement it's a movement composed of a bunch of people who want to understand how to build large distributed systems that are scale and are reliable and where you can quickly evolve them so rapidly changing distributed systems how do we build those that's basically the people who spawned the DevOps movement because they had to solve these problems that had never been solved before how do we build reliable distributed systems at enormous scale that we can evolve rapidly so that's what DevOps is it's people who want to solve that problem continuous delivery is one of the things that enables achieving that and DevOps doesn't really apply in the context of embedded I mean which is not to say that you can't use the practices of DevOps in the context of embedded you absolutely can or apps or other things like that they are absolutely applicable but DevOps basically ignores those problems and is like yeah, borrow what you want we don't care about you continuous delivery is a set of engineering practices that definitely are essential to solve the DevOps problem but also apply to things like embedded and mobile and stuff like that so I'd say if you're trying to solve the DevOps problem you're definitely going to need continuous delivery continuous delivery is still applicable in all areas of software basically where you care about getting better if you want to be able to deliver a higher quality at lower cost and be able to move faster you need to care about continuous delivery because continuous delivery basically is how do we get better I mean nothing in the continuous delivery book is really new it's basically a combination of XP plus the stuff we learnt about from the lean manufacturing movement and it's stuff that we've I didn't invent anything in that book I just wrote up a bunch of stuff that people already knew about but it wasn't in dead tree form so I'd say you should do this and people are like who the hell are you and I'm like that's a very good point I'm going to write this down on a bit of dead tree and then I can wave it at you so I mean that's great I very much recommend that if you can do that without writing a book I want to know how you do that because it's a great trick so yeah these are all things we've known about for decades that just haven't been widely implemented but it's fundamentally about do we get better so if you care about getting better you should care about continuous delivery if you don't care about getting better then you know have fun are we done?