 Rebecca is a long-time thought worker at the company for 19 years now and has been the CTO of ThoughtWorks for the last for over 10 years now. She's always been passionate about technology and large-scale enterprise implementations and also an advocate for diversity in tech. Neil's been a long-time thought worker as well, 13 years in the company. He spends most of his time out on the road. He's spoken at over 300, 400 conferences. I think he's delivered more than 10,000 presentations through his career, and that's a number that's not going down. So with that, I will hand over to Neil and Rebecca, who will actually try to talk about evolutionary architecture. A new book that they recently published co-authored along with Pat Clark. Thanks very much. Thanks for having us. We're here tonight to talk about building architectures that can change gracefully over time, which is something that we've struggled with as an industry for a long time. In fact, for many years, the kind of tongue-in-cheek definition of software architecture was the stuff that's hard to change later. But then we started designing architectures like microservices where we built change into the architecture. It turns out when you build change into it, it's not as difficult as it was before. And so that's really what this book and this talk is about is exploring these kind of architectures that allow you to gracefully make changes over time. So when you think about a software, you've got a problem that you're trying to solve by writing software to solve that problem. You have requirements of some kind. But this is not the only thing you have to think about when you think about building that system or building an architecture for it. You have to think about all these other things as well. Things like performance or scalability or maybe auditability if you have personal information that you need to keep secret. And this is really the skill of architectures given requirements plus all of these implicit and explicit characteristics I need to support. What's the best way to balance all of these characteristics with one another? This is why you hear the word trade-off very often associated with architecture because you can't really maximize all these things. For example, if you want to improve security, you're almost certainly going to degrade performance because you have to do more on-the-fly encryption, more in direction for secrets hiding. And so very often these characteristics play off of one another. And the further thing that complicates the job of architects is there are a bunch of these things you can support. Here's a partial list from Wikipedia of system quality attributes. And they often play off of one another. As I said before, what we're trying to do with this book is add another illity to this list of all these illities, which is this idea of evolvability. But what does that really mean to have an evolvable architecture? Let's say that I've chosen something like performance as a really important characteristic of my architecture. To have an evolvable architecture means that as my system changes, performance doesn't change. In other words, if I've said that performance is really important, then as I change my system, my performance doesn't degrade. I can manage and control that over time. So we sometimes use this phrase of creating guardrails around these really important architectural characteristics and protecting them as time goes by. But let's talk for a second about change. I mentioned change, and we tend to think of change as just one big thing. But for us in the software world, it's really two different kinds of things. There's technology-driven change, which is the change that happens whether we want it to or not because of our ecosystem. And then there's business-driven change, which is the change for the things that we're trying to model with software. Excuse me. When you look at business-driven change, this is changing requirements. So we have one set of requirements, and it changes as the business changes or the market changes. Or maybe we merge with another company at some point, and that changes your requirements. And this has really been the focus for agile software development for the past 20 years or so, is getting better and better at managing that kind of business change. And we're getting gradually better and better at that. Most of these books are about exactly that. But the thing we've been less good at managing is this technology-driven change. And one of the reasons this is so difficult is that the software development world exists in what we call a dynamic equilibrium. So when you think about the software development ecosystem, that's the collection of all the tools and frameworks and best practices and approaches. All the things that we know about building software up until five minutes ago, that is the software development ecosystem. And this is where we all live and work on a regular basis. But it exists in a dynamic equilibrium, meaning that it can shift and change without warning at any given time. A great example of this is when Docker hit our ecosystem a few years ago. It fundamentally changed our ecosystem forever. Even if you're not using Docker yet, this now changes the kinds of decisions you can make against that ecosystem going forward. And this happens all the time to us. We're constantly seeing new things that pop up that impact our ecosystem in unexpected ways. And in fact, that's something that's not getting any slower. It's in fact accelerating all the time that every change changes all the time for us. But this is a big problem for some roles, like some roles in architecture who are part of their job is to do long-term strategic planning for technology choices. But we posit that that's now impossible to do because of this dynamic equilibrium. I've been traveling around the world meeting with a bunch of enterprise architects and I keep asking them a question. None of them have a good answer for me. So maybe somebody here tonight has a good answer for this question. So can you tell me with great certainty exactly what JavaScript web framework you'll be using two years from now? You can't because it probably hasn't been written yet. That's just the nature of the world that we live in. So why are you trying to do these long-term strategic plans against this ecosystem that can fundamentally change at any time? Well, why are you trying to do this long-term planning? Well, because change is expensive and difficult, we want to try to avoid that as much as possible. But what if change wasn't as expensive or difficult? And we can incorporate that into our systems. That's exactly what we're talking about with Evolutionary Architecture. But there's an interesting secondary effect that came out of the techniques we talk about here as well. And that's the sort of side effect of building architectures that can evolve over time. You also end up protecting some things that have traditionally been the realm of architecture of governance. So another common problem we have in architectures, let's say that as an architect, you've analyzed the requirements and all these other characteristics you're supposed to take into account, and you've designed a really beautiful, elegant solution to this problem, and you hand it over to the messy real world to implement. How can you make sure that people who are implementing this are going to implement that design correctly? So this is a great example. We have a bunch of examples in our book based on our experience doing consulting work, but we can't really talk about our clients. And so we cast all of these against a fictional company in our book called Pinultimate Widgets, the next to last widget manufacturer, and trying to get better. And this is an example from a client that I met with a little while back. It was in the process of replatforming their system. And when I met with them, they had come up with 66 aspirational goals for their new platform. And I remember that so well because it was in a spreadsheet that 66 rows in it. And I said, hey, it's great that you've identified all these things that you want in the next version of your architecture, like four nines of availability and elasticity. But when I come back in six months, how many of these things are still going to be true? And maybe more to the point, how easily can you tell me how many of those things are still true? Another way of stating that is once I've built an architecture, how can I prevent it from gradually degrading over time? How can I make sure people are implementing that architecture correctly? But that is really a question of architectural governance. And it turns out that many of the techniques that we're about to talk about that preserves architectural characteristics also allow you to automate a lot of common kind of architectural governance activities, and we'll see how that plays against evolutionary architecture. But now let's talk about the definition of evolutionary architecture. And I will try not to move the microphone around, but I talk with my hands. So when you write a book that's describing a new concept, you have to name it, and then you have to define it. And we'll start with the name. Neil and I have both been talking about this notion of evolutionary architecture for several years. And the first time I heard Neil speak about it, the title of his talk was Emergent Architecture and Emergent Design. And we had a vigorous debate about why he was wrong, and that was a very bad name. I'm perfectly comfortable with talking about emergent design, because think about what does emergent say to you? It feels like it's kind of ad hoc. I'm just sort of responding as much as, you know, just reacting to what's going on without any really clear objective in mind. We use an example in the book of a unicyclist who's juggling flaming torches. And really all that unicyclist wants to do is not fall over. That's his only objective. Style doesn't enter into it. He just doesn't want to fall over. And when you think about what constitutes good code and how we decide how we want to evolve a design to incorporate new functionality, what constitutes good code is probably something that's pretty universal and most of us will agree in a particular language, what does a good object hierarchy look like? What does clean code look like? It's pretty much the same regardless of whether you're writing a financial services app or a health app or a retail app. The domain doesn't really come into it. But when you talk about architecture, what constitutes good does vary. And I want to give you some examples to illustrate this. So one of the early clients that I worked with, with ThoughtWorks, we were building a trading system. Now I say trading system and you're all probably thinking high throughput, low latency, really got to worry about performance every nanosecond counts. For this particular client in their wildest dreams, they figured they would never do more than 200 trades a day. Not an hour, not a minute, a day. But every one of those trades was worth billions and billions of dollars almost 20 years ago. So what they cared about is knowing that they never lost a message, that they knew exactly where every trade was in its workflow. They had three data centers spread across three continents connected by unreliable communication links. So they wanted to know that all of their offices had the same view of all of these trades because each one was so valuable. If I was applying the same architectural characteristic guidance for that trading system, it would have been wrong. They wanted to know exactly what the consequences of all of our architectural decisions were on that communication framework. That was what was important. Now here's another more humorous example. I will bet there is no one in this room who has this same architectural requirement. We worked with a retailer in the UK and they really wanted a centralized architecture so we presented a candidate architecture to them. The response was, and I quote, but what happens if we lose Scotland? Now, several years prior there had been a catastrophic communication failure between their head office and all of their stores in Scotland and for several days they could not communicate with any of their stores in Scotland. And the organizational memory and pain of those days was so acute that they could not accept a purely centralized architecture. They had an organizational architectural imperative to be able to have stores operate for multiple days independent of the home office in the event that some other random catastrophic communication failure would mean that they would lose Scotland. I strongly suspect none of you have that as one of the architectural requirements that you're working towards, but that was how important it was to them. What constitutes a good architecture for that organization is very different than most any other retailer who probably wouldn't worry about that low probability event happening again and they'd be perfectly comfortable with a centralized architecture. So what constitutes a good architecture varies based on your organization, based on your industry, based on your application. And so for us to say there is one set of good architectural characteristics is oversimplifying the problem. So we don't want to say we just want the moral equivalent of don't fall on the sidewalk. We have to be able to say for this system and this organization, this application, this is what constitutes good. And over here is something very different. Now we've been talking so far a lot about responding to change and one candidate name that many people have suggested is adaptable. And that too I think is the wrong name because most people when you think of adaptable, you think there's some mechanism by which you adapt. Maybe you have a plug-in architecture. Maybe you have a configuration parameter. Maybe you've added some kind of throttle so you can change the rates on things. But it all implies that you have anticipated the kind of change you are going to be able to adapt to. And our whole point is you can't anticipate it anymore. We don't know where the change is going to come from. So if you have tried to build some of those things in, what have you done? You've complicated your code somewhat. You've increased the number of lines of code, which means you've increased the number of bugs because bugs always go up with lines of code. And chances are you've got it wrong. And so you've made your code more difficult to change and to respond to the change that actually does occur. So rather than trying to adapt, let's just say what we want to do is we want to focus on the evolvability of the architecture. How easy is it for us to change? A lot of this was inspired by ideas from evolutionary computation, things like genetic algorithms. And you'll see a lot of that terminology coming through. But that's why we believe the phrase evolutionary architecture properly captures everything we're trying to achieve. Now that we have a name, we can have a definition. An evolutionary architecture supports guided incremental change across multiple dimensions. And each one of those three clauses is important and we'll step through each one of them. Not surprisingly, given what I've said so far, guided is a really important concept. If you think about this from the perspective of evolutionary computation, we have what's called an evolutionary fitness function. For your particular problem, you define what you want your solution to achieve. Taking something that's very familiar to most computer scientists, you can write a genetic algorithm to try to solve the traveling salesman problem. And the fitness function will simply be what's the distance? What is the length of the route? And you want to minimize that distance while making sure that you visit all of the cities. That's a fitness function. And you could write a genetic algorithm that will try to find a reasonable solution to the traveling salesman problem. Now let's think about something that's a little more complicated. I went to a talk several years ago at a genetic algorithm conference and there was a group that was presenting a solution, a new design for an airplane wing. And they had used as their fitness function the aerodynamic characteristics of the wing. And they ran their genetic algorithm and they came up with a solution and they presented it to the other engineers and they all looked at the design and said, there must be a bug in your code. This does not make sense. They tracked it and they realized that, yes, the aerodynamic characteristics of that wing were superior. The problem was it had these little strange things jutting out in various places. It would have been probably much more expensive to manufacture. Not completely convinced it would withstand flight for very long. But aerodynamically, it was really efficient. So when you think about these fitness functions, just like anything that you're trying to optimize, you need to make sure that you've looked at all of the dimensions of the problem. They picked a fitness function that made sense, but was actually not the right one if you want to think about actually building an airplane wing rather than just building something that is incredibly aerodynamically efficient. So that's the role a fitness function plays from the perspective of evolutionary computation. We want to apply that same notion to architectural characteristics. So we have this notion of an architectural fitness function, and this architectural fitness function captures what it is that you are trying to achieve in terms of the characteristics or the behaviors or the outcomes of your system from an architectural perspective. The single most important characteristic of this fitness function is that you and I will never disagree on whether or not a system passes. We need to make sure that it is specified in enough detail that we know exactly what it is that we're looking to achieve and what we're trying to test. So something like maintainable isn't going to cut it. It has to be, this is the level of cyclomatic complexity. If you're looking at performance, this is the number of simultaneous users or the response time or whatever measures. It has to be specific. And what we want this to describe, as I said, is what are the outcomes that we want this system to achieve. We're not interested in the hows, we're interested in the whats. And these are the guardrails that we're establishing around our architecture to ensure that it exhibits the characteristics that we want it to exhibit. So if we think about this from the context of penultimate widgets, we have our spreadsheet of 66 rows. They got high marks for actually specifying what it is they want their architecture to achieve. But now, with this notion of a fitness function, if they would have defined a fitness function for every one of those 66 rows, when Neil shows up in six months' time, all they have to do is show, here's the output from the fitness functions. They all pass, great. Or we're still working on this one because we're having a little trouble figuring out how to solve this problem. Neil, why don't you help us talk through possible solutions on how we might achieve the level of performance we want with the level of encryption we know that we need. So fitness functions provides us the mechanism by which we can specify and then verify that our architecture is actually doing what we wanted to do in the real world. And so now, Neil is going to talk through the second two parts of this definition. We'll give you a bunch of examples and some categories of fitness functions in a second, but we want to finish off our definition first. The second part of which is about incremental change. There are two important aspects of incremental change in evolution of architectures. One is the operational side of how do you operationalize an architecture like this? How do you build an architecture that you can gradually change from one thing to another? At the very end of the talk, I'll give you both a case study and some tools that allow you to do that kind of operational change. The other part of incremental change is how do you apply these fitness functions on a regular cadence? This is the realm of continuous integration and deployment pipelines, and I'll talk a little bit about using deployment pipelines to automate governance as we talk about using fitness functions as a governance mechanism just a little bit later. So that's the incremental change part of our definition. The last part is about multiple dimensions, and this is really just the pragmatic awareness that you can't talk about evolving a software system and only evolve one part of it like the structure of architecture. And so certainly, if you're going to build an evolvable system, you have to think about architectural concerns like performance and scalability and all these other things, auditability, but we also include some things that you might not traditionally consider like relational database design. So we have a chapter in our book on evolutionary database design because you can't evolve the architecture if you don't have an evolvable schema. Although the chapter in our book is really not much more than a symbolic link to a much richer resource. So 10 years ago, our colleague, and along with Scott Amler, wrote a book called Refactoring Databases. It just had its 10-year anniversary, and if you look at the subtitle of that book, it's called Evolutionary Database Design. So it turns out it is the perfect companion book to building evolutionary architectures just to have a book about evolutionary database design as well. The point of this, though, is that you have to think holistically about the entire software system, and there's a benefit of treating some things that you used to treat as different things as different categories of the same thing, and we'll see some side benefits of that in just a second. So having now defined what we're talking about here, we have enough to give you the agenda for the rest of our talk, which is a deeper exploration into fitness functions, some categories and some examples, some examples of incremental change at the end, and a little bit about using this fitness function mechanism as a way to automate governance and some examples of that as well. But let's give some more definition and concrete examples of this idea of fitness functions. And we clearly are backwards on it. Okay, so let's talk some more about fitness functions. So as I said, the fundamental characteristic of a fitness function is you and I can always agree whether or not a system displays the characteristic that is represented by the fitness function. But these fitness functions are not some new creation. It's more that we've given a name to something that we already know about. When you look at things like performance monitoring and production where you look to see whether or not your CPU utilization goes over a certain amount, or when you put tests into your build that measured cyclomatic complexity or something like that, those are all fitness functions. We might have things like chaos engineering, particular unit tests that we might look at the characteristics of the architecture. Those are all something that we could consider fitness functions as long as they are focusing on some architectural characteristic not some domain characteristic. So let's think about some categories of fitness functions. What kinds of things might we think about in terms of deciding where we might want to apply fitness functions? And the first we want to think about is what is the scope of the fitness function? And an atomic fitness function is one that addresses just one aspect of architecture. Maybe it's the number of simultaneous users. Maybe it's cyclomatic complexity. But it's talking about one very specific aspect of the architecture. Whereas a holistic fitness function is trying to look at how some of these different dimensions interact with each other. We've talked about security and performance as things that you might have to trade off with it. But the important thing is you're looking at multiple dimensions in a shared context. So you're seeing how some of these different characteristics might interact with each other. The next thing we want to think about is when these things are run. A triggered fitness function there is some event that causes it to occur. Maybe it's because of a check-in into a build and you've got a continuous integration server. Maybe you have a scheduled move into an SIT environment that happens every 15th of the month. But there's something that triggers this to occur versus a continuous fitness function which is running all the time. Now continuous fitness functions actually make more sense in the context of looking at production systems. You probably wouldn't have something that runs all the time in a development environment. You might, but it really probably doesn't make sense. But these things are always running. You don't have to schedule them. You don't have to worry about whether or not they've been done. They just run all of the time. Now the next distinction might puzzle you a bit at first. So a static fitness function is one that always returns the same answer. Now, it's probably obvious that the other category is dynamic. But how can this be? If I said the most important characteristic about a fitness function is that you and I would never disagree, how could it be that the value might change? And this is, again, a question of scope. For a static fitness function, it's a closed scope. So the same system or the same code will always give the same answer. Cyclomatic complexity doesn't change as a result of network loading. But you might have a fitness function that is measuring the overall transaction time or response time for something. And there's a system that is not under your control. And you might have variable SLA on that. So the value that comes back from that fitness function may vary on the basis of something that is outside of your control. In that case, you're probably going to have a range of acceptable behaviors but it may be that your system fails a fitness function test because of the behavior of an external system. Again, this is not ideal. We want these tests to be more deterministic than that. But this is a possibility if it is, you'll have to take that into account in thinking about the fitness function. But it's still well-defined what it is that you're measuring. It's just the result might vary. The next is the fitness function test itself. Ideally, these are all automated. As much as possible, we want these tests to be able to be run without any kind of human involvement. But there is nothing wrong with a human having to step in. You might decide that particular tests need to be executed by a human. And so they will be manual. There's nothing wrong with it being manual. You want to prefer automated as much as possible. You want to think about how you can automate as much of this architectural analysis as possible. But there's nothing wrong with it being manual. Now, there are some fitness functions whose values will change over time. And probably the easiest example of this is what happens when a library or framework that you're using gets updated. If it's a critical enough framework, you might want to say I'm going to break the build. As soon as it's updated, I want to break the build because we don't want to do anything until we actually update our system to accept the new framework. It may be the framework isn't quite as critical as you want to say instead I'm going to start a 60-day clock. So you have 60 days. I'll put out a warning but I won't break the build until day 61. So whether or not this fitness function passes, it changes over time. It starts with the clock and once the clock expires and it's going to break the build and now you better get around to it, you've used up your grace period. Now an interesting question that we still get a lot is what about domain specific fitness functions? And my glib answer to that is those are called requirements or acceptance tests. But if you think about it when we're talking about an architectural requirement, if it's truly an architectural requirement, it should in some sense be describable independent of the domain. So if I'm going to talk about longevity in a shopping cart, that's not an architectural characteristic, that's a business requirement. But if I'm going to talk about number of simultaneous users, it doesn't matter if that user is an end consumer, perhaps a customer support representative, perhaps a doctor or a trader. Those are all users and I don't have to say anything about what the users are doing to talk about a requirement for a number of simultaneous users. So that's an architectural characteristic. When we start talking about things like GDPR we're starting to get into a gray area a little bit because there is a lot of GDPR which really is independent of what you're doing with the data. It's just certain kinds of data have to be protected in a certain way. There are things that are more requirements focused like you have to be able to forget the data, etc. So GDPR is kind of in this gray area. But what we're not ever going to really be able to do, I don't think, is come up, here is my list of the fitness function for all financial services application and here's the URL that you can download it from. They are too specific to both a specific technology stack, often an implementation, very often an organization as well. And then when do these requirements or these fitness functions come into being? And many of them are going to be decided at the early stages of a project. And it's important to do this because what we want you to be focusing on with these fitness functions is what are the architectural characteristics that are going to have the most impact on the design decisions that you make during the implementation. As I said on that trading system it really didn't matter to us many of the things you would normally worry about with performance. And so we focused our effort when we looked at tooling, when we looked at frameworks, when we looked at specifying hardware, all of those different decisions were looked at through the lens of what does the messaging infrastructure need. And so you want to identify very early those primary architectural characteristics that will determine the success of your application. That doesn't mean that you might uncover some over time. Maybe you've switched an external provider. Maybe you've merged with another company or opened up in a new country and all of a sudden your perspective on what your number of simultaneous users needs to be. So these things can be added over time. Some of them will be more corporate-wide. Some of them will be very project-specific. Some of them might float somewhere in the middle. But you do want to continue to look at these different fitness functions. Now this is still probably pretty abstract, so let's get a bit more concrete. We want to give you some examples across this quadrant, looking at both atomic and holistic, as well as triggering continuous fitness functions. And so the first thing we want to look at is an atomic triggered fitness function. So this is looking at one characteristic of architecture and it's triggered by some event. Now first example, let's say you've got on a quest and you've rid your code base of all cyclic dependencies. And now you want to make sure that they don't come back again. Well, you can add a test. You can add a fitness function that says my architecture, my system is not fit unless it is free of cyclic dependencies and this will identify it for you. You put that in your build and you never again have to go through a code or architectural review looking for cyclic dependencies because you can't check them in anymore. Similarly to directionality of imports. I've got a persistence class. I've got a web class. I've got a Udall class. I'm perfectly happy for persistence and web to make use of Udall, but we don't want it to go the other way around. Again, normally that's done through code reviews. But you can put a test into your build and again never have to worry about that in a code review again. Now how about something that is in static code analysis? One of the things with evolutionary architecture is we want to increase the scope of your ability to change the system. Consumer driven contracts is one way that we can allow different aspects of the overall system to evolve based on their own pace. So I have a service that's being used by three others. Neil and Kevin and Jesse are my three users and we all exchange tests and we put them in our build and then we cheerfully ignore each other. We don't have to talk regardless of what we're doing as long as those tests pass. And then all of a sudden my test with Neil fails. Now I got to talk to him. I've got to go to him and say you're making an assumption of me that right now I don't know how to satisfy so let's mutually agree on how he might have to change his system how I might need to do something different. But we continue to cheerfully ignore Kevin and Jesse because their test didn't fail and as long as whatever Neil and I come up with still allows their test to pass we can they don't even have to know we spoke and then once the test all pass again we go back to cheerfully ignore each other. We can each evolve our system independent of the others because we know what it is that we're relying on. This is a trigger for a conversation. As long as none of these tests fail we don't have to think about the existence of anything else. We can just proceed. Now let's look at another category. What about triggered holistic? A holistic fitness function is looking at multiple characteristics at the same time. Let's look at one specific example. I'm using a caching strategy to get the level of performance that I think I need. Check. Got my performance. But I've got a data staleness check that I need to run and when I run them together in this shared context I realize the caching strategy I'm using is violating my staleness requirement. And so now I have to decide okay well maybe I can back off on my caching a little bit or maybe I have to go have a conversation with the architect what do you want me to do now because I need this caching strategy for my performance requirements and that's violating the staleness. This is again the trigger for the conversation but we wouldn't have recognized this if we hadn't run them in a shared context. Now what about continuous fitness functions? A continuous autonomous fitness function is looking at one aspect of how the architecture is actually behaving over time. So a very common example of this any of the monitors that we put into production when we're looking at logging and doing log analysis those are all continuous atomic fitness functions looking for some specific activity or event. Another example well we very commonly use things like Nagyo to monitor these things and put out alerts. Another example of this might be synthetic transactions that we're using to try to monitor how the system is actually running will inject some kind of bogus transaction into it that doesn't actually enter the system of record but allows us to see are the timings am I still respecting my SLA for the processing of this transaction this is a very common technique used in microservices. But probably the most interesting of these four quadrants is the continuous holistic fitness function the best example of this that we have is in fact the chaos monkey and more broadly Netflix Simeon Army when Netflix decided that they were going to move to the cloud and no longer host their own infrastructure they were worried how can I in fact guarantee I'm going to provide good service to my users if I don't have control of my infrastructure and so they introduced the Simeon Army and the chaos monkey initially to try to make sure that they were going to be resilient to whatever might go wrong with the infrastructure and the chaos monkey does what it implies it just goes forth and wreaks havoc and does all kinds of things and the system then has to know how to respond to that and the Netflix engineers their code to be resilient to all of the bad things that the chaos monkey throws at them several months ago the Amazon East in the US zone went down Netflix stayed up Neil's doorbell became a standard doorbell he's got one of those doorbells that has the video feed that you can see when somebody rings the doorbell it shows you the video the magic went away when Amazon East went down but Netflix stayed up because of the chaos gorilla they had anticipated this and made their system resilient to it there's another member of the Simeon Army it's called the Doctor Monkey what the Doctor Monkey looks for is whether or not the restful end points in production are actually properly configured or not and if it's not it's out so again you don't have to go through any kind of code review process to make sure that those configuration parameters are right for the restful end points because the Doctor Monkey won't let it in it will ensure that all of the restful end points running in production are properly configured now people often ask the question what is the business value of things like these architectural fitness functions don't you think Netflix gets significant value out of the fact that their system is able to continue to deliver services to their customers to their SLA regardless of what happens with Amazon the fact that they stayed up when Amazon East went down that's an enormous differentiator for Netflix to have this Simeon Army running so those are some concrete examples of some fitness functions now there are ways to think about these fitness functions collectively this is a very primitive one where we have just a spider chart that looks at how does my architecture behave across these different aspects but more generally what we actually have is a system-wide fitness function and this is conceptually just the composition of all the individual fitness functions that you've defined for your system but what's important is this is your target architecture your target architecture is no longer a diagram that sits on the wall and a series of architectural guidelines or requirements or whatever your architecture group wants to call them that is sitting on a SharePoint site somewhere that nobody reads this is your target architecture describes what are the outcomes and the characteristics of your system that we need to maintain in order to deliver to the requirements of that overall system and this system-wide fitness function also helps us understand as the ecosystem changes as the code ages, as the data ages does my system still exhibit the characteristics that I want it to or do I have to do something about it and now Neil is going to talk more about the governance aspect as well as the incremental change thank you so it turns out that when you design fitness functions to preserve architectural characteristics what you're also doing in many cases is automating governance that's a great example the doctor monkey is a great example of this and we keep coming back to that one because Netflix engineers never worry that they have restful endpoints misconfigured some of you in the room are architects and I can pose this question to you can you guarantee me that every single restful end point in your system is configured correctly you probably can't but if you work for Netflix you could because there's that running check now you can't automate every governance activity but why not automate the low hanging fruit so that you can stop worrying about the things that you can't automate and raise your level of thinking much higher so let's talk about automating some of these governance activities and this is something that Rebecca alluded to earlier in this talk but I want to drill into a little bit more one of the things that we're trying to encourage in our book is to get architects away from these vague meaningless terms like maintainable so your architect descends from the ivory tower and meets with the developers and says we'd like to have a maintainable code base and then drops the whiteboard marker and walks out of the room and the developers look around each other and go what does that mean? because this is not testable this is a great conversation to have over drinks after work as to what does this mean but not really good if you're trying to build a repeatable engineered architecture and so we want to go a level deeper than this and talk about what does it mean to have a maintainable code base maybe this is a certain level of psychometric complexity maybe this is control over your outgoing coupling maybe this is controlling naming conventions like postiles law so these are all testable things that you can apply to your architecture so let's some examples of fitness functions as governance mechanism excuse me so as Rebecca said before a lot of this is just new nomenclature because we've been doing a lot of these techniques for a long time we just didn't call them fitness functions and so this is true from a project I was on many years ago I was working with a company that had hired a lot of very junior developers and they were concerned about code quality because of so many junior developers and so one of those junior developers had been given a task that had two components a back end component and a front end component and the task was to do something with all 50 states in the US and this being a junior developer they very naively tried to implement it by saying if state equals Alabama then do this else if state equals this for all 50 of them but the architects on this project had put in fitness functions to fail any method or function that has psychometric complexity over 40 which is just crazily high and this developer tried to check it in and it failed that fitness function now my friend Glenn Vandenberg has a great quote that bad developers will move heaven and earth to do the wrong thing that the amount of ingenuity that bad developers will put into making horrible messes is amazing if they take half that amount of ingenuity and put it into solving the problem correctly then the world will be a better place but that's not the way things work and so this being a bad developer they thought huh just won't let me check that code in how can I solve this problem oh I know how to solve this problem I'll move that code to JavaScript on the front end instead of on the back end and just avoid that whole messy problem so that's what they did developer wrote this code in JavaScript same method but the architects outsmarted them because they had a psychometric complexity check on JavaScript too turns out you can do that on just about any language and the developer said this code base won't let me check anything in and went to the architect and that's the day they learned about the strategy design pattern so we're not trying to create like a crazy police state here where architects build a system that nobody can check things into anymore but we are trying to put some checks on the low hanging fruit again to all you architects in the room what's the most complicated method in all your code bases right now you don't know do you I bet you'd like to know and I bet you'd like to put some level of control over that in a way that you don't have to constantly do code reviews and architecture review boards and other sorts of high touch ad hoc manual things that you have to do over and over again so there's some more examples of this kind of fitness functions as architectural governance this is an example using this tool called code assert what this allows you to do a very common practice in things like microservice architectures is on everything in my architecture partitioned within a namespace or package so everything lives within the same package or namespace and I don't want to reach across those packages and namespaces you can define a test for that in code assert that basically says that everything resides within the same package or namespace that violates that rule then you get a failed unit test just like you would for a domain based test in this particular case you may have like a utility namespace or package you want to allow and you can build exemptions into this but in general this preserves those architectural characteristics another great tool that allows you to do this level of testing is called arc unit which is designed to work very much like J unit including some of the matchers that J unit has for some examples one of the things that as an architect you might want to keep out of your code base are developers throwing generic exceptions so there's a predefined there are a bunch of predefined rules including one of them it says no classes should throw generic exceptions and so this will fail this test if any developer gets lazy and throws a generic exception you can also do things like controlling naming conventions so one smelly thing is for developers that create interfaces that have the word interface which sort of defeats the purpose of it being the semantic definition of the thing you're trying to create and so here's a test that says interfaces should not have names ending with the word interface you can define some of these rules that you want from naming conventions about how you want things structured in your code base there's another architecture example so let's say that you built a layered architecture which is quite common in monolithic applications we have a presentation layer at the top you have a business layer in the middle and a data layer this is one of those common things where you design this as an architect for very good reasons but now you hand this to the world to implement and whoever is implementing reporting says it's taking too much time to get through all those layers I'm going to go directly with the presentation layer right to the database the architect you don't want to let that happen because that fossilized the schema across those layers etc and so ArcInit will let you define tests that prevent that from happening and it uses this hamcrest style matches and if you look at a test like this no classes that reside in a package service should access classes that reside in a package controller that's almost English now many of you who are architects have written almost that English sentence in JIRA or SharePoint or some wiki somewhere and you wrote it one time and no one has ever read it once it's a write once never read kind of thing because when you write that there's no way to enforce it if you write as a test nobody can ignore it now and this is exactly what Rebecca was saying before architecture now is not a lines and boxes diagram and a whole bunch of stuff on a wiki that no one's ever read it's now the structure of the architecture along with the test that define what those characteristics of architecture happens to be you can also do things like governance so let's say that you have a scenario where you have this class that has some security concerns so you still need to use it so you always need to make sure you call it through a wrapper class that handles the security problem and you never want a developer to instantiate that class directly that's exactly what this custom governance module does class in the arc unit it makes sure that anytime that class is called through the wrapper and never independently of that wrapper again so you don't have to do code reviews and other sorts of failure prone activities there's another great example of the clever use of this idea of fitness function so again this is on a project I worked on several years ago and for this project the lawyers got really concerned about the viral nature of open source licenses for all these open source projects these developers are using and so the developers chase down all the license files for all these projects let the lawyers look over them to make sure they looked okay and then one of the lawyers asked a really awkward question what happens if one of the developers updates their license during one of the updates to that framework and changes the terms of license we told the lawyer look we're perfectly safe because every open source project on earth has a staff lawyer that carefully looks over the licensing stuff and notifies every project on earth using that within 24 hours of any license change of course none of this is true we never put that license in there copied and pasted never read one word of it and just put it in this framework but lawyers get concerned about this kind of stuff so this is what they want they want to know if somebody has changed this license file and so we figured out a way to create a fitness function for them so what we did was took those license files the whole file hash them and put them into a database so that now every time that library updates we hash the license file compared in the database and if it has changed in the database we fail to build and call the lawyer to look at the new license file if they anointed and say it's okay we hash that one put it in the database and now we can continue every time it updates checking to see has that thing changed or not and this is a really great example of the way we tend to use fitness functions we're not trying to build some crazy you know AI cloud based thing that does a semantic comparison between the license and figures out most of what these things are is just a heartbeat on the project it's okay it's okay it's okay it's okay whoops it's broken come fix me it's okay it's okay it's okay and that cadence is really important because when you don't have that cadence you're always wondering is that broken or not and you have to go back and checking is expensive and difficult knowing that it's not going to be broken ever and it warn you when it breaks is a great sense of relief and it frees you up from worrying about really low level stuff and let you think about a much higher level more important things in your architecture so let's talk about what it takes to put all these things into practice okay so when you think about what we've talked about here it might seem not all that wanting if you were starting from a green field but most of us are not in the position of working on a brand new system we have all this pesky legacy there's main frames out there there's crusty old databases that haven't been looked at in years and there's all of this that doesn't really have tests associated with it how do you go about at least starting to put some of this stuff into practice and one of the good things about evolutionary architecture much like continuous delivery it might seem really hard and in fact it is really hard to get it end-to-end but each one of those incremental steps brings value and so is worth doing independently so the first step is to think about where in fact do you feel most exposed I didn't say where you think the change is going to come from I said where do you think that you're most exposed to a change that might happen maybe you had a disastrous biggest trading day of the year last year and everybody got into trouble because it took five hours to get the system back up so maybe you're most concerned about meantime to recovery or maybe you've just expanded into a new geography and you're really worried about the number of simultaneous users I don't know what it is but for your systems there are certain aspects that worry you more than others identify what those things are which of these different kinds of dimensions are you most concerned about identify some small number of those and then write a fitness function for it define what your objective actually is for that and then try to think okay where am I relative to what I'd like to achieve and what are the things that I can start to do to move the needle closer to where I want to be maybe it has to do with some systems upgrades maybe it has to do with some database changes who knows but identify for each one of those fitness functions just how far away you are and then get into your deployment pipeline you want to run these things continually you want to find out where am I right now and how far away and am I getting better because all of this is going to take an investment if you know you're far away from what you think that system really needs to achieve you're going to have to do something to change it and so probably somebody is going to want some evidence that you're in fact making progress and that you're not backsliding so you put these things in a deployment pipeline and you run them you probably don't want to start out breaking the build if you're not meeting some of these things because you think you're far away but you at least want to know where you stand and you want to be able to track your progress so that you know you're getting better and you can identify if some major change that you've just made has caused a deterioration on one of those dimensions that you identified is important and once you feel like you're up to standards yes probably then put in whatever it would be to say break the build you've completed your quest and gotten rid of all the cyclic dependencies now make sure none of them will come back start to break the build on some of those things and then identify the next set of architectural characteristics that concern you perhaps not as much as the other ones did this is a continual process and in addition we want to make sure that we're not just continuing to explore different dimensions but we're revisiting the fitness functions that we have this whole talk has been about the fact that change comes at you at random times from random places in unexpected ways and so you want to continually reevaluate are these fitness functions still accurate go back to what Neil said in the beginning about the introduction of docker into the ecosystem even if you're not using docker docker's presence in our ecosystem is going to change the way you think about some of your architectural decisions and you want to make sure that your fitness functions are keeping abreast of the changes that are happening in the environment some of them are not likely to change if you've got cyclomatic complexity down to less than 10 across your code base you're probably not going to want to push it too much farther at least not as across the board some of the things are not going to change some are but you want to continually reevaluate these fitness functions these fitness functions are just as much a part of your living code base as unit tests and other functional tests are this is an expression of what currently constitutes good for your architecture and you want to make sure that that's being maintained sounds easy right so now Neil is going to talk a little bit more about incremental change and give you a case study on how this actually plays out thank you so the last thing I want to talk about is incremental change and we've already talked about the first part of this which is the developer side of this equation and that really is reliance on all the agile engineering practices that came before us things like deployment pipelines continuous integration continuous delivery automated machine provisioning deployment pipelines of course is the perfect place to make sure that our fitness functions get applied on a regular basis it's one thing to define a performance fitness function but if you define it and then don't run it for a month and then run it and your performance sucks and you've had a thousand check-ins now you have a detective mission but if you're running that every time you make a change to your code base you find out instantly when things start breaking and it's much easier to fix those things and so this becomes the perfect place for us to be able to run things like fitness functions and this also gets toward this idea of automating governance a little bit more deeply so here's a good question for you guys what do you do right now in your organization if there's a zero day security exploit for one of the frameworks you're using this is not an abstract question this happened to a financial services company in the U.S. was using affected versions of struts so what did they do they did what any large company would do their security guys ran around and tried to find all the affected projects but what happens any time you do some sort of manual ad hoc thing like that a bunch of them fell through the cracks and they were open to a whole bunch of vulnerabilities now imagine a world where all of your projects are running a deployment pipeline now most fitness functions are very local to project but maybe your enterprise architects have a slot in those deployment pipelines and they're checking something like cyclometric complexity and maybe your security team has a slot in each deployment pipeline and they're doing something simple like making sure you're not checking passwords into version control and when that zero day security exploit comes out the security team can push a test into those deployment pipelines that say if you're using this affected version of struts failure build they know exactly which projects are affected by this if you're not using that version then everything goes as it did before but this is toward that idea of automating something that is too expensive and too dangerous to rely on this kind of manual ad hoc behavior where things constantly fall through the cracks so that's the developer side let's talk about the deployment side of incremental change and this is the idea of being able to replace things in your architecture and so for our fake company Penultimate Widgets they have a website and you can star rate things in their website so they have a star rating service running in their architecture but then one day they put a new version of star ratings out that allows half star ratings and they put it live they don't force people to start using it but now this is a new capability in their architecture and over time the people that need star ratings are going to upgrade to the better star rating service until eventually no one is pointing to the old one anymore one of the tricks that Netflix taught us about architectures like this is that you monitor not only the services but also the routes between services and when you find a service that hasn't been routed to in a set amount of time you can disintegrate that out of your architecture so that's how that works in the abstract let's talk about tools that allow you to actually do this and this is a great case study from the GitHub engineering blog called move fast and fix things so this is a story about a problem that GitHub encountered so GitHub is a very aggressive agile engineering organization they average 60 deploys a day they do continuous deployments so every time they make a change to their code base it goes live and this described a problem they had with merge it turns out since day one at GitHub the way they've done merge is shell out to a shell script have command line get do merge and then suck it back into GitHub which works perfectly because command line get knows exactly how to merge two projects the problem is it doesn't scale particularly well and that's the problem they eventually ran into they operate at such scale they eventually bit the bullet and decided we're going to have to write our own in-memory version of merge which they did and they did their due diligence testing but now here comes the scary part of our talk now they have to put it live but here's the problem you can't break merge it's been perfect since day one and the last thing you want to do is start introducing regression bugs into something that's never been broken before ever but this is a really common problem in architecture is how do you make structural changes without breaking stuff or introducing unexpected bugs somewhere else the way these guys did it and open sourced it on your behalf so that you can do it too through this framework called scientist and they ported this now to a bunch of different languages using the java.net platform so here's what scientist does it allows you to conduct experiments so scientist has a use block and a try block the use block is the stuff we always did before and the try block is the new stuff we're trying out and so when it hits one of these experiments the first thing scientist does is decides whether or not to run the try block so the merge experiment 1% of people who are doing merges got the try block it always executes the use block and that's what it returns to the user but in 1% of the cases it also executes try randomize the order in which they run measures durations, compares the results swallows and records any exceptions and publishes all these results so here's what happens they first went live so at 2.20am they were doing a little over 2,000 merges right along the bottom you can see some tiny little red tick marks that were used and tried to disagree so if you zoom in on the bottom here are the places where they had bugs in their new code they had no impact on their users because they were still getting the use results back but we were also running this in the background and testing it to see if it's working correctly but they were also concerned about performance so here's the performance graph after they went live the green line is the new code and the blue line is the old code so it's much more performant 24 hours with no mismatches or slow cases they removed the old merge code and left the new merge code in place over the course of that 4 days they did more than 10 million comparisons between the old code and the new code so when they left that new code in place they had really high confidence that they had not introduced any regressions and hadn't broken anything as part of the structural change to their code base this is a great example of incremental change at the architectural level and all scientists really is at the end of the day is a feature toggling framework with some metrics built into it but this allows you to make this kind of change without worrying about breaking things and without a bunch of regression side effects and other nasty sorts of things like that so here's our definition one more time we'll leave you with one additional resource it's kind of a bonus for coming here tonight and listening to us talk about this subject we have a companion website called evolutionarchitecture.com and one of the things it has on it so one of the things that as we talk about this we realize people want more and more examples of fitness functions and so on our website there's a link for fitness function CODAS which is a whole section of the website when we teach this as a workshop at conferences and places like that those are the puzzles that we use each of those is a puzzle that presents away an architectural problem you want to solve with the benefit for coming here if you choose any one of those individual CODAS and go up to the URI and add to it either solution equals true or because I'm lazy S equals 1 it'll show you the answer to the CODAS as well and so it becomes a resource if you want to see a whole bunch of examples there are 25 of them up there now and we're constantly adding new ones up there if you want to see lots of examples of those guys both the puzzle and the solution is there if you utilize a little trick if you come up with any brilliant ideas on how to solve other architectural problems with fitness functions email and us will anonymize it and put it out there as well so that's all we have prepared thanks very much for coming tonight and we're happy to answer questions go any questions