 So Chris is raising Singapore, if I got that right. Sorry, it's really a poor platform, S.H. And just a bit of a bio, which really means that Chris is a digital self-consultant, working with software teams, working into the solution design, technical architecture, and really within the enterprise and government space. As it's Sunday morning, again I'm expecting a few people to come in and out with these sessions, but Chris would take it away. Great. Alright, thank you very much for coming along on a Sunday morning. I know it's early, it's early 22. It's been a long week. So, I'm going to talk to you today about the secrets of high performance development teams. So, what do we mean by high performance development teams? This is a really important thing for us to describe first, because we need to actually understand what it is we're talking about. Can anyone tell me what they think a high performance team is? Has anyone got any suggestions? A team that actually makes their goals. Yep, good. Anyone else? A team that makes their goals. Good. Any other ideas? A team that makes their goals. Good. Alright. Let's cross over to a different field that we associate with high performance. Often when we think of high performance, we think of things that we've been sort of educated are high performance, like motor vehicles. This is a really good example. This is clearly a high performance motor vehicle. This is designed to go very fast all the time. Another question for you. Does anyone recognise this man? If you can tell me who he is, I'll chuck you about at the hosting. There you go. There's an incentive. His name is Bruce McLaren. Bruce McLaren is arguably the greatest Formula 1 racing driver to have ever lived. Bruce was born in New Zealand, just outside of Auckland, where he used to race cars he built himself along the beach. He later went on to found the McLaren motor racing team, where he drove the cars, designed the cars, helped build the cars, and managed the team. He was the only man ever to be both the World Formula 1 Champion and Constructor Champion at the same time as driving and winning the US Can-Am Championship. So he would drive them in Europe, and then he would fly to America for the week when a racing Can-Am would come back again. Unfortunately, Bruce was taken away from us tragically as many motor racing drivers are. In the 1970s, when he crashed into a booth while he was testing for it, but his legacy still lives on today in the company that we know as McLaren. It's not quite the same company, but essentially, this is a company that produces high-performance vehicles. But there's something about these cars, right? While they are powerful and fast and technological marbles, they're also kind of snowflakes in a way. Do you guys know the term snowflakes? A snowflake is something that has to be carefully maintained on its own. The original McLaren's not this one, because this is a recent one. The original McLaren's were hand-built. They were artisan vehicles. Even these ones are expensive to fix and can only be fixed by a certain few people. So while these are great, sure, we'd all love to have one, right? They're not high-performance, or they're not. The company is not a high-performance company. How else can we define performance? How can we define high-performance in a way that's actually relevant to what we do? Can anyone tell me what the highest-selling car of all time is? I heard something. Not a bad guess. Any other ideas? The Prius. The Prius? No. But good suggestion. You're close. The highest-selling car of all time is the Toyota Corolla. It exceeds the VW Beetle by at least 10 million sales, a Model T Ford. The only thing that comes close is Ford trucks, which are a single-built. And this has been built continuously by the Toyota Motor Corporation for some time. And there are instructive lessons in this motor vehicle and how it went about and how it continues to be built that have seeped into what we do as software developers. So here's another man who's had a really good impact on motor vehicles. His name is IG Toyota. Is this Toyota? IG Toyota championed the Toyota Corolla at a time when... Toyota. Are we going to try putting this on again? Yeah, this is the recording. Yeah. Cheers. Thank you. Alright, so if you've just joined us on the recording. Welcome. IG Toyota championed the Toyota at a time when Toyota did not look like the kind of company that was going to become the greatest car company in the world, the highest-selling car company in the world, the most high-performing car company in the world. IG created a system called the Toyota Way. And this is what has led to many of the practices that we use today, such as Agile or Lean. But fundamentally, it's founded on this principle of Kaizen. Kaizen is about continuous improvement. It's like a philosophy that allows us to dig into what we're doing and continuously achieve. Okay, so the secret of a high-performing team is that it's able to achieve continuously. If you're not achieving continuously, how will you ever exceed your goals? You've got to be doing perfect all the time. So the moral of this little allegory is that in order to be high-performing, we don't need innovation specifically. We need continuous delivery and continuous improvement in what we do. So we know a little bit about high-performing teams, because we've done some research platform. We say just done research, but other companies as well, Puppet and other people, IBM, have done research. We kind of know what a high-performing software team is. We know, for example, that they deploy more often than low-performing teams. They fail less often, and they recover quicker. So these are the kinds of characteristics that a high-performing team exhibits. In fact, the difference between a high-performing team and a low-performing team is enormous. A high-performing team recovers 96 times faster on average, according to research, than a low-performing team. Now, if you can recover at a high-performing team in half an hour, how much longer is a low-performing team waiting to recover from a problem? A really long time. So there's a huge flow on effects. So these are some of the high-performing, low-performing values that are identified by Puppet when they do their research. Essentially, high-performing teams, statistically speaking, are deploying more than once a day. That's actually reassuring, because often we get told high-performing teams are deploying 65 times a second, and you've got to be really fast. You don't actually need to be really fast. More than once a day is good. Down the other end of the spectrum, low-performing teams are deploying once a week or less. They also define deploy, because what deploying means is like releasing a code, releasing a piece of code. That doesn't necessarily mean it's released to a production environment or released to a device. It just means that you're building. You have a build that is functional. It's ready to release. That's kind of the point. And that can work for a small project or a large project or whatever. So the key here is that high-performers have high-deployment frequencies. That's actually the single identifier that defines them. So what are they doing? That's what you're here to find out. But I'd like to know first, if you guys have any ideas for what these people are doing, what do you think high-performers are doing that enable them to have high-deployment frequencies? Yeah? What sort of processes do you think? Automated processes. Automated processes? Yeah, we're getting somewhere. Yeah. Do you think they use Agile or Lean? Do you think they have practices that encourage innovation? Let's find out. These are the eight high-performance practices. These are the eight practices that we're going to talk about that are actually the ones that lead to this level of performance. All right, first of all, the companies that are able to do this are cloud-native. This seems like kind of a no-brainer. But fundamentally, these businesses are automated, right? They're ready to deliver into the cloud. They're not in a position where they can only deliver to their own fixed infrastructure or certain locations. Being cloud-native is kind of like saying, we're flexible enough that we can move and redeploy and adjust in an agile way. And I use agile to mean, you know, jumping around. I have a project management system. They use microservices, APIs as an architectural decision, and they're familiar with containers and use of containers. That's the first one. Number two, they standardize. Standardization is really important. Often, if we are software engineers or developers, we think of standardization as something that happens at a code level. So often we're talking about standardizing, say, our code formatting or our testing procedures. But actually, to be truly agile, a business needs to standardize everywhere. It has to standardize on configuration and infrastructure. It has to standardize the tools that are used. It has to standardize the processes that are used. And it has to do these not just for tech, but it has to do them for everything that touches that value chain. This is obviously potentially really big, right? I know in our company, we're struggling to standardize in some areas. We're a company that sells a continuous delivery product, right? And it's all about standardizing infrastructure. But some of the things we do, it's not standardized at all. Our marketing functions are a mess. They're all over the place, you know? And that directly feeds into other things we do. All right, let's get into something slightly more meaty. Has anyone heard of anti-fragility before? Can you raise your hands if you've come across anti-fragility? A couple? Yeah, good. Awesome. All right. So anti-fragility is a concept defined by a nice writer called Nassim Nicholas Taleb. He wrote a book called The Black Swan, which is very famous. Anti-fragility is about systems that respond well under pressure and in chaotic environments. Anti-fragile systems assume failure will happen. When failure happens, they're designed to recover from that. Even better, sometimes they're even designed to work better when things are failing, which is kind of cool, but it's hard. There's a metric that has been identified by IBM that is a great way to measure your anti-fragileness, and that is mean time between errors or mean time between failures. A lot of people measure things in terms of mean time to recovery. Sorry, I got the wrong way. You measure mean time to recovery, not mean time between failures. You're supposed to measure mean time between failures, right? Oh, we've been up for 22 weeks, or we've been up for seven minutes, or nobody died on this construction site for six minutes. On a construction site where people can get injured and hurt, it's really important for that not to happen. In software, actually, systems can fail and they will, and that's not really a problem as long as they're designed to recover fast and not result in other damages to the system. So mean time between failures. This is the amount of time it takes you to recover from a failure. It seems pretty obvious. But kind of what we're missing here is how big are those red and yellow parts? How long are we waiting in the failure mode? If we're up for six weeks and then down for six weeks, then time between failures doesn't help us. So we want to measure time to recovery instead. This is a really key metric. Measuring time to recovery helps us understand how anti-fragile we are and how quickly our recovery process is. You remember at the start I put up that big number? People who are teams that are doing well are 96 times faster. They're 96 times faster in this metric. So this is really important. There's tooling that you can use to help you understand this. And the concept is chaos animals. So some of you might have heard of Chaos Monkey. Chaos Monkey is kind of a cool little tool that goes around and knocks services, running services over so that you can see what happens when they fail. At Platform SH we have one called Chaos Kitty. If you've seen the Lego movie, you know where that comes from. Unfortunately I was not able to obtain any Chaos Kitties to hand out here, but sometimes we have them. We run smoke test environments in all of our regions. We've got 16 regions. We're running these smoke test environments. We spin them up automatically. And then we just go around and kill stuff and see what happens. Do the services recover? What happens? How are they broken? It's really important for maintaining long-term stability for our customers, even though it costs us a bit of money in terms of maintaining extra resources. The next one is they use production-like environments, but somehow I've managed to chop that off by adding the number 4 there. So this should say production-like environments. Deploying to production, deploying to your final destination, if you like, should never be the first time you do that to that kind of environment. The purpose of a deploy is not just to get the code into that location, but also to test that the release of that code works correctly, right? We know that. In order to do that properly, you need running environments that are as close as identical to your production environment as possible. Now, this is not always feasible in the sense that we might be running some massive production cluster and the ability to replicate that in any serious way or the same kinds of loads or pressures that it's under is difficult. But we can do other things. We can ensure that every version of every piece of software that runs between those two systems is identical. And we can ensure that we do that every time we release and every time we deploy. So we want to be using systems that allow configuration as code and the ability to spin up environments on demand using that code. We also don't want to have to do the same build twice. This is another aspect of production and like environments. If you have to do the same build twice and you get a different result, it's not the same build, is it? So doing one build once and then keeping a copy of that and being able to deploy that into your production environment from your production like environment is really important. That's one way of ensuring that you know with certainty that your deploy is going to work. Production like environments also have a fantastic speed benefit. They improve the rate at which you can release by 3 to 15 times. This is really key in getting you to that fast deployment process that we talked about. All right, you've got production like environments. You know that everything's got to be the same. What do you have to do to get that? You have to use continuous delivery. Continuous delivery takes continuous integration and extends this into all sorts of different areas. Continuous delivery starts you with the very basic commit of your code and takes that all the way to release in a production like environment. This ensures that there are no variables that can creep into that or as few as possible that can change that build for that particular piece of code. So what's in the continuous deployment stack? What does it have to flow through? Kind of the common ones, the key ones. The source code, obviously. The build process. Any continuous integration you have to run. Your deployment automation. Getting the thing that you continuously integrated into a release. The middleware configuration because sometimes there'll be additional middleware that has to be tied into that process and automated as well. The environment configuration. This is the configuration of the running environments that you're deploying to and finally the environment provisioning. Continuous delivery should roll out all of this for you automatically. So there's a number of different ways you can do that. Lots of different services do that. They don't always do, all of them all do the full stack. Indeed, at PlatformSH we don't do the continuous integration part particularly well at the moment and we know this, right? So not every service is able to do all of these things completely. So I want to talk about quality assurance for a minute because I think quality assurance is the hardest part of what software engineers have to do, right? Literally the worst thing in the world is putting your heart and soul into something and then sending it to QA and then it's like it's entered some kind of hell really where you need a fire hose because QA can be really hard and I would never be a QA tester for the world, right? Because I think it's a really rough job. But we can make it a lot easier for them. We can create a situation where you don't need a fire hose potentially and maybe you can just hand them chocolate and that will be sufficient to keep them going for a while. And that's a technique that we call continuous isolation testing. So continuous isolation testing stems from two things. It stems from the combination of continuous delivery and production like environments. For things not to fail, they have to be tested. For things to be tested properly, they have to be tested in isolation. To do this, you need automated tests, virtualized services, accurate test data and production like environments. You could probably get rid of that slide, right? Let's talk about the linearity of change. Often when we are building stuff, we put lots of things in together. We do this for two reasons. One is because we're people and we just fix stuff as we go and things happen, right? This is a tendency that we have in ourselves. It's about batch size. I'll talk about it in a minute. This is a process thing that we have to reduce. The other reason we do it is often because we have a limited number of ways to test. We have a limited number of environments in which our QA testers who are not necessarily technical can actually review and test what we're doing and make those changes. So we have to batch groups of changes together. And that causes problems. The main problem is the number of tests, the number of regressions that might occur. Or in software sometimes known as in-path complexity. First of all, one change, one thing. One test. Easy. Really changing one thing. Maybe we've just added a letter. Easy. One change affecting two things. That's two tests. It's only one change, but it's affecting two things. So we have to test it twice. What do people think two changes affecting one thing is? How many tests is that? Two. Four. Three tests. Two changes affecting one thing is three tests. There's the change for the first thing. Sorry, there's the test for the first thing. There's a test for the second thing. And there's a test for the two things together. Thankfully, there's only one thing that's affecting. So that's pretty easy. What about two changes affecting two things? How many tests do we need for that? Shout out a number. Five. Eight. It's hard to do in your head, but the answer is six. I'm sure you can work that out now that I've shown you. It's effectively the number of changes times the number of things times the number of changes again, effectively. So we want to isolate the changes that we make. We want to break them down so that we're not adding unreasonable numbers of tests. I know you can't always test only one thing in isolation. It's not always possible. You've got to limit them as much as possible. Being able to do that is really where one of the huge benefits from continuous delivery and production-like environment comes in, and that's the ability to do faster sign-off and UA on changes. Yep, cool. Faster sign-off and UA means that you can get this stuff through really quickly. I said you had to optimize batch size. This is about making changes as small as possible and testing them in isolation. Optimizing your batch size reduces your delivery risk. It increases your integration risk and improves your testing quality. But there are tricks to optimizing batch size, and we have project management processes that are all about this, waterfall, lean. They're all about optimizing batch size. When you're trying to optimize your batch size, there are dependencies on your environment and your systems. So we've talked about some of the different system changes that are important, and they're also to allow you to be in that position where you can set your batch size appropriately. Agile methodologies are pretty good at this. Lean especially is often why people use them, but if you don't do the other things, you're not going to get those benefits. It's one of the reasons agile tech sometimes doesn't work very well, right? The last one, because we've only got a couple of minutes, is about knowledge management. Seems like an odd thing, yeah, to have a presentation about software teams, but actually knowledge management is really key. As a team, you want to have all the skills you need and none of the skills you don't. And you want those skills to be well distributed across the team. You don't want any of those fall in front of a bus people, because that's not good. And your capabilities only really as good as your weakest link. If you've got three or four people and one of them falls over and they're important, then that's it. That's where your capabilities at. It's also important to note that your core business doesn't really require innovation. It just requires the ability to deliver at a certain speed at a certain rate. So those are the eight high-performance practices. Cloud-native, standardization, antifragile systems, production-like environments, continuous delivery and isolated testing, optimized batch size and knowledge lock-in. You notice I haven't really talked about agile or innovation at any point. They're actually not important, right? You don't need to do those things in order to be a high-performing team. You can. They can be part of your business strategy and that's cool. So in case you're interested in looking at this yourself, these are some of the things we've looked at. The DevOps report, this is a really cool resource for this kind of stuff. We also reference the World Quality report because they look at some of these testing things as well. A fantastic book that talks about some of the practices is the DevOps adoption playbook by Sanjeev Sharma from IBM. And he gets around the world and speaks at events too, so check him out. And finally, PlatformSH has its own internal research and polling with customers that we look at too. So I know we're done. Thank you very much for your time. Do we have a moment for questions, if we have any? Yes, so just a round of applause for me. One question, maybe two questions? So you just said that in PlatformSH you don't have continuous integrations that properly implement. But why did you go further and implement a continuous delivery if continuous integration was not? We don't have a full continuous integration functionality. So we do have some, right, where you can trigger certain things to occur and other things will happen and there's build steps. The whole build is there, but some of the things that we associate with continuous integration around testing and reporting failures, for example, are not there yet. The ability to define a set of steps yourself is not there yet. These things are coming, but we think they're kind of less critical to the process than some of the other parts. But yeah, I think it's worth pointing out that even when you claim to know what you're doing, it's still very hard to do all of those things fully. You know, it's still a lot of work, right? So you meant what makes sense and lay an improvement for later? Exactly, yes. No worries, thanks. David, that's to grab Christopher after the session for a little bit of technicalizations on a set up continuous build because it's not easy. So great, Christopher, thank you very much. Thanks. So wrap up this session and we're going to move on to the future of our data.