 Hi, everybody. My name is Quinn Tran. I am a Raytheon software engineer, and I'm also Kessel Run's engineer practice leader. And I'm Brian, founder and CEO of Kessel Run. By the way, if you have nice things to tweet about this talk, it's at BJ Kroger. If you have unkind things to tweet, it's at US Army. Kessel Run is a disruptive government program that is transforming the way that the Air Force builds and delivers software. We started with a single legacy system, and now our portfolio has grown substantially as we demonstrated results, like reducing initial product launch to 120 days on average with average weekly deployment frequency on every team thereafter. So getting there was quite a journey, and a very important part of that journey is testing. So today we're going to talk about how to test all of the things, not just your units. And as we reflect on all of the practices and lessons that we've learned along the way, a central theme out of our successes is test. Now, this won't be a technical deep dive into advanced testing implementations. Those aren't actually what broke us through the QA barriers. They were important, but what is equally important is actually the conversation shift, and that's what got us through. So the purpose of this talk is to change how we think about and communicate about testing in relation to all the things that you're hopefully already doing. And this will help you better align with stakeholders as they might be resisting the idea of continuous delivery. So as Brian mentioned, testing is everywhere and every aspect of our practice. Whether you're a PM, designer, or engineer. And so any moment we're thinking about creating a product or writing user stories to implementing the story to pushing for production, we always need to have that mindset of testing. And so as we're going to take you to the journey from pre-built to post-built, we're also going to share you some of the meta concepts, things that we're thinking about moving to the next step. So anytime you're engaging stakeholders, it's a really great idea to start with a set of mutual agreements. And here's some that I recommend for the test conversation. First, our priority is to satisfy the user with early and continuous delivery of value. Test doesn't exist for test sake. It exists to enable an outcome in a safe and sustainable way. And so as we think about how we could improve these outcomes, we have to compare outcomes to outcomes. Talking about people's intent, especially if it's maligned or unrealized intent, those things are off the table. It's also important that we realize security, quality, and risk are all relative. So there's no such thing as quality assured. Only quality improved or quality decreased. And finally, it's important to remember that quality and performance aren't the only risks to operations. Delay is its own operational risk, and we need to balance it against the other two. Stated another way, speed reduces risk. You need to be able to dodge 21 punches in 10 seconds. If you saw Abby's keynote yesterday, she said it's time to drop the butterfly analogy because digital transformation isn't magic. She said it takes mastery, and I love that. Muhammad Ali was also a master of his craft. I love the rapids analogy I really do, but my experience with digital transformation has looked a lot more like a 12-round boxing match. Maybe that means I'm doing it wrong, I'm not sure. But regardless, I'm not ready to throw out the butterflies just yet. I'd rather take that analogy and apply it to the boxing ring. As Muhammad Ali said, devops like a butterfly, tests like a bee. How's that for a punchline? Bad, it's so bad. Anyways, it's easy to imagine that your opponent in the ring is actually the other, right? The other guy, the other gal, the ones that don't get it, the ones that resist your change, the ones that you have to navigate to get anything done. In reality, your opponent in the ring is actually your own collection of assumptions. They hit you where it hurts when you least expect it. They're the things that you have to destroy. It's not enough to just turn your caterpillar IT shop into a magical butterfly. You really need to focus in sooner or later. You have to knock out your assumptions if you want to win the market. That means to us that you have to be tested AF. And that's why we want to test all the things, especially our assumptions, all the time. And so when we mean all the time, we mean that we test every time we build a new feature, every time we implement something and push it into production. So we always want to make sure we're always testing daily as often as we can, running our test suites, for instance, and also testing in production to get that feedback as often as possible. So when we say we're testing all the time, we're never really done, right? As long as we continue to deliver our product into production, it continues to delivery, we're not done. We're just gonna keep going. And so I love the saying when we say, good engineers meet the requirements, great engineers question the requirements, right? So as we go back to this theme again, everybody needs to be thinking about how do we test our assumptions? And so as software requirements are always changing, we can't afford to wait until later to find out that our requirements is outdated or is irrelevant, right, after we build a product. And so we want to adopt this mindset of testing our assumptions often, validating and learning from these so that right before, even when we start building these products. Right, absolutely. There's nothing as useless as being really good at building the wrong things. And the requirements that we often get in waterfall are for those of you that are really advanced at waterfall, water scrumfall, they're really just assumptions about value and priority. And those are the first things that actually need to be tested. A great contrast of how important it is to test your riskiest assumptions before you start building is Zappos versus WebVan. So in Zappos, the founder had this great idea to sell shoes online. Riskiest assumption, believe it or not, back then was would people actually buy shoes online? And so he went into a local shoe mart, photographed shoes, put them online, customer sold them. He went back to the shoe store, bought them, put them in the mail and sent them to the customer. Customer never knew the difference. And he was able to validate his business model using a kind of Wizard of Oz, you know, don't look behind the curtain and ended up with a really successful business. WebVan on the other hand, peak of dot com madness. They went out, got tons of investments. They built out a full set of warehouses, a full fleet of vans. They had a full product offering, just like very robust upfront without doing a lot of the initial validation work. And no surprise here, shortly after launching they went bankrupt. Now people will say there's a lot of reasons why they didn't succeed, but I think you can trace all of them back to not validating the riskiest assumptions before they started building out their product. So prior to building anything, we do a lot of activities like event storming, discovery and framing, inception, just so that we understand our problem space and where we're going, right? So that allows us to lay out our assumption ahead of time. So as we're implementing, building it and getting these feedbacks, we validate whether these assumptions are true or not. And then so in any of these roles, right, we need to have that test mindset that we mentioned earlier. And so like for a PM, right, there always have to be thinking about what's the next valuable thing that we're building so that it meets the business needs. And then so for the product designs, right, we're always thinking about how do we, what is the next valuable thing that we're building for our user? Are we possibly, is this feature that we're building help change the user's behavior in a right way that we're looking for? And then finally for the engineers, we're thinking, is this feasible, right? As we're implementing these stories, does this make sense? Can we actually accomplish this? And can we actually ship it to production? And so when it comes to building these products, there's always gonna be risk. And we think about how do we buy down risk? How do we address these risks early on and make decisions with all these information that we learned? So we, I guess around we practice lean. And so we're always trying to constantly testing our riskiest assumptions, seeking those feedback and constantly learning and validating. So that way with all the learnings that we have, that helps us decide whether we should pivot or persevere. And perhaps more than anyone else on the team, a good designer has to make the riskiest assumptions of all. So if you want to avoid building faster horses, right? User driven design that Henry Ford cautions against. And instead synthesize pains and deliver solutions that nobody could have ever imagined, the truly innovative solutions. You're really gonna have to push the boundaries on making assumptions about what your users need versus what they want. So it follows that designers, as much as anyone on the team, if not more, really need to validate their assumptions before coders code it. And so there's a lot of ways to do this. You're probably doing a lot of these things, but you probably aren't taking credit for them as tests that reduce risk and improve quality. So of course we don't get final validation until we put things in the hands of users in production in a real live environment. But there's a lot of things that we can do along the way to bite on our risk. And so maybe it starts with an interview, then hand drawn sketches, mock ups, wire frames, maybe finally ending in a clickable prototype like you see in the top left, that we hosted in Envision. And the user then says, wow, this isn't what I was expecting at all, but I love it. So now it's finally time to start thinking about how we might build the thing. And so in engineering, we're also thinking about that as well, right? Building things are easy because engineers, we can do anything, right? But building the thing right is hard, especially when it comes to like designing the architecture, it takes time. And so as a result, we should also take a lean approach to how we implement. As we like, for instance, as we're growing our architecture, we're evolving to the way that we want to be. And so as we're implementing something that might be technically challenged, we also need to give feedback to our PMs and designers to show that, hey, I know this is exactly what you want, but there's alternative solutions that we can implement to reduce that risk, technically. And then work together to achieve that same outcome. And so another way that engineers reduce risk is doing spikes, right, in stories where we explore certain technologies or different practices before we introduce them into our code base. And those are opportunities that we can learn more, pay that time to learn about the challenges, the risk, but also in addition, we look at tech debts, right? Tech debts are also risk as well. And we always want to keep an eye on that and pay them off as often as we can. And so now we're entering to the build phase. At Castle Run, we practice XP. We do pairing, we do TDD. We also do CI CD as well to allow us to go fast. So with pairing, right, it allows us to build the thing, right? Meaning as you're pairing with another engineer, we're always constantly making sure we're focusing on completing that story and delivering the right thing. Incorporating the quality as long as, as well as sharing context and help catch bugs as early as possible. And in addition with TDD, right, we are building high quality into our product. So when we talk about writing tests first, we talked about like addressing being able to go fast forever, right? Being able to speed is reducing risk. So in order for us to go fast forever, we need clean code. And clean code is not something that's easy to come by, right? It takes time to invest, to care for our code, in addition to care for our tests. In order to even do that, right, we're constantly refactoring, like I mentioned earlier, we're evolving our architecture, we're cleaning up our code, we're refactoring by making sure we don't change our behaviors. And so to do all that, we need confidence. And confidence comes from having those tests. And not just any tests, it's test driven development. Taking the approach where we care about, thinking about all the behaviorals that we want to achieve, what success should look like, before we even implement. And when we implement, we think about, what is the simplest thing I can do to get this test to pass? And so in that sense, we're reducing waste as well. And now a lot of times, right, as we're growing our test code base, test end to code base, we're growing our test strategies and figuring out what it should look like. And if you flip that test pyramid upside down, it can be seen as a filter. So on the bottom, right, we see all these unit tests, they're cheap, they're fast. You can think of them as like a very coarse grain filter to catch all the bugs that can be reproducible, right? And so when you catch those bugs, you can add another test to help catch that bug in the future. And as you go down the filter, right, there's integration, there's end to end tests, and the end, right? All those, the ones that I actually made it through, the bugs that made it to production, or the ones that needs to be, needs a human to catch that. It's not something we can automate the test to capture it. But it's really great if we be more intentful about like how we build this test strategy, and so that it will do the work for us in capturing bugs early on. So I mentioned the outcomes earlier. The outcomes of building a really efficient funnel are extraordinary, but it's important to note that we shouldn't make villains out of the people that exist in the legacy system, the legacy way of doing business. What they do today is actually very appropriate given the environment that they live in. And I should also point out that in our case, they've gone to great lengths to make their process more agile. It used to actually be one big event every couple years, and now they're testing quarterly, and my hat's off to them for doing that. It also is a huge boon to us as we look to integrate with the legacy system. It gives us the ability to put changes into their backlog as well that we need for integration, and get them into production much more quickly. So hats off to them, but in the legacy system, we don't have a lot of automated tests. There's systemic reasons for that, most of which extend well beyond our program offices and the people that test within them. But the result is that the system for the most part has humans as the filter. We have humans catching regressions, which is a terribly inefficient use of human capital. Right, and fun fact, since we started since August 2017, we've grown from one team to 18 product teams. And so up until now, we've used and run our test suites across 18 teams at least 33,000 times, and that's huge, right? So on average, we're committing our code and running our test suites at least twice a day per team. Yeah, and so my goal with Kessel Run was to change that environment that QA exists in. And we wanted to do that so that we can actually change the way we test because scaling the thing on the left is very costly. Scaling humans is expensive. It also comes at a cost of quality. So humans aren't nearly as efficient at catching things like regressions. Also, there's inevitable delays in the left that present really long feedback loops that increase our risk and also present high opportunity cost when it comes to getting our product to the market too late or doing so with too many assumptions. So as we replace the legacy system with the new, my hopes that we can take those same humans and their context and domain expertise and use it in new and in unique ways. For instance, exploratory testing is still a thing. Quinn mentioned some things still get through the filter and maybe we could debate if you can catch them technically or not with automation, but regardless, the cost of doing so in an automated way starts to grow exponentially and actually humans become the more efficient ones when you reach the bottom of the funnel if you've built the funnel correctly. So let's take those same people and still have them do quarterly events just like they do today, but instead of it being a gateway to production that stops us if the test doesn't go well, it just instead becomes one more feedback mechanism that informs our backlog. And also sometimes the balance team, as I've shown here, doesn't actually overlap. So we always draw it that way, but it's usually never true. The overlap is often very unequal based on the backgrounds of the people that exist on the team. And so sometimes you have an inexperienced team or a team that's in a new domain and doesn't have a lot of domain knowledge and these gaps start to emerge. And that's when it's actually appropriate to take an exploratory tester and embed them on the team to fill in a lot of those context gaps. And so you wanna do that actually during the build phase too because you can identify assumptions and bugs that would otherwise be missed by the balance team. I call these unknown unknowns. And when you see that your team is being slowed down by things you didn't anticipate, unknown unknowns, and every effort you make to try to identify those things beforehand is failing, it's a good indication that your team lacks experience or context in this domain and that you should bring in an exploratory tester and establish an exploratory charter. All right, besides just automating test suites, right? We also wanna make sure we are automating like the way we do code calling scans, security scans, testing our code every time we commit. And so as we're continuously integrating, we are sure and feel confident that our code is in a good state. And if there's not, right, we can always roll back. And so all these automations allow us to get feedback very quickly. As soon as we commit our code into the pipeline, we get feedback whether we're missing something or whether we don't have quality or we cause a test to fail. So that feedback is really effective because we can address it as early as possible. And then also with this, right, it gives us a freedom to ship at any time. So what happens after we finish building? Yeah, so as developers complete their stories, PMs go through the acceptance criteria to test whether or not the implementation is really doing what we as a team agreed to from the start. But it's also a really great way to naturally enforce CI, the CI that Quinn described. So many organizations throw out the term and say they're doing continuous integration, but if you actually measure and look, they'll have very long-lived feature branches. It's very common. And it's a hard thing to combat without being very directive, which usually isn't the kind of culture that we want in an engineering organization. So this is a great way. If you're struggling to do trunk-based development, what I would offer up is make it such that you can only get credit for your work by checking it into the acceptance environment and you can only get to the acceptance environment by merging into main branch. So once a PM accepts the story, though, then we get to see if the users agree. For us, we actually do this at a small scale before we push out to production. Once we release the software into production, though, is when we really get to test all of our assumptions. Did we build the right thing? Did we build it right? Does it actually produce the business value? Sometimes the business value conversation gets lost in empathy for users. But ultimately, we do want to make sure that users find our software joyful to use. And this is the ultimate test. But there's a couple other things. We can achieve the outcomes of reduced risk and sustainability in other ways with this. So as we iteratively release, we're actually training our users in small increments. And for us in the business context, the domain that we exist in, our users are employees of the business that we're working for. It's internal IT. And so training is a huge risk for people in large scale change. So for new and existing users, the barrier to entry is much lower. And then as we create change, we can do it intentionally and create ease of use that caters to the user's workflow. So it's much easier for them to adopt. And then the other thing that is kind of a follow-on to that is you'll often get large-scale rejection from users when you're trying to influence or change user behavior and you try and do it in too big of a batch. And so part of our task is not only to automate the Air Operations Center, but also to change some of the business processes that have been around for a really long time and are quite inefficient. And so doing that in a large batch is really difficult. And going iteratively allows us to influence their behavior over time. All right, so what happens? What made it through and how do we prevent that from happening again? So in this case, our organization is getting mature. We have a mix of new and maturing teams, application teams. And for new ones, they really care a lot about delivering features into production as fast as possible. Before mature app teams, they start to care a lot more about availability and how fast can we respond to production issues. And so we're getting into a practice of monitoring driven development, or MDD. It's very similar to TDD, where TDD gives you feedback on your code design. MDD gives you feedback on your application business logic. So in this sense, monitoring and testing are very closely related because you're monitoring your application to test whether that application has fulfilled that requirement in production. And so when they don't meet that requirement, you want to take note to be noticed and get notified and take action on how can we fix that, whether it's automated or figure out the next step to do that. And so the movement of testing, monitoring, towards the earliest definition of the face in production lifecycle is also called shift left. So in this case, for us, a lot of factors determine our influence, our performance key indicators, such as production failures, production bugs, system performance issues. So as your organization decide what metrics that you care most about, that's what you would then focus on and then work with application and platform operations team to ensure that we actually are meeting our metrics. So Kessler Run exists to build a product for the Air Operations Center that I mentioned. But we understand at the end of the day, our product is a system of systems, not just a collection of individual applications. And we didn't nail this one out of the gate at all. At this time, we have 20 applications. And I would say they're only loosely integrated at this point. And our next challenge is figuring out how to put these applications together to form a cohesive system. And we learned the hard way in order to do that successfully. We've had to really think about how our organization is structured, testing some of the assumptions we might have about organization as it relates to implementation or Conway's law. Conway's law tells us that organization structure is important. It is often reflected in our system's architecture. And so a really important thing that we did is we employed the inverse Conway maneuver, where we restructured our organization to achieve some of the architectural outcomes that we wanted to see. So we divided up our organization into three portfolios, which are our bounded contexts within this particular domain. And as our product teams are growing, we have portfolio teams influencing and aligning the teams under the same vision while still giving them the autonomy they need to serve their specific users, which is really important. And we're advocating for our balanced teams at every level because we understand that we also have different sets of users at every level inside of this business. And so we have to make sure that our balanced teams can address their needs. So at every level, we have to be sure that we have a backlog of stories. We want to use the same process that we use at the lower level that represent the integration aspect between applications, between portfolios, and ultimately at the system level. So Quinn brought the practice of capability stories into our organization to help with that. Yeah, so when it comes to building a system, we should think about it in lean. Building a system should be no different than how you build apps. So in this case, we take it to another level where capability stories represent units of integration or a slice in the system that we're trying to build. And so an example of a capability story is something like this. Given I have some targets when the targets are in the approved list in application one, then I see assets mapped to approved targets application two. So these are two applications that I would have otherwise be independent. They're not talking to each other. And so the capability stories tells teams that, hey, this is a behavior I want you guys to observe and come together and figure out how do you create this implementation integration. And so the other thing we talked about is a challenge day. It's an opportunity for applications team to come together and share their vision, what they're going, in addition to their integration accomplishments. So when we only had a few product teams, business leadership was able to maintain broad and deep context on every team that we had. But as we grew, the context started to drift. And we came up with the idea of context anchors, not to be confused with middle management. They comprise a balanced team of their own at the portfolio level for a group of five or so product teams. And it varies based on the bounded context that we're looking at. But we took some patterns from commercial organizations and applied them to our context, positions like associate director, engineering lead, technical strategy lead, and product lead. One of their stated goals is to override team prioritization as infrequently as possible. So Quinn kind of hinted at this, but a great way to get product alignment without being too rigid or using a command and control style of requirements process is by writing the capability stories that Quinn described. All right, so here's like a little demonstration of what we mean, right? So assuming we have these five different independently apps that are running, each capability stories kind of encourages those two teams to kind of work together and say, hey, here's the integration I want you to do first, because that's really important to us. And then as we kind of iterate to each one of these capability stories, we're constantly making those connections. And eventually we'll iterate to the point where we have a system with the desired capabilities we're looking for. And so these are the things that we want to talk about, the desired benefits. The capabilities are meant to align teams together toward a common goal, right? We're creating these opportunities for teams who are co-located to come together and talk about, hey, this is the integration that we need to do. It's really important for all of us. And in addition, every time we complete these capability stories, that means we can deliver this capability out to production, right? And then also the challenge day. It gives all teams a sense of urgency. There's a due date. There's something that I need to show, right? In addition, it creates the idea is we want to create a safe space so that when teams are demoing this in our development space, it's an opportunity for them to fail, right, to learn as often as they can as they iterate through this prior to going to production. We don't want to learn that lesson in production. So that's an opportunity there, shortening that feedback loop. And then finally, right, on these challenge day, we kind of know where we are every step of the way so that are we making progress toward building that system we're looking for. So to wrap it up, remember that your opponent in digital transformation boxing ring is actually yourself in the collection of assumptions that you have. So to be the greatest of all time, to be the Muhammad Ali is to be tested AF, which means you really have to test all the things, including your assumptions all the time. Test your assumptions before you start building, while you're building, when you complete building and after you ship, you should always be validating. And don't be fooled by randomness into believing in your own myth, right? It's something that gets people all the time. Black swan farming is the only way to gain and maintain a position in today's market. And you need to optimize everything you do then for being wrong. And to win against your false assumptions, against your wrongness, you need to devops like a butterfly and test like a bee. So thank you, everyone. We're hiring. Check us out at kesselrun.af.mil. Any questions? Yeah, so you asked about like how do we, are we actually doing TDD from the start? Or if we're not, right, we're dealing with legacy applications. How do we introduce that in? For a lot of our applications right now, we start at Greenfield. So they're all brand new. And with our engagement with Pivotal Pivots, we actually do the practice TDD from the beginning. So we're always making sure that we have this test suite and giving us that confidence to deliver as often as we can. So in answering your questions, we do use TDD every day. And we have done some legacy re-platforms. And in those engagements, we employed domain-driven design to section off portions of the code base that will then treat if there is low test coverage. We'll treat those as black boxes and employ testing as much testing as we can around it to buy down our risk. But ultimately you're still dealing with a fair amount of risk when you do that. And so you wanna be very deliberate about what sections of your code. You know, we talk about the strangler pattern and people sometimes I think assume it's just magic and you just deploy all the things and then all of a sudden you're strangled but you have to be really deliberate about which things you choose to strangle. Otherwise you, especially if it's a production system, you're introducing a lot of risk if you don't do that correctly. Yeah, so we, with technology stack we're using for CI CD. We are currently using Concourse as a way for us to continuously deliver. Yeah, so for test suites, for genuine tests, we use Java, so Java libraries, test libraries. For genuine tests, a lot of teams use Cypress. They also use Puppeteer. So depending on different teams, they're exploring different technologies on like how they can test. And then over time we'll come out with best practices on which test tools are best for us. So we're still exploring at this time. And also it's important to throw security in there as well. I think sometimes we separate that as a separate kind of testing, but we're also running like Sonar, CUBE, Fortify, OWASP, oh my gosh, a whole bunch. It just depends on the team and the language they're using obviously. And then we didn't mention but journeys. We employ journeys as well. So you're asking whenever certain things fail in the jobs, in the pipeline, do we skip it? The answer is no. The reason is that we have monitors in our workspace. So if it's red, it's already telling everybody around that we failed something, right? And you can't really sneak by it. In particular, our PMs are very noticing that pipeline right there. They're not gonna let us like sneak through and like say, hey, let's release and then figure out how to solve this after. It's not the most responsible thing to do. And a couple more notes on that. Two things that are really helpful there. One hour release pipeline is owned by an independent team. So all application teams have their own dev pipeline and they're free to do whatever they want in it. But the release pipeline is managed by a separate entity and it's enforced ruthlessly. As is the security jobs are actually managed by our security assessors. So we had a lot of luck here. I say luck, it was a really great strategy. In giving our security assessors an unprecedented level of visibility. Like most of them would only ever receive stacks of paper about what our security scans and findings were. And now they have full access to our Git repository, to our Pivotal Tracker, to all of the security scan software. They own our Fortify rule sets and then they own the pipeline ultimately. So if you don't pass the release pipeline in the security pipeline, you can't deploy to prod. And then the next question would probably be, well, couldn't people just fake their tests? Cause that's actually not that hard to do it. At the unit test level especially, and pairing is your friend here. We rotate pairs every day. So if you wanted to fake your tests, you'd have to convince probably everybody on your team to do that. And so in addition, we're not like any other commercial companies, right? In order for us to push our application to production, we need to have that continuous ATO. So meaning, right? We do pairing, we do TDD, but we also have to make sure we address all the security vulnerabilities before we push. Otherwise we're violating that and we can't get into production. And we're really bad about acronyms. ATO is authority to operate. Thank you. So there's a thing in the DOD called authority to operate. You cannot push to our production environments without it. Yeah. All right, well thank you everyone. Thank you. Appreciate your time.