 Welcome to today's Postgres conference webinar, Episode 3, Accelerating Test Execution. We're joined by Justin Riag, Chief Evangelist and Field CTO of Gradle Enterprise, who will discuss DPE, developer productivity engineering as a practice, and specific tools and techniques used to accelerate the software build process. Existing build systems that execute tests serially, leaving out metrics, tracking such as test flakiness, failures, and execution time across development organizations, the danger, excuse me, of context switching between multiple activities, several techniques for speeding up test execution, including distribution tests across multiple concurrent agents, and analyzing test result trends to isolate and fix flaky tests. And he's going to demo an approach for using machine learning to optimize testing in a predictive and selective manner. This is a third webinar in a series of three. I'll include the links to the previous two talks in my follow-up email later this week. So my name is Lindsay Hooper. I'm one of your Postgres conference organizers, and I'll be your moderator for this webinar. A little bit about your speaker. So Justin's an outspoken blogger, speaker, and free software evangelist. He has over 20 years of experience working in various software roles and has delivered enterprise solutions, technical leadership, and community education on a range of topics. So with that, I'm going to hand it off. Take it away, Justin. Thanks so much, Lindsay. And thanks everybody for showing up to listen to our third episode on developer productivity engineering. Just as a quick recap, our first episode just covered the practice itself, which we're going to recap just a little bit at the beginning of this presentation. So talked about the emerging practice of developer productivity engineering. Our second webinar focused on one of the primary bottlenecks that's addressed by productivity engineering, which is the build feedback time, the time it takes for a developer to get feedback about whether the code that they've changed has actually done what they expected to do. And in this episode, we're going to talk about something similar. We're going to talk about accelerating the test portion of this cycle. And I think we'll look at some specific acceleration technologies, but we're really going to focus too on how data is used to make tests run faster and how by looking at certain metrics that are often ignored by a lot of other DevOps platforms and observability platforms, we can actually eliminate a lot of bottlenecks that hinder the developer experience. So let's kind of get right into it. I do like to introduce the concept of DPE with this quote from Eric Pearson, who was the CIO at Intercontinental Hotels Group, and he says it's no longer the big beating the small, but the fast beating the slow. And this, I think, really adequately sums up the state of the industry right now. If you look at the businesses that are disrupting and the ones that are winning, they're the ones that are able to get their features out fastest, usually in some kind of SAS way. They're the ones who have adopted 12 factor principles. They're the ones who have more mature DevOps and CI CD practices, and they're the ones who can engage with their customers and deal with a very tumultuous and very ever changing marketplace, most effectively. So everyone's a software company. Now we're tired of hearing that there are some really compelling statistics behind that. Some of you may have heard most recently that IDC has predicted that about 65% of the entire global GDP will be software transformed or digitally transformed by the end of this year. Right. So we got about another month. We're well on track for that. That means now that we have a particular workforce, software professionals and software developers who are literally building two thirds of the goods and services that we pay for. And that really means that to a significant degree, the quality of the products that they build will impact the quality of our life. So we need to take this back to the quality of their experience as professionals in this field. Software development and we often we forget this sometimes as a creative process. Right. It's also scientific, but at the end of the day, it's a creative flow that the developer is in. And the way that we develop enterprise applications now can negatively impact tool chain efficiency, which in turn is going to impact creative flow of a developer, right? A developer that's constantly having to stop what they're doing to wait for a build to finish or even worse to troubleshoot a failed build. That's just absolutely toxic to that developer's ability to carry on their creative flow. And kind of ironically, as enterprise products become more successful, as an enterprise project becomes more successful as a number of lines of code increase, number of developers, you have to increase the number of tests that are run, you have to increase the overall build time, you may be increasing the number of dependencies that are involved in the project. And so all of that leads to a developer experience, an outcome that looks a lot more like this today, where you may have developers that come in, they're good, they're ready to go. They want to start coding, and then they run into some type of failure in their local build, and they have to spend a bunch of time debugging that failure. Maybe they go to lunch, they come back, they're refreshed. They've got their local build fixed, they push it to CI to build out in CI, another failure. Now I got to spend my time debugging a CI build. All right, this is just what days look like for a lot of developers right now who have not, who don't have organizations that have invested in DPE. And this isn't us saying this, right? We recently ran a survey across a number of users who have invested heavily in developer productivity engineering, and we took a number of pain points that we thought that they were experiencing. But these are the ones, these are kind of like the top four that shook out of this survey. Number one, like overwhelmingly, the frustration was too much time spent waiting on build and test feedback either locally or during CI. I mean, 90% of respondents indicated that this was a real problem for them, a real source of frustration. So what does that all mean? Well, if we can turn this around, and if we can fix these bottlenecks and increase, decrease the amount of time it takes for a developer to get feedback about what they're doing or more, or sometimes even more importantly, troubleshoot what they're doing faster and use data to do that, then the acceleration itself can make software development fun again. And data and observation and analysis on metrics like the amount of time that developers spend waiting on test cycles and like the amount of time that developers spend dealing with flaky tests or avoidable failures by capturing that data and really making that data actionable with an organization, then we can keep software development fun, right. And this is different. This is different than what a lot of other DevOps platforms are doing, and what a lot of other what we might call productivity management platforms are doing. So we kind of have to ask the question then what comes after DevOps, and we think it's productivity engineering, because we think that if you really look at the journey for productivity going back to the 70s. We've gone further and further and further left in the process to refine to find identify and then remove bottlenecks in the process and you can look at all these different business philosophies going all the way back to just in time manufacturing. Fast forwarding through to things like agile and lean six sigma ultimately all the DevOps work that we've seen happen over the last decade or so we all you know if you if you study the business theory of all this, you go back to things like theory of constraints or, you know the work of Eli Goldrott, then, you know, at the end of the day, it's still, where can we find impediments to converting work in progress to throughput. That's the same thing. And all we did was found that there was still just a bunch of bottlenecks left a bunch of dark costs that weren't really being covered by DevOps. And so we started to say what can we do to accelerate those what can we do to get rid of those bottlenecks and where can we be vigilant to make sure that those bottlenecks don't re enter the process. And so we became this this this practice of developer productivity engineering, and this practice has been coined for a few years now. Leading technology companies are doing this right some of our biggest customers are like Netflix, Twitter, LinkedIn, WhatsApp. Gradle is obviously very popular within the Android community so we have a lot of folks who rely on mobile for their digital transformation who are using our acceleration technologies and practice to keep things fast. And I believe that that productivity engineering is not something that's just for Silicon Valley Bay Area companies. Okay, this is something that can benefit everyone. You know, we're starting to see this practice move from the emerging technology companies into fast following technology companies and they're seeing just, you know, just as much benefit. So when we say it's the next thing we really mean it literally. We want to address the next set of bottlenecks that are not in scope for DevOps, we want to focus on the developer experience and give them a more reliable tool chain, because we believe that that's the essence of fixing productivity problems not productivity management not leaning on developers to crank out more lines of code, but leaning on their tool chain to be more efficient so that their experience is better. I'm not going to go through the whole solution breakdown we're already further into the reintroduction that I wanted to be I want to get into the meat of today's discussion which is how do we accelerate the test cycle, but of the entire solution overview we, we hit all of these different pains through DPE we want faster feedback cycles faster troubleshooting. We want to eliminate avoidable failures and understand where those avoidable failures are. We want to spot inefficiencies in the build build tool and and through all of this we have a side effect of being able to reduce CI cost, but today we're really just going to focus on these two pillars. Failure analytics and specifically test failure analytics, which helps us identify flaky tests, which means that we can actually remove part of the build process, part of the friction from the build process all together developers just don't even have to deal with this problem that we're going to have to deal with, which is really the best outcome. And then we're also going to talk about faster feedback cycles in our tests through to acceleration technologies and we're going to tease a third one that's still kind of in data for us right now. The first is caching, we covered caching from a build task perspective in episode two but we're going to look at what we can do now with caching test result outputs. We call distributed testing, or parallel test execution. Now, a full disclosure right gradle enterprise is an enabling set of technologies for the practice of developer productivity engineering, but all you're really going to see a gradle enterprise is a couple of the, like using the demos to go through some of these practices. I really want everyone hopefully to be able to separate the practice of DPE following these five pillars, regardless of what technology used to do it. The practice of DPE versus whatever enabling technologies some other other companies are doing this too right launchable has like flaky tests and predicted tests I think came out pretty recently they focus a lot on the testing side. And that would definitely be what would be considered a DPE vendor right. So, so I just want to make sure that that's clear we're really talking hopefully agnostically about the process today. And you come away with this saying you know if I start capturing some of this data within my own development organization and start taking action on this data, I can be a hero for my organization. We call it build caching because tests are part of the build right, but it's really not a fair name necessarily, because I think we could call it a build and test cash right because what we recommend caching really is the output of any part of your build that doesn't have to be repeated. If related code hasn't changed. Right. So, so the scenario before we get into this is that you're a developer. You're working on a project that's hopefully kind of modular. You're working on that project with a number of other developers, and you don't want to have to repeat the entire build and test process when you just go make a couple of small changes. You can only hopefully want to repeat the part of the build and test that could actually be impacted by the work that you've done. So build caching is one enabling technology that allows us to do this. It was introduced to the Java world by Gradle back in 2017. It is part of the open source Gradle build tool so this is accessible to everybody. So it's available for Maven and Gradle and can support user local and remote caching for distributed teams. Basically, any type of test task or goal that hasn't had an input change that doesn't have to run again. This isn't done statefully. We recommend that if you want to implement a cache this way that you do it statefully, rather we do it dynamically based on the inputs to that part of the task. So we generate a cryptographic hash based on the inputs coming into in our case the Gradle task or the Maven goal. We look at that. So we look at the code, the code class classes we look at the, you know that we look at hashes on the classes themselves to see if code has changed. We look at inputs to say like the test task in this case, we generate a cryptographic hash and then key off of that and store the output, the outcome of that including you know if there was if there was associated bytecode or anything like that that was generated as part of the output, all of that gets stored in the cache against this cryptographic key. So that means that if that cash is distributed. It doesn't matter which developer or CI process or anything is running the test at that point. If it generates the same cryptographic key which it will if the input puts are identical I mean it's a fingerprint it's a hash. If it generates the same cryptographic key as someone else is generated the same set of inputs for the test test or whatever it's going to be able to pull those outputs from cash no matter what. So that's that's that's really what we would recommend is caching based on a generated cryptographic key of inputs which I know it's not a unique pattern for this, but you wouldn't say, you know builds like like a like a historic cash right that relies on like time series or or some other state right you would really want one that's that's based off of inputs here. So let's look at one. We can do a demo right now. Let's take a maven application that I've set up to allow some of its tasks or some of its goals to cash. And if you're curious. I started this application just from the spring boot camel archetype, which is just a it's a decent zippy little program that you know that that runs some tests that uses spring so it has some meat to it, but it's also pretty easy to run around with. And it's just a maven application. And what I've done is given it the ability to cash by hooking it up to a Gradle Enterprise server. So the first thing I'm going to do here is just make sure we're starting clean I'm going to make this just a little bit smaller because that's a little bit crowded for me to look at. So we're just starting clean so whoops, do all of that. And we're going to get rid of the existing build cash that I have in here so we're just basically starting clean, and I'm going to run a clean verify on my maven project so this should just be kind of a normal run and basically what everyone is sort of used to dealing with if you're if you're building with maven. So we're going to run our project. We're going to run our tests. We have four separate tests that have been created inside of our unit test class, and they're very simple, just Hello World recognition camel for those who aren't here is like a message driven integration platform. So this is effectively just sending a message through the platform saying Hello World and it's sending, you know, runs for separate tests and asserts that it gets the right output from them. Okay so uncashed, we're talking about a 12 second run. We have eight goals that run ran and eight of them were executed let's take a look at what those goals were just so that we kind of know what we're dealing with. So you can go and look this is a maven build scan. This is also part of the free tooling for for maven, you can just drop. If you're if you want to do this, you can drop the Gradle Enterprise Maven extension into your maven project you can do this today this this is hosted from maven.org at Maven Central. So you can just pull this down you are allowed to redistribute this jar if you want to. So, so anyway you can do this to run a scan today so that's what I'm looking at right now is to scan. And let's look at each of the goals that executed first. So eight goals executed. And let's look at our tests, our test actually took 6.4 seconds of the entire project to run. This is pretty typical. That you know the test part of the execution is really going to be a large part of the overall execution but that's what we're talking about today. So let's see now we see that three of these are actually cashable. That's good. A maven surefire plugin is something that that that that we know how to cash. Great. In that case, let's just run the build again. And let's see how much we can improve just with just with cashing. So now look at that. We took our build down from. Let's take a look at our overall performance 12.345 seconds to a build time of 2.8 seconds by pulling a number of different things from cash. Which was our testing step. We can see that we were able to pull this from cash and that that whole step took now less than a second. If we take a look at the performance. Look at our goal execution. We can see that three goals were executed from cash. So we were able to run our testing from cash compile from from cash and because we didn't change anything. We didn't change any code we were actually able to pull the compile from cash as well. And if you want more information on how that part works you can look at the last episode where we really focused deeply deeply on build cash. Okay, great. So that's for a local build does the same thing carry over to see I. So let's log into a Jenkins server. This is that same project just checked into get, and I've created a small Jenkins project to pull it and do the builds. So let's trigger a build. So we're running our build initially. Now this is generating from CI. So of course it took a little bit longer we had to pull code and we're running within a CI environment. Let's say that we've checked in our code merged it and it's going to build again. Excellent. We just took that down to five seconds. So this can also impact and create avoidance for us happening at the CI level, which does two things, right. It also increases decreases the amount of time that developers have to wait on feedback from the CI and from the remote build, but this also can reduce costs with so many people running CI builds in cloud. There's a direct correlation between being able to avoid some of this testing work and being able to recoup some CI savings. Not really about productivity, but it's a happy side effect. Okay, so that's one acceleration technology, but DPE developer productivity engineering being that it, the engineering part of that phrase is very much a verb. We want to engineer a better productivity experience for our developers through technology. We don't want to stop at just one approach, right, because there's other things we can do. Caching helps in areas where maybe we have a lot of tests, and we make small incremental changes to the code base that don't affect all of those those testing sets caching helps us a lot. But what about scenarios where we are making a lot of widespread changes or we can't avoid running those tests. Well, we should distribute those tests, which is pretty much what it sounds like. This is the execution of multiple stateless tests happening at once in parallel, which is going to reduce the overall time spent waiting on test feedback. Okay, so we'll be able to actually execute multiple stateless tests in parallel. These tests can be horizontally distributed across a pool of test agents, which is kind of the pattern that we would recommend, right? Several different test agent platforms are available for you out there. We of course have one for Gradle that works with Surefire, and it works with TestNG, and it works with JUnit. And any of the testing frameworks that extend from those, so things like Mach and Spock and all that. And of course there's there's plenty of other testing frameworks that have their own agent based testing environments as well that you can look into. We recommend that you distribute those across a pool of test agents. You want to take advantage of cloud and container patterns here too. And especially where possible, take advantage of horizontal auto scaling patterns. And in a lot of this, you know, this is the, this is kind of the golden era. Well, I don't even know if history will agree with me there, but certainly right now, there's never been an easier time to be able to start taking advantage of a lot of these like cloud and container orchestration patterns that are available. You know, why build a horizontal scaling strategy into your own test bed when you can just rely on like a Kubernetes auto, like pod auto, auto scaler or something like that to distribute your test for you. Then all you have to worry about is making sure that you build some type of pooling mechanism that's going to be understood by any of these auto scalers just by implementing usually pretty standard rest components into your project, which is exactly what we've recommended that you do here. So let's take a look at this then distributing multiple tests running inside agents in Kubernetes and we'll look at, we'll look at a couple of approaches. We'll look at, you know, like a single scaled approach, we'll look at a, you know, locally scaled approach. We'll look at horizontal auto scaling in Kubernetes. So we'll see how, you know, this methodology can be put in place in a number of different ways. You know, I focused on agent based cloud distribution but for, you know, for a single developer or a small developer group, just distributing your tests locally through local executors might provide you the type of benefits that you're looking for. So we'll look at that also. All right, so let's start then at kind of the, the, the most basic example, which will be our local test distribution. Now, this is actually a demo that's available from the Gradle Enterprise GitHub. So if you go to our just Gradle GitHub and if you look for desk distribution experiments you can run the same test. But here's effectively what we've done. We've created a whole bunch of projects. So this is a fairly modular project with a whole bunch of tests. Okay, I think there's something like a thousand tests in total that are going to run when we kick this thing off. So the first thing that I will do is run this with distribution off and we probably won't even finish the thing. But I'm going to turn distribution off so you get a feel for, you know, the baseline of what this thing would look like. So this one is a Gradle project, not a Maven project. So we're going to do just a standard Gradle build. I'll take a little time just to even generate the execution. Then here we go. So we're executing tests. I'll probably cut this thing off at about the 45 second mark in the interest of time. Okay, so we can tell, though, I mean, this is going to, we've, this is going to take about a minute and a half probably when we know we've got about a thousand tests. So let's kill that. So let's turn on local test distribution first. To do that, we go into our Gradle properties and we just bump up the number of workers that are available to us. So this should allow us to then execute across multiple local workers. Right. And you can see now that we're distributing this across several different individual test running agents, and it's even kind of naming which ones are executing it for us here. And then we can already tell, you know, we've moved well past where we were in the previous testing cycle. And we're, we're, you know, like what 26 to 27 seconds, we're about to finish this whole thing in about, you know, 15 seconds faster than it took to not even get through the entire build process before. We have customers. I mean, if you look at, you know, open source projects, for instance, I think the Kafka project has over 17,000 tests that it executes on every run. Right. So you can imagine, especially for these builds with a lot of tests where you can really start seeing some time savings here, even in cases where you have to run all the tests, even in cases where maybe something's been merged in. And, and you, you don't have a way to avoid or cash all the tests. Okay, so that's locally great. We're of course going to hit a ceiling there, even if everybody go gets the new MacBook Pro and one max right we're still going to have bottlenecks, because we, we still have a ceiling and that we're running everything locally. So the next thing that we can do is start distributing these agents. Across something more cloud scale. So I've just got a mini cube instance here. And right now I've only got one deployment running and I'll show you what that looks like. This up a little bit. We've got the test distribution agent. Single pod, one available. And now, you know, I haven't deployed an auto scaler or anything like that. I've only deployed one agent. And I haven't asked the deployment to scale. Okay, so right now we're just running one to one pod. I've configured my project to be able to use this. Let me turn distribution back on. Great. Okay, so I've turned test distribution back on and we just have the one agent now running in my mini cube instance but we should be able to offload work off to that agent. Okay, so, so this is our agent that's running in Kubernetes. And it's just the one pod. So we're able to offload test now between our one local executor, and then these distributed agents that are running out in Kubernetes. Okay, not bad. Not bad. Let's see. How are we doing here progress wise. Okay, so we finished everything in about 35 seconds. Not quite as good as our four local workers that we had, certainly better than running without any distribution. But of course you would never do this why would you deploy a single agent in Kubernetes and not take advantage of scale. So let's build let's use native scaling first, and maybe just give it five replicas that it can go off of. And let's give it a minute to deploy. So now the distribution should take over and can take advantage of those multiple workers. There we go, we can see that it's starting to distribute to some of those others. Yeah, and we and now we've finished all that test run at about the 25 second mark right so we could continue of course to add additional workers I mean obviously you'll have diminishing returns, based on how the tests are configured. So you can see now that, especially combining caching so combining the cached outputs of tests that haven't changed with the ability to distribute tests that actually have changed or that do have to rerun. We combine these two technologies and we can start seeing very significant savings, not just from a test feedback perspective for developers to improve their overall experience, but certainly from a CI savings perspective to. And then let's do the real magic let's go ahead and deploy some actual auto scaling instead. So let's go back to our deployment here and set our scale back to our original one replica. And then while we're doing that I'm going to deploy just a KEDA based horizontal pod auto scaler which should look familiar to anybody who's worked with something like that before. There's a new terminal here I guess to demonstrate that. So, I'm already in there. I've got long Justin, I've got a YAML descriptor for KEDA in here that we can use, and it's just a authentication based scaled object, a KEDA provided scaled object to all it is. We're giving it a polling interval of two, probably a little bit tighter than we would want for production but this is a demo. Max replica is 100 depending on what kind of loaded hits. I don't think we won't get anywhere near that just running on my Mac, right, and even my mini cube instance I think has four cores and maybe 16 gigs of RAM I think associated with it right now. This is a member I mentioned before that if you want to take advantage of another platforms auto scaling something like Kubernetes you're going to have to give it standard, you know the ability to gather the metadata that it needs to understand how to auto scale, which is done in a standard way, depending on the auto scaling platform that you're doing with. So in this case we've just provided a REST API that describes the agent pools that are available to allow KEDA to understand how to adequately scale the pods. Okay, so we've got all of that infrastructure not yet deployed but I will cube CTL, apply KEDA, great. And make sure that our deployment worked. So we should see a new auto scaler here and we do great, and it matches everything we wanted it to match. So now let's kick this test off again. Now full disclosure. The best performance is probably going to be that distributed one that we just ran with five agents available to us. Because this project isn't modular enough and doesn't have enough tests to avoid the diminished returns of having to ramp up the auto scaling like it's doing here, we just don't have enough time. But if this had been a project that runs tests for half an hour, the time it takes for the auto scale to ramp up is negligible. And I mean obviously this one already has seriously outperformed the first run that we had with the auto scaling turned on it did not perform as well as the last one that that's because it didn't have to wait for auto scaling to ramp up we automatically auto scaled. Okay, we manually auto scale. Okay. Alright, so that's test distribution. Alright, so we've looked at two main acceleration technologies here. And this is just on the speeding upside. Okay, so on the speeding upside of the test cycles. We know that we can cash our test outputs. If we are good and smart about the way that we generate cryptographic key entries to to actually store stuff in cash, so that we know that what we're storing is accurate. It won't, for instance, have a cash miss on something that's not that shouldn't have been pulled from cash that really needed to be do cleanly. Anything that we can't pull from cash we know that we can now distribute. Great. We're doing time wise. Oh, we're perfect. So that's test distribution. So, I mentioned that productivity engineering, and the practice of productivity engineering has two real areas of focus one is on the acceleration technologies themselves. The other side and the reason that we're here talking at Postgres Conf is the data aspect. You really have to use data to keep track of the local and CI developer experience the amount of time that developers are waiting on feedback from test cycles and CI or locally, and then equally important, the, the developers and the build engineers having all the tools that they need to troubleshoot any kind of test flakiness, or test failures, as well as possible. And even where possible, work on failure avoidance, where if we can aggregate common test failures across the entire organization up to a central productivity engineering team. They have the visibility to look across the organization and say, oh my gosh, you know, these developers, these 300 developers have wasted 30 minutes of their time today, dealing with this one flaky test that they have to keep running or keep debugging. This is just not data that is tallied by most of these types of platforms. Right, if you go and ask the average developer organization right now what what's their what's their build time how long does an average build time take. They probably don't know, because it's not tracked what what we're tracking are things like commit to deploy, which is important right I mean we need to understand that for understanding how we deliver. But we don't pay nearly enough attention to the bottlenecks in the developer process so this is about gathering new data. And using that data to make sure that test cycles are as slow as they can possibly excuse me are as fast as they can possibly be for developers that feedback time happens with minimal delay and disruption, and that the troubleshooting process, and the tool chain itself is kept reliable. Spotify is a company that invest heavily in predict productivity engineering they've got a lot of blogs out about what they do around productivity engineering. And they have a lot to say about flaky tests in fact, I think it was first Spotify who called flaky test the pit of infinite sorrow. And the reason for that. So to define a flaky test first what is that well it's a non deterministic test. It's a test that when given the exact same set of inputs, sometimes passes and sometimes fails. And we've seen stuff like that right now, there's usually two types of these verification tests and non verification tests verification tests are like code assertions. That's like, Okay, I've got a small calculator project. And I want to make sure that when I put two, two plus two into this thing that it spits out for every time. Well, a flaky test would be okay this thing spits out two plus two equals four and then Oh, but now it's spitting out null for this one developer we put the same things in but now it's like, it's not giving us consistent results, it's flaky. So this is this is a problem on multiple fronts. First off, it's a problem psychologically. A lot of developers when they're faced with a flaky test are just going to kick the can down the road, they're just going to be like, Oh, well, hold on, it's red, it's red, it's red. Oh, it's green. Okay, merge, pass into CI and pretend pretend it was never read in the first place. Right. Then what happens well then that flaky test turns into software flakiness down the road Spotify has a really wonderful roadmap where they will show you that pretty much every little outage that you deal with any blip in their streaming or anything that that goes wrong. It can usually be linked back to some flakiness in some test suite that was ignored because software quality at that scale. When you start kicking these things down the road they can have real visceral effects. And then with a flaky test data, you data, right you have to be able to watch how this test performed under the same circumstances across multiple developers and that's hard to do. If you don't have something that's pulling that up. So that is really where I think, you know, although the acceleration technologies are super interesting and helpful for any developers. I think where we start talking about this being very interesting for for a data community is there is all of this data that can be captured from this part of developer experience that isn't. They can have huge payoffs transformative payoffs for the organization. When we do start measuring and improving these things developer happiness is everybody's responsibility. But we as a world now as a global society benefit from software on so many different layers from from biomedical to consumer products to conveniences. There's no question that software development as a workforce is lifting all boats in the harbor and so we have a responsibility to keep this workforce happy and not frustrated in their work because that will directly impact the quality of our life. What we do that most effectively is through data. So some of the other parts of productivity engineering. We want to see outcomes including reliable builds and tests for developers, and we want to address the pain of flaky tests and not just flaky tests but other avoidable failures if we know that, you know, 300 times a certain build failed for developers across an organization over the last day, and that they had to go rerun their build after doing some debugging. If we were aggregating that data up, we could say hey we noticed the 300 you have you were dealing with this failure. Everyone go apply this thing or put in this patch or do this practice so that you don't have to deal with this failure going forward right that's the best outcome is a failure or a waste of time that the developer never even experiences because we've used productivity engineering and a proactive way to work around that. So let's start with tests. Well test results should be compared across runs and surface common failures and flaky tests. And so that really is a flaky test a flaky test is Iran this thing it was supposed to do this thing. It did it for this one developer this one way it passed maybe, and then it immediately failed for some other developer somewhere else. And before that we have two classes of these things. We have a verification test and a non verification test the verification test is the assertion test two plus two should always equal for the non verification tests are the ones that tend to be more common in organizations and harder to deal with. But ones that I believe we're going to see more and more of as the complexity of our deployment substrate becomes more and more complex and these are non verification tests these are things like I went to go hit an API on this web server, and it timed out. Now there's other developer want to go hit the web server and it was fine. Maybe this developer hit it and it got a time out or a 500 or something like that right. That's non verification there was no assertion that we wrote for that. There's something in the infrastructure that failed that is a big pit for developers because they not only did they see a failure like that and have to deal with it, and they're delayed and they're they're inhibited and they have the context switch. But there's they feel like they can't do anything about it either, because it's usually something in CI infrastructure or something in remote infrastructure that's going to deal with collaboration with a build engineer to kind of deal with. So that's where our build scans come in and there's a free build scans you can use with gradle or maven. We covered them heavily in the last two sessions I won't go back into them. But then, by looking at how these tests perform on mass across a whole organization we can start seeing when tests are flaky, and what we usually recommend and businesses kind of deal with this different ways. I think the way that LinkedIn deals with it personally LinkedIn, they schedule flaky test days were literally all they do is they get together together build engineers and their developers, they go through a dashboard that looks a lot like this. They have the flakiest tests in the organization that they can identify, and just start fixing them, because these things have a tendency to really build up inside of a code base because they're they're difficult to find. And I mean they're nearly impossible to find. If you're just relying on people comparing notes in their console log, like, you know, a couple of developers over slide, have you noticed that this test is kind of weird sometime. Yeah, I kind of have noticed that yeah that's weird. Oh well, you know and then the conversation just kind of ends there. But when you actually have the data being aggregated up and being focused on and looked at it becomes actionable and you can actually enforce things like flaky test days to go through and start clearing the queue of these types of things. And by reducing that flakiness further left. I mean, we solve so many problems we deal with developer frustration and pain. We deal with overutilization of resources because flaky tests tend to get rerun right especially remote ones if a developer runs into a flaky test in a remote CI. You better believe that there's a tendency to run that test like five times 10 times until it goes green. And that's expensive on if it's running on cloud and these are costs that aren't recouped and then further if they do that, but they didn't really address the flakiness then like we said, there's nothing to keep that flakiness from going all the way out to production and affecting the quality of the product. So so many reasons to deal with this. But by pinpointing these common failures we can eliminate them before developers even encounter them. So to demo what we do here I want to just show you what we do with the Gradle build tool. So, you know the Gradle organization as a whole has two parts to it we've got the Gradle enterprise which is our productivity engineering solutions like we're talking about. And we also have the Gradle build tool, which is our open source tool. And like we do for a lot of open source organizations, we give, you know, Gradle the ability to use our enterprise dashboards for free. We do this for a lot of open source projects, by the way, spring is probably one of the most well known ones that uses it, but this is our dashboard right here. So, we can take a look at any time we want to this overall this this overview that we have of testing data now this is data that's been aggregated from multiple developers across the organization when we go look at all the the different scans that have actually pushed data into this these are the free build scans that I mentioned earlier, you can see that we've got automation we've got team city. We've got individual developers here, pushing changes and scanning it all of them are aggregating data back up, all of them are contributing to these dashboards that show us things like the amount of test flakiness in the organization. So we can say now that we had 47 builds, where there was where there was flaky tests. Let's arrange by the number of flaky tests. Find out that this test right here is like the flakiest tests that we have in the organization. This outcome trend is kind of nice to we can see how it's been behaving kind of recently in this one view. Okay, so it would be fair to say that if we were having a flaky test day today. We would want to focus, you know, somewhere, somewhere in on these tests that are flaky, a large percentage of the time, and then also ones that have a mean execution time that take a long time because this is all this could all be representing wasted time. Each of these, if this was a unreasonable or not useful output for a developer is basically wasted 10 minutes each time, because they have to rerun it again, not knowing if it's going to pass properly not knowing if their code is affected it right. Then we can drill into it. And this is where the magic happens. Once we can actually get in here and start looking at the individual runs these are individual runs, right they run around a single box. Right. Where the test turned out to be flaky these are these are each representative of a single developer experience whether it was the developer waiting on feedback from CI whether it was the developer hoping for test feedback right at their workstation. But each of these levels is representative of that and then we we can roll all this data back up. So, between that, and the ability to look at task executions across the entire organization to that failed, being able to look at failures that developers or engineers have had to deal with across the organization. And mass this is where we can start really getting very proactive. We can start looking for instance and say, you know, one of the most impacted hosts is this person's MacBook, and we should talk to them and figure out what's going on, you know, are you, I mean we've had we've had organizations come to us, and it just turns out that like a large number of their developers are running really bad antivirus software. Like they were, you know, they had this really long build times we're seeing 20 30 minute build times locally for some developers. We just just by just by getting that data out there. It started a discussion that led to oh gosh this antivirus software is killing these workstations and the developers are having to deal with that. That's a common problem that just doesn't get talked about sometimes because no one's tracking the data and as we know what gets measured gets improved right. Okay, so let's wrap this thing up, talking about some technology that we have coming around the corner. That deals with machine learning and taking big data sets of test analytic analytics and history information to be able to give us something that is called predictive test selection. Fail is not the only company that's working on this. It's a new approach to test impact analysis in trying to determine. Okay, if we go make these changes to this code, which of these tests are likely to produce interesting results. Which of these tests based on what types of code changes were made, and what tests have run in the past and what the test output has been that it passed it at fail. We use a gradient boosting learning algorithm to predict whether based on certain code changes we should or should not rerun tests. And based on whether we think that that test will produce something useful for the developer, because if a developer runs 20,000 tests, and it takes an hour, and they all pass. Well that's good validation, but it's kind of a waste of time. It wasn't really a good use of that developer's time. It was a possibility of being able to predict with some accuracy which test should actually run that could produce something useful for the developer. So I hope that you kind of come away from this third episode of DPE with this attitude of, it's not about how things fast, how fast things are. It's about how fast things can be. When we apply acceleration technologies and data and observation. So we don't quit on this stuff right I mean build cashing the first approach that we talked about in today's talk to accelerate testing is a great start but it's not going to work in all scenarios. So we said what can we do next well anything that we do have to run let's parallelize that let's take advantage of concurrency. Let's use horizontal scaling to get that feedback faster. And even that even that can be improved upon by accurately predicting whether a test even needs to run in the first place. And that's where we're heading now. So again I mentioned we're not the only ones working on this right now. Google and Facebook I think are the two most kind of famous examples of folks that are using predictive test selection today to improve developer efficiency. So if you look it up, if you look up this concept you'll find a lot of materials from us you'll you'll see some white papers and research and stuff that we put out there but you'll also see a lot of really good information from Google and Facebook as well. So that's kind of it for today and for this episode series. Just a couple of takeaways from this we do have a white paper available. It is as I kind of promised earlier a vendor agnostic paper this this is not a pitch for Gradle Enterprise this is a handbook for putting developer productivity engineering practices in place in your organization. You can do all of the things that we discussed today without ever touching a piece of Gradle technology if you wish right. And that gives us just a couple of minutes for questions. We do have this kind of fun speed challenge that if you'd like to try you can win yourself some Gradle swag let me bring that up again actually so that you can see that QR code. And in this contest if you go in and look at the instructions. Basically what you'll do is hit up a free remote caching server. And if you want to hit your Maven project or Gradle project with it and show us your avoidance savings will ship you a swag kit. It's pretty nice got like socks and stickers and a teacher and everything like that. I got the docs and thank you. Justin this is great. Not just this one but the other two presentations as well. Yeah and of course you're free to reach out afterwards I'm pretty easy to reach email I'm also on LinkedIn and I'm pretty pretty open on there I accept pretty much anything that comes in. Justin thank you so much to our attendees thanks for spending a little bit of your day with us and I hope to see you on future Postgres conference webinars. Thanks all.