 Ah, there we go. Thank you. Sorry about that. So yeah, as Dave said, my name's Will. I worked at FoundationDB way back in the ancient mists of time, and I'm now doing something a little bit different. I'm here to talk about the future of software development. What on earth could I mean by that? Well, as somebody who is asked to give a big picture talk at a software conference, I believe I'm contractually obligated to begin with a slide depicting Moore's Law. And also, my pointer doesn't work. OK, there it is, Moore's Law. Clearly, you can see here we've got years on the x-axis there, and we've got clock speed on the y-axis, right? And as you can clearly see, it's going up and to the right, kind of exponentially. And well, hold on a second. Are you sure that's the, do we get the right slide? Is this the Moore's Law? Is that clock speed, or is that hard drive density, or maybe? Oh my goodness, it's weaving and spinning productivity. I wasn't expecting that. That's really weird. There must have been some kind of mix-up. OK, let's try this again. OK, OK, yeah, here we go. That's totally the Moore's Law slide. You've got that nice logarithmic y-axis there. You see that? Yeah, that's what? It's steam engine efficiency. What is going on here? All right, let's try one more time. All right, anybody want to guess what this one is? I'll give you a hint. The x-axis here is the actual years along which Moore's Law actually did operate. It's actually industrial diamonds. So why have I wasted your time with three different graphs, all of which are not Moore's Law? The reason is I think we in the software industry, or in the computing industry more broadly, have a little bit of a tendency to think that we're special. We think that we're somehow different from every other technological endeavor that mankind has engaged in. And something that people point out a lot is this Moore's Law thing. People say, where else in human history has there ever been this exponential increase in the capabilities of the underlying hardware of what we're doing? And the answer, as I hope I've convinced you, is actually everywhere. It's incredibly, incredibly common at first. And the reason is that exponential growth is really, really easy when what you have is really, really primitive. And so the fact that we observe this Moore's Law relationship, maybe still, is actually not telling us that software is special. It's telling us that software is early. It's telling us that we're at the beginning, that we haven't trailed off yet into a logistic curve, which is what usually happens in technology. And I actually find this fact to be incredibly empowering. It means we get the chance to be pioneers too. It means those guys in the 60s and the 70s didn't have all the fun. It means there's a lot more to do. And I think one implication of this is it also means that we should be looking for the possibilities of radical transformations in software engineering, right? Things that will completely change what it means to be a software engineer, what it means to write programs. There's a couple reasons for this. One is because we're early in history and so therefore there's still low hanging fruit. Another is that we're early in history and what that means is that the vast majority of software is not written yet. It's gonna be written in the future, which means that if we make process improvements now, they have a very long time horizon over which they can pay off. I think both of those facts argue in favor of us taking a really serious and hard look and re-examining what it is we do. And so that's the meaning behind the title of my talk, the future of software engineering. I think the future is out there in front of us. We are not there yet, but maybe we will help create it. Okay, so right, so the future. There's a lot of different possible radical transformations I could talk about, right? I could talk about new programming languages and how something from category theory is gonna totally change how everybody does everything. Might be true. That would be a radical transformation if it worked. Or I could talk about the sociology and economics of software, how who becomes a software engineer might be changing drastically. That could be a radical transformation of the software industry. But I'm not gonna talk about either of those things. Instead, I'm gonna talk about testing. Yeah, right. Because why? Because if you wanna look for the potential for radical transformations, one way of doing that, one pretty good heuristic, is to look at where things are currently the most absolutely awful and terrible and painstaking and brutal. And I think testing kinda qualifies, right? I saw a report from some analysts. It was like Gartner or Forster or somebody. They estimated that 30% of all effort in software development was spent on testing or on QA more broadly. And if you're like me, you hear that number and you're like, 30%? That's way, way, way too low, right? Like you add together the amount of time you spend writing tests, the amount of time you spend fixing the bugs that your tests turn up, the amount of time you spend fixing the bugs your tests don't turn up, the amount of time you spend working around and dealing with and mitigating all the bugs in the layers below you that they haven't fixed, those jerks. All the time and effort that goes into dealing with the fallout of inadequate testing, the outages, the economic damage, right? Like that's like we're talking at least 50% of my time. And I don't think I'm extremely unusual. And you multiply that by a lot of software engineers in the world. We are talking about trillions and trillions of dollars of productivity going into this. If you could take that and you could eliminate it or make it much more efficient, much smaller fraction of what we do, that would be a radical transformation of the software industry. And so the reason that I'm giving this talk at this particular conference is as I think people are generally aware, FoundationDB had a different relationship with testing than many products did. Testing for us was not an ordeal. It did not suck up in vast quantities of developer energy. So I'm not actually gonna dive into this a whole lot. Both because I think a lot of people here already know how FoundationDB did testing. Evan and Ben both covered it and sort of gave you some information about it. Instead what I'm gonna do is I'm gonna try and take what we did at FDB and recontextualize it. I'm gonna try and give you a different perspective on it. Situate it as one technique within a much broader class of approaches. And talk about why you should be using them. Talk about why you might not be using them. Talk about why, no really, you actually should be using them. And if I have time then I will close with some slightly more speculative stuff. Okay, but before I get to all that, let's first talk about why testing is so miserable right now. One reason that I think a lot of people are very familiar with is that your tests are usually non-deterministic, right? As soon as your test relies on any kind of external service or as soon as you run it in a even slightly noisy environment, suddenly it fails one out of every 100 times or one out of every 10,000 times. Either way, it's a nightmare. It's a maintainability nightmare. It's a nightmare when you actually go to try and debug something. You don't have confidence as to whether you've actually fixed it or not. If the test does fail, you're awfully tempted to say, well, that's a known flaky test, right? So you ignore it and then it turns out no, actually it really was a problem. I think people are also generally familiar with the way that the Foundation DB team made heroic efforts to claw back this property in our testing using a technique called deterministic simulation. Evan touched on that briefly. There's tons of material online about it. As Dave said, I gave a talk about it years ago. You can go read about that. So I'm not gonna talk about it. But all that said, I think we in this room should be aware that most of the world does not have this, right? Most of the world is still in the dark ages, can't do this, and it's an ongoing source of pain and suffering, and it's really bad. Okay, but there are more reasons that testing is terrible. Another reason testing is terrible is that your tests are very, very often fragile. What do I mean by that? I mean that your test comes to rely on properties of your system that are incidental in some way, that are not the ones that you thought you were testing. So a really simple example of this is suppose you're trying to test a computer game, all right, like a 2D platforming game. One way you might do that is by recording an input, right? So you say, if I hold down the right arrow for 6.7 seconds, and then I hit the jump key for 3.2 seconds, and then I shoot the bad guy, assert that blah happens. And that's a great test until you change literally anything at all about your program, right? You change the level layout, you change the physics engine, you change the input sampling rate, you run that test on a different computer, it's probably gonna fail. This is not like some weirdness about testing computer games. Either, this happens in all kinds of testing domains. It happens in UI testing frequently. It happens in distributed systems testing, right? Like imagine you're trying to play forward some series of events, and something changes towards the beginning, right? The two test histories are gonna diverge, and you're gonna be off testing some totally different thing than you thought you were, right? So this is a really bad problem. Like the flaky problem, it's a maintainability nightmare, like the flaky problem, you're very, very tempted to ignore the test because of this. I heard about a postmortem recently of some giant outage at some giant company that his name I shall not mention, where literally the same check-in that caused the bug, that caused the giant outage, also disabled the test that would have told them that they were about to have a giant outage. Why? Because everybody knew that test breaks every time you check something in. So like, of course, right? That's really bad. An even more insidious failure mode for this, by the way, is sometimes your test can fail open in the sense that you make some change, your test continues to pass, but it's now passing for a totally different reason than you thought it was passing, right? And so now it's a source of false confidence, and you, yeah, it's really, really bad. And I don't think people have a great answer to this problem. Okay, but there's an even worse one, which is even if you somehow make your test neither flaky nor fragile, it's probably not testing the right stuff. I think there's actually a really, like a deep reason behind this, almost like an analytic proof, okay? So it goes something like this. The reason that people write tests is because human beings are astonishingly bad at thinking through all the possible branches of control flow that a program could take, right? We're astonishingly bad at thinking through all the possible orders in which two threads could interleave or some other asynchronous construct could interleave. We're astonishingly bad at thinking through all the possible points in the execution of a distributed algorithm where a failure might occur and what form that failure might take. But that very fact, right, that's why we need tests, but that very fact means that we're unable to write tests to cover all the things that we actually need to cover, right, like, because if we could, if we were smart enough and had the right kind of brains to write all the tests we needed to write, then we would have just written the code correctly in the first place, right? And so I think this is like really scary and really true and the implication is that tests can be useful for turning up regressions, but almost completely useless for telling you about unknown unknowns. That's bad, right? The latter is I think the more important problem. It's certainly the harder problem and people try and fight against this in a bunch of different ways. People try and use coverage metrics to fight against this, which I think is a nice idea, but ultimately kind of useless too. Like, you know, your distributed systems, people imagine that you have a distributed system and your test suite gives you 100% branch coverage on it. Are you sure your system has no bugs? No, obviously not, right? You could have a bug that depended on whether some code executed on process A before or after it executed on process B. You'd have 100% branch coverage either way. This is not like a distributed systems thing either, right? Like, imagine you were writing a Python interpreter and you get 100% branch coverage. Have you tested all the interesting behavior in that Python interpreter? No, Python is a turing complete language. There's an infinite mountain of interesting behavior that you could continue to exercise. So branch coverage is a nice idea, but doesn't actually solve this problem at all. Okay, I actually think that all of these problems are just symptoms of a deeper underlying problem, the real problem. The real problem is that testing is still totally manual, right? What we call automated testing is not automated in the slightest. We have automated the absolute dumbest part of it, which is executing the tests, running them. Automated testing to most people means having Jenkins run your tests. And that's nice. That's certainly better than not doing that, right? Having it sitting there and typing. Who does that? Actually, tons of industries do that. So good start, but still we have, every one of those tests is painstakingly manually constructed by a human being in the first place. Imagine if this were any other industry, right? Imagine if I told you that there's an industry out there where scads of highly paid, intelligent people are sitting doing mind-numbing, rote, repetitive, predictable tasks with no automation. What would you say? What would your VC say? They would say that's an industry that's ripe for disruption. And welcome, ladies and gentlemen, to software engineering, ripe for disruption. And I think this is the real secret sauce behind what FoundationDB did. It's not so much just the deterministic simulation, although that was a very important part of it. It was that we didn't go down this path. It's that whenever we had a new piece of functionality, we didn't say, how can I write some tests to cover this? It was more like, how can I write a system that will forever be generating new and interesting tests that will exercise this? That's like, it's a shift of mindset. It's a shift of philosophy. And it's such a big change that I think it needs a new name. And so for the rest of this talk, I'm gonna talk about automated creation of tests in addition to merely automated execution of tests as autonomous testing, which I think is a better word for this stuff. And FoundationDB is not totally special. If you know where to look, if you squint at the industry, you can see glimmerings of autonomous testing everywhere, actually. A really simple example is just fuzzing. Suppose you're writing a parser that's parsing some untrusted bytes from the network. Do you sit there and say, well, I better write some unit tests for this? No, if that's all you do, your parser sucks, I assure you. It's wrong. What you do is you get a fuzzer to feed millions and millions and millions of random strings of input into it and see if any of them cause it to crash. That's autonomous testing. It's a primitive form of autonomous testing. Property-based testing is an example from a totally different side of the software industry. This is something the functional programming people came up with. It's something that the Haskell people, I think, came up with. The idea here is if you have a piece of your program or a data structure or something with a very easy to specify interface, what you can do is construct a specification of its contract and its guarantees. And then you tell the computer to sit there and try and come up with counter examples to that contract. This is a form of autonomous testing. This is actually something that was exhaustively used in the testing of the FoundationDB document layer, which, as you heard, was just open source. Thank you, Apple. So it's not like at FoundationDB we just had one magical tool, the simulation stuff, and we hit everything with it. It's that everybody, every engineer, was constantly, there was a culture of constantly thinking about how to take testing problems and make them less human involved. And so the question that I've been grappling with in my career since leaving FoundationDB is why doesn't everybody do this all the time? Because I've seen how good things are when you do it this way. And I've also seen, but am prohibited by NDA from telling you about how bad things can get when you don't do it this way. The answer is really bad, by the way. So seriously, why isn't everybody in the universe using this stuff? It's a real conundrum. And it turns out that there are actually some really, really good reasons why people don't do this stuff. But before I tell you about the good reasons, I'm gonna tell you about a really bad reason, which is, I hear this with surprising regularity, that the Turing-Holting theorem proves that software can never test software. Yeah, that's right. And you know, it's kind of true. You can come to me and be like, Will, if your program can find bugs, then surely I can write a program which will crash, if and only if the Riemann hypothesis is true. And then, in order to tell me if it has a bug, your program has to solve all of math. Are you saying you can do that? And like, you know, fine. Like, that's true, but that's correct. But to me, this has a little bit of the feel of people who say that the CAP theorem means you can't write a good distributed database, right? It's like kind of an almost trivial impossibility result, which nevertheless doesn't actually address the issue, right? Humans debug software. Humans are not mystical Turing oracles. Like, if humans can do it, then so can computers. We just, and we don't need to be perfect, right? We don't need to find every single bug in every single program that could ever be written, which indeed, the Turing theorem says you can't do. All we have to do is be better than humans, or faster than humans, or more scalable than humans, or, you know, different from humans. So I think this is a dumb objection. That said, I think it's actually really closely tied to a smart objection. The same is maybe arguably true of the CAP theorem. And the smart objection that this is tied to is that it's insanely difficult. And that, I think, is true. An easy way to sort of think about how difficult it is is just to imagine the size of the space you're trying to explore, right? So when you're trying to find a bug in a program, you are implicitly trying to explore the space of all possible execution histories your program could have. The size of that space is 256 raised to the power of the number of bytes of input your program has ever received. That is a mind-bogglingly large space. That is inconceivably vast, right? You could stumble around in some tiny, stupid corner of that space for the entire lifetime of the universe and never find anything. And yet, I'm up here saying that you can hit most of the interesting parts of that space in a very small amount of time. How on earth can we do that? The answer is that's insanely, insanely hard. And it's a challenge that people trying to use autonomous testing, I think, frequently get shipwrecked on. At FoundationDB, we put an insane amount of effort, I should really say, Evan, put an insane amount of effort into fighting this exact problem, right? Tweaking and tuning all the knobs, bugifying things to turn up bugs more often, tweaking the random number generators so that you have processes crashing just often enough to turn up interesting behavior, but not so often that you never make any progress. This is a really hard thing to do. It takes somebody with deep knowledge of the system to do it. It's like, it's really draining and tough work. And if you get it wrong, the failure mode is, your tests say, everything's good. That's a really, really scary failure mode, right? You need some kind of side channel to tell you that no, actually, everything is not good. Hopefully that side channel is not your customers. So I think that's super scary. The next thing is also super, super scary, and I think it's probably an even bigger barrier to adoption. So all of the autonomous testing techniques I listed, and other ones besides, which I haven't listed, all depend in some way, they're all a little different, but in some way they all depend on your software being in the right form for them to test them. You can think of the whole FoundationDB deterministic simulation system as having the job of wrangling a big, hairy, stateful, distributed system into that form, right? Like, taking it and turning it into a box that took an input, which was a random seed. Like, pretending it was a pure function so that we could do that kind of thing. The only reason that wasn't a complete and utter debacle was that FoundationDB was constructed with tremendous, tremendous foresight to enable that. Most people can't do that. I mean, start with the fact that most people aren't writing software from scratch, right? Most people are working on some ginormous Java monstrosity from the 1990s. Even the people, though, who are writing something from scratch rarely have the foresight to say, oh, we're gonna write a distributed database. First, we're gonna spend a year writing a simulator. Like, who does that? That takes a special kind of person to do that. Like, and you know, a lot of people, even if they did have the inclination to do that, just wouldn't have the time, right? They need code now, not in a year, not in two years. So I think this is a huge, huge problem, and so long as it's almost impossible to apply these techniques after the fact, it's gonna greatly limit their use. Okay, and the last thing here is what I'm calling test oracles, which is just a fancy way of saying that even if your test turns up all the bugs in the world, it's useless if it doesn't know when to alert you that it's found a bug. Depending on the domain, this can be either really, really challenging or really, really easy. If you only care that your program doesn't crash, good news, it's pretty easy to tell when a program is crashed. It's pretty easy to tell when it's run out of memory. It's pretty easy to tell when an assert has fired, say. But suppose you were working on a graphical application. Suppose that bugs come in the form, this border was five pixels too far to the right, or the bug is the screen turned purple, right? You might need a much more sophisticated system to tell you when you've actually found a bug. This happened with FoundationDB2. There were database invariance, asset invariance, which were too complex to check with a simple assert statement. And so what we had to do was design entire workloads dedicated to setting up those invariance and then checking that they were true still at the end. So again, depending on your domain, that can be a real challenge. Okay, but despite all that, you should all still be using these techniques. And this is what you should be doing. And there's a bunch of reasons, some obvious, some not so obvious. So first and most obviously, computers are cheaper than people, you might have heard this. They're especially cheaper than software engineers in the San Francisco Bay area. So when you have computers writing your tests, instead of people writing your tests, you can write many, many, many more tests. And you will get a whole lot more testing done. And that will probably be good. Almost more important than that though, is not just that you're gonna do more testing, it's that you're gonna do different testing. Sometimes when I talk about this stuff, people ask me, Will, are you saying that I should like throw away my lovingly crafted test suite and just use your crazy computer thing? No, don't do that, right? Keep your test suite. That will find certain kinds of bugs. And the computers will find other kinds of bugs. Because computers and humans, like we're all unique and special snowflakes and different and that's wonderful. But we're all humans. A computer is a total alien. All that stuff I said before about stuff humans are bad at thinking about, computers are really good at thinking about that stuff. And so humans will find some bugs, computers will find others, and this like beautiful synthesis of man and machine will come together and it's awesome. Another cool thing about computers is you can turn them on and off very easily. In fact, I hear that you can even go to Amazon and ask for a computer for a limited amount of time and they will turn it on and off for you. This is really great. This is again something that's a little bit different from human beings. If you want to hire more test engineers or more software engineers, that takes time. You cannot hire and fire them instantaneously. They take time to be trained. They have to sleep, they have to eat. It's so annoying. Computers are not like that. If you want to do a medium amount of testing or a small amount of testing all the time and then a really, really, really large amount of testing all at once, you can do that very easily using autonomous testing. That said, despite that, I actually think that that's not as important as people think it is because I think that most of you should be doing way more testing all the time than you think you should be doing. And the reason has to do with the last of these reasons here, which I think might be the most subtle, which is latency. And by that, I don't mean the latency to running a single test. I mean the latency to running enough tests that you find a single bug. The reason is that the amount of effort it takes to fix a bug is extremely strongly tied to how long it's been since that bug was caused. I think we all get this intuitively, but just to hammer the point home, right? Two situations. One of them, you're writing some code and you go hit submit to your repo and some pre-submit hook runs and it says, blah, you have a bug. You can't submit that right now. That's great. That's so awesome. You have all the state of the problem still in your head. You have a literal diff in front of you, right? You know probably the bug is in one of those lines of code. This is about your best possible case for finding and fixing that bug. Compare to situation number two, where your friend causes a bug and it slips through your testing and then it slips through your release testing and into a release and then some time goes by and it slips into your next release and then six more months go by and your friend quits and then four more years go by and you're on call and your largest production customer calls you in the middle of the night and says, hey, something's not working. What do you do? I mean, like you're doomed, right? Like it's gonna be so bad. Even if that were literally the same bug, right? The amount of effort that will go into even determining if there was a bug in the second case is gonna be orders of magnitude larger than it would have been to just fix it in the first place. But the good news about testing or at least about autonomous testing is it's an embarrassingly parallel problem. You can get a million CPU seconds of testing done in one second or five seconds, Amdahl's Law, by running it on a million different CPU cores. That's really cool. It means if you're willing to spend the money, you can drive down the latency between causing bugs and fixing bugs and that is going to repay you in money and productivity and frustration and everything so fast. It's going to change your life. It's gonna qualitatively change the kinds of things your team can do. Yeah. Okay, so I'm gonna wrap up now. I'm running out of time. And I wanna wrap up by returning to that question I raised before which is why don't we have computers test all our software? I mean, I know I told you the reasons, right? There were a ton of like really terrible reasons. But seriously, what would it take? What would it take to change that? Like how can we bring about the kind of epochal transformation analogous to the invention of the compiler where only in really weird or exceptional situations do you go back to doing it the old way? And the answer is I don't actually know the answer to that question but I'm currently trying to figure it out. So Dave Scherer over there and I have recently started a company and our goal is to try and figure out the answer to this question. And then thereby making autonomous testing super easy, thereby making it universal, thereby radically transforming the software industry, thereby ushering in a new golden age of peace and prosperity for all mankind. It's pretty early days yet. It's really early days. But we have a vision and we have a technical roadmap that we think has a decent chance of getting us there. Okay, so there's a bunch of different, I would love to talk to you if you're interested in any of the stuff I've talked about here today. There's two kinds of people I'm especially interested in talking to. Those would be people who might be interested in coming to work with us and people who might be good early customers. The latter being like people who have really scary QA problems that they feel like they don't have a handle on and they think some of these techniques could maybe help because we have a prototype that might kind of work. But that said, like I'm also just interested in talking to anybody who's into this stuff because this is still a really small niche world and not a lot of people think about this stuff. And so if you do, I would love to meet you. And I am definitely over time so I won't take questions. Thank you.