 So good morning, everyone, from me, too. Start. So first of all, it's great to see so many people here today. And with all the Boston parties going on, that's kind of a surprise, right? Because people don't tend to show up for the morning sessions after drunk, for example. So thank you very much for coming here today. But before we start, who can tell me? Louder. Louder? OK. OK. So before we start, who can tell me what's in the picture here? Any guesses? Shout if you know. Right. Excellent. That's the venerable, electronic, and numerical integrator and computer. And that's a photo vignac from late 40s, early 50s in the US. And I find this picture very, very fascinating for various reasons. The first is all the wires. They are just mesmerizing. And I don't know if I should be ashamed or proud of my desk looks like this sometimes. But the most important thing here are the two programmers. In particular, the programmer on the right, who is Betty Holberton, and is the creator or rather the inventor of the breakpoint. The breakpoint is this fundamental tool of debugging, but also of testing, especially in those days. Because how did you do testing in those days? Well, you could test end-to-end and test the end result. But if you wanted to test anything internal, then you just had to stop the world and check the hardware state. And that's where the breakpoint is very useful. And we have come a long, long way since then. We have better tools. We have better processes and better improved practices. But at the same time, it feels that we may be a bit stuck in this era also. We tend to forget to underutilize the tools and forget all the lessons. And in this talk, I'd like to explore this from a free software perspective, where things I feel go wrong and how we can improve. So this is a picture I took perhaps 10 years ago from a tour shop. And that sign in the picture had caught my attention immediately, even though I hadn't realized what was going on from the start. And take a minute to see if you can find out what's going on wrong here. But this was my subconscious telling me, ringing bells and alarm bells, something is wrong, danger, danger. In the same way, over the years, I had internally developed this impression that not everything is rosy in the automated testing world concerning free and open source software. It was mostly back then based on anecdotal evidence of a general feeling that I was seeing project bugs that were inconsistent with having comprehensive automated testing. And this kept bugging me for many years. At some points, I decided to explore. But how do you explore this? How can you make just a large scale survey and have some kind of conclusion for false in general? So the canonical metric for test comprehensiveness is test coverage. But this is quite hard to extract from various code bases. And for example, you have different languages, different tools, different build systems. So I was thinking, what else can we use? And I considered two metrics in the end. So the first is test commit percent, and the other is test size percent. So test commit percent is the percentage of commits in a code base, which affects the test in some way. And it makes intuitive sense that if this number is large, then the tests are being taken care of, that they are developed in sync with the code. And thus, they are probably more comprehensive. And similarly for the test code size, neither of these metrics is fail-proof. But I think they are good enough for at least large scale surveys. And then the other question is, on which projects do you run this whole thing? Do you extract the metrics from? And I decided to use the GNOME and KD projects, basically all the thousands of sub-projects that make up these top-level projects. And the reason is that they contain a wide variety of sub-projects that range from, for example, end-user applications to core infrastructure libraries, to graphics libraries, command line tools, so a bit of everything, basically. And yeah, I run some tools I developed to extract the metrics. And who wants to hazard a guess what this was going to look like? OK, no brave man here. But this is what I got. So basically, notice the very tall line at the zero point. That means that about 50% of the projects don't have any testing at all. And that's kind of sad, right? And then the other line at us, 80%, tells us that in 80% of the projects, at most one in 10 commits affect the test. So think about this for a minute. For every nine changes that you make to the code base, bug fixes or features, you only have one commit that touches the test. And that's a very big red flag. I mean, it's not the end of the world, perhaps, but something's very fishy here. And I got similar results for the test code size ratio for this project. So one thing I also did was I got ratios for other projects that I knew that were better tested just to have something to compare to. And you can see that the metrics are much better here. And this is a good sign that our metrics are actually working correctly. So OK, so we have some indication that not everything is correct in the free software world concerning automated testing. But why is this so? That's the burning question, right? And there are various reasons. There are some reasons that are not particular to free and open source software. For example, many people feel that testing code is not worth their time. Or perhaps it's too expensive, especially in the beginning. And that doesn't really have to do with free software or proprietary software or any other development model. But I believe there are a few reasons that are particular to free open source software that are worth exploring. And in order to do that, I want to go back to the past, back to 1968, when this conference happened. This is the first software engineering conference ever made. And it was organized by NATO. So there are many of the big minds of the era gather and discuss all the things that they thought need improvement in the field. And they came up sometimes with solutions. But often with very witty and insightful codes about the state of things. And if you haven't read the proceedings, I highly recommend that you do. So here's one interesting code from the proceedings there. This is from Alec Galini. He's often credited as being the first creator of the first compiler. And he said, software manufacturers should desist from using customers are their means of testing systems. And in the 50 years since this was written, I'm not sure that we have learned the lesson properly. And sadly, I think that for free software, it's a bit worse than the general industry standards. So why is that? You see, in free software, having a bug in the code is often not considered to be such a big deal, right? The software is provided as is and without any warranty. And to be honest, that's completely fair. Myself, as a free software developer, I wouldn't have it any other way. But at the same time, that means that there's this conception that fixing bugs, that bugs are cheap and fix them is also cheap. And in reality, that's a somewhat pragmatic attitude to have because why spend resources, precious resources, trying hard to prevent bugs when fixing bugs is actually quite cheap from the developer perspective. But here's the caveat. This is a very developer-centric idea. And from the user perspective, things may not be so simple. For example, if as a user you have lost data, if as a user you have had your system compromised because of a security bug, then you certainly don't feel that bugs are cheap. And yeah, at this point you may say, okay, so are you saying that there's no, that free software sucks in terms of quality? No, because there's another thing, pulling the rope in the other direction and that's professional pride. Free and open source software has a lot of that. And the developer is in the spotlight for the good and bad things, right? We have good blame, we have good praise. Everyone knows what we're doing. So it's just that many incentives in free software seem to point to a more reactive rather than proactive approach, right? So, and this is something that we're going to see also in the next topic. And moving to the next topic, I want to go back to the future, to 1999, when this book was published. So this is the Cathedral and the Bazaar. This book explores two different ways of developing free software, the Cathedral model and the Bazaar model, and this book in particular supports the Bazaar model. And it contains a number of quotes or lessons, as they're mentioned, about how free software works or should work. And here are two very interesting ones in the book. Every good work of software starts by scratching a developer's personal itch. And the other, release early, release often and listen to your customers. So, for the first one, you may, first of all, you may have doubts about the absolute terms in which it's phrased. But one thing that is true is that many software projects in the free software world start by scratching a developer itch. And often they start small and without any plans for significant growth or adoption. And at this point, the incentives to have an automated test suite to spend time on this are limited. At the same time, even in cases where projects start with more lofty adoption goals, then, you know, this is, they may follow to a great extent, this mentality, release early and release often. And this mentality has great benefits, but if followed to an extreme and also if followed very early in the project development, leads to projects placing the focus too much on features, too much on becoming as relevant to the public as possible as soon as possible. And often the incentives again here, from that perspective, spending limited time and resources on writing tests may seem like a bad use of time. But regardless of how a project starts, typically if it starts to grow large, then bugs start creeping up. And at this point, developers say, okay, so perhaps now it's time to have an automated test suite. But, you know, it's too late by then in most cases that the test, the code has become test and friendly. It's difficult to add tests at this point, typically. And most projects just don't do that. So again, we see here, this idea of tests as a forethought compared to tests as an afterthought. So like a reactive versus proactive approach. So, now that book contained another very interesting code which is probably the most well known in the free software world, given enough eyeballs, all bugs are shallow. And this is called Linus's law in honor of Linus. Linus didn't say that, it's just in the book there. So, and it refers to code reviews. So, let's see, code reviews, free software is very privileged because code reviews are in the heart of their development model. And that is because only limited people have access to the code base. So, every change, every merge request, pull request that needs to come in needs to be reviewed at some point, right? Unfortunately, some projects take again this to an extreme and the trust on code reviews is so great to that other practices include automated testing are forsaken. So, don't get me wrong. Code reviews are one of the best ways to maintain quality in the code. They help maintain a design, a sane design. They ensure that all the changes align well with the architecture of the project and the overall goals of the project. And they also help catch bugs, but only some of them, some of the time. And the problem here is with not the code review idea itself, the idea of code reviewing, but the fact that we are doing the code review humans. And humans, we have inherent limitations, right? Our brains are limited. We are great at creative thought, but we're also great at overlooking details. And also when there are gaps in our understanding, we are very happy to fill them with our own unicorns and rainbow-based reality of things. In addition, as the code base grows, all the interactions and the possible states grow sometime exponentially. And it's very difficult for us to keep all that state in our mind. And so follow the code paths and the implication that one chains may have. And so in theory, this problem of human limitation is offset by the open nature of the code and the fact that we have enough eyeballs. But what number is enough? Even the biggest free software project, Lynx Kernel, for example, only have a limited amount of reviewer checking the changes. So it's good. I mean, more is better than one, right? But at the same time, that enough is very indeterminate. So sometimes it works, sometimes it doesn't. And that means in the end, for all the reason I mentioned, that code reviews by themselves, as excellent as they are, cannot stand as the only tool. And we need to be careful not to place all our trust in code reviews and forsake other tools that we have. So for the next topic, I'd like to explore more fundamental question. And that is the question of learning. How do people learn in general? They basically, and in particular, how they learn that some software practices are beneficial to them so they can follow them. So they learn by example. They learn by mimicking what the best in the field are doing. And they learn about this from books. They learn about this from videos. They learn about this. If they are lucky enough, they have a mentor. And again, free software is very privileged here because we have this huge library of Alexandria of code, right, that we can go into this library and explore. And this is indeed what many people do. They explore this library and try to imitate what their role models are doing. And when they go into this library, what do they see? So based on the graphs I showed you earlier, then they see a big pile of nothing in terms of automated testing. And on the other hand, the other extreme, they have this like monstrous code basis with huge test suites and arcane test suites. So this is an example of a test that belongs in this category. On purpose, I made the small font very small because it didn't want to scar you for life. So please don't read it in detail. And the whole thing, this creates a negative network effect. So why should you bother doing automated testing when your role model project doesn't or when you see something like that? And I mean, this is ugly. So you don't want to do something as ugly as that. And we need to be very, very careful with the examples and the culture we create and promote because it's really a slippery slope and things can get really out of hand. I'm sure the 80s started with all the best intentions and then we got this, right? So this is not something that we want to happen to free software and automated testing. So it's very important that we should create more and better examples. And as the saying goes, be aware of advice but follow good examples. So I'm now going to break that rule a bit and because I'll try to give you some advice and starting with this, embrace automated testing from day zero. So as we talked about before and perhaps many of you have experienced, the larger project gets without testing, the more difficult it is to add, more difficult is to add testing after the fact. And this is particularly important for free software because as I mentioned before, the incentives are there for being reactive rather than proactive. So having tests from day zero is an important step. But it's not enough to just start with tests. We need to maintain a good testing culture for the lifetime of the project. And this is where the next piece of advice comes in. So we need to set the bar high first for ourselves as for example, maintainers of the projects and also set the bar high for contributors also. And so yeah, basically we need to lead by example. And it's often the case that contributors may need some help to get over that bar. And we can do that a bit proactively. For example, we can have a very nice and clean test suite, easy to understand. So people can just jump in. We can use frameworks that testing frameworks are well known so that the barrier for entry is low for people. And sometimes someone will come to the project, will have an interesting bug fix, but they're not interested in writing a test. They just throw it at you. And it's up to us to try to encourage them, but if that doesn't work, we'll write the test ourselves and this is fine. And finally, the last point here, be humble. And you may wonder, what does humility have to do with automated testing? And I say basically everything because automated testing is an acknowledgement of our inherent limitations. Like as I mentioned before, in ourselves and the limitations to invariably create infallible complex systems. It's basically humility in the face of overbearing complexity. And we need to be humble in order to accept all the help we can get from tools like automated testing, like code reviewing, whatever works for people. So an interesting note here, when I started out programming many, many years ago, I had the impression that the more experience I got, the more I got exposed to projects and people and teams, the more confident I would become in just sitting down, getting in the zone and streaming out, super correct code, right? And that didn't happen. Actually, the exact opposite happened. The more experience I got, the more I realized how fragile this whole process is, how intricate the act of writing code is and how I needed to depend on other tools, external tools to help me maintain the quality. So closing, I would like to mention one last thing. Free software is a culture of openness. It's a culture of cooperation. It's a culture of respect. So we want to promote automated testing, but we cannot demand it from others. We can only encourage it, but most importantly, we need to lead by example. We need to be the change that we want to see. So, thank you. Right. Is there any data to factor in this nice old-flight code? I see those 50% of projects with tests that they are hard to maintain because of these other metrics or because of these problems that arise or because of these amount of work requests or something along those lines. Okay, so the question is, we have all this data that tells us that projects don't have tests and does this correlate with projects actually being harder to maintain? That's the question, right? So I don't have exact data for the maintenance side of things. At least hard data for this, but I do have personal experience in this. And yeah, it's about the question, right? We need to have data like we've got data for this. Is the correlation there? But no, I sort of don't have data. Like a number of bug reports or, you know, that'd be one of... So supposing for a moment that all tests are written properly, what would you consider to be a healthy percentage? I mean, of course, if I'm not testing the robot, so like 40% and 60% just in terms of general, generally, what do you consider to be healthy? Right, so there's no good answer here because, yeah, sorry. The question is, what kind of percentages would be normal for healthy projects that have like a good amount of automated testing? And there's no good answer here. For example, look at this graph here. There's a variety of projects spanning, you know, all kinds of things from database to display servers to core libraries, which are considered well-tested. And their ratios differ dramatically, but they are all high. I think we can perhaps set a minimum amount. Like we can say from the graph here that 20% at least should be something that we should aim for. For example, notes the MIR project there. You notice the very high test commit ratio. And that is because I know that it is using test-driven development. So that's an indication of that. And the exact numbers also depend on the commit practices, right? For example, some people, you know, may squash the commits or it may have separate commits that are merged. So there's not really a very good answer there. It's just a feeling of being comprehensive. And by comprehensive, I mean having at least the core functionality covered. Because in my experience, something that actually started this interest for this topic for me was I was getting projects. I updated them. And then something very core broke. And I was asking myself, did no one test this? And of course, the answer was no, right? There was no testing there. The testing the test or the testing framework? Oh, the test. The test themselves. It depends, right? So I normally don't do that, to be honest. Unless I feel that a test may be too complex. For example, I may have, for example, test doubles that try to imitate some complex part of the system. For example, in one project, I had a test double for D-Bus, like for interactions with T-Bus. So I actually wrote tests for that, right? To ensure that my double was working correctly and then the test. So probably only for the more complex parts of the test. Because then, when do you stop, right? Testing the test and the test, you know, you stop nowhere. Sorry? Yeah? Yeah, if that makes sense for that project? Yeah. So yeah, one of the questions, the comment that the Linux kernel has tests their tests. So yeah, that's a great thing to do if it makes sense for your project. So I guess it depends, right? Yeah, sorry, the question is, is there a way out of the misery of not being tested? And so for, there are some, you know, books and articles about dealing with legacy code and that perhaps should help there. Like, you know, you consider this project legacy in some way, right? Not tested. And you go through the process of figuring out what, first of all, what it should be doing, right? Some tests for this. And then you start refactoring it to be even more testable and go through the cycle. But in most cases, I would say no, unfortunately, you know. I don't want to be not optimistic, but I haven't seen that very often. Unless it's a very high profile project, you know, that someone is very interested in. Yeah? Yeah? Yeah, the thing is that for many people, if it is too late, then it's not worth it to go back and add. It's just a big pain if it complicates project, you know. You don't have all the internal interfaces to check things. But of course, it depends on the project. If you can do it, that's great. That's awesome. Thank you.