 Okay. Okay. Does it work? I don't know. Yes. I think so. Okay. So I'm talking about finding bugs in open source system software using a tool that we've developed, which is called Cox and L1. It's okay. It's okay. Everything's okay. So bugs, everyone knows bugs are everywhere. So what we're interested in particular is bugs in the Linux kernel. So Linux is, as everyone knows, quite important software. It's used in all kinds of systems. It's also extremely large. There's over 14,000 C files. And earlier today, Andrew Tannenbaum told us it was 6 million lines of code. But actually, when I tested the most recent release, it was 8 million lines of code. So it's growing at a quite rapid rate. I say it's increased of almost 50% since 2006. It could even be more than that. Linux also has a property as an open source system of having more experienced developers, the kernel maintainers, less experienced developers who just want to contribute one particular fix or feature. And there's also the developers of proprietary device drivers who maybe they're more expert in the properties of the device than they are in the Linux kernel itself. So what the effect of this is, the effect of all of this change and all of these different people contributing, is that bugs keep, bugs are entering the Linux code, they maintain the Linux code, and then maybe a whole bunch of them get fixed. But then they come back. So we'll look at one which has, it says, the comment says, strikes again. So you can see that it's something that has come up from time to time. And here the bug is, this is something that's allowed in C. It's not allowed in C++, or it's warning about it in C++. But what we have is, we have a mixture of a binary operation here. So this is a Boolean operation. So this is either a negation. So the result of it will either be true or false. That is either one or zero. And here we have a bit and operation. And if you bit and maybe either one or zero with, for example, an even number, then the result is always zero and the test is always going to be false. So that's probably not what was intended. What was intended is that we take this expression, we see what its bits are, we see how they correspond to the bits of the constant. And then we take either true or false of that. So the solution is quite simple. We just take the original expression and we put some parentheses in it. So we can see here how this bug has appeared and disappeared. Actually, it's perhaps not so visible, but there are light blue lines. So here we have a whole big light blue region here, which is places where the bug existed. So what we've done, we've developed a tool, which is another tool in our group which is called Herodotus, which correlates occurrences of bugs in different versions so that you can see this is the same bug even though it has moved into different places due to other things being added and removed in the code. So we see lots of occurrences of the bug here and then at this point they all get fixed. Actually, there's a place line here where there's a version where there's none of these bugs left. But then you can see over here that they creep back into the code. And these black lines means places where the particular file didn't exist. So what has happened is the bug was fixed in all of the existing files, but new code has been added and that code has the bug in it again. So what we need is some kind of way, some kind of automatic way of finding and ideally fixing bugs. So that's part of the goal of our tool, Coxsonl. So it has two features. One of them is we provide a static analysis that matches patterns in C code. And the other feature is that we can also specify transformations. And so if you once you write a pattern that recognizes a bug, then you can express how to fix that bug as well. Even if it's a simple fix just like adding those two parentheses, perhaps you would be happier that a tool should do it and then you could check the result of that tool rather than adding those parentheses yourself and perhaps at making some kind of mistake. So the important point is that our approach is user configurable and it's based on the patch notation. So basically you can write things that look like patches that look like real code. You have some code you would like to find. But the problem with normal patches is that they're very specific to a specific particular place in the code. They refer to maybe particular variables. They have a particular amount of white space in front of them and so on. So the goal of our tool is to allow you to express things at the code level in a patch-like way, but maybe make them a little bit more abstract and so that they can be applied more generally. So here's an example. This is how we can find and fix that bug that I just showed you. Basically our idea is that we have a Boolean negation operator and we have a bit end operation. And basically the argument of the Boolean negation operator can be any arbitrary expression. So what I've said is that E is a variable and it can match any expression. And if you look around a bit in the kernel you find that often this problem occurs when the other argument of the bit end is not a arbitrary expression but it's actually a constant. And so we're going to specify that because actually if you had two expressions you might want to actually be doing an end of two Boolean operations. So basically we say we have this pattern and we would like to replace it by basically the same code but with parentheses before and afterwards. So basically what our tool does is it takes this pattern, it searches through your entire code base, it can be Linux, it can be any other C code, and it finds every occurrence of this pattern and it modifies it like that. So it doesn't have to, unlike a normal patch it doesn't have to appear on a specific line. This doesn't have to be a variable called E and so on. So we can see an example. So here's a particularly pathological example. In this case the person decided for some reason to put a new line here after his little arrow. And so we have the exclamation point, the Boolean negation operation and the end being on different lines. So if you were just going to do a graph and try to find all the lines that have exclamation point and end then you wouldn't find this particular case. So we have the semantic patch and the E is going to match this expression here and the C is going to match this expression and then it will recreate that term according to this rule. And so we get this out as the answer. Okay, so this is just showing the diagram we had before. So using this we could have if we had perhaps supplied it back here. At this time we could have fixed all of those bugs at once with no particular manual effort at all except that you would like to go and check that the result that you produce is okay. So this number of them declined after a while but still in October 2009 there were still two that exist. And I think there's still two that exist today as well. One of them has just been sitting there. I've reported it but no one has fixed it. So look at another example which is a little bit more complicated. So this is a bug that was found last summer and it was someone showed how you could exploit this bug in order to become root. So it might look quite simple but it has some impact. So here the issue is we have a parameter which is called tune. We have a local variable which is defined to be a dereference of that parameter and then we have a null test on that parameter and then we have the rest of the code. And certainly I mean we have two this is contradictory here. Either the null test is useless because tune can never be null or the null test is useful and this code here is going to crash. So the solution to this problem is to not do the reference here and instead move the initialization to after the null test and then it becomes safe. So if we want to specify finding these kinds of bugs using our notation basically we have the basic idea which is that you have a dereference and then you have a null test. So what I have made is a pattern here we say x is going to be an arbitrary expression and field is going to be an arbitrary field name any kind of identifier. And basically this should come at some point and then somewhere later in the code the null test should appear. So this dot dot dot notation here means some unknown code occurs in between the two and actually there's an execution path that will get you from here to here. So it's not just a syntactic notation and these little stars mean I don't have any particular transformation I want to perform I just want to be informed about the presence of these things. Okay so if we take this and we play it to the entire Linux kernel we get a lot of reports I think there's 300 and some of them but if we start studying them it finds we find out that a lot of them are actually not bugs so here's an example. So in this case we have we have bridge error bus and then we have a null test here on bridge but this is not actually a bug because bridge has been redefined in between them. Okay so one thing we could do is we could just look through those 300 bug reports and filter out the ones that are not okay. Another thing we could do is we could try to refine this specification so that it would find everything except this one and another thing we could do which is quite useful to do in practice is to think about a special case where the specification is more likely to always give you the right answer and then you can work on that and then you can write another specification later and find some other bugs the ones that got left out so that's what we're going to do. So we can think back to the original problem which was that we had a variable declaration and then we had a null test right at the beginning of the function. So this is this rule is trying to capture that. Here I say we have some arbitrary type some arbitrary identifier and it's being initialized to some dereference so this is our initialization then I say in between this initialization and the null test here there's no reference to E and there's no reference to I so I want to be sure that the code is in a good way something that where nothing can go wrong and then everything it will tell me will be a correct change and then I can go off and submit them to Linux. So what we do here in this case we know exactly also how to fix the problem which is an advantage in this case the solution is just to take the initialization and move it down after the null test. So if we apply this more constrained rule we can see again a long time ago there were lots of these sort of bugs. This is a problem that can cause your machine to crash and so it's not surprising that they have gotten fixed slowly over time here but then if you look on the other side if you look at the top of the graph here they have been added over time as well so sometimes they're added because new files are added as an example we saw before and sometimes some of these don't have any white have white space to the left which means that the code was changed in some way to add the code add the bug. Okay so in conclusion we have developed a patch like program matching and transformation language so you can write your specification in something that's very very close to C code there's not a lot of different APIs to learn. We've used this to create over 450 patches which have been submitted into Linux. Some other Linux developers have been using this sometimes not without asking us very many questions so I think it should be very easy to get started with and we've also found we haven't worked very much on other software projects but we have looked in some detail at various things especially OpenSSL, VLC and WINE questions yes so the question is do we support C++ and the answer is unfortunately no so also any other language you'd like to ask about unfortunately we don't support it we just only see yes so you could describe the you could write one rule that describes the conditions in which you call that function and then you can write another rule that describes that writes the that shows the body of that function and then has the error pattern inside of it so you can write multiple rules and those rules will apply at different places in your file so the question is should we integrate this into GCC or make it an option in GCC so that it could be checking your code every time you compile it yes so it would depend to some extent on the reliability of your rule file whether it would be desirable or not so the rules that I actually showed are things that have a very high chance of finding real bugs and fixing them in the proper way but other rules I mean it just does what you tell it to do and so other rules have maybe 50-50 chance and so maybe maybe you don't want to be bothered about with that every single time so we haven't idea would be to instruct the tool this one is a real bug this is not a real bug don't bother me about it anymore um but we haven't worked on that yet I think there are some questions over here um I sorry I have absolutely no idea I don't know sorry what there are a number of rules on our website but you can also write any rules that you want by yourself so it's not a black box sort of tool