 Thanks, Justina. I'm very excited to be here remotely. So this talk is about SMATCH, which is a static analysis tool that I wrote. I think the last, I think you're in a series of static analysis tools. And the last one that I talked there was Julia's talk about Cox Now. So this is just a brief introduction to SMATCH, some things about the Linux kernel, and some ideas for different uses that you could put static analysis to. So anyway, let me start with the slides. So if you could download SMATCH while I'm talking, there's a Git clone URL there. And then you'll need to install some libraries. That is the instruction for Debian. It's going to be similar on Red Hat, but they might have a different name for their SQlite library. But you could start on that while I'm talking, because the first part is not really about code. It's just my biography. So this is me growing up. I'm on the far left in boarding school in Africa. And then after I left Africa, I went to college. This is me in Minnesota. But I always felt like I would come back to Africa after I always wanted to. So after I finished university in Minnesota, that's not actually me. Then that's when I graduated from college. That's the dot-com collapse. And there are no jobs, so I was unemployed for a bit. And I wrote a match. Initially, right after college, I wrote a version of a match, which was terrible. What it did was it took C code, and then it outputted something that I could grab. Because what I realized was that part of the problem in finding bugs is finding the actual code. So if you can just rep for it, then that simplifies finding bugs. And my first check that I wrote was looking for locking bugs. And it said, oh, if we come across that time, the mix was very old. This was, of course, 20 years ago. And there was a big kernel lock. And if you take the big kernel lock, then it incremented the lock count by one. And if you decrement the big kernel lock, then decremented that. And then at the end of the function, it's like, oh, are we in an odd number or an even number? And that's the error decision. But I got a job. I stopped working on Smash. Then that job, I thought I was going to go out of business. So then I rewrote Smash in C, using Sparse as a front end. But then the company survived, made a little bit of profit. Forget about Smash again. In 2009, I came back to Africa. I took a flight from Africa, starting in Egypt, and going down to South Africa, and then back to Zambia. I'm hearing some voices. That's not on my end. Unless it's a question, but I can't hear what the question is. Anyways, so while I was cycling, then that's when I did a lot of the critical work on Smash. The important parts of flow analysis, I figured it out on my bike trip. I had a little laptop. And I reported the bugs to the Linux kernel mailing list, and nobody did anything with it. Not one bug was fixed. So I told my brother about it, and he's like, you should just fix it yourself. If you fix it yourself, somebody will hire you. So after I finished my bike trip, then I did fix a lot of bugs. This is in 2010. If you look at the top committers by patches, then I was number two under Joe. And sure enough, I was hired at Oracle after that, because somebody saw me, Chris Mason, and he's like, let's we're building out the Oracle kernel team that's hired Dan. So then that was, of course, 10 years ago. In 2020, I'm not on the list. And if you look at the list, Amaro is a repeat name. A lot of these names are old names, but Amaro has got 12,000 commits. You have to be in the top three to even be here on this list. So it becomes a lot harder these days to, I guess, to compete with everybody else. The competition is very fierce. So why am I no longer the superstar developer? And number one, laziness. Number two is that I report about over 100 bugs. And instead of fixing them, I just report them. And it's faster for me. And people are much better at fixing bugs these days when you report them. And it doesn't take the original author long to figure out what he intended. For me, I have to look at it and think, what's the right way to fix it? But the original author knows. And we tend to report bugs right away within a week, maybe, or days, even hours. So in 2010, there was a lot of ancient bugs nobody was fixing static analysis bugs. There was one guy who fixed 30 static analysis bugs, I would say. But basically, it didn't happen. Then there's a lot of increased competition for easy bugs. These days, static analysis is very popular. I started work on Snatch in 2002. And I stopped working until 2008. And I imagined that, oh, somebody's going to write Snatch. Somebody's going to write a very good static analysis tool. And I won't be able to compete with them. And I know what I want it to be, but somebody's going to do it and take away that opportunity. But in 20 years, static analysis has not really developed that much. Snatch has developed a little bit. But what has happened is that the Linux kernel community does a lot of static analysis. So in 2010, I reported about 33 bugs. Stephen Rothwell, he does build errors. Randy is doing build errors. But then in 2020, everything is static analysis. The Hulk robot is reporting 800 bugs. K-Build is here three times. So they're reporting, again, 800 bugs. I'm reporting 150. Stephen and Randy are doing build errors. That's a different thing. Sysbot, those are runtime bugs. But again, it's automated system. So in the Linux kernel, we have automated bug reports. In 2010, people were skeptical of static analysis. And they complained a lot about the false positives. And yeah, just generally, there is a lot of skepticism about it. But these days, a lot of thousands of patches come from static analysis. The other two names are doing runtime and static analysis for reporting. So there's a lot of static analysis. Of course, this is just the top 10. Other people are looking at it and fixing here and there bugs. So everybody's doing static analysis these days. We have three main static analysis tools that we use in the kernel, at least open source ones. There's Sparse, which is very high quality code. Lena's Torbos wrote it. Luke is the maintainer of it now. But Lena's is still very involved. It's very fast. It warns about endian bugs and user annotations when you mix user space pointers with kernel space pointers. And it warns about type issues. Smatch uses Sparse as a C front end. Coxnell, Julia Wall presented it about it last week. It's easy to write checks in Coxnell. And it works on the code before it's been pre-processed. So you can write checks for macros. You can write it checks for macros in Smatch as well, but it's a lot more complicated. And it generates stash automatically. So I mean, that's a very valuable thing. Like some months ago, Case Cook went through the kernel and changed a lot of allocations to use like KC Alec and K Malica Ray and the struct size macro. And what those changes do is they prevent integer overflows. So they're just a much safer API. If you were writing it in Smatch and you reported to people, oh, this is a much safer API, they don't want this report. So we get annoyed with you. So no, it might be safer, but in this case, we know that it can't overflow. But if you send them a patch, which changes everything, then they just apply it. So in that way, Coxnell makes the code a lot safer, which is just not possible with another tool. So of course, my tool is Smatch. And it has the best flow analysis, certainly in open source. But I mean, probably it's the best that is there. And I'm going to explain a little bit more what that means. But Smatch tracks all the values of all the variables. Of course, that's not always possible. That's what we're trying for. Then it tracks relationships between variables. And it also tracks, if we know we try to, if we know A is less than B, then keep a record of that. And so that we can answer questions later on. It does cross-function analysis. How that works is you do a build of the kernel, which spits out a lot of SQL. And then it builds a database. And you can look up in the database how the function is called. And so then you run Smatch again. And it uses all the information about how a function is called. And then if you build up, if you run it again with the more info output, it will rebuild the database. And every time you rebuild the database, then the call tree extends. So you know about this is called from this function, which is called from this function. It is called from this function. But unfortunately, Smatch has some downsides. The documentation is not very good. And it only works in the Linux kernel, which is not true. There's, Soyin is using Smatch. They're probably the other big user of it, besides the Linux kernel. But in theory, it should work on all C code. But in real life, I've only tested on the kernel. And my job is a kernel developer. So it's hard for me to invest a lot of time into making it work on things that aren't the Linux kernel. But it could work on anything. But right now, it's only been tested on the Linux kernel. And another thing I would say there is I'm not a user space programmer, really. And I don't know anything about user space networking. I mean, I can Google that stuff. But I'm not an experienced networking developer in terms of user space. And there's a lot of things you would want to check there. But so there's different levels of working, I guess. And there's an element where I've tested on the Linux kernel and then I've written stuff to get around bugs that I had on the Linux kernel. And so it works really well on the Linux kernel and perhaps less well on other software. So flow analysis is masked to understand code. So when you're looking at code, then you're explaining it to your friend. Then you can say, oh, no, that's not possible. That K3, it's not a double free on that path. Somebody sent me a patch the other day. And I'm like, you think you're fixing a bug, but it's absolutely not fixing a bug. But that code never returns that value. So and the way I figured out is a kind of logic. And computers can do that same logic. That logic is flow analysis. That's what that word means. And we could. So the flow analysis is pretty simple in terms of Smash, which is that we track where variables come from and we tie them together in lists. And I think you can answer any question that way. The problem is sometimes there's too many variables and they can be tied together in too many ways. So then you run out of memory. For example, let's say a function is called over 200 times. In Smash, I think there's some kind of general thing that we can say, but basically you can tie the variables together in a meaningful sense. So if you're trying to parse 200 different calls, just takes too long. So instead of that, I think what happens is we try to pull out anything common amongst all those 200 calls and just say it's called one time like this. But of course those are gonna be vague details about how it's called. I'm saying that now, trying to remember the code. That's how I wish the code would work. But I'm not, don't get annoyed if that's not actually how the code works. So anyways, flow analysis lets you answer questions about the code. Is the pointer null, are we holding the lock at this point? Is this pointer been freed? And sometimes those questions are complicated to answer, but Smash does the work for you in the background. Some things, flow analysis can answer almost any question, but some questions are too difficult. Sometimes people debate, is this a bug or is it a feature? Sometimes you can't tell from the code if it's working correctly, you need the hardware. And again, as I mentioned before, sometimes there's too many variables and you can't track everything, but it's surprising how much stuff you can keep track of. In the Linux kernel, when I'm running Smash, there's like a timeout to say, oh, we've been trying to parse this function for over five minutes now, let's just give up. And then we skip to the next function. I don't know if it's actually five minutes, I might have changed that, but if it's five minutes or if it's one minute, at some point you just give up, you're like, this code is too complicated, but there's 200 functions like that and there's out of thousands and thousands of functions in the Linux kernel. So most things you can parse to some extent. So when you're reading the code, you can infer things from looking at it. So if you have if statement saying if X is less than 42, then infer that X is zero to 41, or if X is a sign value, it might be negative. If there's a null check, that means probably the pointer can be null. So one warning that a snatch will print out is if you de-reference pointer and then check it for null, and that means can it be null or can it not be null? The author thinks it can be null, but you already de-referenced it. So that means you think, somebody else thinks it can't be null and it's pretty important to get that correct because otherwise the kernel is gonna crash. And a lot of times the answer is it can't be null. So it's not really a bug, but we still think that's worth fixing just so the code makes sense. If you have a PR error, that means you have an error. And probably it means you should return an error code. A lot of people think error codes are not that important, but they are. For example, if the user, if the user, if we're checking user data, then, and we don't return an error code, then it could be a security bug. Normally, if you return an error, a lot of times if you return an error code, if you don't return an error code, you return success, but you should have returned an error code. A lot of times that all results in the use after free were some kind of a crash. So it seems like a trivial thing, but actually it's pretty important. This code here is a famous bug from 2016, four years ago, five years ago. Something like that. It's the heartbeat, heart bleed bug. And what it is is there's a copy and paste error that they put go to fail twice. So when you're looking at this code from a static analysis point of view, you should know that, oh, that's, there's several warnings that you look at. So if you wrote that code in the Linux kernel, you would get three error messages. So the next line after that extra go to fails is unreachable code, it's dead code. Then there's a, it looks like it's indented. So it looks like it should be part of the if statement, but there's no curly braces. So that's another warning message. And then the third warning message is the go to fail, the go to fail should be indented pulled back. So it's like the indenting is not right. That's the third warning message. Another warning message that might be useful to print is to say that you've got two lines which are exactly the same one after the other. And I've looked at that to see if that's a valid warning that we could do. And I wasn't able to make it work. Seems like a good idea, but sometimes there's opium-pasive things like that which are not bugs. And some of that code was very interesting, but didn't work. So sometimes you have ideas for things that static analysis could find and it doesn't work. So now I'm gonna talk about how to write a SMATCH check. So SMATCH is based on sparse and there's some sparse data types that are important. There's a symbol. So each variable is a symbol and then a symbol is inside of an expression. So if you have A equals 100, that's a symbol expression, then a assignment expression and a value expression. And then if you add a semicolon on the end of that, then it's a statement. So you've got symbols which are variables and expressions and statements. Those are the important data types in sparse. Then in SMATCH, SMATCH is a kind of state engine, which means you set a state and then it transitions to a different state. So the most important data type is a state which is normally just a name like state freed. And then you normally have a variable connected to a state to say, oh, the variable is freed and that's an SM state. And you can say set state expression freed, which will set the variable to freed and then you can get state to expression. And then when you have a group of states, it's all the states that are present are in a state tree. That's a street struct. So, and those, the states which are in a street are tied together, they're preserved history and you can do a lot of stuff with them to manipulate them, just say, oh, I assume that X equals a hundred, then what does this tree look like for that? Which states are true at that point. And then in SMATCH, you have numbers, which are S-val and they have a data type and a value and you've got a range list, which is a range of numbers. And you can, with the numbers, there's various levels of certainty. So, you can say get a like range list and they might fail to say that. I don't know which numbers this can be. Get absolute range list doesn't fail. It'll tell you, it'll give you that it can be zero to UN max, depending on the time. Get user range list will tell you if the user can set that data and what the user can set it to. So, there's different levels of certainty and there's also a hard max and a fuzzy max, different levels of math. So, I'm gonna just walk you through a sample check. So, you first declare, this is a check to look for freed variables. Don't use after free and double free. I guess use after free. So, you first declare the state, which is freed. All SMATCH checks have access to the global states of undefined and merged. So, if you don't know what the state is and it's undefined. And if you combine a undefined with a freed then it becomes a merged state. Then you have to hook everything into the corp. So, we'll say add a function hook for a K free that'll call the match free function. We'll say, if the state gets modified then we'll set it to undefined, the variable gets modified. And then if the variable is de-referenced then we'll check it. So, the match free function, you take the first argument of K free and you set the state to freed. And the match references function and you get the SM state, which is different from the state. Because in this example, let's say the SM state is merged. So, you get this SM state and it's merged but one of the possible values of the merge SM state is freed. So, then you'll print a warning to use after free. So, it's pretty simple in some ways. In the background, all the states are controlled and they do a lot of flow analysis magically for you in the background to track if it's freed or not. So, the checks themselves are quite simple though. So, not all is match checks used that much flow analysis and they're all quite different. That's just an example. Of course, a lot of people upstream use SMAT and I'm also checking the code every day on Linux next. So, I mean, you have the Huawei people with the whole group of reported 800 bugs every year. So, using SMAT is something that everybody's doing. But what we need is people to take SMAT and be something new, different. And so, I've got a lot of ideas at the end of this presentation and the simplest one and maybe the best one is to just take SMAT and use it on a different C project, your favorite one, let's say. Or another idea would be to run it on all of the W and C code there is a Debian static analysis project and I don't know how far along they are. I don't know what they're doing but that would be cool if somebody did that. Or it would be cool if somebody integrated SMAT with Jenkins or one of the other continuous integration tools. In the Linux kernel, the cable bot, the zero-day bot uses SMAT but that's not a, they don't release the code for that. They just send the warnings out. I'm assuming the code for that is quite hacky as they just wanted to get the bugs out, the bug reports out. So, but it's a fantastically valuable service and that's, I mean, they're the number one bug reporter right now. But if somebody else were to integrate SMAT with Jenkins, everybody would love that. If you, it would be fun if somebody took SMAT and ran it on GitHub or what another fun project would be to take SMAT and just strip out everything all across function analysis, don't build the database, remove all the other checks just only the off-by-ones because that's a pretty good check. And does somebody ask your question? Another idea would be to use SMAT to create a website. The Linux cross function elixir website, I love it. But it would be cool if somebody integrated that with the SMAT cross function database. So you could see where a struct member is set or if a function is called from a function pointer, SMAT is aware of that. Or you could see if anybody passed a null pointer to a function. All those things are in the cross function database and it's hard to access that. You have to rebuild your cross function database. Unfortunately, a lot of that information is quite slow right now, but it could be fixed, it's simple enough to fix. This one is quite, is very, very difficult which would be to integrate SMAT with a code editor. So you could parse a function up to where you, a file up to where you've done. So with C, you could read part of the file and then stop. You don't have to have a fully finished file to run SMAT on it. Another idea is to integrate SMAT with Syscaller. There was some students who did something like this. Well, they did this in a way. What they did was they looked at system calls which modified different struct members. So they took a system call, they looked at SMAT to say, which struct members does this modify? Then if they had two system calls which modify the same struct member, then they tried to race them against each other. And they were able to find like 10 race conditions with that process. And then they wrote their thesis and disappeared. But the code is still there in SMAT. And I think it's a useful thing. Potentially there's other useful things like that. Of course, this one is, number four is a very difficult which is to just rewrite SMAT but with C-lang front end. What that would give you, would it be you could test C++ code? I think, I don't know. When I was writing SMAT, C-lang was not there and sparse was. And I love sparse and I'm probably not going to make that work. I'm not a C++ programmer but somebody else could copy SMAT pretty easily, I think. Copy the data types, copy the street manipulation code. And it'd be, I think a very useful tool. Unfortunately, I think it was hard for me to sell anybody on static analysis because when I was developing SMAT, nobody really wanted to fund that. They wanted to fund kernel developers. So that's what my job is. But other people might have different luck. These are much simpler and much better ideas probably. You could just take a check from Coxnell and rewrite it in SMAT. I, of course, Julie and I, we borrow ideas from each other at times. It's not applied to do it too quickly. You should wait for them to get all their bug fixes in. But when you rewrite it, then just because the code works differently and you're gonna try a different trick and you're gonna end up finding different bugs. And also, if two tools generate a warning, I mean, that's fantastic. People look at it like maybe that's annoying, it's not annoying, it's fantastic. And that means that two people will email the original author about it and that, good. If you're on any development lists and whenever you see review comments, think about how that could be translated into a SMAT check. I have avoided putting style checks into SMAT. I think I will add a pedantic option, which is only for reviewing new code. So for example, Julia recently went through and changed all the statements that use a comma instead of a semicolon. And I had had that check for a long time, but I hadn't found any real bugs with it. So I hadn't committed that check, but it's a good idea to do it for the pedantic option. One thing I do is I look through the last RC releases and I just think about if any of them can be translated into a check. A lot of them are hardware specific or it's like, is this a bug? Is this a feature? It's not clear. Some of them are build errors, a lot of build errors, but occasionally I'll find a simple bug there and those should be static analysis. They should be figured out immediately. And I also review CDEs to see how good we're doing at fixing CDEs, preventing them in the future. This one may be not a great idea, but I used to have a scheduling in atomic warning. So if you're holding a spin lock, you can call K Malik with DFB kernel or you can't call schedule. Or you can't, I don't think you can't take a mutex here, I don't think. I used to have that check and I think it's there in box now, but I don't think it's great. And I deleted my check somewhere along the line and I tried to reimplement it in a cross function way and I just didn't figure it out. And that might be a good idea. This is a good idea when you, if you take user data and then you save it as an enum. So I don't know if there's a way in sparse to figure out the highest value an enum can be, but it should be capped at something probably. And so SMATCH has two ways of tracking capped data. There's to say that it's capped to unknown data and to say that it's capped to like 43, but enum should be capped at a literal like 43, I think. So if somebody could do that, they might find a lot of security vulnerabilities or they might find nothing, I don't know. If you take a memory, which is allocated to with KMALIC and pass it to DevMK free, somebody did that today. It's an example from today. If you're checking is error P or is error value P, is error values, somebody was doing that today. But you already know what the results, maybe that's a good warning. Or if you have an if condition and you know the results are ready. And sometimes that will just be the last, there's a lot of series of if else statements, if this, then else if do this, else if. And then the last one will be guaranteed to be true. But if there's a dead code, then maybe that's useful. I have played with this one and I've never actually made it work. This is Julia the wall's comma one. So I had written this comma in C, you can replace almost every semicolon with a comma and it won't break. And so what happens is people cut and paste and they from the initializer and it comes code and then they forget to switch the semicolon, the comment is semicolon. I had never found any bugs, but Julia just swapped them all and she did find bugs where it was supposed to be not part of if statement because it ends in a comma, it is part of your statement. Some people do that and they'll put the second frog on the same line, which is terrible, terrible style, terrible everything. But if it's indented back, then that's probably a bug. So Dan, if I can ask a question, what would, based on your experience, I was just doing a kernel Git log and gripping for a SMATCH. I know it's not, it doesn't always, we have about 571 bugs fixed that are found by SMATCH. What kind of bugs would you, your top writer bugs that SMATCH would find in terms of, I am seeing some de-referencing before the null check kind of pointer type errors. Is that something SMATCH would be useful for in your estimation? Of course, we have that check. Say that you're de-referencing before the check. You're asking which is the most common bug? Right, right, which is the most common bug, I have a two-part question, which is the most common bug the SMATCH has found in your experience. And then also, what's the best time you would say to run SMATCH on, if somebody is writing a new driver or a new substantial feature, when would be a good time to run it? Okay, the most common bug that I see these days is uninitialized variables. Lena's got annoyed with the uninitialized variable check because DCC has had a series of bugs. And so there's a lot of things where I was checking this version of DCC had the warning and this version did not have the warning. And so he just got annoyed like four months ago and disabled all of that. So you have to say W equals one or yeah, W equals one, he does warn it. And not many people do that because it's just overwhelming the number of warnings to get. And so I am seeing tons and tons of uninitialized variable warnings. And I wish I worked. The uninitialized variable warnings are hard to determine if they're false positives or not because if you have a loop which has a, for I is less than a limit and the limit is unknown, SMATCH will say, well, what about if the limit is zero? And quite often the author will know, oh, in this code, we always have this list is never empty or, and it might have continued statements in it, right? You go through a list and it's got a continued statement and me as a stranger coming in, I'm like, I don't have the list can be empty. I don't know if they continue statements to get hit them all the time. So I sometimes don't report those. But then there's a lot of straight up bugs where you do reference a pointer before it initialized. Yeah, so that's my number one source of bugs right now. Thank you. We do have a hand up for the question. Go ahead and unmute and then ask the question or just type it in the chat. Is there a question? The ICA hand up, so I'm looking to see if they can answer. I can answer your other question. You were, I run SMATCH after every patch I send. Oh, okay. So pretty much every patch that is, you recommend running? Yeah. Most of the maintainers rely on me to some extent, right? They're like, ah, or a K-Bail, they know that somebody's gonna warn them about static analysis. There's a bunch of people who will warn you if you have a static analysis thing. So a lot of maintainers don't run SMATCH. Some of them do, but as somebody sending code, I run it on every patch, yeah. That's good to know. Yeah, I run them, but I haven't, I have to admit I don't constantly run it, but when SMATCH reports come up, which happened, bots run them and send them, I go, oh, I wish I had. So that's a good tip to remember to run SMATCH. Yeah, of course I have a QC script that I run on every patch. Yeah. Okay. Thank you. Let me just finish this slide. I think it's the last one. It is the last one. So today, there was somebody with SCANF and they wrote this code here. And so I generated three checks based on this code, which is, SCANF doesn't return error codes, it returns positive values. And so you shouldn't return the return value from SCANF. And then the, I wrote a check, I don't know if it'll work to say, oh, if SCANF didn't return what you expected it, probably we should return an error. And then the final one is the buffer in this code was twice as big as the first part. So it can actually lead to a buffer overflow. It turns out if you just grab for SCANF, there's not that common to copy a string. But I've tried to write this check before and got this carriage, but today what I did is I just said, if it starts with a percent S, the format string, then check buff and the first part and warn if buff is larger, which would have caught, it's a very simple rule and it will miss a lot of bugs or it'll miss some bugs, but it would have caught today's bug. And so it doesn't have to be complicated. It doesn't have to be smart. Just catching, you know, it's useful even if it's not a hundred percent. So yeah, that's my advice is don't try to be fancy. Just catch the bug that somebody wrote, probably somebody else will write it soon. That's the last slide. So if, let me, let me exit out of here. Am I on mute? We can hear you, Dan. It's fine. So just a reminder to everyone, if you do have a question for Dan, feel free to raise your hand and then we can unmute you and you are welcome to ask your question live. If you prefer to type your question, you can see a Q and A box at the bottom of your screen and you can type your question and we'll be happy to answer it live as well. I have another question, Dan. In the last couple of days, I was debugging a problem that is a routine that should be called with a RCU read lock held. And there are places it wasn't and that was the bug I sent up a patch. Is it, can SMACH or Cotsnell be used to detect such bugs? Yes, absolutely. I recent, no, I haven't pushed this code yet. The question is, the question is how do I know which functions need to be called, need, assume RCU is called, is held. And so SMACH is tracking RCU. It's tracking that we're holding a RCU lock. But I don't use that information at this time in the published code. So I guess that's my question to you is how do I find out if I need to be holding the RCU lock? Right, I am coming from a different angle. I'm actually looking, I have had to go through code and then figure out, okay, what are all the places? What are the routines? The only giveaway for me is a comment block at the top saying it should be held. And then I went and scanned the code manually for all the instances, it's not being held. So I don't know the easy answer to it. There are places you have, you can run with locked up on in some cases, locked up warnings. But that would be during a debug session that you would enable locked up, locked up warnings and then find places. So I'm wondering, is that a static analysis type of application or is it not possible? No, let me say that I'm gonna push this code soon. And then if you have this match database, right? Then you can just run FMDB in the function name and I'll tell you if the RCU is held, RCU, read lock. Okay, great, I think so. Okay, I mean, if you have a version, if you can give that to me, I can experiment with it, definitely. Okay, sure, I'll probably just push it, why not? That code seems like it's basically working. Looks like we have a couple of questions. One is on the Q&A box from Norbert. How do I generate a SMATCH report for a patch? Run SMATCH, store the output, apply the change, run SMATCH again and perform a diff. No, I mean, you just run SMATCH in the final, after you finish your patch, then you run SMATCH in the file with the K checker script. So in the SMATCH scripts directory, there's a K checker script and generally, if you give it a .sp file on the kernel, it will check that. Okay, and Lukas, you have a question. Do you, would you like to ask the question? Sure, can you hear me? Oh, yes. Okay, great, then I have a question on, similar to what Shua I think pointed out, is there a possibility of adding more annotations that your analysis can get better? So let's take a couple of examples. So you check for locking and unlocking if they match each other, right, with your analysis. But of course, when I look at the code, I can see that immediately before or after a lock definition, there's usually a global variable that is protected by that lock. Now your analysis does not take into account, is that data accessed without having the lock taken? Right, so all these bugs, you're kind of blind for. Sure. Could I now add an annotation saying, well, this has to be guarded by, this access to that global variable has to be guarded by this lock and you could kind of extend your check. How difficult would that be to implement and to kind of add annotations of that kind? Of course, you can know until you try. I think that's a useful annotation. The Linux kernel has a lot of annotations to do with locking and in my experience, I didn't need them. But what you're saying is to say that this variable is protected by this lock, that would be quite useful. Part of the issue there is that like in probe, then you don't need to take a lock because it's not exported or whatever. So you're gonna get false positives on that probably, but I feel like that's a useful annotation. In general, I haven't found many annotations that useful. Most things you can figure out just from looking at the code but that's one thing that I can figure out. I don't think humans can figure out that generally unless you're a lot smarter than me. So computers can't figure that out either to say, which we have a common check patch or complain if a lock doesn't say what it's protecting. The other annotation that would be quite useful maybe is to say what returns are possible. That's one thing. In Smash, I hard code a bunch of those. I've got a special file which is called return fixes and I hard code some returns in there to say this always returns negative values or whatever in success. But other than that, I don't think of many annotations that I need, but yeah, that sounds like a useful one and I think it'd be easy to use it if it existed but how to make it is not necessarily very clear. I don't know how to actually implement that. Yeah, and maybe so something that I always came across was that I know there's a, when I look at the code I know there's a relationship between the return code of the function and a certain property on, let's say input arguments, right? If a function returns zero, I know a certain value will be initialized. And does your program automatically guess such relations? It saves, you try to save everything. Yeah, try to save everything. So it saves that to some extent because sometimes if you get over like 3000 variables, change it and be like, okay, we're just gonna say it's called and forget about it. But for most functions that works, yeah. Okay, and would it help you if the programmers would provide kind of empty kind of templates of what kind of relations they're looking for that you would store those and get your analysis more precise? Yeah, I'm gonna save everything. Speaking of annotations, I mean if we had the locking annotation, I don't mind if we had to run a Perl script and get those. If it was like in the comments or whatever, and you could run a Perl script that's connections. But yeah, anyways, I think that might work. Okay, yeah, thanks. Hi, this is Norbert. If I run as much multiple times like it's, I have met what you said earlier, I wanna generate a report and then I also want the provocation state to be as precise as possible. I would rerun as much multiple times, right? No, I wouldn't bother with that. But what I would do is what I'm doing is every day I download Linux next and rebuild the database. So after a week, it's filled up and takes a long time to build the database in the kernel. I've got a quad core with a lot of memory. It's not, you know, whatever, it's not hyper-threading. So it's like ATV, CPUs I guess. And it takes me, I mean most of the day to build Linux next. But my other system's bigger and it's supposed to, it died. I'm waiting for part. So, but anyways, I don't care. As long as it finishes, you know, within a day that works for me for Linux next. Yeah, so that's what I do. I rebuild my data every day and then after a week I've got a full database, I guess. But it's not like I would wait for, you know, rebuild it, whatever, seven times or seven days, you know, about this kind of one. Sure. Okay. So let's assume I have a box that has 72 cores, so it's much faster, right? So I could actually spend that time to do that precision thing. Is, when I also run the build concurrently, so I like compile elements in parallel, is the outcome of as much deterministic? Or do you expect that if I like, on two different days in the same Linux, what I would get? Not, if not deterministic. Yeah. There's, of course, yeah. It depends on the data, what data you have in the database, and then, you know, some of those things are time based to say that, oh, we've been parsing this function for whatever, I don't remember exactly the time. It might be a minute or five minutes. Now I'm done with the function. So, yeah, it's not deterministic. Yeah. And even though you have 72 cores, the database is like 24 gigs, and it starts out as two 12 gig files, and then you read them one at a time, and database 24 gigs. That's my database. I don't actually know what it would be in the release version, but it's probably the same. And so it takes forever. Yeah. Okay. Can you envision? Just do it. Once we'll give you a good idea, you know, you can look up all that information you want about it, how functions are called. Yeah. It's very frustrating that it's not deterministic, but it absolutely is not. And then, of course, I'm changing mine every day, so I cannot reproduce bugs ever. I see. Thanks. I think we're almost out of time. Yeah. I think that's a final question, which is love everything. We don't have. We don't. Yeah, we don't have any questions. I'm sorry. I don't see any questions actually in the chat or I'm just scanning one more time. Let any last minute questions. So. Let me just say that I wish people would look at. The review comments and say always ask themselves, can we just prevent this in the future? And I'm going to add that, a dandy option so that people can add like our questions and. So that. Like something with a very subtle, but there's a lot of things you could be checking that we're not. So just today I found so many things. Keep your eyes open. New chances to write checks. All right. Thank you so much. Thank you so much. Thank you so much. Thank you for joining us today. Thank you Dan for your time. Just a reminder to everyone that this recording will be on the Linux foundation YouTube page later today. And we hope you're able to join us for future mentorship sessions. Have a wonderful day. Thank you.