 Hello and welcome. My name is Dmitry Vyukov, I am very excited to be here and today I'm going to present TOKOL Dynamic Program Analysis for Fun and Profit and I hope to give you some useful information about Dynamic Analysis Tooling that you can use to find very real bugs in programs as serious as the Linux kernel. So first I will give some overview of what is Dynamic Analysis and compare it with Static Analysis because I know there were several talks in this series about Static Analysis. Then I will give overview of existing Dynamic Tools in the Linux kernel and then I will talk in more detail about KSN or Kernel Address Synutizer which is one of Dynamic Tools that we developed. So for the past 10 years I work at Google in the team that is literally called Dynamic Tools and we do a large set of tools for bug detection. Those tools work for user space, for kernel, for different languages and find a large set of different types of bugs. I will also do some fuzzing tools for user space in kernel and different languages and we do some production hardening tools that can be used right in production to detect or prevent some types of bugs and we will also do some other tools, continuous fuzzing coverage and so on. So I know a little bit about this area. So first question we want to answer is why? Why do we want to use Dynamic Analysis? Why it is useful? Why it is good? And the answer is bugs and bugs can lead to security issues which is a very hot topic today. Bugs can also lead to stability issues and they just result in low quality and we don't want to produce low-quality software, we want to produce high-quality software. Bugs also can result in wasted time by developers because fixing bugs after some period of time is much more expensive, it involves reporting, debugging by section and so on. And bugs also mean moving slow, you may need a long stabilization period before you can release and use software to shake out all bugs. And Dynamic Analysis is a cost-effective way to find bugs and later get rid of them. So what is Dynamic Program Analysis? It's analysis of the properties of a running program. What properties can be analyzed? So the first one is bugs or absence of bugs and this will be the focus of this talk. But there are other properties like performance, we can analyze say execution speed or memory consumption of a program, also code coverage or we can build a dynamic call graph of a program to understand if it's possible to say get from this function to this function and how exactly we can get from one to another or we can track data flow throughout the program and so on. So if Dynamic Program Analysis is analysis of properties of a running program, then static analysis is analysis of the properties of program code, which can be source code or binary code, it doesn't matter. As a result, Dynamic Analysis proves properties that hold only on a single execution. We of course can run several different tests or the same several time and check several execution but still each time we test on the single execution. And in contrast, static analysis proves properties that hold on all executions at once. And you may think that proving properties that hold on all execution is just strictly better, right? Because why would we want to prove on the properties only on one execution? So this is true in theory but it is not true in practice. And to explain you why we need to consider two things. First is called true positive. It's when a tool reports a real bug. So this is a good thing. We want tools to report all real bugs. And the second thing is false positive. It's when a tool reports something that is not a bug. And this is a negative property. We don't want tools to report something that is not a bug. Ideally, we of course want a tool that would report all true bugs and don't report any of the false bugs. But unfortunately, in reality, it's not always possible. And in reality, what we usually have is we have this kind of slider where you can balance between on one side reporting not reporting false positives, but also reporting fewer of true bugs. And on the other hand, you have tools that can report more real bugs, but at the same time, they may report also more false bugs. And the degenerate case of reporting all true positive would be a tool that just flags every expression in the program with all types of bugs. And of course, it will report all true bugs, but it also report lots of false positives. So it's not very useful. And the degenerate example of on the other side would be just an empty program that doesn't report anything. It does not report any of the false positives, but it will also not report any of the true bugs in the program. So it's also not very useful. So in practice, we want to find some balance where we report some amount of true bugs, but also not report to many false bugs. And if we try to compare static and dynamic analysis in this respect, then generally we can say that static analysis is better for true positives because it analyzes all executions, all possible executions. So it can give us more true bugs, but on the false positives, usually in practice, dynamic analysis is much better just because it turns out to be much easier to analyze properties of a single concrete execution rather than trying to analyze the source code in general. And to explain this, let's consider a simple example. So let's say we have this function that allocates a block of heap block of 10 bytes and then writes at offset 20. And this will cause out of bounds Texas, which is a bad bug. And static analysis hopefully can find this bug because the bug is very local. We can actually see that the size of the heap block and the actual offset. And we can prove that we can prove that this is a real bug. So we would like static analysis to report such bugs and there are tools indeed that exist that will report this looking just at the source code. However, if we make this example just a little bit more complex and use a dynamic index to access the heap object, then you can see how things quickly become much more complex for analysis. So is it out of bounds access or not? And let's consider that to calculate the index, we take some offset in a buffer and then convert it from string as a string representation to an integer and then use this as an index. So now to prove that this is out of bounds access or that out of bounds access is not possible, we would need to figure out what actual bytes can be written to that buffer. And then if those are converted to an integer, can they be larger than 10 or not? And we also don't know the actual offset in this buffer. So we will also need to figure out what offsets we can have and where this buffer is written and what bytes are written to that buffer. And only then we can try to prove that this is an out of bounds access. So you can see how hard it becomes very quickly. And this is reasonably realistic code. You can find something similar in basically every program. And in reality, it's usually even more complex because there also can be some unions that are known to be hard for static analysis. There can be arrays, threads, there can be function pointers and crypto algorithms that by definition are not possible to reverse engineers statically. So if you use static analysis, it's very likely that we will either miss a real bug here or report the false positive. Now let's consider what we can do with dynamic analysis in this example. So when we will execute the malloc call, we will memorize size of this heap block somewhere on the side in some meta information. And then when we execute the index in operation, we will fetch the size of the heap block and compare it with the actual index because during this execution, we know the concrete value of the index. So we just need to compare those two numbers. And if index is larger than the size, then we report the bug. So you can see that it doesn't really matter how the index was computed. If there was involved any conversion from strings, if it involved any unions, or if actually the malloc and the indexing are in completely different parts of the program, it doesn't matter. We will also never report the false positive because we have the concrete values of the heap block and of the index. And we will always report the true bug. But the caveat here is that we will only report the bug on the test that actually triggered this bug. So this is the downside of dynamic analysis. And to give you some idea of a number of false positives produced by static tools, I run four static analysis tools on all yes config kernel and collect that number of unique warnings they produced and the amount of CPU days that were required for the analysis. So this table is not to show you that this tool is better than that one. It's completely apple to oranges. To begin this, those tools find different types of bugs. But you can see that say Clang produced more than 10,000 warnings. And you can guess that most likely not all of them are true bugs. Probably a majority of them are actually false positives. Some tools, you can see that some tools produce significantly fewer warnings. So they probably on the other side of the spectrum, but they also may report few of the true bugs. And you can see that say for smash the amount of CPU time required full analysis is almost two CPU days. So it's significant amount of CPU time that we could use. If we use dynamic analysis, we could use this time to run a significant number of different tests also to find bugs. So dynamic and static analysis, it's not that one of them is better than the other, they would say they're complementary and each of them has own strength and weaknesses. And yes, and one thing is that most static analysis tools, they analyze not the source code as these, they actually analyze concrete compilation of this source code. And for kernel, it means a particular configuration of the kernel for a particular architecture. And there are more than 12,000 configs. So you can get something like two to the power of 12,000 different kernels. And of course, each architecture is significantly different and affects say size of pointers and size of longs that again can introduce some bugs. So Dimitri, this is Joak. Can I have a question on the previous slide? So client analyze seem to have a lot of warnings. However, does it indicate the coverage it provides compared to GCCF analyze? Does it look at more things that it also ends up giving more warnings? I'm looking to see if the coverage is safe. I'm pretty sure. Yes, I understand the question. I'm pretty sure they do just different types of analysis and probably find different types of bugs. I didn't really look at details. I'm pretty sure I'm so for Klangen, analyze that it produced one type of bug very frequently. Maybe it was a potential null pointer, the reference or something. And I suspect that those Klangen GCC, they just choose kind of different point and the spectrum between reporting more true positives and more false positives. Thank you. So as I said, static and dynamic analysis are complementary. It's not that one is better than the other. So static analysis is better for simpler and more local bugs, as we've seen in the example. And it also has a great advantage of giving more code coverage, because it immediately tests, analyzes all of our code. And it's also generally faster and provides more deterministic feedback, which is not subject to producing, say, flaky tests or failures or failures that depend on timings or on a concrete hardware that you use, which is a very nice property. And dynamic analysis tends to be better for finding more complex bugs. And its great advantage is that it generally has no false positives. And because of this, it also allows a simpler usage model, especially for projects like Linux kernel where we have lots of contributors and code is changing very fast. And what I mean here is that if we know that the tool produces no false positives and we are getting you failure, say, on introduced by a commit, we can be sure that this is a true bug. So we can mark this commit as bad and mark that it should not be merged. And if a tool produces false positives that we cannot do this, because maybe it's just a false positive, so we should not really block merging of this commit because of the failure. And also, in the kernel, we have lots of accumulated false positives. And as we've seen in the table, something like thousands. So when you run the tool for the first time, you need to analyze all of them. And actually, other people already analyzed probably all of those warnings before, but now we need to analyze them again because the information, if it's a false positive or not, is not really recorded anywhere. Okay, so the main disadvantage of dynamic analysis is that it only finds bugs that are actually triggered. So we need to get more coverage to make dynamic analysis more useful. And there are several ways to do this. So first of all, you want to run all of the tests with dynamic analysis enabled, unit tests, system tests and so on. Also fuzzing or randomized testing proved to be very great in combination with dynamic analysis. And it finds lots of bugs. So we have a kernel fuzzer called syscaller, but it will be a topic of the talk next week. Also, you can use dynamic analysis just when you do development. So for example, I have lots of dynamic tools enabled in the kernel whenever I just build and use the kernel for just anything. And this helps to prevent me from introducing very stupid bugs that could have been found if just the tool was enabled. And actually, sometimes I see such cases when other commits introduce bugs that are very easy to trigger, but I see that the change was just wasn't tested with the tool enabled. And there are more interesting ways to get coverage. For example, some of the tools can be enabled in pre-production servers or servers that say handle some portion of a real traffic or maybe in dog food clients. For example, we use Android phones with heavy instrumentation enabled to find some bugs that actually happened during use of those devices. And also have Chrome builds with some of the heavy dynamic analysis enabled as well. And some tools can even be used right in production if they introduce overhead that is suitable for production. So dynamic analysis is only as good as your tests are. And you want to run as different and variety workloads as you can with dynamic analysis. A few words in the kernel tests. So I will not go in details. I think this topic probably deserves a whole separate session. But there are two kernel test suits available in the kernel. One is called KUnit and another is called KSelfTest. So those are right in the kernel tree. And there are also set of test suits that are out of three. For example, Linux test projects, XFS tests and some other ones. So I found those somewhat harder to use because each of them is located in different place and each of them is built and run in different ways. So I would suggest you to start to concentrate on KUnit and KSelfTest first. Okay. And this is end of the first section on the overview and general information on dynamic tools. And if you have any questions so far, I'm happy to answer them now. Do we have any questions? We do not have any questions yet. But just a quick reminder. You are welcome to unmute yourselves and ask a question. Or you can raise your hand and we will ask you to unmute yourself whenever Dimitri pauses. And you can also submit a question via text in the Q&A box. Okay. So I will have another Q&A, another after the next section as well. So let me move to the next section. It's called DIY tools due to self-tools. And it's about set of, let's say, simpler tools that are available in the kernel. So dynamic analysis doesn't need to be very involved. And it can be, in fact, it can be quite simple. For example, there is a thing called config debug list in the kernel. And if you enable this config, kernel will do some dynamic checking of linked lists. So there is a structure in the kernel called listhead. And as you expected, it contains necks and previous pointers. And if we enable this config, what will it do? It will add some additional debug checks for all list operations that will check consistency of this list that links make sense and notes appropriately linked rather than just causing silent memory corruption. And if you enable this and the kernel code will try to execute some bad list operation, then you will see something similar on the console. You'll see a report from this analysis tool. It will say that there is a list corruption that happened in delete operation. And it will give you a stack trace. And hopefully, this will be enough to actually analyze and pinpoint the problem and fix it. There is another one called config fortify source. And what it can do, it can find out of bounds access in the simple code like you can see on the slide. For example, if we allocate a buffer of 10 bytes in a stack and then try to memset it with the size that is larger than the buffer size, then it can catch it. And how will it do it? It will inline memset function and use compiler magic that is called built-in object size and ask the compiler if it can tell the static size of the buffer we are trying to memset. And in this case, the compiler will tell us that the buffer is 10 bytes because the buffer is declared right here. And then memset will compare the size of this buffer with the size of memset. And if we're trying to memset more than the size of the buffer, then it will report the bug. Dimitri, we have two questions. Do you want to fill them now or later? Yes, let's do it now. Okay. The first question is, I want to hear about the future of these tools referring to your previous slides about the tools, SISBOT and so on. So let's do this question at the end, I think. It's just better for at the end because I will yes. Okay. So there is another question. How are you dealing with the timing issues introduced by dynamic analysis instrumentation, like print case causing solving kernel races? We do not. We generally just test the execution at SIS. And yes, dynamic analysis can change the timing somewhat. So we rely on executing just the test over a cloud multiple times and eventually getting the right timing to find the bug. So yes, it's possible that because of dynamic analysis, timing has changed and some bug that was happening actually frequently starts happening infrequently or the other way around. Actually, that's that's what we see more that the bugs that can happen very infrequently, they actually start happening more frequently. But still, if something is possible without the tool in your production bill, it's also can happen with dynamic analysis, maybe just with lower probability. And as I said, we just rely on on repeating the test enough time so that eventually we will catch the bug. Have another question. Yeah, I think that's probably that's that's what probably they're looking for. This is the question coming in on chat. There is a second question on Dmitry, are there any classes of bugs that are not possible to spot by runtime analysis in principle, just due to the fact of change to runtime? It's a kind of a follow on question to to the print case introducing changing the timing. I don't think it's possible to change runtime. It says I said that the probability can be much lower for some bugs. But also interestingly, because we're on same virtual machines, it's it's possible that in some execution, the virtual machine or the host actually also changes the timing significantly. And so that the change between two threats introduced by the dynamic dynamic analysis can be offset by the slowdown introduced by say virtual machine or some, you know, I interrupt arriving at right time. So I think in the end, all bugs still can be triggered. And in some cases, we actually try to additionally disturb timings, both to provoke more unlikely bugs and to kind of recover bugs that we masked because of the slower execution due to dynamic analysis. Thank you. Okay, I see that I don't see any other questions. So let me continue. Okay, so config fortify source can find this simple type of box. And the simplest one is actually a macros called back on and warn on. They just check that they accept the condition and they check that the condition is true at runtime. Those cells are known as assert macro in CNC plus plus. And those are super simple. There's nothing involved here, but they also super useful because this condition will be checked in all kernel executions out there during all testing. And now and in future. And there's another good thing about them is that they serve as a great comments and code. For example, here you can see example from kernel do group exit function that is called on process exit contains a bug on related to exit code. And it has a comment saying core dumps don't get here. And this is the comment that we actually can trust that it is true because it is checked in all kernel executions out there. So if it would be if it would be failing and some corner cases, we would know about this. And you can trust this comment, but you may not necessarily trust normal comments because they may not cover some corner cases or maybe they were true when they were written, but then become not true in future. So encoding your assumptions in this back on and warn on macros is very useful both testing and to communicate your assumptions to other developers. And there are actually lots and lots of similar configs in the kernel. So I tried to compose more or less complete list of those. It's probably not complete, but it's it's complete enough. So there are configs related to lists. There are configs related to check in user copy operations. There are set of configs related to detecting some lockups and hanks. And there are more of them log depth, for example, is super useful. It's a tool that finds potential deadlocks. And there's whole set of configs that can find misuses of synchronization operations. And there are even more of them that can find other types of bugs and more of them that can find even more bugs. So it's for all of those you just need to enable them in the config and you don't need to do pretty much anything else. The kernel will analyze itself for all those different types of bugs. And if anything fails, you will get a bug report on the console. And there are other configs, debug configs that you need to do something to actually use them. For example, KMM leak, you need to run periodic scanning to find leaked objects. And there's also fault injections config that super useful for finding bugs in error handling path. And there are lots of such bugs. But you actually need to enable and set it up to tell it how to inject the bugs, when, and how many, and so on. And there are some auxiliary configs. For example, you want to enable debug info so that you can get line numbers for your bug reports. And in some testing, we enable configs that make the kernel panic whenever it detects any bug, just so that's simpler to spot the bug. So I will not go in details of all those tools because configs, because there are lots of them, but you will be able to find this list in the slides and use it for your kernel testing. And there's one small thing, which is a script called the code stack trace SH, which is very useful for dynamic testing. And it allows you to get line numbers for your crashes. So normally in the kernel crash, you see lines like this, which tell your function name and then some magic hex numbers, which are not very useful. And what this script does, it will add file name and line number to this entry. So you can actually look up this line number and follow what crashed where. And that's it for the next part. Do we have any other questions? A couple of comments more like questions. Maybe you can, the question is about how to bridge static and dynamic analysis is a hard question. I answered saying we do some of that now with K-Cal coupled with the Sysvat triggers. Maybe you can elaborate on that, Dimitri, if you. But I'm not sure what is meant by bridge static and dynamic analysis. What do we want to do? Dave, do you want to unmute yourself and ask a question or elaborate on that? Hey, great presentation so far, by the way. Thanks so much for giving it. I guess one of the things that you have a ton of experience in and that I've always tried to work really hard on is finding a way to minimize the effort once you have a vulnerability or a bug in this case that you've detected. And you have a bunch of tools that have static analysis and they generate 4,000 warnings or whatever it is. And then you also have your dynamic analysis and finding a way to trigger the static analysis generated warnings with dynamic analysis is always really valuable in my opinion because now you have a test case that you could in theory use to help triage and help analyze the event. And so I guess one of the things is you've got a very broad overview here but I haven't yet figured out how some of these things connect, I guess is where that's kind of what we were talking about in the main chat. So I see what you mean. There is some research on doing roughly what you described and using static analysis results to try to guide dynamic testing to actually trigger that bug at runtime. So this research, well, it tend to be in my opinion quite complex. So this type of thing is frequently somewhat hard to incorporate into in the actual production bug pipeline, testing pipeline. So we don't use anything like this so far. So I don't know what to add. So, but yes, it's possible. It's just quite non-trivial and it may also depend on type of the bug. I guess I know that it's somewhat easier for data races. For example, if static analysis pointed to locations where a potential data race can happen, that what you can do is you can run dynamic tests and then just pose threats at those points and see if you can pose them at the same time. So if you catch both of them at those locations, then they keep proof that it's actually possible that those memory access can race. So that's one way of doing this, but this specific for data race detection. And there are other ways how dynamic and static analysis connect. For example, dynamic analysis instrumentation frequently uses static analysis to, for example, prove that some constructs cannot cause a bug and then we don't need to actually check them at runtime. For example, if we allocate a buffer on the stack, we see the size and then we access this buffer and we see that the access is probably within the bounds of the buffer. We don't really need to check this at runtime. So this can be used to remove some of the dynamic checks and make execution faster. Well, thanks so much, by the way. Okay, so I don't see any other questions. So let me continue. And the next part is about KSN or kernel address sanitizer, which is one of the tools, dynamic tools we developed. So first of all, pillars that we use for our dynamic tools. First, we absolutely don't tolerate false positives and explain why they're nasty and why we want to remove them. So whenever we have a choice between, say, choice between reporting a false positive or not reporting a true bug, we tend to go to the side of not reporting false positives and also maybe missing some true bugs. So we consider this to be better than reporting more true bugs, but also reporting false positives. We try to make our tools work out of the box and be easy to use. Don't require any special tuning or setup. And we try to provide informative reports that give developer enough information to understand what happened and to fix the bug. And we try the tools to have as low overhead as possible so that they can be used more widely. So what is KSN? KSN is a tool that can detect out of bounds access, use of the free box on heap stack and global variables, and it's enabled with config KSN. So what is out of bounds? Out of bounds happens when we allocate a buffer and then we try to access it outside of the bounds of that buffer. And use of the free happens when we free a pointer and then try to access it after we freed it. And those are bad bugs. And what you might expect to happen is when you do this, there will be some kind of loud boom. And I don't know, something will happen, program will crash. But unfortunately, in reality, in most cases, actually nothing happens. So we just write, we just corrupt some other memory unrelated when we write out the bounds and nothing happens so far. Later, maybe the program will crash and then we'll get the loud boom. But at that point, it will be very, very hard to understand what happened, what caused the corruption and so on. So it's known that some of such bugs required basically months of developer time to debug and to understand what happened. So what the KSN does, it actually makes the program boom at the point of the bad memory access. And here on the slide, you can see an example report produced by KSN for use of the free bug. So you can see here in the title, it says it's use of the free in this function and that it was a write of size eight done by this task. And then you see the call trace where the bad memory access happened. And then the tool tells you where this hip block was allocated and where it was freed. So this information is hopefully enough to pinpoint the problem and fix the bug. Okay, now let's consider how the tool actually works. The tool has a notion of shadow memory. And this shadow memory for every eight bytes of aligned eight bytes of kernel memory, the tool has one shadow byte. And this shadow bytes encodes if this kernel memory, if the corresponding kernel memory is good or to access or bad to access. So if all eight kernel bytes are good to access, which means this memory is not freed, it's not, it's within bounds and so on, then the shadow byte contains value zero. And if only first n bytes of the kernel memory are good and the rest are bad, then the shadow byte contains the corresponding number of good bytes. For example, if first five bytes are good and Romanian free bytes are bad, then the shadow will contain number five. And the bad memory means that this memory was either freed or it's outside of the bounds of the allocated heap block and so on. And if all eight kernel bytes, sorry, if all kernel bytes are bad, then the shadow contains minus one. So now where this shadow memory, shadow bytes are located. Kernel memory virtual address space has particular layout. There is, for example, a region called physical memory at particular address. There's region called Vmalloc memory. And then there is a kernel text, program text at some address. But we're not actually interested in all those details. What we're interested in is this new region called case and shadow. And this region holds all those shadow bytes. This region is quite light. It's 16 terabytes, but it's only virtual address space. It doesn't actually consume your physical RAM. And this memory has shadow bytes for all of the other kernel memory. So there is a mapping between kernel address and the shadow byte. And because this shadow region is a single continuous address range, this mapping is actually very simple. If we have a kernel address, we need to divide it by eight and add some offset. And this gives us mapping from the kernel address to the shadow byte. And we divide it by eight because we have one shadow byte for eight kernel bytes. Now the next thing we need to consider is red zones around heap objects. So normally without KSN, all heap objects are allocated next to each other. And what KSN does, it inserts red zones between those heap objects, those red chunks. Those regions, shadow for those regions is marked as bad. And if kernel will try to access those red zones, we will be able to detect out-of-bounds accesses. And compiler also ranges red zones for stack and global variables. The next thing we need to consider is quarantine for heap objects. So when you free a heap object, normally without KSN, it will be used in last in first out manner. So the next Kmalloc call of the same size will most likely return the same object that we just freed. So what KSN does, it puts freed objects into so-called quarantine, which delays reuse of the heap blocks. And while the heap block is placed to the quarantine, its shadow is marked as bad again. And this if kernel code will try to access this block while it's in the quarantine, we will be able to detect the use of the free. And the last piece we need to consider is compiler instrumentation that actually checks validity of memory accesses. So if we have a write of eight bytes, it doesn't matter if it's a write or read, it will be the same actually. The compiler will add the following code before the memory access. So let's consider what it does. First of all, we will compute address of the shadow byte based on the original address kernel tried to access. And we'll do this by shifting address by free or divided by eight and add in a particular offset. So this gives us address of the corresponding shadow byte. And then we will load this shadow byte and compare it with zero. And if it's not zero, we will report a bug. And only because the access is eight bytes, we want all eight kernel bytes to be good to access. And all eight bytes are good are encoded as zero. That's why we compare shadow with zero. And if the access is smaller than eight bytes, for example, it's one, two or four bytes, and we're accessing say bull or short or integer, then the code will be very similar. But there are this additional part that will actually try to figure out if we're accessing good or bad bytes within those eight bytes. Because, for example, we can have first four bytes are good, and the last four bytes are bad, and we're accessing two bytes. So we need to figure out if those two bytes actually bad here or good here. So this code looks at the offset of the pointer size of the memory access and compare it with the shadow. And this is basically it. This is like the main algorithm, how KSN works. So to recap, we have the shadow memory that marks kernel memory as good and bad. We have red zones around heap objects, and red zones are marked as bad. And this is what allows us to detect out of bounds accesses. And we have quarantine, which delays reuse of freed heap objects. And again, objects in quarantine are marked as bad. And this what allows us to detect use of the free box. And we have shadow checks before each memory access that are inserted by compiler and that actually allow us to detect bugs when they happen. So let's look what we've got. We've got no false positives by design because we basically, because of how the tool is built, if some object is freed, and with kernel code tries to access this object, there's no way it is not a bug. The tool works out of the box. You just need to enable the config. And it gives relatively informative reports. It provides allocation free stacks. It describes heap stack and global objects. And lately, it even do cool things like giving you the last call RCU stack for the heap object. And it has a relatively low overhead of about 2x slowdown and 2x memory overhead. So this is a visible slowdown, but it's usually fine to run any tests. So now you may wonder how good actually is this dynamic testing? Is it fine in any box? So we use dynamic tools with fuzzing with our syscaller fuzzer. And we test kernel continuously for about five years now. So we have this syscaller upspot.com dashboard with all the bugs we've found. You can see an example on the slide. And so far, we reported more than 5,000 bugs and more than 3,000 of those bugs were actually fixed and accounted more than 4,500 backports with fixes in long-term stable kernel releases, which I would say pretty good numbers. And the types of bugs we're finding they're different. For example, I counted about 1,000 bugs detected with KSN, and then 370 bugs detected with KMSen, which is another tool that we developed that finds uses of initialized values. We detected about 480 data races with KSN and log depth detected about 170 dead logs. And for warning plus bug, we found about 1,000 of cases where they fire about 500 null pointer, the references and some other types of bugs. So it's really a mix of different types of bugs. We don't try to use just one tool, we try to use all of them. Dimitri, do you remember when KSN went in? It's a while back, right? It's about 4.5.5, I think, something like this. Oh, no, it's older than 5.5 for sure. I'll look it up. Somebody asked a question, how older versions? So KSN was introduced quite long ago in 4.3, I think, and KCSN relatively recently, in I think it's 5.5 or 6. The question was about KSN. Okay, so about 4.3 for KSN and KCSN is 5.5. Okay, thank you. And for KSN, I think people ported it to all the kernels as well. There should be some patches for, I remember, up to 3.8, I think. Okay, so to conclude, dynamic tools are your friends. Use them, enable all of those debug something configs, enable log depth, enable KSN, use it during development. As I said, I have almost all of them enabled basically all the time. If you develop new code, insert bug on warnon, those are useful macros. Also add and run tests. And this decode stack trace, create this, is your friend for add and line numbers to crashes. Also bugfix is usually a perfect first contribution. So if you're just looking for contributing to kernel for the first time, that may be a good thing to do. Because bugs are usually, bugfix is usually relatively small and they don't require lots of high level design. And bugfix is something that's always welcome and it's not controversial. So you can look for example at syscolor dashboard to find some bugs that are not fixed yet. And try to fix them. And there's also Linux kernel bug fixing, Linux foundation mentorship program that run last year. And maybe it will run in the future, but it's something that requires you to sign up before, but it may be a good opportunity to learn how to fix those bugs. Yeah, we just started one starting spring. We just selected 10 candidates for spring for the Linux kernel bug fixing. And we might have one coming up in summer and fall. Check out the last slide, we'll have all the information. Thank you. And if you want to go hardcore, you can also contribute to the dynamic analysis tools themselves. For example, we have this Lounder release in kernel bugzilla with various feature requests and improvements for KSN and other tools. And you can make a KSN, for example, find more bugs or produce better reports and so on. So contributions here are also welcome. And last but not least, the sanitizers that we are developing, they are also available in user space, both as part of Clank and GC compilers. You basically just need to add a single flag to your compilation and you can detect wide range of types of bugs. So if you don't use them yet, that's the first thing to do. And with this, I would like to thank you and I'm ready to answer any questions. We have three questions in the question and answer. So the one question is, what's the status of using Clank to compile the kernel? Where can one find information on using Clank to compile the kernel? Are learnings from getting the kernel to compile with Clank generalizable to other code bases? So kernel can now be compiled with Clank just fine. There was indeed some struggle for a long time, but I think ready for multiple releases, it can be compiled. The upstream kernel can be compiled without any additional changes to Clank or the kernel. And we actually have a Clank build kernel on our scolar testing. So we continuously build and test those kernels as well. And I know Linaro and some other companies are also doing this. So basically, you can just do it and use Clank with Linux. I'm not sure I can answer if the findings generalize to other code bases. So I wasn't directly involved much in this effort. I've seen some bugs. Generally, it was bugs like assumptions of GCC use of some GCC specific extensions that are not supported by Clank. I think it also found lots of latent bugs that were just bugs in the kernel code, but they happened to work and not fire loudly with GCC, but then code started breaking with Clank. I think it also uncovered some bugs in the Clank as well, which were fixed. So overall, I don't think there's a general answer. Basically, if you port in batch code base, there will be some set of different issues and each of them just needs to be debugged and fixed in the code base in the compiler. Maybe code needs to be changed to not use compiler-specific features and so on. We have a question in the chat as well, Dimitri. Does a scolar consider that kernels are non-deterministic? How does that affect the POC generated by scolar? Do you see many false positives as a result? Yes, kernel is very non-deterministic code base and it makes things much harder. So lots of user space testing and fuzzing in particular, it just assumes that the program is completely deterministic that the same input always gives you the same coverage and that if you found the bug, it can always be reproduced. It's not the case with the kernel for lots of reasons because of timings, because of inherited non-determinism in, for example, random number generation interrupts arriving at random times. Also, lots of accumulated states, so it's not really possible to make test cases isolated. So it affects design of scolar and number of ways. For example, when we collect coverage, we need to run the test multiple times and see what coverage we're getting on each execution because, for example, we can get some flaky coverage when execution went and the mutics locked slow path, but it happened only once and then next time it will not happen. But it's actually not an interesting coverage for fuzzing as well. Or Kmalak can suddenly start doing some memory, trying to free kernel memory and call into file system and so on. That's also the coverage that we will not see next time most likely. So we run each test multiple times and then try to figure out if we're actually seeing some stable coverage as a result or not. It's the same for crashes. We can't always immediately figure out what crashed the kernel. We try to reproduce and trigger it again, which is somewhat complex procedure. We try to run the same things we run before the crash, but sometimes it succeeds. Sometimes it doesn't succeed. Sometimes it was the interaction of two tests that were running at the same time. Sometimes it's related to the sequence of how the tests were run because there are some accumulated state in the kernel. So we try to run different tests and different combinations and see if we can trigger the crash again or not. Sometimes it succeeds. Sometimes it doesn't succeed. Then we just try later. Maybe the next time we will be more lucky and be able to trigger the crash. Lots of crashes are just flaky because there are some races involved. So they crash say one out of 10,000 times. And then when we try to reproduce it, we may be lucky or not lucky. Yes, so it's different and this non-determinism affects lots of things in CISCOLOR, and there's still some unsolved problems. Another question. It's a follow-on to the same question. Well, I don't mean false positives, but I mean reports slash POC that are impossible to follow. That is, Kaysan finds a bug of some behavior elsewhere in the kernel. I think you might have already answered it, but go ahead. Can I read this question? Where is it available? It's on the chat. It's in the chat. Okay, I'm just... Yeah, it's probably... Oh, okay. Whoever asked said that you have already answered it, but yeah, go ahead. Okay, so we have several more questions in the chat, but if you think that you have something more to add to that question, please do. Otherwise, we have more questions. Yeah, so sometimes we just can reproduce the crash at all, but we still collect the crash because we know that the crash itself was true. We actually crashed the kernel in some way. It's just that we may not provide a reproducer. I'm not sure if this answers the question. So if we provide a reproducer, then we know that this reproducer actually triggered the bug that we report on a freshly booted kernel. So if this caller provides a reproducer, it's not a false positive. It may not trigger the bug reliably, or it may also trigger other bugs besides this one, but at least kind of the reproducer and the bug, we know that that combination is valid. Okay, so another question we have in the question and answer box is, do you do any analysis of the components in which the bugs are found? For example, are certain components more prone to bugs? We don't do this analysis directly. We, I think, seen some components that contain lots of bugs. This may relate just because those components are large. For example, networking is very large, or USB is very large because there are lots of different drivers. We also have different coverage for different subsystems. So we don't do direct classification of each bug, and we don't do rate of bugs per line of code analysis at this point. There are some requests, feature requests, to actually try to classify bugs by subsystem, and then we will have those numbers, but not yet. Thank you. There is another question. This is a longer one. Something that I have been trying to do in is run since scholar on a Samsung Android phone, but I have been having troubles compiling the Samsung kernel with K-SAN plus K-Cow. I have followed the instructions on the Android K-SAN K-Cow guide. The thing that I don't understand is modifying AOSP code section. It doesn't help with other devices. For example, it's not obvious how adjusting bold parameters for a device, is there any logic in increasing these addresses, how to calculate new values? It's in the question and answer box. It's a bit longer. Yes, I see it. So I think we need to take this offline to mailing please. And I'm actually not sure what mailing please. So there is K-SAN dev mailing please, but it's mostly for upstream kernel. So if it's Android specific or specific to particular device, there are not necessary people who just have experience with that. Generally, K-SAN doesn't need to be adjusted in any way. So that shadow region address, it should work for all kernels and for all boards and configurations. But there sure can be other issues related to, I don't know, bootloader and other things. So it's hard to talk abstractly. Thank you. There is another question. How hard is it to prevent exploiting out of bounds and use after a few bugs in the kernel? Is anyone working on this? How hard? It depends on what runtime overhead you can tolerate. So it's very easy to prevent if you're ready to tolerate, I don't know, 10x slowdown and 10x memory consumption increase. There are techniques called, for example, Fed pointers that can do precise bounds check and just prevent all of those bugs reliably. But generally, it's not, this amount of overhead is not okay for production environments if you're in production. So there's actually techniques. There's some hardware support, which is called memory tagging, and it's being developed by ARM. It's already in ARM specifications. And we hope that other CPU vendors may provide something similar. And this hardware support allows to have case and like detection with very low overhead. There are no physical hardware yet. So we don't have exact numbers, but maybe somewhere from, I don't know, 5% or something. So that is our closest hope and the most realistic hope for preventing exploitability of those types of bugs. It's essentially moving the case-on type functionality into the hardware so that it happens quickly. Have another question. Is it possible to educate SISBOT at some reinforcement learning with manual SIS duplicate? SISFIX commands for better performance. So there is actually some work on incorporating reinforcement learning, but in the FASR logic, so that the FASR come up with more interesting programs and triggers more bugs. But this was understand this question about duplicating bugs. Marking bugs is duplicate. So for FIX commands, we do by section and this handles some amount of missed FIX commands. Marking duplicates with reinforcement learning. Maybe it's possible. Hard to say. It may also, it will also have some false positives, I think. And I don't know, this needs to be evaluated. For example, if 80% of duplicates will be marked correctly, but then 21 will be marked incorrectly and we just make some bugs disappear, but those actually not duplicate. They're a unique bugs. Is it acceptable or not? I don't know. So I would say it's some open research and somebody needs to try to build prototypes and see what quality we can get and what results we can get with this. There is a question. There is still the last question about future of the tools. But before that, based on the follow on question to Android case on, Adam is wondering if he can reach out to get resources on where to get help, offline help. So let me post this case and dev mailing list address to the chat. I think it's this one. This may be a place to start at least. And I think there was a number already of questions about ports and case and to some Android kernels and dealing with some boot issues in Android. So maybe it's already answered there. So I personally don't have lots of expertise with backporting and enabling case and on specific Android devices, but maybe there are other people on the list. Thank you. Another question. Best tool for dynamic call graph. I actually don't know. I know something called Gcov. It may do this. I mentioned dynamic call graph just as an abstract thing. I didn't actually mean any specific tool for the kernel. So sorry I can't give more info. So now we are to the last question. I want to hear about the future of these tools. Future of those tools. So for case and it's mostly done besides some local improvements. We do KC-SAN, which is data race detection tool. The issue currently with the kernel is that there are too many benign data races or intentional data races. And they confuse the tool a lot. It detects those races, but those are not actually bug as considered by developers. So that's where we currently stuck with deployment of the tool. So the future on this direction would be actually somehow marking all those intentional data races, and then the tool will start finding non-intentional data races that are bugs. I also have a tool called KM-SAN, which is memory sanitizer, which finds uses of initialized values. And we already use it, but the thing is that it's not upstream. We never managed to actually do this last step because it's actually quite a large step to upstream and incorporate it into upstream kernel. We are also apparently working on a set of tools that can detect bugs right in production because of this problem with, you know, getting coverage. If you were on tests, we're on fuzzing, but still, we're not doing that. We're covering everything that's important for production, so we want to deploy some tool right to production systems. And for that approach, we currently use some sampling, use high rate, low rate sampling, and check just some small subset of heap objects for use of the free and out-of-bounds bugs. But the idea is that if you deploy this tool in a large fleet of devices, for example, a data center or all of the phones, and each of those devices checks just a handle of heap objects with very low overhead, then you still find bugs on this whole fleet of devices. And this tool is called KFANCE, and it should be upstreamed basically within this week, at least we very much hope that it will be sent to Linus and Linus will merge it. So in future... Okay, go ahead, sorry. In future, we also want a tool that detects initialized memory in production, but we don't even have prototypes for user space, so for kernel it's long future plans. Yes, and memory tagging I mentioned, so that's... There's also active work in the kernel to support memory tagging and actually be able to do all of this checking without sampling, checking all of the heap objects in production with acceptable overhead. So that's the word to support the so that's the word to support the Syncernal and support the Syncernal for using user space and we're also waiting for the actual hardware. Yes, I think that's it. That's KFANCE, Dimitri, that you mentioned. I can, you hopefully can find it, I will post the name to the chat. Okay, there is somebody posted a LWN article as well, Link. Okay, it's there. Okay, sounds good. And then you mentioned that benign racists, do you mean that means that intentional, are those need, do they need to be tagged in some way to ignore them? Did you mean is that what you meant when you said benign racists? Okay. Yes, yes, that's what I meant. So the tool just requires some kind of abstract model that would allow it to distinguish between box and not box and if those not distinguishable, if bad race looks exactly the same as intentional, not bad race, then the tool just can do anything, it can't distinguish them. So the idea is to just tag the intentional racists. Right. So something like what static analysis tools do to ignore false positives at times with the tagging, like Kavarati does that and similar to that, I think. That's kind of... Yes, yes, to some degree, yes. So what is the nature of these benign racists? Do you have any examples on, in the Cardinal that would tell us more about those benign racists? Yes, we have lots of examples. We actually have lots on the CISCOLOR dashboard, the cases on reports. So lots of them related, for example, to statistics counters when multiple tasks increment some statistics variable at the same time and they don't use atomic increment, they just use normal C-level plus-plus expression to do this and they're okay with, say, some statistics increments being lost. But of course, if you say do this for an index in an array, then that can be a very bad bug, but the tool can tell the difference. Lots of those also related to, say, setting some flag in one task and then checking it in another one. But again, if you analyze the code, the code is actually fine with, you know, say this update being somewhat sloppy or it ensures that multiple threads don't try to modify the same variable at the same time and so on. So some of the lossiness is, like for example, static counters. I have been looking at those in the past few months. So those are, lossiness is okay. So you don't really care about those. The others are, if you look at the flags, they might actually hold a lock which might not, Casey-San might not pick up to update the flag or is it a real update without a lockhold, but it's okay. So Casey-San catches two concurrent accesses that happen exactly at the same time. So it won't be confused by, say, missing lock or missing some synchronization that ensures that it's not actually a race. So when it points to a race, we know that those actually happen at the same time. But say it's one write and another is read and the logic is arranged so that it's okay. It's this type of cases. Okay. You have one writer and multiple readers, sir. Yeah. Maybe there's actually a mutex for all of the writers. So those are serialized and those don't corrupt the variable. But on the read side, it's read and done without the lock and we catch the race between read and count one writer. Okay. Great. Thank you. Any other questions? How are we doing on time, Christina? We have about nine minutes left. There's any other questions? This is a great presentation, Dimitri. Just awesome. Thank you. Thank you. Yeah. Thank you so much. Well, if there's no other questions, I just want to thank Dimitri for his time today and thank you to all the participants who joined us. As a reminder, this recording will be on the Linux Foundation YouTube page later today, and we will also be posting the slides on our website. We hope you are able to join us for future mentorship sessions. Excuse me. Have a wonderful day. Thank you. Bye-bye.