 All right, so about a year ago, I was invited by the Linux Foundation to do a mentorship session on Linux kernel fuzzing. And the main point of the extension, the main point that I was trying to make is that fuzzing is such an awesome tool to find Linux kernel bugs. Well, today I want to revise that statement a little bit and say that, I mean, it's not as bad as this. I mean, fuzzing is still an awesome tool to do anything, it's an awesome approach. But fuzzing is usually just by itself. One of the things that makes fuzzing work is dynamic bug detectors that are used together with the fuzzer. And the idea is that the fuzzer is able to trigger different code paths within the kernel, and then debug detectors that detect bugs because if you had no bug detectors, maybe you would get some random kernel crashes sometimes, but those kernel crashes are very hard to debug. So my name is Andrei, and today the talk that I'm gonna give you is about Linux kernel sanitizers. It's a family of bug detection tools for the Linux kernel, and they find a lot of different kinds of bugs. And initially I started making this talk is, I wanted, the reason I started making this talk is because I wanted to give a bit of an update to how generic mode of Casan works, because the last presentation that I gave about generic Casan mode was, I think, seven years ago, and some little bits of change since then. I think for these parts, like almost everybody around here probably knows how Casan works, but still I'm gonna give an update. And besides that I'm gonna give, I'm gonna talk briefly about other types of sanitizers. Now we have the memory sanitizer, concurrency sanitizer and so on. A few of the sanitizers that we now have can not only be run in testing and fuzzing, but also in field, which means you can deploy to beta testing or maybe even to production devices. And now we also have a new Casan mode, which is called hardware tech-based Casan, which is intended to be mitigation. It's not a mitigation quite yet, but maybe we'll get there at some point. And the final part that I'm gonna talk about is about how to make some tweaks to the sanitizers to make them find even more bugs. And this will be particularly useful for people who do vulnerability research, because if you just run the same tools that everybody else runs, you'll find the same bugs, but maybe if you extend the tools, you'll find something else. All right, so the sanitizers is a family of bug detectors. And initially it was implemented for user space. I mean, initially the first detector that was made was called the, I believe the first one was the threat sanitizer, but the first one that got famous was the address sanitizer that finds different kinds of memory corruptions. And later this tool supported to the Linux kernel and the letter K was added to their names. And the reasons the sanitizers are so popular is there are a few reasons. So first of all, the easy to use in case of user space, you just add another compiler flag in case of the kernel and just enable another kernel config and tool starts working. You don't need to do any complicated setup or something like that. Then they're fast and they're relatively fast, at least based on the feature they provide compared to the other tools that provide the same features, they are much faster. Another important thing is that they are precise in the sense they all the bugs they report, they're true positives, which means there are no false positive reports. Sometimes also positives happen, but usually it's because there is a bug in one of the sanitizers and then the bug gets fixed and there are no more false positive anymore. And also this is like a very underestimated feature. The sanitizers provide very detailed reports because if you have a bug detector that just prints you some very simple report, you have no way to debug what the actual problem is. But the sanitizers try to give you as much information as possible. And we're gonna start with generic Kassan and initially this was just called Kassan. It's like the original mode of Kassan, but right now we have a few modes, so the oldest one was rebranded as generic. And Kassan stands for Kernel address sanitizer and it's a detector that finds memory corruption. In particular, it finds out of bounds use of the free and invalid free bugs in slap, page, alloc, and vmalloc and also finds out of bounds and bugs in stack and global memory. So it covers quite a wide range of different memory corruptions. It does require compiler support, but the support was integrated into both Clank and GCC. And this has been done quite a few years ago, so usually there's no, there are no issues with like if you have too old of a compiler, then of course it's not gonna work, but most of the systems now have quite modern compilers. Like at least the compilers with Kassan support. And as I mentioned, it has three modes, but I'm gonna focus on the generic one. For now I'm gonna briefly mention the other ones later. And there is a page on the Google GitHub IO page. It's like, there are pages for each of the sanitizers and now except for Ubisan, I think there is one for Ubisan, but maybe that will get fixed. And it contains some information like links to the documentation, links to the found bugs and so on. All right, so Kassan consists of two parts, generic Kassan. There is a compiler module which is implemented in Clank and GCC and there is a runtime part that's implemented in the Linux kernel. What the compiler mode does is first of all, I'm gonna briefly mention what each of the parts does and then I'm gonna go into detail. So briefly the compiler module first, it instruments every memory access. So essentially before each memory access that's done by the kernel, the compiler adds some validity checks for this access. And then the compiler also inserts thread zones for second global variables and I'm gonna describe what thread zones are later. The runtime parts does a few things. First, it maintains shadow memory and essentially shadow memory somehow reflects the state of the kernel memory like which memory is accessible and which memory is not. Then it needs to maintain, I mean, it needs to update the shadow memory when new memory is allocated to freed so it hooks into the kernel allocators. And finally it contains some logic to detect bugs and print bug reports. So I'm gonna jump around this list a little bit to make it a little bit easier to understand and we're gonna start with the idea of the shadow memory. So the algorithm that Kasan uses, it comes from a notion that almost any eight aligned bytes, they usually have only nine different states. The first few bytes are good, which means accessible and the last of the bytes are not accessible. And this is a natural idea because most of the time the memory that's allocated in the kernel or like allocated in general comes in contiguous chunks. So there is no way that we have like one bytes that accessible then another byte inaccessible then one byte is accessible. This is a very untypical situation. And because of this idea of the nine states, that means to each of the states, we can encode the number of the state of each of the states in just a few bits. But Kasan uses a whole byte for this and this byte is called a shadow byte. And the encoding works like this. So if all of the eight aligned bytes are accessible, the corresponding value of the shadow byte is zero. Then if only a few of the first bytes are accessible and the last few are not, it uses a number of accessible bytes. For example, if the seven bytes are accessible, then the value is gonna be seven. And then if all bytes are inaccessible, Kasan uses a certain negative value or, I mean, in case you cast it to unsigned, value is just gonna be the value that's in the higher part of the 256 values that a byte can take. And these different types of values for when the memory is accessible, allow to differentiate between different types of inaccessible memory. And this is a snippet from the Kasan source code. So here you can see that it defines quite a few of values. For example, there is a value which corresponds to free page-alloc memory, or there is a value that corresponds to free slap memory. And there are value for stack, for globals, and so on. This is just a few of them. It's most of them, but not all of them. So all of the shadow bytes, they are contained in a special region that's called the shadow memory region. And this region is contagious, and if you want to get the address of a shadow byte that matches an address of the kernel memory, just take the address of the kernel memory, you shift it by three or divide it by eight because we have the slots of eight bytes. And then you add the offset of where shadow memory starts. And if you check the documentation for x86, this is the mmrst file. You can actually find the region where Kasan shadow memory is mapped. And here you can also see that the shadow memory is 16 terabytes, which makes sense because the whole of the kernel address space is for level of page tables is 100. Like for the most typical x86 configuration is 128 terabytes and divided by eight to get 16. So shadow memory is mapped by the kernel and the mapping procedure for the shadow memory is not very straightforward. So first of all, during very early boot, shadow memory is mapped to a zero page, which means that all of the memory the kernel has during early boot is considered fully accessible. And after page tables are initialized, the proper shadow is mapped. So this is the shadow that's actually changeable and the kernels can start like changing the values in the shadow at the moment, or they're starving checked as well. And the shadow is not mapped properly for the whole range of kernel memory, but only for the parts that actually contain something like the kernel text, the fizz map, and so on. And this is the first lines of what happens during boot and then after boot is done, the kernel can still map new memory via Vmap or Vmap. And in that case, Kasan handles these cases and maps the proper shadow for them as well. And I will have to few links to the source code in this slide. All right, we're done with the shadow memory. Right now we have a certain region of memory that allows us to track the state of the memory whether it's accessible or not. Let's now move on to the compiler module and see how the instrumentation works. So let's say we have an eight byte access within the kernel. Let's say you have a pointer A and the kernel writes eight bytes to this pointer. So what the compiler is gonna do with Kasan enabled is it's going to add a few checks before this access. So first this check is going to calculate the address of the shadow of the corresponding shadow byte and it uses the same mapping scheme that I mentioned. Then it's going to check the value of the shadow byte. And if the value is zero, that means all of the eight bytes by the pointer A they are accessible and Kasan can proceed or the kernel can proceed. And if the shadow value is not zero, that means at least one of the bytes in the memory where A points to is bad. And that means at that point Kasan prints a report. And it works similar for N byte accesses where N is less than eight. It's just a little bit more checks that we need to take into account the alignment of A or rather like where eight A points to within the eight, within the aligned slot of eight bytes. And then we also need to check for the how many bytes we access. But if you stare this check long enough you'll figure out that it does the correct thing. All right, so at this point we do have the shadow memory. We do have the compiler checks. I mean the checks that are inserted by the compiler. But what happens within the kernel keeps allocating a freeing memory all the time. And Kasan should keep track of that. So it should update the shadow memory whenever something happens. All right, so this works fairly simply. Kasan just adds a bunch of hooks to different allocators. And if you open the allocator code and you grab for Kasan underscore you'll find quite a lot of callbacks into Kasan runtime. And I'm not gonna go into a lot of details here but I'll just show how it works for the slap allocator rather than the slope allocator in this case. So without Kasan the slope allocator just contains, every slap contains a bunch of slots for a bunch of different objects. And when Kasan is enabled first whenever a new slap is allocated it's fully poisoned by Kasan and poisoned means marked as an accessible in shadow memory. And this makes sense because when a new slap is allocated all of the objects in it are free, like in the freed state. So the kernel should not be accessing any of them. And then when any of the object is allocated it gets marked as accessible or like being unpoisoned. That's what it's called in Kasan slang. And then Kasan also does another thing is that it adds a red zone between each of the two objects at red zones. And the red zones they always stay poisoned. And the red zones they allow Kasan to detect buffer or flows because if we have a few allocated objects that go one after another and there are no red zones that means the overflow from one of the object into another will not be detected by the Kasan, by Kasan because they both are marked with zero in shadow memory, right? So Kasan adds red zone, it allows it to add to catch buffer or flows. And the thing that Kasan also does it's not mentioned in the slide but whenever an object is freed it's getting poisoned, right? Because when the object is freed the kernel should not be accessing it. For K Malak it works, Kasan has some additional red zones because this is like the feature of K Malak caches that when you use K Malak to request a certain size the K Malak, the slap allocator is going to use the best fitting K Malak cache. So for example, if you request 100 bytes via K Malak it's gonna be served from the K Malak 128 cache. And but generally if the kernel requests 100 bytes it should not be accessing any bytes past this 100 even though it has 128. So what Kasan does is it adds an additional red zone after the requested 100 bytes until the end of the object. And then of course there is still the red zone that comes between the objects. But if somebody calls K size after allocating this kind of pointer it's legal within the kernel to access the full slap object and whenever K size is called Kasan is going to unpoison this smaller K Malak red zone. So this is essentially this is how detecting out of bounds accesses works. And Kasan also wants to detect use after freeze and as I mentioned, whenever memory is freed it's marked, it's poisoned so it's marked as inaccessible but at the same time the kernel at least the slap allocator works in such a way that when you have an object that's freed it gets put on the free list and then whenever another object is allocated the first object from the free list is taken. That means that freed memory is unlikely to stay freed long and this makes it hard to detect use after freeze. So what Kasan does is it implements quarantine. So whenever memory is freed with Kasan slap memory it gets put into a queue and it's reuses delayed. Essentially it's not returned back to the allocator but it just kept in a Kasan specific queue. And the queue has fixed size and I have mentioned here but I think it's one 32nd of the RAM size. And these allows Kasan to detect use after freeze because instead of being immediately reallocated the objects stay freed so they stay poisoned and they are kept in the queue. All right, so with this what we have right now is we have the shadow memory, we have the hooks and allocators. We do know how Kasan handles the slap objects or other dynamic type of allocations but Kasan is also able to detect bugs in stack and global variables and this is done with the help of the compiler. So let's say we have a function foo which has an array of 10 bytes on its stack and it also has some code that handles this array. So what Kasan does is going to add when compiling this function and compiling the kernel it's going to add red zones on the stack around this function. And then it's going to add some code into the function to poison and unpoison and I mean to poison the red zones and unpoison the stack allocation. So before the original function code we have the code to unpoison x and poison the red zones and after the original function code we have the code, I mean the compiler inserts the code to unpoison the whole stack frame. So it like just cleans up after itself. And this is how a stack like out of bounds detection for stack variables work. If we have multiple stack variables there's going to be red zone between each of them. So it can be red zone at the beginning then the variable red zone, variable red zone. And it works similarly for global variables. So let's say we also have a global variable array of 10 bytes and what Kasan does it transforms it into essentially structure. I mean technically it just adds red zone after this object and the red zone is poisoned by a constructor. So it works both for the main VM Linux binary and for the module binaries. Just the Kasan inserts some code and essentially it inserts a new constructor for the binary that's being built and these constructors are called by the kernel. Okay and the final part is that Kasan is able to print bug reports. And the bug reports should be as useful as possible as I mentioned because the developer or the somebody who's running Kasan and gets a bug report he needs to be able to address whatever's happening in the kernel. And one of the most important things that Kasan does is that it prints allocation free stack traces for dynamic allocations. So it does it for SLAP and it has some functionality to do it for SLAP and for page allocation can be done where the page owner functionality. And for SLAP how Kasan does it is these stack traces are collected on each allocation free event and then they're saved into Stack Depot. And Stack Depot initially it was developed specifically for Kasan but right now it's just another subsystem in the kernel and quite a lot of other kernel parts use it and it allows you to store stack traces in it and whenever you store a stack trace you get back a handle it's a four byte handle that denotes a particular stack trace. These stack traces, these stack trace handles need to be somehow associated with allocated a free object. And the way Kasan does it is it stores stack traces for stack handles for allocations stack traces in the red zone because for SLAP objects we anyway have the red zone so it makes sense to reuse them somehow. And for the free stack trace usually Kasan tries to save it in the object just to make red zone maybe a little bit smaller. I mean in this case the handles are four bytes straight out it doesn't matter. But anyway it tries to store it in the object to reuse also some space but at the same time for some types of the objects for example the ones that can be accessed by RCU at some point later after being freed it does that still in the red zone. And let's see how a Kasan report looks like let's say we have a SLAP out of bounds bug and this is just a part of Kasan test suite and essentially this function allocates 115 bytes and then accesses the one byte out of bounds. And when you compile this code and run it with Kasan you're gonna get a report. And this report says that there is a SLAP out of bounds bug in this particular function there is a write of size one at this address and then comes the access the stack trace of the bad access. After that Kasan prints the stack traces that are related to a particular SLAP object if a SLAP object was accessed. In this case we only care about the allocations stack because the object was not freed and Kasan prints the allocations stack trace. If we had some kind of free stack trace associated with the object which is relevant when you see use of the free Kasan it would also print a free stack trace. After that Kasan tries to describe the memory address that was accessed and in this case Kasan knows that this was a SLAP memory address so it calculates it figures out to which cache this address belongs to and it calculates how far inside the object the access stuff set is. And since all of the SLAP allocation they also belong to page alloc they're also technically page alloc allocation and they have some corresponding physical pages. Kasan also prints some information about the physical page. And the final part of the report contains information about the memory state around the accessed address and the memory state is essentially the shadow bytes that I mentioned. So here you can see that the middle row it represents the 128 byte allocation. Essentially since each shadow byte is eight in size that means to represent 128 bytes we need 16 shadow bytes and this is the middle row. And there you can see there are 14 bytes that are marked with zero which means they're fully accessible and 14 times eight is 112. And then we have the shadow byte of the value three which means there are three more bytes that are accessible. And in total the size of the areas we can figure out from the shadow memory is 115 bytes which matches the size of the allocation that Kasan initially did, the test initially did. And then we can see that we have a smaller K-Malek red zone which is zero three FC and then we have the larger red zone which is quite large is 128 bytes but this is a particular feature of K-Malek caches is whenever you have a K-Malek cache of certain size all of the objects that are allocated from this cache are supposed to be aligned to the size of the cache. I think this patch was added a couple of years ago and Kasan has to deal with that. So the red zones for K-Malek allocations are quite large that usually match the size of the allocation. But anyway, from this representation sometimes you can figure out even more details of what went wrong. All right, this was the main part of how Kasan works. I'm gonna add a few other notes and they're not like specific to the compiler on the runtime, I'm gonna talk about both. First, some code, some kernel code is not instrumented by the compiler. And first of all, this is the assembly code and the reason is just that the instrumentation module inside the compiler does not handle assembly. And then some code specifically is deliberately not instrumented such as early boot code and allocations themselves. I mean, early boot code, we cannot instrument any code until we have any kind of shadow map, right? So this code is just not instrumented. And the allocations are not instrumented to avoid different kinds of recursion. And you can grab for kasan underscore sanitize in made files to see which files Kasan avoids instrumented. Then what I mentioned before, the instrumentation, the checks that were added by the instrumentation that I mentioned, that they were added by the inline instrumentation mode. And Kasan has a so-called outline instrumentation mode. So instead of directly embedding the shadow checks, it just adds function calls. And these is a bit slower just because of the old function calls that the runtime has to do. But these makes the kernel image smaller. And this is sometimes useful if you have a device which does not allow the kernel image to be very big. And if you enable the inline instrumentation mode, the device is just not gonna work. So instead you could try to enable the outline instrumentation mode and the image will be slower or smaller and it could just fit. And a few other notes. The first Kasan has three modes that I mentioned. So the one that I just described is called generic. The other two they have to do the memory tagging. And the generic mode is called generic because it's handled by any architecture. So essentially there is maybe like five or maybe seven, eight architectures. I don't know that support Kasan now. And it's only a question of implementation. So there are no limitations, architecture-specific limitations to the generic mode. But the other two modes are specific to RM64 and they require certain RM64 features. Then there is a Kasan test suite that checks that Kasan is able to detect certain types of memory box. And for the most part, the test is ported to the KUnit framework. But there are a few tests that are not and all except one of them are easily ported. I already have the patches to do that. But one of them, the problem with that test is that it does copy from user and copy to user checks. So essentially it checks the copy to user and copy from user functions and those functions require the user space component. And the KUnit tests are run from a kernel thread and there is no user space component and it's unclear yet how to deal with that. And unlike quite a lot of kernel subsystems, Kasan does use bugzilla. So all of the bugs and features attract in bugzilla. And yeah, you can actually if you check the bugzilla you can find out that quite a few features are missing from certain modes. So there is still quite a lot of work to do to make Kasan detect everything it could potentially detect. All right, so the summary is this. We have Kasan, generic Kasan, which is a memory bug detector for memory corruptions. It allows us to find out of bounds, use of the free and different kinds of invalid free bugs in slap, hydraulic, vmalloc, stack and global memory. It does require compiler support but your compiler probably supports it anyway. And if you know it's about its performance, first of all, I mentioned that it's relatively fast and usually it gives you about two X slowdown, maybe three X if you enable outline modes. It's like it depends on the exact features of your system. And so I mean, this is quite fast compared to other tools that provide similar functionality. But at the same time generic Kasan suffers from a big crumb impact. And it requires some ram for shadow, it requires some ram for quarantine and also because of the red zones it adds some overhead for slap. And I mean, you can still use it and it still works. It works great, but it's not perfect in that sense. And the way to use Kasan is like the most basic way just enable the kernel config which is called config Kasan generic and then run your tests, your kernel tests, your father or whatever you want. And Kasan is just going to print bug reports to the kernel lock unit to monitor for those. All right, with that I'm done with Kasan. Now let's move on to a few more sanitizers. So for other sanitizers, I'm not going to go to give a very detailed descriptions just a few notes that like to give you some idea of what kind of sanitizers there are. So Kasan, it covers most of the memory corruption types that we have. And initially it was developed and then the question was how about, what about the other types of bugs? So this is where more sanitizers came in. So first we have the kernel memory sanitizer and these sanitizers allows to detect users of an initialized memory. So essentially if you declared, let's say you declared a stack variable and you forgot to initialize it and then use it in an if condition. This is clearly wrong and Kasan is able to detect this type of bugs. And besides that, it's able to detect the information leaks across different security boundaries. So for example, it's able to detect information leaks from an initialized kernel memory to user space. So if somebody came like some memory and then copied it to user space without initializing, this is clearly bug and this is a potential security vulnerability. And KMSAN is able to detect that. Like Kasan, it uses compiler instrumentation and shadow memory, but instead of storing the information about accessibility in the shadow memory, it stores the information about whether certain bytes or bits are initialized. And otherwise it's like a very similar approach to Kasan, very similar implementation. And it's still not in the main line, but I hope it's gonna be in the main line soon because I mean, I've been seeing quite a lot of, quite a lot of patch series lately. And yeah, let's hope. Done, beside the, like another type of bugs that we have are the data races. And the data races are handled by kernel concurrency sanitizer. And this sanitizer also uses compiler instrumentation and it uses what's called a soft watch point and essentially it just tries to store whenever it sees an access to some kernel memory, it tries to store or delay, just sleep for a while and see if there are any concurrent accesses to this same memory. So I mean, essentially if you stop on an access and then you check the value before, before you store and after you store and if the value of memory is changed, that means somebody else accessed it and you have a data race. This is a very basic description, but it's like the exact implementation is much more complex, but that's the idea at least. And there was also another attempt to detect data races, which was called kernel threat sanitizer. But right now I'd say this attempt failed because it tried, so the kernel concurrency sanitizer is kind of like a best offer tool. It doesn't check all of the accesses it can check. But KT-SAN tried to check everything and it tried to follow all of the synchronization primitives around the kernel. But there is quite a lot of them and there are some of them are very weird and hard to track. So there is, I mean the implementation is still there based on an ancient kernel, but I don't think it's going to continue. I mean anybody's going to continue working on that. And finally we have the UBISAN, which is undefined behavior sanitizer and finds different kinds of undefined behavior. And the undefined behavior, it's the undefined behavior from the C stand point. So for example, if you have a variable of 64 or like eight by 64 bits and you shift this variable by 64, that's an undefined behavior from the point of view of C standard. So undefined behavior sanitizer finds these type of things. And the most annoying thing about this sanitizer is that it doesn't have the letter K in its config name. I don't know if it's easy to rename the config names, but anyway, yeah, that's just some legacy thing that didn't, wasn't implemented when it was merged. All right, so these sanitizer that I mentioned, they cover different types of new, other types of kernel bugs. But besides just covering new types of kernel bugs, we sometimes want to test the kernel in different environments. So usually sanitizers, initially sanitizers intended for usage with testing and fuzzing, but it would be really cool if we could deploy some of the sanitizers to an actual production workload and see if they find any bugs there. And for these, there are two sanitizers, so like one sanitizer and one sanitizer mode that could be used for this. So yeah, and this is what I mentioned. It's like the, like you could either, maybe you can only use it in testing that's also fine in beta testing, but on actual devices, or maybe you can even deploy it in production. And the first sanitizer that I mentioned, it's not technically a sanitizer because it doesn't have a sanitizer in its name, but this tool is called K-Fence, which stands for Kernel Electric Fence. And this is also a memory corruption detector, detector as far as I know it only works for slap memory. And the way it works, it just chooses some of the allocations. And instead of using the normal slap allocation path, it puts this allocation into a separate page next to a protected guard page. And if there is an out of bounds access from this allocation to the guard page, it's going to, the kernel is going to catch that, right? The same thing it does for use after free. So essentially it puts the allocation into a separate page. Once the allocation is free, it just protects this page. And these two will use a sampling which makes it, I mean, it's possible to configure how much the overhead is because you can just take one out of a million's allocation or something like that, and the overhead is going to be essentially zero. And this allows this tool to be deployed in production. But the problem is that if you only deploy it to a single device, it's very unlikely that it's going to find any bugs. So the way this works is you deploy it across a fleet of devices. If you have like a thousand or a million devices, then it's going to, like, there is the chance of catching a bug at least in one of them is much higher. And yeah, that's essentially it. And then another thing that we have is the Kasan mode that I mentioned. We now have the software tech-based Kasan mode, and this mode is based on software memory tagging. And I'm not going to go into details, but I've left a link to the presentation that I gave a couple of years ago on the Android Security Symposium and it has all of the details, like how it works, how it uses compiler instrumentation and so on. It has a very similar performance impact to generic Kasan, so it's about X2 slowdown. This tool does not quite work for production, but maybe it can be used in dock food. But at the same time, compared to generic Kasan, it has much less impact just because it doesn't have that much shadow, it doesn't require quarantine. And if you have some kind of Android devices, let's say you're fuzzing an old Android phone, maybe even a modern Android phone, and when you run generic Kasan on it, it just keeps crashing with out-of-memory bugs. What you could try to do is you could try to use software tech-based Kasan just because the memory impact is much lower. Yeah, and compared to K-Fence, of course, this is not the tool that you can run in. Like I mean, K-Fence is much faster and has much lower environments. So this software tech-based Kasan is okay for dock food. But at the same time, it still makes sense to try software tech-based Kasan because it prepares for the next thing that might come in, which is a hardware tech-based Kasan mode, which is what I'm gonna be talking about now. So essentially, like the first idea that we had, let's use sanitizers to find bugs for fuzzing. The second idea is let's deploy them to production to detect bugs in production. And the third idea we can come to is how about we use some sanitizers in actual mitigation? Because if sanitizers are able to find bugs, that means they're able to, I mean, they can just panic the kernel whenever a bug is found. And this might make sense from the security perspective because if somebody is trying to exploit a memory corruption, the sanitizers might cage that of one of the sanitizers. And then just panning the kernel and this way protect it. And the mode that was developed for this case is called the hardware tech-based Kasan. And this mode is not based on the software memory tagging, but on the hardware memory tagging approach. And it's only implemented right now in the, I mean, there are no CPUs that support this thing, but there is a specification and ARM specification for CPUs. And the ARM feature that supports memory tagging is called MT or memory tagging extension. And the hardware tech-based Kasan mode works based on that. So essentially, if your CPU supports MT, that means the validity check of memory accesses are gonna be done by the CPU itself, which makes it really fast. And these hopefully will at some point be use a production mitigation nobody knows yet, but at least if it's not never used as a production mitigation, it can still be used as an in-field bug detector, like the thing that I mentioned in the previous section. And the RAM pact is noticeable, but quite slow. It's 3% for storing memory tags. Essentially with MTE, for each 16 bytes of memory, you have a four-bit memory tag, which now this is the 3% that you need. And the performance impact is still unknown because at this point there are no ARM CPUs that are released with the MTE feature. But the expected impact, I mean, MTE has two modes. They're called sync and async. There are certain differences between them. And the sync one is a precise one, is the most precise one and the expected performance impact is gonna be 10%. And it's a question whether people are ready to take the 10% impact to deal with a certain type of memory corruptions, like the certain type of vulnerabilities, we'll see. And, but the async mode can probably be deployed. I mean, it's not as good, but still it's like around, probably the overhead is gonna be around nothing. And yeah, if you want to learn more details about the hardware tech-based Casan, I gave a talk, I mean it was a virtual talk, but still on last year on Linux security summit. Yeah, and you can find the slides in the video on my website. And one last thing I wanna mention about the hardware tech-based Casan is the thing that I'm working on right now. So essentially to, let's say we deploy hardware tech-based Casan to production or maybe to dog food. And at that point we will start getting some kernel crashes, we'll start getting kernel reports, right? And these reports still need to be useful. And the worst part of Casan right now that's unsuitable for production is the stack trace collection code. So the thing that we need is we need to have some kind of stack trace collection and storage that first is fast and second it's memory bounded because usually there is a requirement. I mean, if your memory that's used by your stack tracer just grows indefinitely, that's not very much like suitable for production. And the approach that I'm planning to tackle is to start implementing some things. First, collecting, we need to collect stack traces fast and one of the ideas that was proposed is to use shadow call stack. Because shadow call stack is essentially a mitigation that allows to deal with certain types of stack eruptions but at any point in time the current stack trace is actually contained in the shadow call stack. So to collect the stack trace you can just do mm copy and this is like the fastest stack trace collection you can get. I tried sending some patches but so far RM64, I mean in particular mark, I don't know if he's around, I saw him yesterday. So yeah, some RM64 maintainers they resist but I mean they do have valid reasons of why it's not to get the idea to include it into the kernel but I think ultimately it will come to the actual performance numbers if we see that this is indeed faster and much faster than. We'll see. The second thing that I've already implemented is right now Casan stores stack traces in red zones and this is not good because the red zone they add additional memory. Since that what I implemented is a, it's kind of like a global stack trace handle storage I called it stack ring and I'm not gonna go into details but I've linked the patches here they are already in the mm tree. Then the next thing that we have we have the stack depot which is not very good for production. It has, so we need a memory bounded stack storage and stack depot has a memory away to limit its memory but the problem is the moment stack trace memory like the moment stack depot memory is depleted it just stops saving stack traces and this is not what we want we want to keep saving new stack traces just to make sure that we are not only able to detect a few bugs during boot and then we have nothing. And yeah this is something that I'm gonna look into next and potentially to limit the performance impact from collecting and saving stack traces I can just implement something but we'll see how fast or slow it is. All right, how are we doing time? Okay perfect, then I have the last extra section that I wanna talk about and this one is about extending sanitizers and hopefully this will be useful for some vulnerability researchers. So the whole idea is similar to what I talked about during my fuzzing mentorship session. If you just take a standard fuzzer and you run it you're probably gonna be fighting the same things everybody else does. And this is also applicable to the bug finding tools. So if you just take the stock sanitizers and you just run them you will find some things but probably these are gonna be the same things that everybody else finds. So what you could do instead is to extend the sanitizers and this is what I call the advanced usage of the sanitizers. So if you look at the, let's say you look at Kasan and Kamsan they're not just tools by themselves you can consider them frameworks. And Kasan is a framework that allows you to mark certain memory accessible and inaccessible and then it does some checks like whenever a memory access is done it can act according to that marking. And Kamsan is a framework that allows you to keep track of initialized or uninitialized memory. And you can build certain features based on that. So the first feature that I wanna mention is adding custom red zones for Kasan. And the example that I have is there is a data structure in Linux kernel it's called a socket buffer and essentially every time you send some data through a socket a socket buffer gets allocated. And this buffer contains the data that's being sent through a socket but it also at the very end within the same location and SKB shared infrastructure is placed. And if there is some kind of bug in the kernel that allows you to overflow from the socket buffer data into SKB shared info there is a potential vulnerability. And Kasan is not able to detect that they're just because these two things they happen to be within the same allocation. So what you could do is you could add a small red zone between the socket data buffer and the SKB shared info. And you will need to do some kind of additional annotations to the socket buffer handling code because sometimes socket buffer might be copied and then it's fully accessed and then you need to unpoison the red zone maybe and then poisoned again in the copy. But this is certainly possible. And this type of custom red zones they allow to find what's called intra-object overflows like when you have a single object and the overflow happening is happening between its parts. And this type of behavior can be exploitable. I mean this is like in one of my exploits I targeted exactly this behavior of the socket buffers. In other thing you can do, it's more complicated but still you can add support for more kernel allocators. So I mentioned that Kasan has support for slab, page alloc and VM alloc but there is for example the per CPU allocator which I recently find out exists. And this allocator is not supported by Kasan. So per CPU allocator essentially allows you to allocate like region dynamically allocate per CPU region so like per CPU variables. And this allocator is used by the kernel. I just grabbed the per CPU alloc function and on the right two columns you can see that there are quite a lot of usage from the networking subsystem. And it's quite likely that some of these uses at least some of these uses can be reachable from user space which means these are potential like security relevant places. And I actually checked that Kasan does not handle any out of bounds use of the free box and the per CPU allocator I wrote a small test. The first test is just a log 128 bytes with the per CPU allocator and then it accesses the one byte out of bounds. And Kasan is not able to detect the crash. And the same thing happens with the use of the free for per CPU like a log per CPU, free per CPU then you do the access and Kasan is not able to handle that. So theoretically if you add more annotations for the per CPU allocator you can find even more bugs and these are the bugs that nobody found before. And potentially I mean if you overflow from one per CPU valuable to another per CPU valuable variable these might have some potential security impact. And this is not only applicable to the per CPU allocators. If you know about some other allocator you can implement it there as well. I'm pretty sure quite a lot of beefy drivers may like GPU drivers or whatever they have their own custom allocators. Maybe you can implement it for that. And I also know that Android has a few allocators. It has binder, it has ashmem, it has Ion and other stuff. I don't know if actually these type of things will be applicable to those allocators but it could be theoretically. So maybe this is something to explore as well. And now for extending KMSUN the example that I have this example was already implemented. So essentially KMSUN allows you to track which memory is initialized or not. And a few years ago I was working on the fuzzing USB. I was trying to fuzz the USB stack the USB subsystem in the kernel from the point of view of an external device. And if the kernel allocates some uninitialized memory and sends it to USB device this is a vulnerability. I mean if the device is trying to exploit the kernel the device does need to have some kind of information leaks to be able to know where the kernel structures are to make, to create a memory corruption exploit. And these type of checks were added to KMSUN and after that she's both started finding quite a lot of USB inflects that are happening over the USB box. All right, I'm done with the core part of the presentation now just a few of the summary slides. So first, if you want to find even more bugs you can still improve the fuzzer. I mean improving the fuzzer is a good idea. It's like improving Cisco, writing new Cisco or description. This is probably it's even easier than modifying bug detectors. But you can improve the bug detectors. And this is the thing that nobody else does as far as they know. Because I guess it's just more complicated but at the same time because nobody does it it might lead to some new results or some unknown results because I mean at this point everybody knows how to write Cisco or descriptions. And everybody like all of the anyway everybody does it and probably people even cover the same subsystems all over and over with their private descriptions. And to know how to extend sanitizers first you need to learn what kind of tools like how to make, how to improve memory bug detectors. First you need to learn what kind of bug detectors there are already. And for this you can learn about sanitizers. And this is the summary slide. That we have some sanitizers for testing and fuzzing. Kasan, KMSan, KCSan, and Ubisan. Then we have sanitizers that can be run in field either in beta testing in dog food or in production. And this is KFans and the software tech based mode of Kasan. And finally we have the hardware tech based mode of Kasan which is hopefully gonna be used as a production mitigation. And like after you know how existing tools work you can make them even better. So first you can extend sanitizers as I mentioned custom red zones, new allocators, maybe it builds something on top of KMSan. But beside that you can also build your own custom detectors. And maybe you can build a detector for other types of bugs. I have type confusions here as an example. I don't know if there are a lot of type confusions within the Linux kernel but they're certainly possible because sometimes kernel casts one structure to another with a similar layout. And then depending on what structure it has, like it does different things. And maybe, maybe this is something that makes sense. I don't know. Then it would certainly be possible be interesting to make detectors for different kinds of logical bugs. For example, I know, I noticed that Jan Horne has been finding quite a lot of missing TLB flashes lately, like over the last couple of years. So maybe there is some kind of way to build an automated detector for missing TLB flashes. Yeah, I don't know what it could be. And of course if you build your own tools you can take inspiration from sanitizers. For example, the compiler instrumentation approach is a particularly good thing that you can use. And you don't even have to write your own compiler module. You can just take a sanitizer and I mean the instrumentation that's inserted by one of the sanitizers and reuse it. Okay, with that I'm done. Thank you for listening and I'm ready to take any questions. So my question is about the inter object overflows. So I mean, you presented as an example of extension but it's kind of crafty. If I mean, if you use cases it's not so locally focused on one particular subsystem or one particular structure. So is there any reason you haven't implemented it? Like as an option, you know, of course you don't want it to be enabled all the time because it's gonna be a huge blow probably on the both memory and everything but why is it not supported back on by default? I mean, this particular says the problem with inter object overflows is that they vary. I mean, you need to implement them from case by case flow, all right? There is no generic framework for doing this type of allocations when you have one type of data in the first part of the object and another type of data in the second part of the object. And this particular thing was not implemented just because nobody got around to it like other better things to work on yet. Because you can also try to insert like when you have the structure and you have many, you can just try to insert the red zones when the structure or like fake objects with some poisoning and... Yeah, I mean, it's not gonna help. Yeah, I get what you're saying. In this case, it's not going to help just because there is no single structure but theoretically you could insert red zones like between each two fields in the structure. I think this was tried in the early Casan, I'll like in the early addressing it either days but I believe this approach was just discarded. I don't know the exact details, I'm pretty sure it's just a lot of performance impact. It certainly will impact greatly both memory and performance but like if you do have a use case where you're ready for paying for it and nowadays we have such a huge servers which we can run with, we can, like for testing and fuzzing, we can sacrifice this so because it's very important part and if we are relying on people to kind of, go and manually go after certain structure adding this very manual approach is not gonna scale for whole kernel. We wanted to enable like by default and something like C-Scoller or something. I mean. So generic approach, I'm trying to save it here. I would kind of advocate for generic approaches just kind of saying to people that you can try to make your own kind of solutions here. It would be much more useful for everyone. Yeah, certainly, I mean, it's applicable in certain cases and for sure people can implement that. I mean, there is another thing that I can see another problem with this approach is let's say you do have a structure and you insert some red zones that whenever this structure is memcopied, like these red zones are gonna trigger. So you need specific structure. You either need to add annotations into a kernel code as well or you need to have a specific structure that's never memcopied. So I mean, anyway, what you're suggesting is totally possible in some cases. You can hook into memcopy and de-poison and similar like you did for me. Yeah, yeah, I mean, you can for sure, for sure. It's possible to implement that, yeah. Now would you just ever done that, yeah. You're suggesting to use Kazan and Kafein's in production but I guess it's gonna leak some information that you would not like to leak to the user space. Do you have any solution for that? So you mean it's gonna leak information through crashes? Yeah, in very parts. So for, I'd say the both, both of the things that I mentioned, like using the sanitizers in production, they're mostly related to Android. In the case of Android, you usually don't have access to the kernel log. So whenever some information printed to the kernel log, the user doesn't have access to that. But let's say the user has some kind of primitive to access the kernel log. And indeed, right now, I don't know about Kafein's but Kazan definitely leaks some information. It's still possible. I think there is even a bug for that to get rid of all of the private information printed by Kazan reports, essentially add to a private mode and they're just going to hash the pointers maybe. And yeah, I mean, I think it's safe to disclose tech traces, right? But probably it's not safe to disclose shadow memory and it's not safe to disclose pointer values or register values. And yeah, you would need to add some kind of private mode for that before it's deployed in production for sure. Could it be possible maybe to give a public key to the kernel at compilation and then encrypt the reports? Or is it crazy? Sorry to say it again. Could it be possible to give a public key to the kernel to encrypt the reports? Yeah, it's an interesting idea. I guess. I mean, yeah, that's good work. Thank you. Thank you. For the inter-object stuff, in structures, red zones, in structures, a lot of the work that's been going on lately to deal with hardening mem copy and make sure we're not doing cross-member copies, that annotation has basically been constructed now over the last couple of years. So we actually have, we've marked in structures, hey, we know that these things need to be, you know, consecutively allocated. And like, we might be in a decent position to actually start adding red zones between structure members, which again, sounds like an enormous memory overhead, but still might be interesting. And then there was a talk earlier on finding speculative issues that was using the Dataflow Sanitizer. And they had modified that for the kernel. So I'm hoping that that gets added as another bug finder. They went through and did a lot of marking for, you know, taint from user space, from everything else. So it seems like that could be used also for the USB style data taint stuff. Did you look at the Dataflow Sanitizer ever? Never. I mean, if you have any concrete proposals, you can definitely file a bug on the bugzilla because that would be interesting. Or maybe just a drop in email to Kasandev. Yeah, they had some patches that did it all. They needed a little bit of cleanup, but it looked interesting. You listed UBSAN in not under the production sanitizers, which I'm curious your opinion on that because I know that is being used in production. Yeah, I guess they just enable some of the checks, right? Right. So I guess partially UBSAN can be used in production just because it's like no downsides. Yeah. So some of the UBSAN tests are insane and some of them are very low cost and find real bugs. And then you mentioned the per CPU stuff. It sounds like just the instrumentation for marking the red zones is missing. Like the poison and all that. Yeah, I guess. I mean, it should be fairly simple. You just add red zones and you add poisoning on freeing and that should work. But I mean, I thought they were treated by the compiler. Like they were normal variables and then magic to move them around happen. So I'd expect the red zones already be there, but maybe the poisoning isn't. But I think, I mean, you're talking about per CPU variables, right? I think this is a completely different thing is because per CPU variables, they're kind of like allocated during boot or whatever, like global things. But this is some dynamic per CPU allocator. Yeah. I mean, I've never, to be honest, I just noticed it maybe a year ago while I was working on some Demalux stuff. I didn't know if this existed. Pretty weird. Is there any granularity of detection difference between generic case in and like the MTE version? Yeah, sure. I mean, generic, yes. Generic, first of all, it's able to detect at least like out of bounds box with one byte precision just because like the shadow contains information about exactly how many boxes are allocated. And the both of the empty amounts, I mean, anyway, the MTE it only is able to detect them with 16 byte granularity. So if you have an overflow that goes up till the 16 byte boundary, it's not gonna be able to detect that, yeah. But the number of different types that I know that like the generic mode have, I know we have like a pretty limited set of bits available for MTE, but that still works out okay for the most part. So I didn't get that. Oh, like generic has all like the different type, different locations of different red zones, different, you know, oh, this is use up for free. But in MTE, we've only got a couple bits for specifying it, you know, what type of memory it is. Yeah, I mean, for MTE, actually, yes. So for MTE, we have four bits per tag. And right now we have only two different types of specific tags for MTE. So essentially, we have the FF tag, like it's like the native kernel pointer and these accesses are just not checked that are marked with this tag, the pointers. And then we have the invalid tag, which is, I mean, essentially, the, that's a lot. The slide that I had, that lists all of the different types of shadow values, this one, essentially for MTE, it's all FE. It's all like the same value, yeah. But right now for MTE, there's actually no support for stack tagging. And no support for global tagging, so it's still to be implemented. But all of the page alloc memory and the slap memory, they're marked with the same tag, yeah. Okay, that's awesome. Mates, MTE reports, less useful for sure, but what can we do? Awesome, thank you. Thank you. Just continuing with case, mentioned about this USB and we're extending the initialized, this K-Memes on. So just another, to add another angle to it. So now when we start to look in like confidential computing, we are basically starting to look at all this hardware kind of layer or something which is just untrusted and this is what we are fuzzing currently. So we don't fuzz, like for us K-Memes on currently is not useful because I mean, it's mostly tracking the initialized leaks to a user space. We are not so much kind of currently interested in that. But I mentioned with this kind of extensions to make it in tracking, whatever it goes back to generically to hardware or private interfaces or something like that would be useful. So I think we're like many, many usages for, because we are starting to have more with attack surface versus just, you know, user space and stuff. And we're starting to kind of employ fuzzing, not just for like, you know, CIScolor-based fuzzing, which is wetline and that for all kinds of other sections. Yeah, for sure. I mean, the USB was just an example of K-Memes and also has check for leaks across network, I think. And it's probably gonna be useful to add leaks to other types of, yeah, other types of environments too. Thank you. All right. Thank you very much.