 Hi everyone, my name is Alexander. I'm going to talk about uninitialized memory a bit, so Sorry, this this guest probably Yeah Nice So let's start with what uninitialized memory release It's memory. That's actually uninitialized. So if we create a variable a local variable or a heap allocation And it's used before we assign any value to it then we consider it uninitialized It's initialized as well if we're using it after it has been freed So according to C standards starting from C 89 This is considered undefined behavior, which means we can pilot may do whatever it wants Optimize the code away change it So some kapal is really do so and different even if they don't the result is still undeterminable This means that attackers may use such bugs to By by controlling this memory to provoke crashes information leaks our seas and so on So right now I'm working on kernel memory sanitizer, which is a tool I can pile a base tool that detects uses of uninitialized memory in the Linux kernel. It tracks every bit Every bit of kernel memory Writing telling whether it's initialized or not and the compile instrumentation propagates this state and checks whether uninitialized memory is used in conditions appointed the referencing or Whether your values are copied to user space or data. Oh, sorry or hardware Right now. I've this code leaves on github. I've sent some rfc patches upstream, but it will take some time to Actually lend them So within we've integrated KMSAN with Ceasebot a Fuzzing infrastructure developed at Google Within two years it has found more than 240 bugs in a very modest setup, so we've been using like 10 machines for that and for example case and use uses 10 times more machines So 200 bucks out of those are real There still are false positives and one of non reproducible errors, but we don't report them to upstream developers Out of those 200 bucks 119 have been fixed already There was one 21 info leak 5 KVM bugs and Almost a hundred networking bugs Sorry. Yeah, almost a hundred networking bugs. Most of those bugs I will never report it upstream because the networking people at Google just fix them right away 58 bugs are still open and 61 is a stuck in pre-moderation queue most of them are Use after freeze which have been already reported by KSN Some of them just don't have reproduces or are not reproducible anymore Three bugs have pending fixes, which means The fix is landed already, but Ceasebot is waiting for for it to reach all the track trees Most of the bugs that we found are fixed within One week Some of them however take 10 months or more to fix Here are the bugs that have been reported this year. So on average KM7 reports like seven bugs a month So we don't have much data about the bug lifetime But based on 53 fixes tags, we can see that the lifetimes of bugs Almost uniformly distributed within one year and 14 years The top anti-patterns are below so a Lot of places in the kernel copy parts of struck Struck socket address from the user space, but they treat it as a whole struck Also a lot of people allocate a structure, but forget to initialize some of its fields or Forget to initialize the padding which is also critical and then the structure gets copied to the user space Some pointers may leak Also USB code Initializing USB devices Often doesn't check That the read from the device exceeds and it actually read more than zero bytes So most certainly there is a lot of bugs That are still there in the kernel Right now Ceasebot covers Only 12% of the kernel kernel code on x86 and Most of the attractive attack vectors are still uncovered. So we We have only basic support for networking for IPv4 and IPv6. We don't really generate packets that do anything complex like Initiate a head shake for example Sorry There is very limited support for USB for KVM and there is almost no support for wireless networking stacks, which probably contain a lot of bugs For example information leaks that don't require physical access to the machine. I Also expect like 200 bugs to be present in the remaining networking code. It's unending Sorry, it's unlikely that the uninitialized bugs May disappear anytime soon. So Mateusz Uczuk from the Google project zero says that Bugs related to Sorry But the bugs related to information leaks are deeply rooted in the C programming language and I actually believe that all bugs related to uninitialized memory are deeply rooted in C So what shall we do to never have to deal with uninitialized memory again? The answer is simple We must initialize all the memory There are several reasons to do so First if we initialize all the memory there won't be any information leaks second if we have code that Has branches that depend on uninitialized memory then it will execute deterministically and third if we Initialize memory that has been freed then it complicates use after free exploitation By the way, Microsoft already does this for PODs on the stack since November last year So, yeah, we're a little we need to catch up Let's start with this technicalization case cook has put a number of a Number of kernel configs under an umbrella There is a number of GCC plugins that initialize parts of the local variables. So there are structures Marked as user space as has been copied to user space. There are structures that are passed by references And finally we can zero initialize anything passed by reference There is also a flag called unit stack all which initializes everything on the stack With an infinite screen pattern. This is only supported by clang, which is something that we may want to change in the nearest future Clang can also zero initialize locals, but this is this mode is protected by a really lengthy Flag because clang developers don't want to introduce a new C++ dialect So our goal is to converge to a situation where all the supported compilers can zero initialize all the locals on this stack in order to do so we must Introduce a similar option to GCC by the way any GCC contributors here and make make clang Community support is a zero initialization option as a first-class citizen There's been also a proposal from the clang community to introduce yet another C standard mode for the compiler which will Be just a collection of such options so we've measured some measured the performance of the locals initialization and the numbers look pretty good So in most cases It's almost free. The problem is that we It's it's really hard to benchmark such changes because if if a benchmark like a net birth spends most of the time in the Kernel, it's not really representative if we have an end-to-end benchmark like for example Android benchmarks Which use both the kernel code and the user space code then Their variance is really big and it's really hard to tell whether anything slows down or not So ideas are welcome. If anyone knows how to benchmark Kernel slowdown the size impact of this instrumentation is pretty low, but Certain hot functions still need an extra cache line or two. So the question is can we do better? First of all, we must use zero initialization for that because The code is more compact. It's faster Second right now clang is bad at that star elimination There is a lot of opportunities to do cross-basic block DC in both the mid-to-land and the back-end and Also FDO and LTO can So there the full program analysis can help remove redundant stores that come from inline functions and Maybe GCC is actually doing a better job. So we must just switch to GCC For certain cases in which the compiler's cannot do their job Well, there is attribute uninitialized which just prevents initialization of certain hot functions So let's now move on to the heap Linux 5.3 has two good parameters called any tonalic and the need on free Which initialize heap and page unlock memory the first option is more cash-friendly because it's initializing The memory chunks that are likely to be accessed soon The second one is a bit slower, but it minimizes the lifetime of this sensitive data And it on free works somewhat similar to Pax memory sanitize, which is unfortunately downstream But well Pax memory sanitize has an advantage of disabling initialization for certain caches, which we haven't done yet just because we haven't measured the Security and speed impact of those changes So initializing the heap is substantially slower than initializing the stack Although it's possible to reduce the costs by not initializing some some places I just need to better understanding of How this works in terms of security whether we can trade speed for security in these cases so yeah It's it's a big deal to not initialize Certain buffers that get for example initialized later are one of the approaches to do so is introducing a Special gfp flag for that This will only work for allocations because we don't pass gfp flags to free functions We've checked that this can this is an easy way to Improve certain benchmarks by just fixing one or two call sites one or two allocation sites, but such gfp flags are really easy to abuse because There is a lot of allocation sites in the kernel and they go easily out of control on the other hand there is a nice nice opportunity for optimizing this even further by emitting and non-initialized came a lack plus memset and The MM set can build it later removed by this to elimination pass in the compiler Another option is to introduce a slab flag that disables initialization altogether for a certain cache Which will work for both in it on alloc and in it on free This is easier to set up and control for example we can create a list of uninitialized slabs at boot time. This is what packs memory sanitize does So Linus also thinks that opt-outs are inevitable We just need to figure out which places Need to be fixed and document them. Well yeah, so Memory initialization is also related to the new arm Instruction set extension, which is called the memory tagging extension or MTE It has been announced last year, but doesn't exist in hardware yet the best of my knowledge The core idea is to assign a full-bit tag to every aligned 16 bytes of memory as well as to every pointer in in the girl so a load and store instructions check that pointers and The the corresponding memory chunks have matching tags if they don't The hardware exception is thrown It's one can think of this as a hardware a sun Implementation, which is really fast and can be used in production So we hope that people will actually use this in order for MTE to work will need to Set tags for every stack and heap allocation Which is suddenly what we need for initializing them and MTE provides Special instructions that perform both initialization and tagging of memory Which means we'll probably have a cheap way to both detect heap corruptions in production and kill uninitialized bugs all together So yeah, make sure out of slides at this point. So If anyone has any questions, I'm welcome For the stack Initialization, did you also look at the increase in stack frame size for the functions? So the question is Did we look at at the increase of stack frames, right? Well, it doesn't affect This tech frame size that just because we Do do do do do do only mean the initialization not for example, I'm empty so let me explain if we initialize a local variables then we don't introduce any additional locals Yeah, but the compiler tries to reuse stack slots between different so we'll use the same stack slots for different variables In different places in the function and actually when we did the GCC plug-in We found out that actually the Stack frame size is affected It could be like a pathology of the plug-in or it could be a GCC problem But I was wondering if you had any numbers for clang on how the stack frame size is affected. I Would suppose that for a GCC plug-in that doesn't initialize all the locals It it could be so that the the local variables cannot be reused but In the case of LLVM, which Initializes just everything. It's still possible to reuse the stack slots and this is done at the intermediate representation level So it's optimized pretty heavily. I Myself have actually never seen a case where the This cost any stack load, but it probably could be possible. Yeah, thanks for a talk. Is it possible to turn memory sanitizer into runtime mitigation not just debugging technique Well, I don't think so just because it's really costly It's requires twice Twice as much memory Just just to store all the metadata. So twice as much memory altogether plus it inserts a really a Really heavy instrumentation that Effects every arithmetic operation every loading store in the in the kernel So it's a lot cheaper and easier to just initialize everything And yeah, on the other hand, you can use memory sanitizer for different things that that require tracking of certain values from from They are During their lifetime for example for taint analysis Thanks More questions. Thank you for the talk. Do we have time for two questions? So the first question is you've shown some statistics on how long does it take to Fix the bug. Do you have any thoughts of why some bugs are fixed earlier than others? Is it about the type of the bug or the availability of the proof-of-concept or is it something else? so I Don't think this Has anything to do with the type of bug by the way types of bugs don't don't really say anything because he's bought Just bails out after the first report and it could be so that There is an innocent-looking bug Which is immediately followed by a remote code execution. We just don't see it and we report only the first bug I think this depends heavily on the Availability of maintainers for the certain subsystem for example the Networking people are really responsive and they've fixed most of the bugs within one or two days and Yeah, maybe maybe for some people the burden is bigger and they just don't have time for that I see. Thank you. The the other question is about the code coverage. So it's also really interesting the statistic Based on your experience with these tools, would you say that once we get the code covered? Let's say we have these drivers five percent What would be the level of certainty that the code is bug-free at this point like once we get it covered Is it like executed millions of times thousands of times? I'm just trying to get a feel for that well this metric As far as I understand this metric actually means that there exists at least once is color program that Executes a certain basic block. Of course in most cases. This is not enough to trigger a bug for example if if a bug requires two two tasks to be immediate to be running at the same time then You'll need some Some better metrics some threading coverage for example Yeah, thank you very much More questions, if not, let's thank the speaker. Thank you