 Hi, I'm Matt Helsley. I work for the VMware Open Source Technology Center. This talk is about OBS tool, a kernel tool that basically takes in object files produced by the compiler. Basically, it works on X86 right now, but it has the foundations for expanding to other architectures. It does certain checks on the object files. It checks for stack validation. It checks for certain vulnerability problems, like with Spectre. And then more recently, it's been modified to add some stack data to the kernel. And that's going to be useful later. One of the interesting things about the kernel is it does use the ELF file format. So it's not that different from regular executables. There is some magic that the kernel does, but a lot of it also relies on the ELF format. So it builds tables using the ELF sections. ELF is the executable and linkable format. It's in a standard for a long time. And there are per CPU architecture additions to the standard so that it can work on multiple architectures. It's found in object files, the .o files that you see when you compile C code. It's found in executables, the regular programs, and your shared libraries. It describes the actual compiled code as well as how to link it when you're building the executables, and then runtime linking as well, the dynamic linking. So Obstool, what it does is it uses an external library called libElf. And libElf is responsible for actually loading the file into memory using a set of data structures that are defined by the standard. And what Obstool does is it takes those data structures and it takes the offsets within the file and converts them into pointer data structures that is ready to access all of these structures. And it will link things like names for the various sections into the sections. It'll look up the symbols and produce hash tables for them, that sort of thing. And then on top of that, it has little flags for checking whether sections have changed, whether we need to write them out to disk, that sort of thing. LibElf itself handles the differences between 32-bit and 64-bit. So you have different word sizes even on the same architecture. The same architecture can also have different endianness. So you can have big Indian and little Indian on architectures like PowerPC and MIPS, that sort of thing. And what libElf does is it normalizes all this stuff when it puts it in memory. It uses some more generic data structures. So for example, with the different word sizes, it'll take 32-bit fields, and it will expand into 64-bit. And then when it's going to put it back on disk, it shrinks it back down. It also uses these functions to basically attach data to the structures, so it has some representation of what was on disk and what's going to go back. So this is all nice, but the problem with the libElf output is it doesn't actually, it's not easily linked together. And from my case, it's also got elf written everywhere, so it kind of gets a little repetitive and your guys kind of glaze over. Whereas optional, what it does is it defines its own data structures, uses a lot of the similar kernel patterns that you might be familiar with, and then kind of links everything together so you can easily walk over, say, the sections using a linked list, or you can look up certain symbols using the standard kernel hash tables, that sort of thing. So what obstacle sees is it sees these object files that come in during the kernel build process. This is before the object files get linked into the final kernel. So it doesn't have things like program header tables, that sort of thing. It's also mainly looking at the instructions that are present in the object file, and this is part of the checking process. And this is one of the reasons it's specific to x86 right now. It has code for decoding x86 instructions, and specifically looking at the way that x86 manages or doesn't manage the stack pointer or frame pointer. So informally, I'm just gonna call these instructions assembly or machine instructions. So the other thing that optional has to deal with is it deals with not only the compiler output, but since kernel developers write assembly on a fairly regular basis, or at least deal with assembly on a fairly regular basis, it has to deal with handwritten assembly too. And this can make things a little more difficult, but at the same time, it adds, it's a little easier to follow a human written sequence of assembly instructions. The other problem with that though is the fact that a lot of things, a lot of times the human won't really necessarily add in frame pointers, for example, or they might forget some aspect of the assembly code that they need to follow for the kernel to be secure. So, OSTool has an opportunity to recognize that and remind people, hey, you need to do this. It started out as a stack validator. It would go through and it would look through the sequence of instructions and follow how the stack pointer changed and whether or not the back pointer was updated according to those changes and it could actually warn you, oh hey, like you're writing this assembly, you didn't actually add the back pointer here. And those are for config frame pointer builds. But at the same time, those builds cost some performance and so a lot of times people build the kernel without those frame pointers. And so that gets into more of the other aspects of OSTool, what it's used for. But the other thing it does is it does some specter checks. So a lot of the speculative execution problems that you've seen recently in the last few years, it will actually check for certain patterns and say, okay, well, this is a problem and you'll be able to go in there and fix it. It also does some new access checking. So that's when you've got code that accesses user space, you wanna make sure that it's very limited in where the kernel can access user space from. And this will actually check for transitions in and out of those sections and find when you might be trying to access user space when you shouldn't. So a reminder with the specter stuff. This is indirect branches causing problems where you've got speculative execution and an attacker can actually control the way that the processor goes in terms of guessing where you've got a branch, which direction it's gonna go. And then it can observe what happens, look at the timing of what happened and kind of determine the value of the branch. And so what this does is it's, let's see. So basically what the attacker can do is they will adjust, they will run some code that will fill the, I think it's the branch history buffer. And the idea is that the processor will then look at this branch history buffer to anticipate where it should go. And you wanna be able to prevent the processor from actually using those heuristics. You want it to use a heuristic that it's not really vulnerable to these kinds of attacks. And so the solution is a reticulum. And this one is sort of like where you set the return address, but the return address is actually the place that you wanna jump to either for a call or an actual jump. And the way it works is you basically have a setup section that calls a function that then takes and puts the return address, replaces the return address with the address you wanna go to. And just after that call is normally where any same processor decides that this is the most likely place you're going to return to. And therefore it doesn't actually look at the branch history. It just says, okay, well, you're gonna come here next. So I have no problems. And so it'll go through and anticipate executing that infinite loop. It won't actually do anything that's driven by the attacker. And so this is how you fix those repletes, but how you fix the vulnerability. But there are some pitfalls, like especially in handwritten assembly. If you hand code this, you might actually fall into that infinite loop. And so there's some checks that Obstacle does to verify that you haven't done those things. It will actually look at the branches that you're making and say, okay, you're making some indirect calls here and I don't see this ret clean sequence after it. It will ignore certain sections because there are some small sections of kernel code that actually do use indirect branches that have been reviewed as safe. And there are also annotations for those sections that it knows to ignore. And let's see, so there's an excellent write up by Google about Spectre. So I definitely recommend checking that out. So the other thing that Obstacle does is it helps with stack traversal. So when you're doing a stack traversal, you're generally trying to provide some debug information to the user or the kernel developer most likely. And you wanna be able to see the sequence of function calls on the stack. What happens is the assembly code will adjust the stack pointer. It will set up call frame. And that call frame might include the frame pointer going back to the previous frame. But it might not because as I went over earlier, the human might have forgotten it or the compiler may have omitted it in order to get some additional performance. So we want something that's going to be able to replace all of that, it's gotta be more reliable. And it has to have no overhead as you're normally executing the function calls. So we wanna avoid adjusting, setting up the frame pointer if possible. Also, there's kind of a race here between interrupts and exceptions and these stack frames where you could in between pushing something onto the stack or adjusting the stack pointer, you may not have, there's gonna be a tiny window where you haven't adjusted the frame pointer in the stack frame. And then you can get an exception then. So you can't necessarily rely on it even then. And that's one of the great things about what Opstool does. So what this shows up as is you'll actually miss function calls in the stack. You'll see a call and it'll actually go back to the previous function, not the actual caller. So sometimes you'll see these strange transitions in the stack face and you'd have to understand that, okay, well the assembly function, it omitted this particular back pointer. And that makes things challenging. So now we have Opstool that introduces this work format. And what work does is it takes and looks at the instruction pointer, the state of the stack at each instruction pointer value and then it looks for the transitions and it builds a table that's outside of the regular instruction flow. And this table goes into an elf section and the table is indexed by the instruction pointer. So based off of only the instruction pointer and the current stack pointer, you can actually find the back trace without having to have the stack or the frame pointer in the stack frames. What it does is it looks at, okay, for this instruction pointer, here's the offset from the stack pointer to the beginning of the frame. And then from there you can get the next or the instruction pointer you're going to return to and so on. So you can, uh-huh. So you're writing this information out to the other file, creating the elf section. Who consumes this other than the Opstool? Anyone else? The kernel itself consumes this. It's one of those cases where it's useful for the kernel itself to be able to consume it. So it has some pointers that point to the section where the section's been loaded and then it can actually go through and one of the first things it does, it actually sorts the entry by instruction pointer because you can't guarantee that the instruction or the entries are sorted initially. This is one early boot, it'll go through and sort that. And then it will actually be able to search the table based off of the instruction pointer. So if you have one kernel thread, for example, that wants to know, but another kernel thread is doing, this one can kind of look at the other one's stack and it can follow it without having to worry about, okay, well, are the frame pointers perfectly set up or not? So that's useful for a couple things. Let's see here. So one of the things that, so it does the stack traversal. Okay, so I mentioned that it sourced the table and then when you're actually trying to do the stack traversal, it looks at the current instruction pointer, the current stack pointer, both of which have to be maintained. They're always correct. It determines where that stack frame is, looking through the table, finding the offset and then it takes the instruction pointer and again goes back to the previous frame and so on. So there's no need to keep frame pointers. You get to save all of those instructions that adjust the stack frame during regular runtime. But there is a cost to work and that is that you have a big table essentially that's kind of off to the side. It's not typically loaded in cache. It doesn't have an impact on the registers at runtime, but it is a big table in area and I think the numbers that I saw were typically like a two megabyte to eight megabyte table. This is actually, it's big for, compared to using frame pointers, but at the same time it's much smaller than the dwarf information. The dwarf information for the kernel is huge. So it's a little nicer in that respect but you also still have a bigger kernel. And one of the things that work helps with is because you have these reliable stack faces now, you can do things like look into what another kernel thread is doing, a task, a process, those things and figure out, okay, well, it's currently executing these kernel functions and during live patching, we can actually say, okay, well, it's not executing any of the function that I'm patching right now and so I can actually patch that function for that task without worrying about it, either one stepping on each other's toes. So let's see. So it helps with live patching and the main thing is you wanna avoid patching pieces that are being used by currently running processes. The other aspect of live patching is it can run without Orc, but it's less reliable itself. So it'll wait for user space to go all the way back out of the kernel and then it can actually patch the functions that user space was using. The problem is you can have multiple processes coming in and out of the kernel and so there's a chance that the functions are always in use or might always seem to be in use because you can't look at the stack reliably, but once you can look at the stack reliably, you're much more likely to be able to patch the functions out. So there are some more checks being considered on LKML. I saw one recently and there are some other things that I'm actually working on where I'm trying to incorporate a tool called record M count into upstool and what that does is that one looks at the function entry tracing. So you have the standard compiler tools will generate a function call at the very beginning of each function and in order to help you do profile generation and then develop a call graph. And so what record M count does is it goes and turns all those calls into no ops and records the locations of those calls so that later on we can do dynamic tracing where you can enable tracing of certain functions within the kernel at runtime and you don't have to worry about constantly having that overhead of having the call graph being recorded. And that's what record M count does but it's not currently incorporated in upstool. It's its own elf parser. It has its own structures. It has some very weird patterns there that aren't really easy to understand unless you really stare at the code for a long time. And so I'm trying to incorporate into upstool and use upstools better elf interfaces that are standard kernel patterns to kind of make it easier to understand and more maintainable. And the one thing about record M count that does present some problems is it's a little more widely supported on multiple architectures whereas upstool right now is x86 specific so that's one of the things I'm working on testing and making sure that I can build upstool for other architectures. The other things that might be in the future replacing sort x table which is very similar to record M count it goes through and looks at the exception entries for the kernel and then it sorts those. And then there's also one called generate ksims which walks through the kernel table and finds the symbol names and then makes those available to the kernel too. So those could all sort of be incorporated into upstool as different little sub commands. Upstool has a check sub command for example and a generate sub command for the checking and generating work data respectively. And then one of the things that we can do that I think we can do is we can also have it so that you only have to run upstool once to do all those passes. So you could do a check pass, a generate pass and then make the tracing tool pass as well. And so you would only have to load the elf data once. You could process the whole file with the different passes and then return whereas right now you reload the elf file. All right, that's it. Does anybody have any questions? How is recording com function from ftrace? It's part of what ftrace uses, yeah. So it's building the tables inside the kernel that ftrace uses. It's kind of like the way that upstool builds work tables. Recording is kind of the corresponding part for ftrace. Any other questions? Thank you.