 Remember, remember, security in November. Renektskernel tracing and GNU plot. I know of no reason why Renektskernel tracing should ever be forgot. Stephen Ross stood in his companions. Did the scheme contrive to blow Renektskernel bugs up or alive? 3,000 trace events laid below to prove the integer overflow. But by God's providence the bug did they catch. The regular expression had found a match. A bit in a bite to minister of all's delight. A bug sticks one, then you make two. Better for tracing and the worst for you. Enough sort, enough sort to crack the port. Denial of service to choke it. A pint of beer to wash it down. A jolly good specter to melt it down. Hello boys, hello boys. Make the butter ring. Hello boys, hello boys. God save the king. Hip. Hip. Alright. It's always fun to give talks on Halloween. Okay, I'm Steve Rosted. I work for VanWare. This is actually something I've been wanting to do for an awful long time. There's a lot of things that the Renektskernel does inside tracing and security and there's a lot of things that people need to know about. So that's why I said tracing the bane of use security folks. So I always like to, you know, let's go back. I see a lot of people who are in security and sometimes I think people I get too focused on things and they don't take a step back to say, what's your goal? What are you really trying to do? I mean, I think the number one thing is protect privacy or data. I think that's number one. You don't want people to find anything out on your computer that you don't want. Bank accounts, whatnot, embarrassing things. But still, the whole idea about security is probably the number one thing we said is probably to protect the data. I think people agree with that. Number two, also kind of protecting data is to prevent destruction of the data. We have seen data and the other one is destruction of data and we know about those crypto ransoms and all those wonderful things that people love to do. Keep services running, denial of service. We try to prevent that so people can get their work done, their job done. And also we don't want to try to prevent people from malware. So steps that everyone here should know about what we take. You know, we have file permissions, read, write, execute, capabilities, passwords for logging in, to execute code, SUDU. Also personally, I hate SUDU. I never do SUDU. Actually, I switched to root. I don't give any permission to anything else but root. I don't know. I really don't see the benefit of SUDU. Keys. I love the keys. I try to do as much as I can with keys where it makes sense and protect past phrases on those keys. We also have isolation. Obviously, the page tables, kernel, memory space, and user space, when the hardware works, there's isolation there. The memory sandboxes, virtual machines, containers, and yada yada. Harding. This is where we start, I'm starting to get more into the fun things. We all know code is buggy. Yes, your code is too. So harding, my understanding of it is the fact that what you do is we try to make the kernel still be able to protect in case of a bug. If something was able to, someone could break into something, they still won't have the ability to cause harm or to see data that they shouldn't be seeing. So we do various things. You have the mandatory access control, the stopping, what's it called? Randomizing addresses. So when the kernel boots up, there's time to randomize where it actually loads, because one of the things is if you know where the kernel location is, if there's a vulnerability somewhere, because there's a bug, we know there's a bug in the kernel, it's much harder to get it if you don't know where exactly that code is. And then, of course, we have the lockdown features. I'm all for locking down modules. Once you load all your modules, boom, turn it off, don't ever let a module be loaded. Because if you could load a module, if anyone gets to the point where they could load a module, you can do anything you want. It's very easy. And so we also prevent code modification. We do have runtime, a lot of runtime code modification. I'll talk about that. And also, sometimes, the lockdown features will not let you see any addresses as root. But also, recently, the lockdown features disabled tracing. What happens if your lockdown is broken? This is one of my fears. I'm like, how do you know if your machine is compromised? Because once you lock down everything, you've locked down yourself, too. It's tying your hands behind your back. And recently, the other day, or actually a couple of weeks ago, I had one of my main machines, my work safety machine. One CPU started going full 100% CPU, like the usage. It was just, boom. I was like, what's going on? Just out of the blue, the CPU just started going like crazy. And I did a PS, and it was a K worker thread. I'm like, OK, why is a K worker thread going 100%? How do you tell that? If I did not have tracing, how would I know? I rebooted the box, booted up, everything was fine, and then suddenly, boom, it went up 100% again. Well, I had tracing, so I enabled tracing. And sure enough, it showed me a USB device. I found out my USB controller had a little bug in it, where for some reason, it was actually when I did backups, my hardware, the USB hard drive, external hard drive, had something of you would stress it too much. It got into a point where a bit wouldn't flip. So it basically said, hey, do something. And when it was done, it said, do it again. When it was done, it said, do it again. The bit never cleared. And it was actually a bug in the hardware, a faulty USB device. The hard drive was kind of faulting, and it was causing this spin. But without tracing, how would I know? How would I know that someone's not on my machine doing something and using the K-Worker, like a bug in the kernel of K-Worker D? So lockdown can be dangerous, if you ask me. I like to use Nmap as my example, because it's a great tool for admins, because you can scan ports to see what's open. But it's also a great tool for attackers, because you can scan ports and see what's open. So conflicting agendas. I'm from the real-time world. My kernel development was doing real-time, and it was always funny, because people always used to think that real-time meant real fast. And I said, no, it's the opposite. I always say we have the fastest-worst-case scenario, or fastest-worst-case times. If you need a fast-worst-case time, real-time is the way to go, because nothing else will beat it. But general aspects, there's a lot of throughput. But every time you go throughput, you find out that it kills determinism. And although it's funny, because now, if you look at the specter, those little same little tricks that kill determinism actually make it more deterministic to see where things are, or see what's going on, which is how those are used. It's kind of funny. So you folks, the security folks here should also know the two big balance of easy use of security. The more secure you make it, usually it gets more complex, and the more complex it becomes, actually the less secure it becomes. So I always said the best security system is always the easiest one to use, but then again, when you make it real easy use and you have some sort of, it makes it much harder to do those other cases. So I always like to say, you know, the conflicting agenda is security versus security. There are some things that are secure that will actually make other things insecure and vice versa. So we don't want others to have control, but we want control for ourselves. And we want to see what's happening. So first thing I wanted to tell people, talking to my colleague and I said, security folks must know tracing. And at first they're like, well, wait, wait, I'm a security guy. I don't really know tracing all this. And I'm like, no, that's because you don't know you need to know. You need to know what capabilities are on the machine you're looking at to see things. And there's a lot out there. And also one of the things we want to do is be able to monitor tracing and know what tracing is doing. The function tracer, my favorite of the tracers, is probably the most powerful of tracing. It's not just function tracing. It actually is the way to hook into any function. When I first wrote, I did a talk, I don't remember exactly where it was, but when I first introduced the f-entry, where you could actually put a no op at the beginning of every single function in the kernel and patch that to jump to something else, I said you could actually hijack that function. That's great for two things. Now this will actually allow you to do live kernel patching and it'll also be great for root kits. But right now it's used not just for function tracing, it's used for K-PROs, PERF, and much more is trying to use it. Like I said, it's used by live kernel patching. How many people didn't know this? Did everyone know that live kernel patching was used, or function tracing was used by it? And this is how it works. Say you have a buggy function here, I just picked schedule. We have a, there's a no op at the beginning of the function that we switched to call this f-trace trampoline. The f-trace trampoline will then, because it's a co-op, it requires it will save some registers so it'll be safe to call C code and not mess up the function that it came from. And although at the beginning of the function it actually really doesn't need to do that much. So then it will call the kernel patch function because the kernel patch guy will register with f-trace. f-trace will say okay, I'm going to attach to schedule and call you. So it jumps to the trampoline and it calls directly to the kernel patch. Kernel patch gets, it saves all the registers and passes the regs, the PT regs to the kernel patch. And then one of those registers is the instruction pointer. So kernel reg, or the kernel patch can now change the instruction pointer and then return back to the f-trace trampoline which restores everything and puts back the instruction pointer where it would have been where if it did the return without modifying the instruction pointer the return would have gone back to the normal scheduler. But instead it jumps to the new fixed scheduler. So every time the buggy scheduler hits goes to the f-trace trampoline that jumps and sends it to the new fix. So we have good fixed code. There was a little problem with this though. This. During kernel recipes last year, I think it was last year, yeah. Yeezy Cosina was talking about KGraph, the Susie's version of live kernel patching. And at the end of this talk I raised my hand and said what do you do about f-trace enable? And he said what? And this is where I said this is why you really need to know about things because it just dawned on me when I say that like you know I just forgot we had this global switch that you could flip and it will turn off tracing. It will never make that call. It will be a no-up. Actually it's a bug. It should have been a no-up up there but bugging the slides. Which means you're calling the bug. If I flip zero I can have my kernel all patched with all these updates to fixes. And then I say zero and now it's back to debug. It's all buggy again. After they knew about this we're working on a way we have this permanent flag that if the permanent flag is set the zero you won't be able to disable it. Or if it's already zero and you put a permanent flag on registering the function it will give you error. Tracing is not enabled right now. You have to turn it on and do that. So patches are out there right now to go in. I think it's maybe it's going to go in the next merge window and probably be backboard and everything. But like I said this is something that we need to look at. And like live kernel patching is something that you could say whoa. I might want to turn it off live kernel patching because you know live kernel patching means people could change your machine to do something bad and you not know it. So but it also fix your machine when you had a bug in it if you can't reboot it. So if you're in a case where you can't reboot your machine you might want live kernel patching to be able to fix things. But then again when you have live kernel patching on someone might be able to break things. Text poke. How many people are familiar with text poke? Few. Okay. How many people are focused on kernel the kernel security? Are those same people? Okay. Text poke is a way we modify memory K probes and other aspects. It came in after F trace. F trace was the first one to do it and it doesn't use K text poke yet and I'll talk about that. But the way text poke works it's used by K probes and jump labels and I'll talk about jump labels right now. Anyone here who's familiar with the static branch likely static branch unlikely? Yes, good. Here's the way it goes. So if you go in the kernel and you see static branch likely and static branch unlikely and this is really great for things because this is unaffected by the branch any branch prediction but it's also there's no conditional jump. When you see these things you might say well it's a little bit over every time you put a if statement in code you've actually slowed it down slightly because if you know the processors especially if you're getting a branch prediction if there's anything reason why branch prediction prediction exists is because of the slow down of a branch and if statements are branches and every time you add a branch or if you've now slowed it down so because the speculative going up when it hits a branch it's got to pick one way or another because that whole process is expensive. So static branch likely and static branch unlikely are for cases where it's always going to be one way and usually a user will need to switch it so basically if I want this enabled like tracing is a perfect example tracing uses this all the time because if you're either tracing or you're not tracing it's not going to be constantly making the decision so when you enable tracing we do the switch and the way it works is this if I disassembled and this is actually it's funny because I just randomly I never usually everything's static branch unlikely because that's actually the faster path the static branch likely is very is not very likely to be seen and the reason why I put it there by the way this is encapsulated so this is actually the ZAC code in the kernel because the static branch likely can be unlikely as well depending on configuration but in my configuration I saw this and if you disassemble the code there is no branch there is no check you'll see a jump the very very first thing it does comes in is a jump and the second one that you'll see is a know-op so it either always jumps or always does a know-op text poke so here I have the jump in know-op off to the right and you have the page tables you have a page table if everyone knows how like for those that I'm assuming not everyone here is a kernel developer or you know no systems but page tables is the way you do the abstraction from or virtual address abstraction so when you go to write to an address it will look at the first few bits of your address so go to this one table and then find the next few bits to index into the next table the next few bits of the index to the next table until you finally actually get to the page and the last like you know 12 bits depending on how big your pages are it'll jump there that's when we have TLBs to make it faster but let's say the whole kernel is mapped with 64 meg pages and it's one there this is like one page table and we want to modify something but it's read-only and executable we don't want to make executable code read-write because that could be bad so what we do is we create in the fix section we create we add some more page tables with the protection bits being read-write and no executable but points to the exact same location that we want to touch we make our modification then remove it that's how textbook works the function trace around my old laptop that I brought here because the one my laptop on Thursday broke that's why I'm using someone else's laptop by the way I have over 50,000 functions in my laptop are being traced that can be traced when I enable function tracing it's going to modify over 50,000 locations in the kernel to make it trace and you can see it yourself by doing that command or if you go to syskernel tracing or syskernel debug tracing and you'll see a file called available filter functions that's all the functions that can be traced this machine had 53,777 but all text is read-only so how does it do this? and it might create page tables all over the place to do this so if you look at the actual code you'll see this ftrace arc code modified prepare then you'll see arc ftrace update code and then you'll see ftrace arc code modified post process and if you go to x86 you'll see this you'll see that we convert all text in the entire kernel to read-write make our modification and then make it to read-only Peter Zilstra who's not here today he's in the process of making a goal to make sure that this never happens and he's trying to get rid of this by what's it called and the problem is like I said the text poke the problem with the text poke was the fact that it's one page at a time so Daniel Bristow who's he came here or not there they're doing the real-time summit which I'm going to go into at the end of this the real-time summit Daniel did the text poke batch where you can pass a large number of addresses and it'll use less pages to map those locations and make it easier to do this Peter Zilstra is working to change all of ftrace it's still a lot more it's still a case because we found a lot of bugs we found a lot of things that we're still working on as you would expect but the goal is the goal that we're trying to do is to make sure that executable code never becomes writable once it's executable it's read only ever okay the Berkeley packet filter it was made for high-speed custom packet filtering it uses a just-in-time compiler it's actually a pretty awesome utility to think about it the concept of BPF is really amazing a lot of people with dtrace does some things like that for it as well and dprobes and there's been a lot of other work on things like this but from a security point of view you didn't see that the extended Berkeley packet filter does a little bit more because, you know, BPF was written for network filtering and it really didn't have much security concerns actually it was much better it made things really really fast you could filter on things much more complex and speed so the better it's one of these things the better something got at filtering the better it became as a security feature but then you become to make it extend it and then suddenly maybe it's going to be a little bit worse but it allows just-in-time compiling for more than just packet filtering you can attach it to a trace point to actual code you can monitor stuff you can look at things it's there everyone wants to figure out how can we use a BPF for non-privileged users and that just keeps me up at night so during the textbook work Alex say asked me when does Peter Zilch's patches come in because I'm working on something on top of this and I said what exactly are you working on and he said well I try to use your like the K-probe's work and he said because to modify the or not to modify but to get the parameters of a function I want to get the parameters of any function in the kernel and I want to to do that I have to you have that store of registers and you store a lot of registers and when we did our benchmarks it's way too slow because it's storing a lot more than we don't need all the stuff that the F-trace handler does so basically he's going to circumvent F-trace and take over that no-lop because right here what this is the way that it would be that he would do it with the registers if I go there you'll notice the F-trace caller comes in uses function trace so the function trace stores all the registers and then he wants the parameters but no he wants this he wants to be able to call like I said he can actually call a function straight without any trampoline just I just want the parameters put them in the path and it like if I if the EPF function maps it has the same type of parameters as do sys open you could get everything you have do whatever you want and go back of course you need some trampoline you might have to store the parameters inside a trampoline or something but he said he wanted to circumvent this and that scared the heck out of me it was at night I was having back and forth with him I was I went to bed and like you say want to circumvent the F-trace which scared me which means there's going to be another entire implementation of code modification in the kernel and I'm sure you guys really love to hear that because the more you have the more likely to go wrong more that could be used to hack and there's no tracking of what you could be modified but does F-trace do that? yes when live kernel patching came out they wanted to do something similar they wanted to circumvent F-trace for something so why can't you just do a direct jump and I said no and a big reason why is because we have I have monitoring I like to know what's been modified and this is why I've been wanting I never I've never told this to anyone I realized reason why is because this has been mostly for debugging purposes I've always had this on for debugging purposes purposes but now I realize this is a great way to know what is happening on your computer if you go into the CIS kernel tracing and you see a file called functions like I said I created this for debugging the handling of the function tracing is this huge table that you'll see on boot up I'm working on shrinking it even more so but it has every single those 50,000 those 50,000 functions have a descriptor it's a 16 byte descriptor for each of those functions and it has state it has flags it has stuff and if something goes wrong F-trace crashes and your machine can crash too but usually F-trace will stop and to debug this I had this enabled functions thing and I found it's really really powerful so if you were to enable say if I were going to CIS kernel tracing and did a do IRQ set F-trace filter so I only want to trace a do IRQ function say I want to start tracing F-trace or enable F-trace you'll see this but first thing it does is touch the next one tells you how many callbacks are going to it this one actually F-trace creates itself a dynamic it creates a trampoline it actually allocates defines a code so you'll see tramp which means there's a trampoline right there or function it tells you if the architecture supports it it will actually scan that trampoline it will actually look in the memory and say this is what this thing calls and tell you that this guy calls this function function trace call and then it looks at the descriptor that's registered at the what's registered to that function and it's the same thing which it should be if it's not you might want to investigate it so if I enable a k probe which I said does full registration and it can modify the IP as well you'll notice there's an R and an I that it's a full register this functions saving all regs every time and the other one says this function can change it has a possibility it could change the instruction pointer it has a flag set that's saying I might change the instruction pointer on you if you did all live kernel patching would have that I set k probes we're working on not always right now we did by default because we said k probes should do it but now we're going to ask for it so it'll only happen if you ask for it reason why actually the reason why we're doing that is because if you have live kernel patching you can't put a k probe on it because you can only have one modify the IP address if you enable two I right now I have a k probe and a function tracing on the scheduler so you'll see there's two registered for it and there's actually the call goes directly to it uses the default f trace if it's a default f trace trampoline which means it jumps straight to the actual code that's compiled in the kernel it's not dynamic you won't see any trampoline it's just right there and this guy's calling the f trace ops list function which actually iterates at all the call iterators it'll go through the two guys and say do you want to be called do you want to be called so what happened was I talked with Olexi and had this fight and I went to bed I woke up at 6 a.m. and I turned on to my wife honey don't bother me I'm going downstairs and I have to code I had it's one of those things I solved something in my head and I was amazing it's one of those things I woke up and like I have a solution for Olexi's problem because he doesn't want this so I ran it downstairs from 6 to 12 didn't eat did get coffee did I came up with registered f trace direct it takes a call address and a callback address it only allows one so if one's or even registered it won't do anything else and you could still attach function probes but Olexi's thing is if he took if he took over that m count you couldn't use f trace on that guy you couldn't do k probes on that guy it would only be ebpf only and also he said I'll add another no op for you I'm like no and it's still tracked so I went in to test this my code's buggy I only did this in six hours I just want to get it out before before Olexi did any more work I want to say here can this work and he actually said yes this works for me so thank you it's there's things I have to fix on it it's still buggy but it does this so I went into trace events sample code and I added this to it and I just I exported do raw spin lock because it's not exported so I went into the kernel said export GPL simple export simple GPL so I had it available and then I went and modified this guy I added my direct function and I created myself a little trampoline all my trampoline does it saves it puts in a little stack pointer and then calls my function it pushes it's only going to save the first parameter because I know my direct function only cares about the first parameter so it's going to save the first parameter and then it's going to call my yes do spin lock RQ cares about the first parameter so I save the first parameter and then I call my direct function from that trampoline and then I pop the first director and then I pop the first parameter so when it goes back to run do raw spin lock it still gets the same lock and I just put that in there and then this is all I did to register it I just did return register FJStirect I passed in the spin lock and I said here's my tramp and then I unregistered it what's I like about this and this is why like I said I nacked the the live kernel patching version and this is why I'm pushing and I hope I have support here that EPVF uses this facility because when you do this you can't enable functions you could see I added a new letter there D for direct so conclusion all tools all tools could be good and bad sometimes shutting them off could be do more harm than good and the the key to this is always stay one step ahead look around ask people talk to people say hey is there anything here that can mess me up and like I said live kernel patching didn't know about that switch we're wearing the fixer and I was like you know about this I figured someone knew about it and that's why I'm also bringing out I don't think anyone knows about this enabled functions directory so really the key is knowing what's there and since the landscape is so huge you need help thank you questions could you oh I did to my selfie yes thanks for the talk is it possible to modify using F trace modify the function which shows you this statistics of what is traced the this is part of the F taste directory which has all the no ops turned off the directory there's two ways to this there's basically we have two ways of three ways of stopping F trace from touching a function there's a annotation called no trace if you go in the kernel you'll see no trace no trace and that means F trace will never see it the second way of stopping F trace is profile you can do see flags remove and then F trace flags it's in there it's like that and then that will remove it for a file the third way is you can remove F trace from an entire directory and that's by changing the K flags to something else so the F trace directory has all tracing turned off that's why I had my log dev LWN talked about I actually used my log dev to debug the tracing but that's out of tree and I have to compile it it thanks good idea and second question what do you do if you have too much parameters and not enough registers so some of them are on the side no the what's called the parameter we do have ways of storing you have to use the stack so if you want to do something like this it's basically that the direct one is up to you it doesn't affect me it's affect you I just give you the option of jumping to a trampoline if you need to get all like 10 parameters you have to find the stack and load it or actually just jump to it it will be on the stack you just have to make sure it's saved there and it's restored any other questions so if not let's thank the speaker again thank you very much I know