 Thank you everybody. Y'all get a little bit nervous when you're the third talk on a subject in a row. You start worrying about which of your slides you have to throw out or modify or correct. And turn on the clicker. There we go. Hi, I'm Casey Schaufler. I've been doing kernel development since 1978. So I've seen all kinds of interesting things go into the code base. Why don't we think the kernel is hard? Anybody here a kernel developer who does not work in security? Let's see. Okay, good. So this won't be completely wasted. First thing, it's too easy to cause damage. We all know, we all know about buffer overflows, stack these string functions, and it's just there's so many different ways that you can cause the kernel to do things that that you don't want to do it, either for bugs or malicious. It's just it's still too easy. And at this point we've got a bunch of people who want to do damage who are really clever. And not only are they clever, they're motivated. They're making money off of it. Exploits, stealing credit cards, all kinds of good things. Even we have a title which has evolved over the past few years of security researcher. Used to be called hackers. Now you've got a differentiate between the good guys and bad. So now they're security researchers. If they've come out in the public and done something that they're getting paid for as part of their job, they get called a security researcher. Otherwise they're a miscreant of some sort. But that's not new. It's like we've known about this for some time. So what's kind of the base of the problem? How old is this problem really? And really it's as old as the C compiler. If you've ever seen the original Kernhahn-Ritchie C book, that's about Ye Big and it's in a small format. And it's really easy to read. I used it as part of my educational process in learning how to do programming, which explains a lot about my code if you've ever seen it. But the C language is written in a time where operating systems were written in assembler. In fact, text formatters were written in assembler. In fact, virtually everything was written in assembler. And for systems programming, C is a really, really, really good language because it allows you to do the things that you can do in assembler while actually giving you some rational structure to your program. So you can organize your memory with data structures and you can, your control flow, you can follow your control flow because of the way the constructs in the language provide for. But it's not strongly typed. It's more a suggestion or a guideline than an actual rule set. This is efficient and convenient. You can do a lot of really interesting things efficiently. For example, my favorite one is the one where you have a data structure which defines a header and then it defines a one element array. And you can use this. You can allocate memory for your data structure and say, hey, this thing's actually got 74 things in it. So you allocate enough memory for 74 things plus the header. And you're good. You don't have to worry about them. You can dereference it appropriately. So you can understand what you're doing, even though you haven't constrained yourself by the data type. You can be clever and you can be precise. The way you have to do sometimes. For example, if you have an internet protocol header, you don't know at the beginning whether it's an IPv6 or an IPv4 and the structure is different depending on that. So you can actually go look. You can have a data structure that defines what this is and if you say, oh, that's the wrong one, after you go look, you can go switch it. That's pretty convenient. You can't do that in a strongly typed language. So with all that good stuff to say about it, why would I want to give it up? And the answer is you probably don't. We all know that strong typing is for weak minds. And strongly typed languages have their own issues. For example, if you have a piece of data which legitimately might be an IPv6 header or an IPv4 header, how do you declare that in a strongly typed language? The answer is you can't. You need to have some way to circumvent the strong typing in order to deal with that kind of data. Now, you can talk about object-oriented programming. That's always one of my favorites. Let's do garbage collection during an interrupt handler. Real-time impacts are awfully good on that. And the other thing is every now and then I'll hear somebody say, hey, let's rewrite the kernel in Rust or a language du jour. And I say, great. How are you going to do that? Well, we'll automatically convert it. We'll write some Ock scripts. We don't use Ock anymore. Do we? We're past that. Pearl, no, wait a minute. We're past that. We'll use some interesting language and we're going to just do an auto-convert. And that will work for about 90% of the system, at which point we hit the 90-90 rule, which is that 90% of the code can be converted in 90% of the time. And the last 10% will take the other 90% of the time. But there are things we can do. We can use the typing that's available. That makes things a little bit easier. Now, we can fix what we know is dangerous. He's talked about, in case, sorry, no name for years, and I still get that wrong. And only in public. We can fix things that we know are dangerous. And we can prepare for failure because we know it's going to happen anyway. So typing, how does that help? Let's say, well, our good example here is, in this case, referenced earlier, the ref count T. Now, a reference count is a very, very specific kind of behavior. It really is an integer. But you are using it in a very specific way. It's got a couple of properties that make it very interesting. One of which is that it should never be zero. And that you should never assign to it. You should only ever increment it or decrement it or look at it in the case of freeing it. So if we use a ref count T in a place where it's appropriate, we can control the behavior of that. And we can put checks into the handling of ref count T so that when we find a problem, we can assume that something bad has gone. Bad has happened. Either it's been attacked or there's a bug. At that point, we don't know and we don't really care. We just can just say, oh, bad thing happened in a ref count T. And that'll find a lot of bugs, prevent a lot of attacks. Good thing. So what do we know can be dangerous? Well, string functions. Anybody notice what's wrong with the stern copy example here? Somebody must be able to figure that out. Actually, it shouldn't be a sterling of anything. Yeah, if it were the sterling of desks, that would be whatever's already there. Of the source means it's coming. You're saying, oh, yeah, we'll just copy everything. And actually, there are two bugs in the second one. Because if it's actually the exact length of desks, you're going to lose the terminating null. So say there are two bugs in that. So the thing with string functions is that if you use them correctly, it's okay. The problem is that we've got a lot of people who've never learned to use them right. Oh, well, maybe they shouldn't be programming in the current. We've got no JS these days. Okay. Okay, sorry. So and then automatic arrays. Case was talking about this earlier. Why is this a problem? Well, because you've got a function where you're going to say, I need an array here and needs to be big enough for whatever the caller is using. So he's going to tell me how big it is. Well, we got two two choices here. If we want to actually make things safe, either we can check in the in my function here to make sure that that number is appropriate, or I can check all the places that call it and make sure that that number is appropriate. But both of those those approaches have have their problems. When you know what's appropriate, when we're doing casts, you can cast you can say I want to treat, you know, I want to point her to this object over here. And I want to treat it like a cred structure. Okay, but if I as an integer, I've got problems because it's not clearly not enough space for the credential. On the other hand, if it's something that's bigger than the credential, I may not be fixing setting all the information that I need, because the credit is only so big. And if I really have got a credit and then some stuff after it, I'm not setting that. I may have trouble there. So casts are kind of dangerous in this case. And then my second example here. I'll leave it as an example to you to figure out what that is supposed to do. But really what should have happened here is that temp should have been defined as an unsigned int and then there wouldn't be any problems at all. By the way, 80% of casts and I'm making the statistic up on the fly. 80% of casts are incorrect or unnecessary. Now it's not that these things can't be used safely, they can be. But checking that you that doing checks on your parameters can be expensive. And sometimes you don't and if you if you make the check at the at the end, you may be making a whole lot of checks you don't need to. And if you make it from from all the places that are calling it, well, what about out of three modules that you can't see? Are they doing it correctly? So we got a we got an issue here. It's like, you got to figure out which is more important. Do you find all the places that call it? Are you doing check there? Are you going to put the check here? And everybody's going to go through it, even though all but one of your callers isn't isn't doing it incorrectly. It can be a balancing act. Stacks. Why can't we get rid of stacks? They're they're machines that didn't used to have them. But anyway, here's a picture of the guy who invented them, you can blame him. They're convenient to push stuff on them, you pull your pop stuff off of it can be hardware accelerated, they're really great. But they're also convenient for mucking up. Why is that? Well, if you're in a function, you know that that the stuff, the parameters for the functions that are in this in the trace that got got you there are on the stack, you could go look at them. No, no, no need to actually think about passing everything. You just go go look on sexy what was there. And you know that anybody you called the last function you called is still on the stack below you. So you can just go look there, right? If you make a call and then a while ago, you know, he had an intermediate result that I could use. So you just just you know where it is on the stack, you've read the code. So you just go look for it. Or if heaven forbid, you're an attacker, you can use that same mindset, we'll find information you probably shouldn't have. There are some intermediate states from functions you're calling you probably shouldn't be seeing. Now you can make it harder to get to do that. You can put gaps between the stack, you know, between the stack elements that stack pages that works that that's a big help. You can if you know what what the gaps are, you can still work your way around that. Or if you're cleverer than I am, you can come up with other ways. You can erase what what's no longer needed case was talking about this earlier. So you come out of a function you erase the you erase the stack that solves that half of the problem right there. Now I just had a random thought. Let's randomize everything so that nobody knows where it is that'll take those hackers and just walk them up the head because they won't be able to find anything anymore. Well attackers and developers hate randomization and they both hate it for the same reason. That is it makes it really hard to find where things are when things go when things aren't working the way the code is supposed to. Sometimes you really need the real address of something or if you have the real address it's much easier to exploit. If you're looking at logs trying to figure out where your bug is or you're looking to log to figure out where where data is that you want to get at. And you don't have the address in the log. It's really tough to do that. You have to start working. And of course anytime you've written the debugger. I was like you've written the debugger is out released. Yeah it's I haven't supported it for two years and all of a sudden bam we've got hashed addresses. Well what am I going to do I'm going to rewrite my tool to deal with the fact that the address isn't really the address it's an arbitrary representation of the address which may or may not be accurate relative to the other addresses. Things yeah my tools get buggered. We can randomize data structures this is always fun. Someone has spent months getting this data structure in the right order so that on that arm arm 64 box that they've got it the caches never miss ever ever. And so now you put a little thing in into the compiler that says oh just whoop the things around into different orders. Now not only you're missing your cash lines but the structure actually got smaller because their ordering is you know the random order happens to be better from a size what size viewpoint than your careful cash line thingy. And so now your system actually runs slower and this is all bad and it's in the networking networking stack and everybody knows that nothing is more critical than cash lines in the networking stack. So there are we have this in now you can say you want you want to randomize your layout or not randomize your layout or let the system decide for you. It's OK. It's really one of the one of the big benefits of this makes it hard for somebody who's running a program to find a particular data structure in memory to do so because if they don't know what the order things are in they can't finger predict. Stack pages. Stack pages are just pages right. There's no reason we can't just shuffle them around and put them wherever we want. No reason they have to be in any particular place. We'll leave this as an exercise to the reader. Functions. Yeah. There's no reason that functions in a in the kernel should be in any particular order. There are optimum optimizations that that are a little bit trickier if you're going to randomize the order of functions but not that much right. And again this makes it a lot harder for somebody who is looking at the system to find thing find where things are so that they can go and find how they're going about it. How they're going to go about exploiting it. Thank you. This is a lot harder if you want to do it on every boot than it is if you want to do it at build time. But again there are a lot of clever people out there who know compilers which I fortunately don't. Now do I have to worry about performance in all this when I'm doing my looking at making the kernel harder or just doing things in general and it's like does the sunset in the West. Yeah actually it does why do I have to be particularly concerned about performance. Well true story. What's happened to me. All right. Hey we want to put this security code into the networking stack. No. Well why not. We can't measure any performance. Well then your benchmarks are good enough. Oh OK work work work work work work. Hey we've got benchmarks now up up new up they still can't find it still can't find and still can't find anything. Finally got a new piece of networking hardware in that was 10 times the performance of anything we'd had previously. Yes in this case in this error case under these circumstances we have a 2 percent performance degradation. OK great. You can't check that in because it has a performance impact. OK. Well fix fix fix fix get clever get clever. Hey look we fixed that performance impact. So it now is no performance impact. Can we check it in. No your benchmarks aren't good enough. OK. So performance is always going to be critical. And there's a reason for this. Damn. I hate it when my slides show up before they're supposed to. OK. Performance trumps security more often than not. Unless you're in an environment that is explicitly made the decision that they are going to take security over performance performance will win in an argument. And the reason for this is really pretty simple. Performance is objective and quantitative. You can come up with a performance number. You can say 2 percent degradation or 50 percent degradation or I ran this benchmark and I got a 17 instead of a 32. You can come. If you're a performance guy you can come in with a number. Numbers are really easy to use when you're making an argument. Where security is is quantum. If you don't know of any problems any security problems with some code. Well nobody cares about security in that. If you've identified a possible vulnerability. Well there are a few people who are interested but really in general nobody cares. You're not going to win an argument with that. If you've demonstrated a vulnerability people start to notice it but until it's exploited and has a name nobody gives a rats anatomy. But again once it's exploited people until it's exploited you can't go into an argument with a performance guy because a performance guy is going to come in with a number. So at the end is it worth the bother. It's like some of the stuff that we're doing in kernel hardening is really kind of minutiae or really looks like it. We introduce code churn. Ref count T 180 files with Ref count T's in them. 500 instances of Ref counts T's in those files and there's still more to do. The variable length arrays. It's like we're churning a bunch of code here. We're turning code that hasn't been touched in years. In order to do this hypothetical security possible maybe some day somebody's going to have a problem with that stuff. And we're introducing runtime overhead when we do it. Not always. I was like some of the the removals of the variable length arrays have actually sped things up. Harden user copy on the other hand has real performance impact. There are cases where people were copying information from inappropriate places in the user space and vice versa. You certainly don't want to copy directly from user space into a DMA area for example. That's usually a bad thing to do. And we introduce you introduce a lot of checks in places. When audit went in for example there was a lot of concern about how much impact it would have on every system call. Well and these are completely legitimate concerns. And the other thing you have to worry about is the developer experience. Okay now there's the user experience, there's the developer experience. The Linux community has been very big on the developer experience. Linus Torvalds is very big on the developer experience because if you don't have a good developer experience you don't have developers. Developers are really pretty important especially in a situation where not everybody who's working on the code is getting paid to do it. Now some of the things that we can that we do are as simple as check patch. All right? You put something in check patch that says hey you know what you're using this this interface but it's that's deprecated you shouldn't be using that. Or you're using this this function in a way that's really not appropriate. That okay well I haven't checked the code in yet. Yeah I can fix that before I do it. It can also be on the other hand it can be pretty picky like the the whole thing with %p. Where %p has that's how you're printing a pointer. There was a lot of debate on that about how that should be handled. Eventually it ended up being the simplest way but before that there were a lot of proposals like well you only you report %p and you weren't about %p and you only allow people to use it under these circumstances with these modifiers. Finally just dealing with it took care of it but that was a choice you know that was a choice in the developer experience of you know Linus and we see a lot of compiler warnings. How many how many people remember lint? Okay for for for those of you who are under 35 lint was w-w before there was a dash w when the compiler just said yeah I can do that. The more compiler warnings we generate the slower things can be. Compiler warnings about casts and about data structure usage really should be paid attention to as I think and that's one of the big reasons why we have that policy in the links from that we don't introduce warnings and that we get rid of warnings because we put them there for a reason. Because the stuff that that you're being warned about really can be dangerous. Finally a harder is subjective you know is it is the kernel actually harder when I make these changes. Does it really make it harder for people to develop the kernel if I make these changes. Yes sometimes yes it is yes we're making it harder here but are we making it more harder over here. So the answer to the original question yes it is harder. We are making it harder to develop the kernel. But the communities buying into it. I think before yeah yeah ten years ago they wouldn't have. Now why is this well part of it is that we're working in the open. Previous efforts that I won't mention by name that were done off on the side in their own patch stream for for direct commercial exploitation weren't going in and the reason for it you know pretty clear here is that that's not a community effort. Community effort makes a huge difference whether you're talking about performance security functionality support for obscure and bizarre and unnatural hardware it doesn't matter the kernel being involved in the community getting the feedback giving feedback working with everybody to make everything better gets you some slack. The amount of help we've been getting this has been awesome. Every now and then you think oh god I'm working so hard on it and then you look at the people who have contributed to the work you're doing and you say oh I don't have no idea who this person is. They popped out of the woodwork and made this little comment here and all of a sudden my code runs in half the time. Thank you yeah or I was going to do it this way we looked at you know looked at the 45 different interfaces that involved and somebody said dude why don't you just do this and you say oh duh or even better it's like oh yeah somebody else had a patch for that but then they got sick maybe you should look at that cool okay so again the amount of help we're getting just incredible the community really is buying into this but at the same time we're still learning where the bounds are as case was saying earlier when he has commented earlier yeah when you do your pull request you pull on your asbestos underwear and hold your breath until you get a response sometimes it goes goes in when you don't expect it to sometimes you get feedback but on the whole we're making the current making it a little bit harder for people to develop in general and we're making it easier for people to maintain because with fewer bugs fewer exploits people can go work on new stuff instead thank you so questions questions are any questions just a small comment I wouldn't completely discard rust as a potential language to use in Linux and and and the reasons are that that it's it's probably works in an environment where where you have both see code and rust code I mean I mean it's used in the Firefox that way so that so that portions are written with rust well and and generally speaking all the most of the kernel developers that I know love so so it has much stronger support in the company that well hang on hey for example see plus plus okay quick poll how many kernel developers here no rust how many kernel developers do not know rust it looks like about three to one against our questions next question I'll and I'll be nicer to you than I was to Yarko that's alright I'm gurney hunt your your comment about performance versus security is certainly true historically but at least on the hardware side of the house for which the kernel is the sibling of the hardware that's shifting because people are beginning to realize they've got to think about the security implications of their micro architectures or they get screwed in the long run through exploitation so don't you have to rethink or we ought to figure out how to balance this thing between performance and security rather than say one always wins and the other one always loses I'm trying to balance this off but I think that I am safer career wise not say anything about hardware development processes I just within the kernel I think I really think that we're still going to see until somebody makes a big stink about well okay first off we're as I said we're getting better about being able to do hardening stuff proactive things even if there is some impact but we still have to come in and say here is the kind of exploit that we expect to have a have this as a problem regardless of what the hardware goes okay and we are seeing a change in new hardware features for security that make more sense than what we've seen in the past so in the past we've seen you know whizzy hardware features intended to help security get introduced and people look at them and say gee how are we supposed to use that and it's like ring architectures are a perfect example you have a machine with 17 rings for security architecture and everybody looks and says huh cool ring zero ring ring something else and the other the other example and Tom Lyon had a paper on this in the 1970s was about a chip that he got for his Unix machine that they were building in Unix machine and it was a time a time chip and it would give you daylight savings time you know time in three different times it give all kinds of wonderful things and and they said well great but how do I make it give me seconds you know seconds since January 1st 1970 because that's the only thing I care about so spiffy hardware anything about hardware I don't think we can count on that in the performance security argument I think in speaking to the balance between performance and security where there is a trade off on that because there isn't always you can get security features in where there's no problem and everyone says that sounds great let's do it those are the easy wins but for the ones that are harder I've seen a shift in the culture in the kernel it used to be that nothing would win against performance and as you start demonstrating this long history of attacks against the kernel that has begun to move and I would say that there is a bigger shift in getting rid of bug classes like accepting a performance hit for killing a bug class is easier for maintainers to accept than killing exploitation classes because it's too far away from what people are sort of thinking about the old style was why do we need to kill the bug class let's just fix the bugs but that's not the right approach and I think the needle has moved enough that we say okay I guess we can get rid of bug classes but why kill exploitation methods they need a whole series of bugs to make that work so I'm hoping there will be some balance point that we reach but I'm still pushing to get as far towards killing everything as we can just my thought on the balance and how we'll reach it. Our proposal to continue discussion over the break which is now so let's break for a break and just to note so there is now a board here for the first session proposal so I know that there is already one plant on TPM and this TPM in the afternoon so please if you have more put it just on the right side of the writing device very so and let's get back at 20 past.