 Hello everybody. Thanks for joining. We'll talk about security-minded development today and let's see. This is a new experience for me representing remotely. So, bear with me. So, let's go ahead and start. My name is Shua. I'm Linux Cardinal Fellow at the Linux Foundation and then I have been looking more into security-minded development and then I want to share my thoughts on that today. Let's see. I'm going to jump right in and I hope you like my little facts with COVID safety face covering. So, it's kind of in line with what we're talking about anyway. So, let me see. So, what is security- safety-minded development all about? We want our systems available and dependable. Like, for example, we are chatting today and we have a lot of infrastructure helping us do so. We want our laptops working, our phones working. They dependable. When we pick up our phone, we want to be able to make a phone call. Not just that. We want to be able to read news, connect to the Internet, take pictures, upload pictures from wherever we are. So, all of that happens because we have systems that we can rely on, systems we can depend on and they are responsive as well. And we also want our systems to be resilient to remote and local attacks also. So, all of this combined gives us this experience of being able to do what we do in our daily lives. And then also, furthermore, we want our data to be secure as well. We want our data to be accessible wherever we are. We want to be able to access our pictures or data or when we are traveling, for example. All of that we want to be able to also share with trusted entities. At the same time, we want to keep it safe as well. We want to be shareable and safe, safe from corruption, safe, secure from unwanted inclusions. So, that, so, all of that in mind. So, what are the things that will stop us achieving our goals? Because we are worried about software related. This talk talks more about the software related issues that we can, we encounter that prevent us from achieving our goals of keeping our systems safe, available, and all of the things we want to do with our systems. So, let's briefly look at what are the some of the vulnerabilities we encounter. Heap overflows, integer overflows, stack overflows. And then we also have privileged information leak. In term, in the form of cardinal addresses could be leaked. And then also messages, we can leak them in messages, de-message, and then also APIs. Like, for example, we might inadvertently expose them in CISFS interfaces. And then other things that prevent us is vulnerabilities is we, sometimes, as we are writing our code, we might not be doing sufficient error and boundary checking on the data that's coming in, or just the data that we have, user data that's coming in, and we could be not range checking some of the stuff that's coming in. And then, so, out of bounds access. Those are all the things that we could encounter. If we have all of these vulnerabilities, what we are going to be able to be susceptible to somebody exploiting those, intentionally or unintentionally. It's not always intentional even. So, more on vulnerabilities, memory leaks, use after freeze. We might be using variables without initializing them correctly and properly. And then, also, we could be seeing unsafe arguments, data from user space, in form of input arguments to system calls, eye octals, and then also in the form of network packets. The data could be coming in into protocol packet layers in the packets, USB, and then network. So, what these stand in the way of us being able to keep our systems available, dependable, and safe, and then also even our data. So, we want to be able to, we want to make sure that our systems are not vulnerable to unauthorized access, for example, and then leading to us losing, we could be losing data, we could lose access to our systems, and so on. So, what do we do to do some of the ways that we counter these problems is we use fuzzers, we use regression testing, we use some form of static analysis, and we scan for vulnerabilities. We do all of those things, and then we find and fix regressions, and we kind of are in the mode of figuring out what could go wrong, and we identify vulnerabilities. We keep all of these methods give us that ability to, we are kind of in the mode of looking for vulnerabilities, right? We are trying, we try to harden the kernel code paths, and we're using all of these methods, like when we introduce new features, we go and make sure that we haven't regressed anything, we haven't introduced any new vulnerabilities, new bugs, new problems. So, all of this, we do all of this. So far, all of the things we talked about, fuzzing, regression testing, and scanning for vulnerabilities, and continuous testing in integration testing, all of the activities we do, they are all in the fall under the cure type activities. We are reacting to things that potentially happen once we release our software. So, could we do more? Could we do focus a bit more on prevention as well? Could we do, could you, could we understand, based on the data we're collecting, right? Fuzzers give us some idea on what kind of bugs are escaping us, and then the scanning vulnerabilities, that shows us some indication of what kind of things are escaping us. So, could we use that information and say, can we understand the common design aspects and coding mistakes we make? Or what other activities we can do during the development process itself to avoid and minimize introducing new bugs and vulnerabilities, and kind of also make sure that we are taking precautions instead of being proactive as opposed to reactive. So, let's see. I will talk about some of the things that we could be doing during various phases of development itself. So, as we are writing our code, sometimes we introduce debug messages, for example. We might be giving away more than what we intend to during development. So, do we make sure, once we are done with our testing, we make sure our feature is working correctly, do we go scan and say, hey, did I leak anything? We have some helpers that now help us. Person P won't be, we don't actually, the cardinal addresses don't actually get printed. So, we learn that. We learn that saying, okay, we want to make sure, we also want to put those kinds of common helpers in place as well, as we are going and designing our software and we are learning from the mistakes, learning from some of the things that, some of the bugs, actually the fuzzers and such show us what we could avoid in the future, what kind of proactive steps we could take. So, and then checking input arguments, it's the best practice anyway. It's not, it's not, it is part of the, error checking is part of the code paths. So, the problem with error checking sometimes comes into play is that we don't, when we, when we are doing feature testing, we tend to focus on, more often than not, we tend to focus on whether the code is working for a sunny scenario. So, we should also be focusing on error checks, input arguments, boundary checks and so on. And then more importantly, as, especially when the data is coming in from the user space, we want to make sure that we sanitize those input arguments. And especially, like I said before, in addition to paying, say we are focusing 90% on sunny scenarios and 10% on error paths, we want to make sure they are balanced as well. Error paths, paths tend to be sometimes are the, and cleanup paths especially. I'll show, I'll talk through some of these examples on what I found, but when I found, we don't seem to pay as much weight to error and cleanup paths when we are testing our software. And avoiding repeated mistakes, like for example, if you have a helper that can sanitize input arguments, for example, use those and look for the common helpers that you can use and maybe look into adding common helpers as you need to the code path if they are missing. So, I want to focus a little bit on unsafe data. And in the three aspects of it, possible causes and failures and mitigation. Those three things I want to focus on for this, for this talk, unsafe data, what are the possible cases? I kind of alluded to that already, that we have, it could come in in CISFS, for example, iOctyl, system calls and network packets, user space and USB packets and the data that comes in. And then it could be malicious, it's not always malicious, it could be an unentered application bug, for example, that could result in us getting unsafe data. So, what could unsafe data do? It could force kernel to allocate large amounts of memory and it could panic the kernel and it could also, it might not do about two, but it could go, we might see a responsiveness issue or it could drive us to get us away from the deterministic paths and then other kinds of undesirable behaviors. Maybe it could lead to, it could have a memory allocation, for example, that could lead to performance problems. And then a potential exploitation of Spectre variant one we have seen in the past. So, unsafe data, well, if you run out of memory, you won't be able to allocate memory, then you are really in a situation of the system out of memory conditions that could be leading to performance problems and so on. And then you could also potentially lead to service interruptions, data corruptions and loss of execution of control as well. Like your system won't be available. So, you're hitting all of these, it's preventing us from all of our goals of our systems being available, dependable, reliable and resilient to attacks and then our data not being safe. So, all of these just looking at this one example of vulnerability, we could be compromising our systems. So, let's go look more into this. I picked examples to show. So, these are all the examples, some of the code fixes that went in. What could have been done, the illustrates. So, let me see, I'm going to share this screen with you. Let's see if my screen sharing works well enough. All right. If you are seeing USB IP fix a stub, then my sharing worked very well. I hope it did. So, in the previous example we have seen this is a user space data coming in and it comes in as an endpoint. So, in this particular missing error boundary checks on these endpoints resulted in a bug on and a panic. So, if you look at this fix that went in, really this is a simple fix that could have been done before. If you think about the error return, you check the endpoint values and then you go into just handle the data, handle the data that's coming in. So, back to looking at the next error that could happen. I want to go through a few of these to give you an idea on what could happen if we and then we can talk about how to mitigate this. So, I'm going to show you one more. So, this particular one error you'll see from highlighted the diff that not checking the buffer length that should be allocated, buffer allocation. So, if you missed the rain checking on this, you could allocate large amounts of memory. This is also another vulnerability that was found and fixed. So, these are all found. These two particular problems were found by modifying a user space store. This is the data coming in. One could easily inject errors. It's memory. It's found you modified tool to inject errors. So, the next one I'm going to show is null transfer buffer issue. So, I will share my screen again. So, the one here, the transfer buffer could be null. And if this particular variable, if you do not check that, you will run into problems later on at a different point in time. So, the fix to this problem is really say it could be a potential malicious packet coming in and you will, if you do not have the checks for this, you will, this will result in referencing null buffer at some point down the road. So, by adding checks, you are preventing that kind of errors. So, what I'm trying to, what I'm emphasizing here is there are some precautions we can take while we are developing the code, while we are paying attention to the error legs and thinking through what could go wrong as a part of our, part of our development process itself. Pay attention to it during that time. This second one, this one is another example of unsafe input argument use. So, this particular one was found vice is color and let me share my screen again to show you. So, this particular problem, this could result in an interesting issue. If you were to, if you were to not check for this length, length of fat file system, then you would overlap, overlapping root directory with fat entries. So, what happens is you would, you want to, this is an invalid entry. So, you want to make sure that we do not allow mounting file system. So, you would be mounting a file system with an invalid format. So, just this could result in potentially overlapping your root directory and cause your, your date. It could even lead to errors that we, in deterministic behavior and then data corruption and so on. So, both, so you really want to check this error before you mount. It's great that CSIS color was able to find this problem, but we could have potentially avoided it if we were to use static analysis tools perhaps and so on. So, let's see the next. I'm going through a few of these to, I started to examine a few of these to see, I'm kind of focusing on what can we do for prevention. So, this is all in the prevention area of, of looking at these, examining these bugs and what we are finding using CSIS color. Can we use that knowledge back into the development process and see if we can use what we are learning from? So, sanitizing input arguments. This is another problem that was found and fixed. We, I think this, this, this we found and fixed in various subsystems, including this particular one that happened to be in the USB IP driver. Like for example, you're getting these, it's a, it's, it's, sometimes it's not enough to, we want to make sure that you're, you're range checking, but you don't want to, you want to range checking happening. And then also you want to, so that this, the speculation, the, we don't speculate ahead of range checks. So, we want to make sure range checks happen. And then we, this is a helper I was referring to, another helper function that helps us check use of, so that we don't speculate ahead of, ahead of the range checking. So this will help us, this is again, falls into the category of boundary checking. And this one was found using SMATCH. So, SMATCH, so this is one example where we could be using these tools that we already have in the Cardinal that we can use to avoid introducing these kinds of bugs in the first place. So finding them and before we get into, needing to react to these bugs. So next slide. This particular one was found using Cauchy Check. Cauchy Check can, this is, this is a, I'm sharing my screen again, to show you this particular example of what we found using size of struct. Like for example, this could lead to, instead of leaving these kinds of open-ended checks that are prone to type mistakes, we can use struct size. There has been effort to go and fix these problems. Gustavo has been doing, fixing several of these things found proactively using Cauchy Check and running Cauchy Check and finding these problems and then fixing them proactively, so that we are not vulnerable to these things. This is one example of a proactive effort to fix the problems before they happen. So let me see how I'm doing on time. I want to be able to leave some time for questions as well. So this is, this was found using Syscaller and fixed in recent, I think it's 5.8, fixed in 5.8. And then let me share the screen again to talk about that problem. So this, this was fixed, this is an error check example. For example, we could have fixed error check. This alloc per CPU may return error null, which means that you, you might have a variable set to null. So if you do not check for error, then you are looking into a condition where you would trip into it at some point, like Syscaller found. So this problem, at times what happens is, I'm finding as I'm going and looking at these things, at times we optimize the code, meaning we, we optimize in the sense that we'll go and look at error paths and say, hey, we can make these error paths simpler. And we make those simpler and then we use a common routines. And as a result, we are changing the code path slightly. And then we could run into an issue of, because of the changes that we make, we might be introducing problems at times because, so this is again, this, this fixes, if you look at this fixes, this is a optimization type or when we go and add a new feature, we don't, we might have missed an error path that we newly introduced error path that we should be checking for. So this particular is, is a good example. I think that we, if we were to pay more attention to the error legs, we might be able to think through and pay attention, more attention to the error legs, as we are writing code, we might be able to avoid some of these problems. We cannot guarantee we will, we won't go out without problems, right? Software always problem. We will have problems. There is no scenario where we won't have any problems. But it's, it's, that's where the security-minded development, if we may be thinking about what could go wrong and looking for ways to mitigate that proactively, we might be better off. So let's go to the next one. Another one example also, example of error and cleaner paths. Again, that's my going to be the, I'm identifying some of the things and sharing what I found with error paths. Sorry, I'm sharing the screen here. Another one of these. So in this particular one, we, so the K, this is also introduced, it's an error path again. So you have a memory leak when we K object in it and add returns an error in the function call. So this is a case of missing, we are missing a K object put in this case. So we are kind of leaving open of the, not releasing a resource. So this, this is potentially, so I've been thinking about, we have to think about how we can prevent these in some cases, we might be able to, we look for existing ways to scan for these using, we found this with syscaller. That means we, found it after it's been applied to a tree. So could we find this before or could we find this during, during our testing, feature testing, or as we are testing, can we go and look and see, hey, can we, what are the error paths? Again, my example of 90% of sunny scenarios, 10% of what is our balance of sunny scenarios versus error scenarios. I think that would be helpful if we could focus on some of that. So let's go to the next one. So this is a, this is a cardinal panic that was found by syscaller. And let me share my tab again. These are all fixed by the way, they're coming in from the fixed source. So it's all in the release 5.8. I probably went into 5.8. So if you look at this particular one, this is an integral for a situation. You'll see this max logical block is defined, just defined incorrectly. Even if you, this is an interesting example of you, you are rain checking, but you might, you, you're range checking, you're checking, you are checking against the wrong value also. And then you, so you're starting, you, this has to be within the range of not overflowing logical blocks. You have to be looking for some, somebody passing in a large length, arbitrary huge length, which can overflow this particular particular, you're rain checking, but you're overflowing the buffer. So it's again, falls into the same category of unsafe data and not validating the data properly that we are getting from the user space. So before we use the values that we are getting from the user space, we want to make sure it doesn't send us down a path where we are, we compromise our systems. So now, so, so far we have been seeing all of these cases off, you're seeing all of these cases off error paths, and then not being not checking the boundaries correctly, not checking what could happen if we, if we, if we don't think through these error legs. So some of the takeaways, so I want to encourage in investing time in different defensive designs. I am more and more doing that as I'm writing code, I'm looking at what could go wrong type defensive designs. And if you are, especially if you're starting to use, use common routines, pay more, sometimes we go and change a piece of code and say, hey, I have this common routine, common helper I can use. Sometimes when we do that, we have to think through what are the implications of using this common routine here, and how does this change my error leg path, error paths. So we have to think through that. So that's what I mean by defensive design. When data comes in, we want to make sure is this data safe for me to use, is this user data either coming in from USB packets or network packets or any iaktals, any kind of interfaces, is this data something that could send me send us down the path of panics or memory leaks allocating large buffers of large buffers of memory. And just think about, think through how you can do defensive design. And then static analysis. I am starting to use that when I accept patches and also review patches in some cases. I'm finding in one case, I found a duplicate flag defined, which could be caught. I don't think I could have caught it with just code review because it was a large define. So two sections of a defined struct defines that was defined. And then is a duplicate flag in there that I was able to catch running catch check. So I'm starting to do some of that. And then we have various efforts that go on. Dan Carpenter does a lot of work with S match and sparse errors. And Julia Leval does catch check. And a lot of people are now I'm starting to see fixes coming in for catch check findings. We catch check finds a lot of, if you go and just look for catch check found problems, you will see a large number of them. So I'm starting to more focus on prevention aspect of it. So I am putting effort in to running catch check myself on various parts of the kernel I maintain, as well as other parts of the kernel to get a feel for what we could be doing proactively in avoiding problems like this. So detention detection and testing during development. So do you have a ask yourself, do you have a test for this for a new subsystem or a new feature you might be adding or a fix you anything apart part of the code kernel code you're changing. So is there a test go look for a test if you if there is one, if not try to see how you can write a new test or see if you can enhance a existing text test for it for that matter. And error paths are tricky, because we have to proactively inject errors. Sometimes I force my code paths when I introduce I actually introduce errors in their error injection, we have to look into also error injection, you can force sometimes errors, inject errors and look into how you can write maybe error error injection tests. This is what is a great example of that. That's kind of what it's doing, right? It's injecting errors. First testing. There are some error paths you put probably lightweight code paths you can make sure that you are injecting errors in a lightweight fashion than maybe sysbot. So think through and see what you can do to test your code, error paths especially and also making sure error paths automatically implies that you are testing for unsafe data and then handling that error in your code. So I'll leave you with I'm almost getting close to the end of it. I want to leave you with being mindful of error and cleanup paths especially and then testing with at times initialization run times can be easy to test. Error paths is something that are prone to like for example you might have memory leaks due to you're not releasing resources correctly when you are cleaning up. Say your driver initializing hardware failed or your driver you're loading a module that failed or you're initializing your driver that failed but you have to unravel the go through the cleanup path and release all the memory. Are you doing that correctly? Are you do you have a unbalanced lock acquire lock release kind of logic in your error paths? So do you pay attention to that? Then you can also when you are testing make sure you turn on all of the debug awesome debug options we have in the kernel. We have tons of debug options in the kernel. For example, config debug spin lock and prove locking make sure you test with those enabled for testing purposes to make sure that you are going into the mode of detection. And then I'm not listing all of the options we have tons. For example, if you enable config case on which I do in when I'm testing even stable release kernel sometimes I want to make sure the paths are correct. So you will see problems you will did you are able to detect problems using those enabling those options while you are testing your kernels. So they and this is a good example of connecting the dots connect the dots for effective testing, right? I mean, you know, we have sysbot effort happening syscaller effort and adding a recent work that went in adding cake of hooks for collecting coverage and facilitate coverage guided fuzzing with the syscaller. So there are opportunities here. I want to thank Andrea for doing this work and then collecting it. This is kind of a you are figuring out, okay, I have this scholar. How do I connect my module? So it's let's find these kinds of ways to increase our coverage and cover testing coverage. And then also it improves our chances of finding problems proactively. So we have several resources for regression testing, kernel self tests. We have several of those. There is a lot of work that have since every release, we are adding more and more tests. And then sysbot reproducers, these reproducers, I have been maintaining, I get those from Dimitri and then I upload them into Linux Arts, the next auto generated regression test repo, which I maintain. And it is up to date as far as as of I think we have all the syscaller, sysbot reproducers came in until March of this year, I think. So I keep updating it once in periodically. I go get them from Dimitri and update them. And so use those. Those will be your regression test. There is a run script that Dimitri wrote. He has he and it's seeing more work in that space. LTP, they are using that in LTP testing that they have, they download the Linux Arts kit and then run those. So there is a lot of activity happening in this space. But think about that as you are writing and contribute to these activities as well, which will help. And in all of our developers, kernel developers keep adding tests to kernel self tests. So use them and expand the tests and keep the keep secure defensive, proactive approach to writing code. So with that, I'm done with my talk and I have I have about 10 minutes for questions. And then if I can't answer all the questions, I have a session tomorrow as well in the afternoon. Right, let me see. So I have a few questions. How is, I think that's the that's the first one here. How is kernel.org planning to use static analyzer online documents? Let's see what this question, it's a long question. I think the, bear with me, I'm increasing this window size so that I can read the question correctly. How is kernel.org planning to use static analyzer options? And specifically, don't know that that falls into the category of kernel, you kernel.org category. This, this is a, you see, this is a measure static analyzer method that is outside. I'll have to get back to you. This question came from Joy. I'll get back to you Joy on that. I'm sorry, I'm reading these questions. Let's see. Okay, would you recommend running, for example, syscaller in a nightly suite? Should we run 800 times? Then that's enough until we add more patches. What does throw and do diligence? So, yes and no. I am, what I am, I think saying is that it's testing and also development. What can you do during development? I'm more focusing on being able to, yes, running syscaller will help you in terms of regression testing for sure, and that we are not introducing any new problems. What I am focusing more on this talk is more on how can we prevent problems also. And based on the learning from the fixes we are putting, we are, what we are fixing. Can we learn from the fixes? Some of the things that we are, syscaller is finding and what fixes we are putting in. So can we learn from those experiences? So that's, I think, yes, due diligence here is running syscaller definitely will help finding problems. Let me see. Sparse is great. It can even indirectly find security issues like complaining about global variables. Right. Yes, that's true. So that's, that's really, I think that's, that's my message in this talk saying that let's use what we have. We have sparse, we have a Kotch check, we can expand, we can write more scripts, Kotch check scripts to find more problems. We can, we can use our existing tools as we improve them to proactively look for problems. So how do you see security being increased by compiling kernel with Clang? You know, I do not know. I don't, I myself do not use Clang. It's been on my list of things to do to see. It's, it would be a, if I don't know much about that in terms of the security being increased by compiling kernel, I would like to understand that for sure, more. And I'm also, I'm not necessarily just focusing on security in this. I'm also focusing on overall dependability as well. So security, like I mentioned in the talk at one point, I was saying that the same problem that could happen in the kernel, with unsafe data coming in from the user space that can be malicious, it could be unintended application bug. So I'm kind of focusing more on dependability of the software in stuff, looking at security versus safety, security, related focus versus safety, the reactions, even though the outcome might be the same, the way you, the same set of problems could fall into both categories. So you're looking at it from just the example of unsafe data. It could be malicious or unintended error. So I have one more question here for checking security in test strings. What about using? Yes, scanner tools. If you noticed in my, in my slide, last slide, I don't have any scanner tools because I haven't really compiled a list of them yet. I am, I also am focusing on what we have in the kernel at this moment. That's a good, good question. I haven't done enough research in the, what kind of scanners we have available to do vulnerability scanning. Let me see. Using Clang, somebody commented using Clang is basically giving you a different interpretation of the C standard. So if you compile something with GCC and Clang, and one of them has a compiler warning, you may want to fix it. So both compilers are happy. Yes. Yeah, definitely. Using Clang and GCC will definitely give you two different views of the code you have. And I'm glad that we have enough users, current developers that use Clang now. And then we also have GCC has been used. So it's good that we are focusing on both. It is on my to-do list to do. One day, I'll, based on my bandwidth, I'll probably will start experimenting more with Clang. I haven't been able to. So that is mostly all the questions I have. We are almost out of time. Can I get access to these questions so I can choose to answer them? Probably I'll try to get these questions and then I can address some of them in my AMA session tomorrow. Okay. So I think I might have, there is a comment on Spars as well. Spars is great. It can even indirectly find security issues, like complaining about global variables not having static attached to them. So there is, yeah, I think I read about Clang. I think I covered all the questions. This is about the scanners. There is one question about Truby. I'm not familiar with that. Okay. I think I answered all of the questions that we have, specific to the stock at the moment. Thank you very much everybody for joining. Thanks for all the good questions.