 All right. Well, welcome everyone. Good afternoon. So my name is David Brown. Flavio was unable to be with us here. He helped with the content. I get to share all the words. So today I'm going to talk about the Zephyr project and the security aspect of the Zephyr project. Kind of an overview, progress, status, where are we at. Hopefully have a little bit of time for questions and answers, ideas people might have. So what we're going to do today is going to give a brief introduction. I think most of you probably know what Zephyr is at this point and what security is and how that affects this. But I wanted to go a little bit of details about vulnerability because I think there's a bit of a kind of a black box that happens. You know, oh, these security vulnerabilities come in and then maybe you notice in the release notes it says the CV is under embargo. No other details. And then a couple of months later you notice, oh, there's details. Well, what will happen to between and what was going on? Then just current status of the project, the security aspect of the project. And then hopefully a few minutes for a bit of a discussion. So introduction, Zephyr, as I said, the importance of security. And then what is the security committee and what is the security working group? So what is Zephyr? Great question for a conference like this where we're all here for Zephyr. Things that really matter, it's an open source project. So the source code is there and that affects how we deal with security issues and security vulnerabilities. Our patch process is public. We review our code on GitHub and some of the normal things that people do with keeping things secret until they're ready to be revealed don't quite work that way. But just in light of Zephyr, it supports a lot of architectures, a lot of boards, it has a lot of features. I just did a slot count last week and I got 1.3 million lines of C code in the Zephyr repo and 20 millions lines of code in the modules directory. And I probably missed some because there's a few that have moved out of the modules directory. But that's a lot of code. So why does security matter so much? There's a lot of answers to that question. People are building products with this and vulnerabilities affect them. But there's also like these external requirements. We've listed a couple of these security standards here that depend on the division. That last one's not a standard that depend on what jurisdiction you're trying to sell a product under. And these things have requirements of the process that we have to go through. And this kind of thing to try to ensure that IoT products have some kind of semblance of security. So what do we do in Zephyr about this? So the first thing we have is something known as the security committee. And this is actually defined by the charter of the Zephyr project from way back in 2016. Every Platinum member of the project gets a representative on this committee. And then there's an architect and a chair, which is currently Flavio and myself. And the original idea is that we would meet every two weeks and we would go over security issues, discuss things that weren't public. We currently, I'll just give the current status, the history will come in a second, which I realize is kind of backwards. There's also a security working group, which is open to any participants. And this kind of started, we didn't start initially because this wasn't originally in the charter. And then we added, we started a meeting regularly. And the idea was to have the committee meeting on demand. And then that didn't really happen. And so now they're kind of interleaved. But a lot of useful discussions have resulted from this with various security standards, security processes, code analysis tools. Since this was split into the private and the public group, the safety team has also split into a similar manner. Kind of find this a pretty useful way to work with this kind of hybrid model of, you know, Platinum members are contributing lots of resources and money. And can we give them something for that, but also a project that it's open that everyone can participate in. So we basically leave the committee to deal with vulnerabilities and sensitive information. So that's kind of an overview. We'll get into a little more detail in a minute. So as far as what a vulnerability is, it's a big word. What does it mean in the context of Zephyr? Why is it treated differently than just bugs? And what process happens when a vulnerability report comes in? So let's ask chat GPT what a vulnerability is. And I was actually surprised. This came back as probably the most coherent answer I've found to this question in one place. A software vulnerability is a flaw or weakness in a software system that could be exploited to compromise the system's security or functionality. It could be a bug, design flaw, or configuration oversight in the software's code, design, architecture, or user interface. If a vulnerability is exploited, it could potentially lead to unauthorized access, data loss, or data theft. It could allow an attacker to install malicious software, gain access to sensitive data, or gain control over the system and its resources. So there's a lot in there. Every single one of those little phrases separated by commas is a whole area of what this means that bugs are the kind of the most obvious thing people think of. You have a buffer overflow or an integer overflow or underflow in your code, and that results in something somebody can do that they shouldn't do. But there's also security flaws that are vulnerabilities that are design flaws, or the way it's architected makes it vulnerable. We've had security issues that were in a standard that we implemented, that when implemented correctly, it was vulnerable. And that had to get fixed at the standard level, and then obviously the code had to be made to correct it. User interface doesn't quite apply as much with the bear's effort code, since we don't really provide a user interface, but people build devices with them, so that really happens. And then there's all these ways that the vulnerability can affect things. You can get access to data you shouldn't. You can steal things. You can let them do things with the device they couldn't do. You think of these. There's a distributed denial-of-service attacks that periodically come out with IoT devices where there's an exploit installed on the device that doesn't do anything at first, and then once it spreads to enough devices, they then turn them all on to start sending requests to one server and bring that server down kind of thing. And there's just a lot of things in all of these aspects. Each one of these things, when we do a vulnerability, there's a severity field. How important is this vulnerability? And it starts basically asking us these questions. Well, what can be done with it? How hard is the exploit to make? And so this is all part of this kind of notion of, well, how important is a vulnerability? So for Zafer, we have a lot of these things. They're bugs in the code, flaws in the design, things I mentioned there. The big difference between a bug and a vulnerability is that it can be exploited. And that's not always a question you can answer. You know, can this be exploited? You have to really kind of, sometimes it's not a clear answer. But you generally want to make the assumption that, you know, if I can think of some circuitous way that maybe this could be exploited, but it would be hard, then somebody will find a way to do that. You know, there was a kind of a, it really hit me when there was a bug found in the Linux kernel that was a leaked semaphore that was a certain device call would, whichever direction would do the thing with the semaphore and then not release it when it was done. And, you know, it leaked in such a way that it was just not ever getting back to zero. And then somebody figured out that if you went and did four billion of these requests, it would suddenly be zero again. And they got root with that. So you think, well, that's kind of weird and obscure, but no, you actually can, you know, there's always, with enough motivation, there's a way to exploit a lot of things. So in order to understand how important is that to the project to our code, you have to have a threat model because fixing these vulnerabilities is about, it costs, there's a cost to doing it and there's a cost to the security problem. And if you don't understand what those costs are, it's very hard to make the trade-offs. And it turns out threat models are really hard in Zephyr. We've written up a couple, but the threat model is kind of based around, well, this is an application. I'm building this specific device and one of our example threat models is a device that is an IoT sensor device that listens to data and sends it up to the cloud, which is fine, people build that, but there's a lot of other things that happen with Zephyr too. And it gets, there isn't an answer to the question. And we're just saying it's hard that we have to figure out, well, how important is this versus how important is this? Where should we be putting our resources? Because it's a general purpose system that's used in a lot of places. As I mentioned, the threats also change over time for the system. So how do these get treated differently than just bugs? Right now, our system is built around GitHub, our process for software development. If you find a bug, you go to the project slash issues, click on the green button that says New Issue, type in your description. People see that, it shows up in meetings. People get reports about them. There's big bar graphs that have numbers that show you, oh, there's another medium priority issue. What happens with vulnerability? So they're kept private. And the thing about why is that the case? Well, the description of the vulnerability usually is also the kind of information you need in order to exploit it. And we want to give a chance for the thing to get fixed and for that to get to the people who care about it and can imply it in devices before we tell the people who might be able to exploit it that there's a problem. But the fixes still happen publicly. Code gets checked in, PRs are made, the code is reviewed and it's merged. We kind of follow the same process that happens with the Linux kernel that you go make the bug fix and you don't really talk about the vulnerability. Oh, I fixed this little thing here in the code, explained what it did, but not how you might exploit that. The idea is just kind of some hand-waving to kind of try to keep it from becoming an issue where that's how it's exploited. So the theory is this is how it allows people to migrate. In reality, it hasn't worked that well. Supposed to be we had this thing where people could register as I want to be notified of security things, we'd figure out that you're actually building a product with Zephyr so that makes sense. And then this whole process of actually sending mail to the notification doesn't really happen very well. And there's a lot to that. We'll get into it in a second. So what does happen? What works? Vulnerabilities are reported one of two ways, generally. There's other things like pulling someone aside at a conference and telling them about it and that kind of thing, but there's a p-cert mailing list. People send email to that. I found a vulnerability. Or there's a big green button on the security page in GitHub that says report of vulnerability. And we may be trying to push people towards that because it turns out it's a whole lot easier process-wise because the second step is when the vulnerability comes in, we go into GitHub and we create this thing called a security advisory. And that's the thing that will be published when the vulnerability is published. It has all the data about what the problem is, how do you reproduce it, and then it has stuff that gets filled in over time of how is it fixed, where was the fix, what releases contain the fix. And if you do the report of vulnerability, it fills out that template and it asks the user the right questions. You know, for severity, it kind of guides you through this set of questions of well, is this a network exploit? Is it a local exploit? What kind of privileges do you have to have to provoke the vulnerability? What kind of privileges do you gain from it? What kind of information? It's part of the CVE process. There's a template. They have this, there's a soup of acronyms, so I just won't shout them out and probably we'll get them wrong anyway. But one four-letter acronym is the name of the severity category and then there's another three-letter acronym which is the name of the enumeration of the type of exploit that it is and I think people gather statistics about all of this over time. So the first thing that has to happen once we've created this is we have to figure out who's going to fix it. And as I mentioned, it's kind of a big project, that takes them diving into the code sometimes, finding a maintainer, is notified, look, we found a vulnerability, here's information, and then a couple of things happen that they get visibility into this thing in GitHub so that they can monitor what's going on. So the maintainers gain visibility along usually with additional developers who actually are going to do the fix. And then eventually one or more PRs are created in GitHub to actually fix the issue. They get created, they get reviewed, they get merged. We try to track all of this in the advisory. Turns out the advisory mostly doesn't have this information in it. There's a kind of a description field. So we're in the process of experimenting with this thing called projects in GitHub which is a table. It's like a spreadsheet basically to track the state of did this happen, did this happen, and then we correlate them with this. Regardless of the tracking issues, that's an attempt to fix the process so the right people get notified. Once it gets merged, then it makes it into a release. So unrelated to the fixing in theory, we like them to be related, there's this thing called the embargo which is the hiding of the information about the issue. And it's a timer. If you receive the vulnerability report, it's supposed to become public. And that happens whether it's fixed or not. And sometimes if the fix doesn't happen, it isn't getting through, the report ends up saying, here's the vulnerability, and it's not fixed yet. And sometimes you can put in there if you disable this feature, if it affects you, you can disable this feature if it's reduced. Sometimes it's, well, here's the patch that's in progress. You can try applying this. But we report, the idea is to report the state. Now, ideally the fix is done so that the answer is this is fixed. But it doesn't always happen. This gets published to the MITRE database, the CVE database, and you see multiple places. If you just search for CVE, there are the form CVE- year- and then an incrementing number throughout the year that are just assigned. We allocate the number at the beginning so we have a handle to refer to that through the process until it's made public and then data appears in that database if you go look at them. And then this gets into the release notes and there's some interesting complexity that it's always in the current release notes that if things change since a release the current master or main version of the code will have the current status of them even if it's since a release happened. And once that happens we have all these long-term stability versions of Zephyr. So we actually promise that the fixes will happen. I think we're currently two LTSs, one LTS and then the last version of Zephyr that also, that one's two. Yeah, something like that. There's about three versions of Zephyr that need to, older versions that then need to get these fixes. And the process in Zephyr has actually got some really good tooling behind this. There's a little bot you add a tag that says I need this in, the tag has the version number in it just to cherry pick it into that branch. It runs all of the tests and if all of that works, it merges and it's done. If you get merge conflicts or whatever, now you have this PR and you, as a security person, we then go look at it and say, well, is it trivial? Is it pretty easy to fix? Or did the code get rewritten in between and they have to rethink that and usually at that point it gets kicked off to the maintainer to do. But the idea is then release notes get updated, the CVE needs to get updated all this stuff to say, well, if you have this version that gets fixed in this and there's this whole table in the CVE of what versions are affected and when does it get fixed in everything. And so that's the process that happens and it kind of sounds like it's largely just a manual process, it's a bunch of steps. We spent a while, so a couple years ago we got one report that had 26 vulnerabilities in it that we just got all at once across multiple subsystems and they were really well detailed, very significant and they were generally they were significant vulnerabilities they had high priority many of them and we tried automating this, you know, let's go generate these reports and stuff and I think we'll still want to get there but we found some things like the security advisories in GitHub don't have an API yet and so we're like web scraping to get that information and that's fragile and it breaks if you haven't run it for a while and one of the things we really figured out is there's kind of two tasks here are two kinds of tasks, there's these expert tasks that somebody actually needs to understand security and how it affects the system and have a good understanding of Zephyr as a whole how do you find a maintainer, you've got to be able to go back through the code to try to figure out when the vulnerability happened and then there's a lot of this other stuff that's just notifications I get a message I need to send an email to the list of people that are interested in a notification saying look, there's a new CDE and this is what we can tell you about it so far so we're going to try to split this up we're going to try to get some help with this but a lot of it is just about documenting and we've started doing this process of like writing down documentation of this is what we do with the vulnerability the past couple of slides are kind of a summary of where that documentation is ultimately we have a security committee this is ideally one person from each platinum member for to be picked up in part of this process but you know, some of these parts are fungible I did look that up to make sure you can use that for people people can be or job tasks can be fungible that somebody can just volunteer for that or be assigned that you know, you need to do this but there's a lot of these tasks that if a member company just assigns someone maybe they have a security background maybe they don't so we're looking at adding a couple people not like every single maintainer or member company in there so that's this process thing the mechanics of where that happens is as I said the security committee the thing defined by the charter and that met or met every two weeks and we would discuss security and it's a limited audience and we started realizing there's a lot of things we're talking about that aren't about vulnerabilities and are a lot more general than just these security issues and why not open these up so we created this thing called a security group the security working group security committee they're probably kind of confusing terms but I'm not sure how to do that better unless we called it the security member only group that meets every now and then and then call the other one the open security working group but regardless of that we started meeting back in March a little over a year ago and the idea was that the working group would be the one that meets every two weeks and then the other one would meet on demand as needed it turns out on demand meant doesn't meet and we would occasionally have some kind of discussion and stuff kind of built up so we decided we now kind of we alternate these meetings so this working group was the general thing and it turns out a lot more people showed up we covered a lot more topics we started discussing these incoming standards things like coding guidelines and it turns out you get a lot of opinions about that kind of thing and what aspects of it are helpful we spent a while talking about annex K from the C language standard which are replacements for the string functions and it's amazing the amount of opinions on that because the basic consensus is I mean I'll give my opinion on that the basic consensus is that we have problems because the string functions are insecure so let's replace them with other functions that are insecure and have more complicated ways of not solving the problem so we're not supporting annex K what it was was do we bring in a library that adds these functions and we're kind of like we need a library for some things we should bring in some functions will actually help we talked about analysis tools static analysis things that can look at the code the security labeled issues how do you find these things that somebody didn't report that oh this thing just got fixed and there's an issue but it's actually a vulnerability and we really should be making a CV around it and this working group has worked out pretty well which leaves the committee to focus on the vulnerabilities and things that involve the budget the mechanics of the actual organization so trying to improve the vulnerability process the things I talked about before and then budget things like do we want to have a third party or what about an analysis tool that usually costs money sometimes they get donated sometimes they get donated with strings attached sponsoring security conferences paying for people to attend this kind of thing or the kind of things that get discussed there so we'll set that aside that's all the summary, status, all that stuff I wanted to just cover a couple of things that we've then since talked about in this working group of these issues affecting security so the first interesting one is cryptographic libraries Zephyr uses multiple cryptographic libraries kind of to summarize there's embed TLS which is kind of the complete TLS stack with cryptographic operations that for a while they tried to split into an embed crypto libraries they brought back and it's going to get pulled out again and we have to keep track of what's happening there there's also a library called Tiny Crypt which has a few cryptographic libraries and an implementation of elliptic curves and it's kind of a cool library it's significantly smaller it's both smaller and faster than the code in embed TLS and it doesn't have a maintainer and that's not a real good thing for a cryptographic library and we really kind of debated what do we do about that not really in a position to become maintainers ourselves for this project so what we ended up doing it's going to go away eventually but not all of it it turns out it was just pulling in the elliptic curve library from a different library and that one has a maintainer still so I think we need to just go back to that one but through all of this or while all of this was happening and discussions were happening the ARM platform security architecture I think is what it stands for the PSA is defining one of the things they defined was a security API and it's not bad but interestingly embed TLS is being massaged to support this API eventually that API should be able to support other solutions probably with overhead but that we could plug in a different back end and still use the same API we're in process this hasn't actually happened sometimes these things will happen so I can work on that and then it turns out they don't actually get time to work on it with their company and then it doesn't happen and those kind of things so that's just one thing it's cryptographic libraries we ended up we're a little better and then I found out a few days ago that somebody said oh yeah we changed our whole library to use the PSA crypto API and so that's nice to know that there's other people working on things and improving things so part of it is just sharing these issues because other people know that maybe they need to be worked on so some concerns that we still have about things Zephyr is built using West as part of the ecosystem third party code is kind of a weird word to describe this but we have the Zephyr Git repo itself which is the core code and then there's a bunch of modules which have everything from well embed TLS is in there is one of the modules Trusted Firmware M is a module there's HALS there's libraries that implement things features and all of this stuff what happens when vulnerabilities are found in a module and this is an open question we're still not sure the best way to handle this do we pull the reports into Zephyr do we just forward them to the project do we track them do we monitor them can we come up with tools that will monitor this GitHub knows kind of does this dependency thing if you're using one of the systems they support from NPM NPM or cargo a couple of handful of other dependency systems but they don't support west and do they need to do we need to have more information in the software build materials so that these tools can work and then this big mess here of module interdependencies this came up very specifically right now the Trusted Firmware M project has a dependency upon another module called Mbed TLS and they're kind of they run lockstep there's no long term support release for TFM yet and so if you want a specific version of TFM you need a specific version of Mbed TLS but if somebody doesn't want TFM then the version of Mbed TLS could be more flexible and maybe they want to actually move to the LTS for better support and there's not a real good way to handle that we actually ended up basically disabling TFM in one of the LTS releases and just declaring people you can either go back you know or if you could by default you get the newer version of Mbed TLS that fixes these vulnerabilities but if you need TFM then you need to revert that back and figure out what you want to do because it turns out the fixes were very non-trivial back ports so the ultimate goal there is to be able to get it to where TFM and Mbed TLS are LTS releases and they are doing back ports as well but we're not there yet I had some ideas I don't know how to get there is there something we could do with West where the West configuration could be based on what your specifications are that you're building if you are enabling some feature you get different dependencies depending on whether you enabled it or not it's something I don't have a good answer to but it's just an issue so these are all the things that I had and I wanted to give a few minutes here does anybody have any questions about security on the project or ideas or suggestions on things maybe we should do different so with that I'll leave it for the question discussions for that or nobody has questions and we can go early go ahead so the question is do we have any information on the cost of a root kit what is a root kit worth and do we know of the existence of anything like this no I really don't know so root kit is kind of a funny term and I think that's one of the things that's challenging with Zephyr it's kind of like what does that mean because we don't have the notion of users we have user space and kernel space and this is an example of where the threat model is so important because when user space was implemented in Zephyr we developed a threat model that's like well here's what we're going to do we're going to have these system calls and we're going to protect against these things it turns out that's actually not a real common thing that happens on Zephyr we tend to run arbitrarily provided code in a Zephyr system even when the code is arbitrary it goes through something like MicroPython some other interpreter that has a bit of a barrier of protection around that and we have this separation in there and as far as we can tell from the past couple of years not a single vulnerability we've had reported has been thwarted because of that mechanism the code that has the vulnerability is running in system mode and can do anything now aside from that it is protection like if somebody is writing a large application that's going to be running largely in user space and there would be protection against vulnerabilities found in that code so it's not to say it's worthless it's not readily visible to us so as far as a root kit goes it's kind of hard to say and I think one of the advantages we have which isn't really an advantage that everybody's Zephyr system that they build is customized for what they need and so something like a root kit is going to be specific to one device and so that does help us it also doesn't help answer questions like how valuable is this wow do these things exist out in the wild and I'm certain that there have been systems that have been exploited but I don't have any visibility into anything at this point so hopefully that answers your question any other questions or discussions or yes there are crypto is random number generators and I do remember that many years ago there was discovered that under Linux there was some I think even the silicon of certain MIPS processors and somebody went along and just wrote a return zero in there to make something work because some instructions a very random number and that turned out to be sort of a flaw are there any kind of guidelines for a random number generator drivers for example when they access hardware generators like in a TPM to basically self audit or to say at least it should be zero so the question is about random numbers mention of some exploits a system where the random number generator could basically be set up to return zero and that kind of thing so yes, there's plenty of information about that and it is something that's readily overlooked it's pretty easy there's a random number generator in Zephyr that produces a predictable stream of numbers instead of actually random numbers there's a trade off there there's some situations where like QEMU we don't have an entropy source in there that needs to be fixed in that side not made that some people have ideas that well we want our test to be predictable and so you end up with these configurations that aren't random in the slightest and people misunderstand how insecure that is in a system the best comment I was given on this was it a it's the international conference of cryptographic modules that somebody made a comment that TLS with poor entropy is worse than TLS at all because at least if you don't use TLS you know that it's not secure and it's kind of hard to grasp that with a predictable random number source it's trivial to see the plain text of your transmission so as far as standards go yes, a FIPS 140-3 2 it depends on there and a whole set of deeper standards so let me go under that so let me summarize that really quick because it doesn't make it out of the stream that we have two random number generators inside of Zephyr one is a non-cryptographic just a little simple pseudo-random and then one that's ideally a cryptographic pseudo-random number source and just finding that separation you know that like here's somebody calling random does it need to be cryptographic here and the answer is not always clear network backoff timers it doesn't really need to be cryptographic but it needs to be unpredictable or it could be used as an attack so the FIPS standard that I was mentioning describes a whole series of tests that a device can do to it's kind of a sanity check and they're pretty strict about the requirements that you know and you call these crypto functions and then you check the results to see what they return and if it doesn't return the right results you panic and the correct answer is that you stop and you don't ever do anything again and you know these are for like cards that go in ATMs it is where these modules are getting used and most of us don't have to live with those kind of standards but they're really good resources for understanding what's behind it part of the difficulty of that is that it's very hard to comply with FIPS 140 dash 3 if you have a hardware random entropy source because you have to convince the vendor to tell you how it works and that how it works is part of the justification needed to get the certification and people don't you know this is like this is how it's implemented I'm not going to share that with other people but you kind of need to that usually ends up being let's see which card am I being shown I guess we're out of time so thank you for your time I'll be around all week so if you have any other questions feel free to ask thank you