 Hello So I get the last slot of the day at the Linux security summit where I'm going to not be talking about Linux, so Hopefully this is still relevant to us I'm gonna start with a bit of an overview of what Zephyr is because I think that's not necessarily well known It is a Linux foundation project, but let's start with the the official marketease Little blurb there the Zephyr project is a scalable real-time operating system RTOS supporting multiple hardware architectures optimized for resource constrained devices and built with safety and security in mind Okay, what is Zephyr? What is it so some of those words make sense. It's a real-time operating system if you're more familiar with that It's typically focusing on deadlines rather than multiple users that kind of thing But from the perspective, I'm coming at this from a security perspective It's open source, so it's Apache licensed Not quite the same license as the Linux kernel But it gets hosted in Git. We actually hosted on GitHub. We work with the poll request model There are maintainers mailing lists and lots of meetings We use a K config device trees We used to use K build that moved to K to CMake Just to give you an idea of a little comparison here between the Zephyr project source tree and the Linux tree as of about an hour and a half ago There's a few more lines of code in the Linux kernel. It's a bit bigger Unfortunately, no there isn't actually any Haskell code in in the Zephyr project There's a .hs file in there for some reason which slot count counts, but You get the idea where there's a C project. It's open source. We try to a lot of people Came from the Linux world and we kind of brought a lot of the development mindset and methodology there So to give you an idea though, it's a really busy project GitHub gives these nice statistics the 107 authors 764 commits 907 commits to all branches thousands of files tens of thousands or 50,000 additions 18,000 that's just last month as a comparison And I'm not sure how well GitHub does this because it's not Doesn't use github's model, but it still sees the same Git tree of the Linux kernel you get Actually pretty similar numbers out of that There's a lot more churn of the Zephyr code. There's a bunch of things where we're taking code and moving it into separate Modules and that's causing a lot of the the changes and additions and deletions, but There's a lot happening with the code is the idea Just give you an idea. I don't expect those to be readable or identifiable, but there's about a hundred and seventy Little boards of various types There's another exciting picture that was felt to market ease to me to include that was Shoes and hard hats and other interesting things that are running Zephyr but What I want to go over is what's different between the Linux kernel and Zephyr. I mean another way to put this This is a Linux foundation project Why are we even doing this? We already have the Linux kernel key differences Zephyr is typically a single address space There's a thing called a memory protection unit. It's like an MMU light It doesn't do address remapping. The only thing you get out of it is Protection you can say this block and this usually really restrictive it might be a power of two alignment boundaries of that size of thing and you get Eight of them or sixteen of them depends on the controller. It's typically no dynamic code. So That has radical impact on the threat model The we're not running code that an arbitrary user on a machine runs We're running code that somebody building a product wanted in that product At least we hopefully are running code turns out you really do need to count for arbitrary code running because There are bugs Buffer overflows incoming data that kind of thing it still can be possible to run code But it's still a different environment of how it's how it comes about and how it's run Another really big difference is we tend to do a lot of things that compile time that would normally be dynamic on a Linux system How many threads do you have? What kind of devices do you have in your system? How much memory are you configured for a lot of that would be a Compile time choice. The idea is to get the code small I mean think about what we're targeting here are microcontrollers where you think of hundreds of kilobytes of code and Maybe tens or hundreds of kilobytes of RAM and that's it So it's not a dynamic system. It's Well, I could show my age. It's the kind of computer that I started programming on except really really tiny and cost 50 cents So with all that aside of what Zephyr is Let's talk about security and just a little bit of background. I am I work at Lenaro. I've been there just under four years Hired on to the security working group team and for about three years. I've been having this focus on IOT I came from actual Linux kernel security before that and Just this past Guess it was about six months ago. I was elected as the security architect for the Zephyr project. It was an exciting Election to the end. I ran and no one else did so It was as long as someone voted I would have won But I want to go over What are we doing about security in this project? Linux gets a lot of focus Greg gives his talks about CVE's What are we doing in this project for security? And so I'm gonna break time down real real easy divisions of past present future What we've done what we're doing and what are our kind of grandiose plans So now we get slides with lots of words on them So Big thing is memory protection and this is fairly new most our tosses Kind of don't do this. They assume you have a simple address space and everything just runs And we started out Well, we have this memory unit this memory protection unit. What can we do? The obvious thing is to look at what Linux does you have user space you have kernel space these processors typically have kind of the same division of a Protected privileged mode and a non-privileged mode If you look each processor uses different words for them just to help keep everything confusing So we added this memory protection Turns out It's not I mean it's quite it's actually quite useful. There's a lot of things it protects for But as I mentioned before with that user model of wet code is running on the system It's not necessarily protecting against the right things turns out It's pretty easy to have a lot of things on the wrong side of that boundary You can get a system where a small amount of application code is running in user mode and Then most of your system and most of the code that's vulnerable is still running in privileged mode. So Despite the fact we have this done. There's a lot of work to do for memory protection A lot of that is taking in order to move something to the outside of this boundary you need to come up with system calls and It's not POSIX really here. It's System call specific for drivers pieces of drivers and let's figure out what those are so We've done that. It's kind of a past thing, but there's a lot more to do So I wanted to give that to make sure that it sounds like we've done practical things because a lot of these points they sound like things you would give it a talk at a conference instead of things you'd actually do to improve security and It's because they are but they're also they are important things, but they don't immediately directly translate into code so The rest of these things we've done we have a security subcommittee This meets bi-weekly and that little star means there's more slides coming We've created documentation on secure coding practices If you've ever tried to do that it turns out it's really hard to do What it really boils down to is and it's actually worded pretty much this way is you need to have people on your team that know about security and Know what to look for in code So we've created this documentation. It's public. It's on the the zephyr project org website. There's a little security button I'm gonna talk a little bit in a minute about What we've done with our get repositories to try to focus a little bit on security as well as safety Then some of the more possibly controversial things we've registered with MITRE as a CVE numbering authority and as you can imagine these are some Interesting meetings we get to go to to discuss What are a bunch of rules that essentially amount to how you allocate integers? to When does a particular vulnerability become one CVE or does it become multiple CVEs? How do they correspond with patches? Just try to get some homogeneity and to try to make the database useful We've It's been about a year now that we've got this we've I think we've done about four official CVEs That doesn't mean we haven't fixed bugs. We've we do fixed bugs Things that have been reported against previously Jira now github. There's when I last checked 6,514 issues That have been closed as fixed No real Easy notion of whether those are all security fixes sometimes that's hard to know We joked at our last security meeting somebody said well We just need something to tell us what's the security bug and so I said Alexa is this a security bug and Was immediately greeted by someone's Alexa in the meeting defining bug So it didn't even help when I when it did hear me a Couple other things. There's this thing called the core infrastructure initiative. See II that has this notion of best practices they're not specific things to software, but they're things like When you have a website you have to use HTTPS and these things and there's a big list of those And they have different levels that you can qualify for we are now passing as gold it's a Pretty smaller subset of the projects that are using this Infrastructure initiative, so that's kind of a nice Thing at least it tells us that our source code isn't going to change easily without us being aware of it that kind of thing And then just lastly some of the things we've done is using a automation to prevent regressions the CI That was set up pretty early in the project We have a lot of targets is about 170 of them We build them and we make sure that all of the sample applications all the tests Not everything can be run. We don't have all of this hardware, but at least we can build it and make sure it keeps building All right, so what is this subcommittee? It's defined by the charter when Zephyr was created the Zephyr project was created at the Linux Foundation Zephyr is a project you join the companies join The funds are used to run the company and there's different levels of contribution that people can companies can contribute to and way it's defined is each platinum member gets a seat and Then we can invite other people by invitation from that and then there's two Roles there's a security chair Which is elected by the rest of the subcommittee and They're basically responsible for running the meetings for taking notes for making sure stuff happens And then there's the security architect. That's me That's basically defined to be responsible for overall project security the idea is Significant changes that are at least theoretically supposed to affect security are supposed to go through the security architect Before they're made to the project So What do we do about our repositories I'm sure most people are familiar with the long-term stable release of the Linux kernel And we do something quite similar to this We've we've had two of or no, I think we've only had one of these I'd have to think back on that one we have zero one or two of these long-term stable releases It's a fairly recent thing for us to have them. They've been in planning for quite some time and We have this other branch that we call auditable, which is kind of like the long-term stable, but more so so The idea of LTS is a little different than it might be in a lot of other projects. I mean it's product focused It's current code with the latest security updates Interestingly we want to bring in compatibility with new hardware Zephyr people at boards very frequently to Zephyr and Usually they want to use them fairly quickly and so we don't really have a prohibition against using brand-new code from a new board in This long-term stable release so those are generally those patches are generally pulled back into the the new heart the Long-term stable release they tend to be isolated changes to just a group of files And the main thing this this code is more tested the cycle of development is extended and It's supposed to be stable for the long term. It's what you would choose to bring into a product if you're going to build an IoT device a sensor a Shoe and any of these interesting things that people come up with for devices This is a good starting point for that. That's is the idea it's a it's feature-based focusing on Hardening the functionality of what's there. It's not intended to be cutting edge to bring in the new latest stuff now That's the one in the middle this one on the the right is auditable is a branch off of the long-term stable release and Significantly it is a subset of the code and the idea here is There's a lot of types of certification many of them involve safety But there are a handful of these that involve security issues FIPS 140 dash 3 2 Is a big one that? People want to build products and then get these certifications The idea is to have a part of the code that's a starting point for that That we can address the issues even to work with the labs to get Maybe call it pre certification or certifiable is a term that we've we've used And the idea is if you're building a product, excuse me If you're building a product that needs some kind of certification or multiple multiple kinds of certification This is a good starting point for you and This is just starting we haven't done any of these certifications yet This is just a beginning of something. We're kind of seeing that we need to do So that's where we're at now As far as where we're going The first point there is we're trying to be open about this you have the product documentation Publish what it is what our goals are and then just these list of things that are in that documentation We're working on coding guidelines That's a link when these slides are sent out. You can click on it That will take you to our current coding standards current guidelines How to report vulnerabilities it's a a big one with the CVE process is that people know I've found a vulnerability. It may be sensitive. Who do I send it to can I get a PGP key to encrypt it to so that it isn't just flying out over the internet freely What happens when I send that vulnerability report in So that's documented and then we have currently a Jira instance to manage Bugs during any kind of embargo process that's needed This is still fluid. We're using Jira because of it has a richer permission model than github did at the time Github's adding support for security advisories. We may evaluate Over time Just moving to github so that we don't have two different places that bugs are stored because right now we have all of our issues in Github except for these security issues which get reported to Jira so That database is mostly not visible once something has become published And it's we once release notes refer to it The link will work. We changed the permissions on it and you can go look at that particular Issue to find out information about it all right So I talked about the coding guideline Linux kernel has a document that I believe Linus originally wrote which how do you write code for the kernel? We started with this one of the first things we did with this effort project is go through and make it look like Linux code some of that was using tabs to indent and Proper spacing on different things But when especially when you get to safety certification, but security Certifications look at these a little bit. There's some documents about a miseracy 2012 which has an amendment couple of these other documents to use as a reference and The thing about these these documents is they're kind of a mixture of some really good ideas and what were they thinking there's So What we're trying to do is incorporate these in but we realize you can't just say oh well All your code must comply with this document that does not publicly available. You must comply. Good luck So we are looking into tooling that can selectively enforce these different kind of requirements and You know these vary from basic things like you can't have global variables that aren't used to Things it's about being able to when you do security sensitive code to audit that code all of it and then typically involves looking at the assembly output of the compiler and making sure the compiler did the right thing and So there's a lot of stuff in there about writing code that Fits with that, but there's also things in there like no dynamic memory allocation at all So if you're building something with Zephyr that uses TCP or uses not TCP Well that has an allocation to uses TLS There's Allocation in all of as far as I know the TLS libraries embed TLS that we use So we that's why we have that audible code base that you can have a smaller subset of the code that's able to comply with these things so a Security guy coding guideline doesn't magically make all of your code better But it's it's a starting point it it gives some things that we can look to that You shouldn't do this in your code and a tool that will flag when you do that So that we can at least analyze whether well should you be able to do that? And do we need to write up an exception and that kind of thing? So another example in our code So this is kind of what we're doing now. This is a an open PR For updating our entropy random framework. We got a couple reports of vulnerabilities about a subsystem or another subsystem That we're using what turns out and not to be a cryptographic random number generator For part of a protocol that needed cryptographic random numbers to be secure So and there's a couple of those so what we're doing is we're actually going through The the randomness the entropy code Separating them out separating the randomness API So that is very clear what you're asking for if you just have a function that's you know, give me random data Doesn't really tell you is that cryptographically secure random data Is that just kind of random that might be good for a? Backoff timer, what's it good for? so This is the kind of thing we discussed in the the security subcommittee In this case, I was actually somebody on the team that worked on the issue and the goal here is to Clean up our API. We're in kind of a neat position of being able to change our APIs and Document things So that's easier for people to do the right thing that these things are documented even just named better so that you know What you need to call if you need to do something and that includes things like the entropy API is Made clear that you probably don't want to call this you probably don't need entropy unless you're implementing your own deterministic random bit generator and Make sure that it's clear. No, you don't want to call entropy You want to call the output of a random bit generator that itself will use entropy to seed itself so we're what what's What's our goal? I mean we want to make Zephyr more secure And what does that mean? And so we've as a security subcommittee. We've kind of had to sit down and decide What are some things that we want to do and then we have to work with the Technical steering committee to decide which of these things we're going to do so just a couple of slides here on What are our goals for the upcoming year? So right now We have this kind of mishmash of crypto drivers. We Include both embed TLS. It's pulled in through our brand new module system as well It's not a get sub module. It's not a repo thing. It's a West thing We have our own tool that was written for that To pull in dependencies and then we also have tiny crypt, which is as you can imagine a small cryptographic library and Different parts of the code call different ones of these and so the there's an ongoing discussion of do we need a more unified API? so that these can be plugged in and You can use different implementations for this So there's a thing by arm called the platform security architecture. It's Large and has many parts, but one of those is they have a crypto API which is Basically started with the crypto library underneath embed TLS with some name changes and They've actually changed embed TLS to use This API is defined. So that's kind of one obvious choice It's targeted for embedded devices the same as we are targeting There's other people pushing for something like pkcs 11 This is used for example by Amazon free our tosses their official crypto API and These are ongoing discussions. We want to evaluate these. What do we do? another thing we want to be looking into is FIPS 140 dash 3 When I first wrote these slides, it wasn't accessible the the text of the standard It is now. I haven't actually taken the time to read it But people build these it's intended for cryptographic modules so first thing to think of is the The little module that sits inside of an ATM that performs the cryptographic operations Isolated from the rest of the system There's a move to use this for Things that are parts of systems and separated devices, but a lot of people Demand the certificate or at least compliance with this A lot of specs the Bluetooth spec talks about well You need this such-and-such cryptography that needs to be pips 140 dash to compliant And the focus really here is on the crypto operations. They want to make sure you you do the right thing That you have the right primitives available that you've implemented them correctly They they continue to do the right thing They have a set of test suites that you have to run on boot in order to be compliant and The idea is this is a common way for people to get kind of an assurance that Some thing providing cryptographic operations maybe does them right So we're looking into for people who want to use the effort to build Something that is going to comply with pips 140 dash two or three. How do we help them? The generally they certify products We're an OS not a product But there's still things that we can do so that if somebody pulls this code in the certification lab Maybe knows all that came from the Zephyr auditable tree Which has been pre-certified and it just takes less work and less money for them so Another big issue is with secure boot We've been working on this for a couple of years now with Zephyr There's a bootloader called MCU boot and caveat. I'm one of the maintainers of that project That started out as the bootloader for one one OS minute. We've Generalized this that it now supports at least Zephyr and my note. There's some other things being worked on for that and What it is, it's a small boot loader So it has a lot less functionality than something like you boot or what might be find in a UEFI BIOS It does upgrades it can revert these upgrades check signatures against public keys that are kept in ROM in the device And as an example the trusted firmware project for m-class architectures The microcontrollers is using MCU boot as its bootloader and for Zephyr We can build with this bootloader. We can use it Things that need to be done is It's not a clean setup right now to build an application that uses the bootloader You kind of have to go build that yourself and assemble the things and try to make the image out of it We also don't have a real good upgrade story. How do you do upgrades over the air? Where do they where do they come from and this is in process? We have a couple of pull requests open and One I believe just closed just merged to implement various standards that have been developed for distributing firmware over the air Having it with the signatures There's the thing called suit Which is the IETF working group for software update for IOT and This is a attempt to build a manifest format It's an RFC describing a manifest format to describe what's in a firmware image So to have a standard format rather than the ad hoc one we made for MCU boot and Then things like richer key infrastructure right now It's one public key that somebody had to use to sign every firmware image on every device and Clearly there's a need for whether that be x509 signature certificates with chaining or Something else, but there's any for something richer than that and then There's an interest in fuzzing fuzzing gets a lot of news You talk about Vulnerabilities that are found by fuzzing the Linux kernel different parts of it fuzzing different programs In case you're not familiar with it the idea of fuzzing is it's a tool that Basically generates garbage as Inputs to something to try to exercise all of the edge cases looking for vulnerabilities Most fuzzing work is done on larger systems than Zephyr typically targets The typical fuzzing is a library you link in with your application while you build it with Additional profiling information so that the fuzzing tool can Determine what paths the code is taking to Direct the fuzzing data that's passed in Because otherwise you have a problem of just too much data. It's too hard to find the edge cases There is a research project Forget what university it is on a QEMU based fuzzer where the rich part of the fuzzing is done on a say Linux machine and then the application is running in emulation and It watches that and feeds it the data Because the thing is that existing fuzzers often assume lots of memory That we do have a POSIX native port of Zephyr, but not everything works there. You don't have network devices you typically will have Sockets available it as user space tasks, and there's a lot of things that are a lot of the code that doesn't get exercised in this environment Big thing is this is an open area for research If anyone finds this fascinating, it's a it it would be a very useful place to put effort Let's leave it at that lastly Documentation we want to improve our documentation We well I I wrote some threat models for Zephyr a couple years back I've since learned what a threat model actually is and It'd be nice to go back and take some time to Write down what I wrote down as a threat model to read like a threat model and also to figure out Is it pertinent anymore and to maybe find some more applicable? environments or configurations other applications other contexts where that threat model would apply So that's all I have for the the Zephyr security update. I guess I got a couple minutes if anyone has any questions Hey So I have two questions if the time allows the first is have you considered? more safe language instead of C I'd love to Which one are you thinking of? well I've just looked it up and turns out that it's possible to run go or Rast or whatever on Things like my micro bit. Yeah, I think that's a wonderful idea I don't think that's gonna happen on this effort project for quite some time The projects doing the safe languages in embedded targets Are pretty immature right now, but it's definitely a great direction to go. I Don't know how long it's gonna be before We want to move there. I mean There's kind of a joke on the the reddit channel for rust that People say oh, we'll rewrite it in rust and That's not usually a good answer for something But there's definitely a place for new applications to at least start by looking at that See if we can get these tools to a state where that's useful so I See I think so and since you're stuck with C What is your testing strategy and do you have any tools like a sin? so One of the difficulties is that there are such constrained devices that this code runs in We do So we have some static analysis that we're running right now which Can detect some cases of buffer overflows Memory usage issues that kind of thing a Lot of it is We test a lot we we build all of the code we run it We hope we find the things but no there's a lot of need for that for something like that The challenges of how do you do that in a device that has a hundred kilobytes of RAM? You know, what do you do with? You know, we've added memory protection. This is that's a significant thing. Can we Partition things more fine grains the things that shouldn't be handing data Accessing data don't and that kind of thing but if people have I if you have ideas that would be really helpful, too Thank you. Thanks a lot for your talk Could you elaborate how much code? Do different platforms share I mean I know zephyr can run on x86 and all those Small microcontrollers as well. How much code do this platform share in common? So without opening a terminal window, I can't give you exact numbers There is a lot of common code. So the code that's dependent on the devices you will have architecture specific code for arm for Cortex m33 for Cortex a specific processor It's not that much code. The whole system is much smaller than something like Linux But most of the code that's not common is is drivers the core code for Scheduling that kind of thing is shared between everything But again, I can't really give you if we want to like pull up the code afterwards I could get more detailed numbers if you just want to look through the tree there. It's pretty cleanly divided. I just asked because I Wanted to understand if I for example fuzz zephyr on one platform How about bugs which I found Do they applicable to other platforms and so that's going to depend on where the bug is found? Obviously, but a lot of the the core stuff for whether it be I mean the networking stack is not going to be specific to open to a platform the Bluetooth stack is not going to be Specific to a platform. So there's a lot of that code. So As long as what you if what you find is a bug in your platform The say the flash driver you are the driver for the Bluetooth hardware That's probably only going to apply to that But if you find a bug in how a memory protection and semaphores work, that's probably general And one small question as a security architect. What do you think about bug bounty for Zephyr? It's kind of beyond my scope To come up with where the funds would for that would come from it We do get reports from researchers the bounty might increase that I don't know It's You know what I'll bring it up I'll bring it up with the TSC. I mean the people who make the decisions about money can certainly Set aside some for something like that. Any other questions? No, let's thank the speaker