 Cool. Yeah, so I'm Jesse Brazell. Thanks for having me here, by the way. I always love coming to Chicago because my sister lives here, and Chicago is nice in the summer. Other than yesterday when it rained, I also feel like that was my fault for being here. So, yeah. I'm going to be talking about why open source firmware is important, and I actually hate giving the same talk twice, so this one's going to be a little bit different and also go into like Roots of Trust and a bunch of other stuff. So, if you think about like the layers of software today, it's kind of like you have a bunch of software on top, like your app, and then maybe something that controls the app. And then you have your operating system kernel, and then you have firmware, and then you have the hardware. So it's like all software and then hardware, just to make it super, super general. And then if we make it even more general, it's like everything is shit. And I think everyone can agree on that because everything has bugs and everything is horrible. So if we look at like the privilege levels for this kind of stack, you have like ring three, which is user space. Ring two, like doesn't really exist anymore. It was like drivers, same with ring one. Ring zero is your kernel space. And then ring negative one, also like kind of the negative rings are made up, but if you really think about it, like all the rings are made up. So there's ring negative one, which is like your hypervisor, like Zen or KVM or whatever you choose to use, BeHive. And then there's ring negative two, which is like system management mode and the UEFI kernel, which we'll get into the details of. And then there's ring negative three at the very bottom, which is like the management engine if you're on like XA6. And then there's equivalence on other processors. So the code that we like don't know about, because you can use open-source software for all the others, is like system management mode, UEFI kernel and the management engine. And so that's like pretty scary that like our most privileged software is in the layers that we don't know about. So let's like kind of go over what these are. So system management mode was originally for power management, then people shoved a bunch of other shit in there. So then it's like hardware control, proprietary design code, so like vendors will add a lot of new features there and be like just throw it into system management mode. It's like maybe it would be better somewhere else. It handles like system events like memory or chips at errors. There's like runtime kind of things there as well. So like correctable errors, stuff like that. And it's like a half kernel. Then we have the UEFI kernel, which is like extremely complex. I don't know if anyone's looked at the code for the UEFI kernel, but if you ask people that like work on it, they're like it takes forever to fix a bug because it's like unreppable. It's unreadable code. It's just like way too much code. So someone was telling me that they like tried to make like a one line change to the UEFI kernel and it took forever to even find like where the bug was or anything like that, which is horrifying. So UEFI applications are active after boot. And then there's like security from obscurity here because like no one can really wrap their head around it unless you like hire the author of UEFI, like the one kind of like reaction that people have when you're like UEFI is really bad and they're like oh no, but like we hired the author and it's like okay cool. You got like the one person who knows that. So then there's like a bajillion other features in there and you can look them up if you really want to, but like no one can wrap their head around it so like honestly it's not worth it. So then there's Intel management engine and so this has networking management, KVM management, Intel proprietary features. There's like of course you can look up all of these like crazy things that the management engine does as well and then there's, it can actually like re-image your device even if it's powered off which is incredibly terrifying. Also terrifying that there's like a web server in there so it's like it can re-image your device and it's also hooked up to the internet like great. So it can turn on the node invisibly and then it runs minutes which like no one knew for the longest time so the most popular Unix runtime is Minix which is crazy. So there ended up being like in 2017 this critical bug in all of Intel processors that happened because of this Minix layer and that's like when it came out that they were using Minix is that like there was a bug in the network server, the web server in Minix, in the Intel management engine and it's just like why is that even there? It's a really good question and why is it still there is an even better question. So that's just one example of a bad attack but like if you Google like any sort of firmware bugs or anything you can easily find others. One that was kind of recent was the Bloomberg's article on like modchips on super micro boards and like while this probably didn't happen it was still entirely possible and you can find a great talk on it from Teramo Hudson I think it was like 33 C3 but it's called modchips of the state and it's like amazing because he actually goes through exactly how you would do this and it's like incredibly complex of course and you have to like hack the supply chain but it's like why would you hack the supply chain if you could just like walk right through the firmware, right? It seems a lot easier so this is bad like I mean one you could actually do modchips but like two the firmware is already very easy to get through so just use that but it gets even worse so there's a feature called Intel boot guard and what this allows you to do is kind of like guard the firmware on your box which sounds great right like in theory so with boot guard like you are kind of locked into always booting the same firmware and you get to sign it and Intel verifies that like this firmware has been signed but you can't actually then modify the firmware say like I want to run like core boot or something instead you can't because like Intel owns the keys and you can't sign it but what you actually can do and this is what like Tram will figure it out and this is a different talk of his so you can there was a bug where like basically if you get rid of the screen that says like you can't boot this firmware and you don't replace it it will be like oh I can't find the image and it will boot anyways so it's like this isn't even doing its job at the end of the day and also it's making it really hard for people to actually run like open source firmware which I'll get into like what those options are and stuff like that but it also is like kind of causing Intel to just own the software process that you're running on your computers and on their chips so if we add up all the things that are like in these lower stacks you get two and a half other kernels so like already you're probably running like Linux or Windows or Solaris or whatever your own OS and own kernel but then you also have these two and a half other ones that like no one has really vetted and no one really knows what's going on in and then each of them have their own networking stacks and web servers which makes no fucking sense and then the code can modify itself and persist so like you're connected to the internet and the code can modify itself and we have no idea what the code even looks like because we can't see it right so that's horrifying all of them have exploits they're all incredibly complex so my hypothesis is like once you need to deal with the firmware it becomes pain and this comes from like a survey that I sent out online and then also like talking to people but I kind of love like I asked this on the internet and like there was a bunch of replies but I'll go over like some really funny ones but basically like the pain is astronomical just at the firmware level I mean just in general like computers are shit and I'll kind of go over that as well just because it's funny so Super Microbiose and IPMI bug where trying to load the IPMI module would freeze up your SSH session and the machine would drop traffic but updating the firmware would 50-50 brick the server and this is like actually super common like in talking to people like breaking your server with firmware super common so even in this second one this is one time when we almost bricked a thousand machines with the bad BIOS also this time we actually bricked 3,000 machines at the same time that's horrifying so firmware bug and NBME flash causing sporadic PC IE bus resets which were covered in fraction of a second because network lost because Nick buffer overflowed in that time and it's like there's a troubleshooting horror film buried in the statement and I actually would like love to watch that movie because it would be really good I'd watch a movie for any of these honestly then Dell C series servers C6100 IPMI board only survives attempts at updating it 50% of the time and the process for reviving them was bad enough that Dell asked to send a tech out to do it manually because once it's failed it's not addressable for a second try and it's done locally so here's another Dell after that for a while came up with a bug in the fan firmware that made it think the server was overheating randomly reboot the whole chassis the solution suggested was to increase the threshold for that sensor it was great bug and BMC controller on IBM HS20 blades so it's like both IBM and Dell IBM support transferred my call to the chip manufacturer they sent a new controller board and asked that I not tell IBM like absolutely horrifying and if you talk to people like I've talked to so many people where like Dell can't ship the same SKU twice so like the SKU of Dell that you get is like a made up SKU and then you open up the box and it's like a bajillion other SKUs and that just seems to be like a problem with the chip itself so outages in general are unavoidable and I'm only including this because some of these are really funny data link between our DR site and main site was having rhythmic packet loss so like ooh I can't do it right it was fiber and nothing could explain it except the lines had fallen from a pole accident and we're lying across the road packet loss was cars driving over it like that is crazy so I was like looking at this one it was like wow that's crazy it's like weight related but this one is as well so S390 box kept powering down could not find a fault everything looked totally normal eventually sat next to the box all night nothing happens about 4am I get up to get coffee and the box powers down it was a loose floor tile that was wobbling the power cable absolutely insane also I'd be so pissed if that was me because I would be like you've got to be fucking kidding me that I just stood up all night so yeah another kind of theory that I have for this and it also relates to talking to a lot of people about these issues is that it's kind of like a form of Conway's law so in talking to a few firmware hackers when they find a vulnerability and like Dell's firmware or you know a vendor's firmware they have to obviously tell that vendor and in multiple accounts that vendor one doesn't know what to do with it right away but then two they cannot talk the teams internally so if it's in between two interfaces of the Dell firmware and those two teams can't talk to each other then the hacker has to do the communication between the two teams it's like that is actually your problem that these two teams don't talk to each other so from the perspective of hardware engineers as well they tend to think you'd be crazy to think hardware was ever intended to be used for isolating multiple users safely so the hardware engineers are kind of one layer of the stack and they're like those software people are fucking nuts they're over here trying to do multi-tenant and we're all like please stop so Spectre and Meltdown kind of proved this to be true because if you ask any hardware engineer about Spectre and Meltdown they're like no those people were crazy to begin with so that's super interesting then from the perspective of that kind of lower software stack on top of hardware they kind of want the vendors the chip manufacturers and stuff to make their firmware do less so SMM stop doing runtime services because it's just fucked up anyways and it doesn't work well so they want them to give them the control so these kind of communication channels aren't working out since no one seems to know the other side's opinion or they just don't give a shit they keep saying that but we don't care so vendors can really debug firmware issues like I said or the hackers have to do the communication between the two teams and it's like this oversight and lack of communication leads to bad shit so I don't know if you all recall but there was this hack on IBM's bare metal cloud software where the BMC of those servers was exposed and then the hackers were able to distribute malware through the exposed BMC so that if you were to get one of those bare metal nodes do this and then delete your node any other customers that come onto that node the hackers basically own you and that's the whole problem with the cloud is that that's a promise that they should be guaranteeing is that obviously the next customer who uses these resources should not be able to have their data read or anything like that so how did no one when building out software how did they not think about the BMC or protecting it or making sure it wasn't exposed or anything like that so these kind of miscommunications happen when the teams aren't allowed to talk or the team is just so blinders on that they don't even think about what else could happen or maybe they didn't know that the BMC was even a vector because our job is orchestration it's like we don't care about that but it's like if no one cares then no one's going to deal with it or if no one communicates that they should know about that no one's going to deal with it and someone almost needs to own that high level vision so it's like it's like it's like it's like it's like it's like it goes up to them but I've also like seen these miscommunications in layers of the stack happen in the container ecosystem as well so like Kubernetes has a couple like security features where it's more like a window dressing and so this occurs is exacting into other containers so Kubernetes has a security feature that goes like you can't exact into other containers but all this does is block the API and point for that it's like no like then you get like a 502 or whatever for whatever the actual responses I forget the one for not allowed so if you were actually on the node itself like you can exact in through like multiple other forms that a lot of people use cube CTL on the nodes themselves and so you get into a situation where you're like actually I could just like combine all the file descriptors for the container and exacting myself because it's not that fucking hard even if you blocked like the docker exact like command which it doesn't do but but if you blocked try to block like all the layers you could still combine all the file descriptors right and so it's just kind of like one of those like things where I don't think people necessarily look at the full full picture and it's also being advertised as something that's not so it looks like this basically because you're like I can just walk around this thing it's like a common pattern so most of these like vulnerabilities people find are actually just like oh I accidentally just walked around it so the point kind of is like miscommunications of various layers of the stack leads to bugs in these intersecting layers like that's always where it seems bugs lie based off people having incorrect assumptions so it leads back to kind of our like everything is poop which it is and everything has bugs so how do we fix these things while we do it with open source firmware kind of the point of this talk and why it's important obviously like that doesn't fix all the layers of the stack but I just think it's an interesting point so when the open source firmware started it was first called nerf and then that's non-extensible reduced firmware they just ended up like getting rid of the name but I still think it's cool because it's the same thing it's still doing everything that this like entails so they're trying to make firmware less capable of doing harm make its actions more visible which is great remove all the runtime components so like with the management engine you can't remove all of it but you can take away like the web server and the IP stack who fucking needs this and then you remove the UEFI IP stack and then you remove the ability to sell free flash because that also seems super harmful and then you let Linux instead manage all the flash updates so like most people run Linux anyways so it's like that kernel should already be vetted for you so going back just to like remind you of this like visual we have a user space on top the kernel hypervisor SMM UEFI and then the management engine so you're kind of just like getting rid of those you can't like entirely get rid of UEFI but you can make it super minimal and just the management engine you can entirely get rid of either but you can make it super minimal so zooming in on these you have SMM disabled UEFI is minimalized and then you have your Linux kernel with a minimal user land and since it's a Linux kernel what's cool about this is like your user land is tools that you know like you could use like fucking bash if you wanted to like maybe don't but you know and then you have your minimize management engine and so how you do the third part with the management engine is you use this tool on github called ME cleaner and then the stack on top for like negative two looks something like this and core boot you can use core boot there's actually been a lot of production usage that has recently made the press about people using it so that's really cool in production super dope and that's what handles silicon and your DRAM initialization then it passes off to Linux boot to do like your device drivers network stack multi user and then you can use this project from Google if you want for your user space and it's entirely written in go and it's a single binary and super nice and then you have like all these kind of like nice user space tools and since it's written in go you can like make patches very easily and stuff like that and that handles your and it ran with us so why Linux a single kernel works for several boards like Linux has a ton of drivers it's already quite vetted and it has a lot of eyes on it and it's used like pretty extensively I mean people have their like naysay about Linux but it's like that is probably some of the most highly looked at code so it's a single open source kernel versus like the two and a half other shit shows that were like mostly closed off and then it improves your reliability of booting because these like firmware drivers and stuff have actually already been hardened versus you know the other shit shows and now since it's Linux you can build in tools that you already know like you can use go Python whatever the whole language you want to use and then when you need to write logic for anything that you would do it boot it's easily audible and like a modern stat which is cool it also allows you to like hire devs that are not necessarily firmware devs like they just know other languages which is also cool because the logic is like whatever and you get like memory safety wins if you use like an actual memory safe language so it also turns out that made boot time 20 times faster which is amazing so that's also another great win and now moving on to Roots of Trust and these two things are tied together so what you can actually do by having like all open source software is you can verify a boot that like the software that you're running is a software that should be running and like today a lot of these open source firmware things that went over like Linux but it wraps like a proprietary binary like a very very very minimal proprietary binary but it's still cool in terms of checking the integrity on boot so there's like a few examples in the wild of this today Google has Titan and Google has a custom silicon that they wrote they actually have given a lot of talks on it which is great it would be cool if it was open source but there's like actually a lot of documentation on this if you want to look further into it but they have like on-chip verified boot there's a cryptographic identity and secure and then boot firmware signature check and monitor physical security and then transparent development so that's really nice there's a really great paper on this too Amazon has Nitro which is part of their whole Nitro stack but then there's like a Nitro chip and if I recall correctly it's in FPGA but I'm sure someone from Amazon will correct me and FPGAs if you don't know it's like programmable logic arrays which seems like it would fit pretty nicely for just verifying hashes stuff like that Apple has T2 and if you're familiar with Apple you're probably familiar that it's not easy to get information on this so there was a talk at Black Hat by some people from Duo that kind of tried to reverse engineer it but then there is also this like paper online information which is nice and I mean it must have taken forever for someone at Apple to even release this so thank you to whoever that was what it ends up looking like is your boot ROM evaluates the iBoot signature then iBoot evaluates the T2 kernel cache signature then T2 evaluates the UEFI firmware signature and that all happens on the T2 chip and then over spy UEFI firmware evaluates the boot.EFI signature and then the boot.EFI evaluates macOS kernel signature which is like pretty clean and it's also like kind of easier to wrap your head around what's happening but it's more like just like a relay where one thing leads to another down the line so honestly like if you were to read anything after this talk this is like the best paper it also goes over the kind of security enclaves for your iPhone which is like super like really well thought out so yeah there's that then Microsoft has Cerberus and so the specs for Cerberus are actually like open source on github and part of the open compute project but there's no code but if you ask someone at Microsoft like if it's open source they're like oh it's on github and you're like no that's just a spec have that conversation numerous times actually brought up the repo to someone to be like it's literally just a spec so no idea if they're going to open source the rest of that that would be cool but it goes over a spec for a route of trust so if you're interested in that there's another option the catch is though eventually you will need to wrap a proprietary binary for your firmware so it would be cool if like the vendors actually gave you all open source firmware because then you could actually verify without a doubt that even the stuff in the proprietary binary was not like backdoor or something when it comes to like say you're like a government agency or whatever you would definitely care about that maybe not everyone else would care about it obviously but you know some people do so if all firmware was open source from manufacturers you could actually guarantee the entire integrity of the hardware and firmware from backdoors as well but today you can just basically guarantee that it's like what you say it is still just pipe dream to have that though so wrap up through open source visibility minimalism and open communication you can like push computing to a better more secure place from the hardware up we can't keep building on top of shit like someone needs to really care about like these base layers and how can you help push back on vendors to open source their firmware if the organization that you work in also buys a lot of stuff from them like you're going to have more leverage than the rest of us so yeah anyone can honestly help in this space and huge thanks to the firmware community and all their work on this because they've helped me like understand a lot of this like space because I knew nothing going into this so that would be like Ron Minnick Tremel Hudson who has great talks online if you want to dive deeper actually all these people have great talks online Rick Alter and Zowlin I don't know how to say that and thank you for having me