 Thank you. Hi, everyone. My name is David Hendricks, and I'm a former engineer over at Facebook. And this is my colleague, Andrea Barberia. And we're going to talk today about open source firmware at Facebook. So I'll start by giving a quick review. And then Andrea will finish with a whole bunch of demos. So when you think about open source firmware at Facebook, you might think of our involvement with a lot of software projects. Yeah, like the Lex kernel, CentOS, Chef. And in fact, in 2018, Facebook was the 5th most active contributor to open source projects on GitHub. The 5th most active corporate engineer there is. However, we're also involved in open source hardware. So a few years ago, we started the Open Compute project, which Steve talked about yesterday during the lightning talk. And that aims to bring the same kind of innovation that we see in the software world to the hardware world. And we also started the Telecom infrastructure project to similarly accelerate the pace of innovation in the telecom industry. Today, Open Compute project has over 100 members and Telecom infrastructure project has over 500 members. So these are big industry-wide initiatives. And a few years ago, we started... There you go. Thank you. You have to know something else. So we also started working with open source firmware. A few years ago, Facebook engineers identified proprietary BMC firmware as a pain point for our operations. The BMC, or Base Board Management Controller, is an independent microcontroller present in many server and networking platforms that typically performs monitoring and management functions. Our engineers were unsatisfied with the proprietary solutions at the time and decided to implement one based on Yachta that was released as Open BMC in 2015. So with the BMC firmware finally opened up and Open BMC becoming the standard on the system equipment, system firmware became the next model to set. So what is system firmware? So system firmware is the first bit of code that runs when your processor is turned on. It's sometimes referred to as BIOS, but that's kind of a legacy term. It has knowledge of the system as a whole and so that it can initialize all the cores on all the sockets and find all the devices and all the buses and to set everything up basically so that you can run sane normal code and actually move into your operating system. So code running this early has to do with a few interesting constraints that application developers typically don't have to worry about. So for example, when your processor starts up, you don't have caches, you don't have DRAM, you don't have a stack, you don't have a heap. All this stuff has to be set up and then once you do get that all set up, then there's the fun part of initializing all the devices, you have SSDs, networking devices and so on so that you can actually load what you want to load and execute your target operating system. Simple, right? So a couple examples of how this has gotten a little bit more complex over the years. Let's take a simple act of booting over a local storage device. So many years ago, a couple decades ago, we might have had one de facto standard device these days. The sheer quantity has exploded and we have many more devices, you know, rather than just one type of serial interface. Now you have, you know, all these other things, SSDs, EMMCs, portable devices and so on. We have many generations of interfaces and protocols, controllers and devices from countless vendors that we need to support. We have high-speed links these days. Used to be, you'd flip the power and those of all the training you have to do basically. These days, you turn on the power, probe its capabilities, ramp up the speed. You may need to configure power and clock trees along the way and that involves a lot more knowledge of the system as a whole. And finally, once you get to the point where you can actually communicate to the device, then you need to actually load something and execute it. And the whole way you do that with the master boot record scheme is you would take the first thing you could find on a silver head sector 001 and you would blindly execute it. That's pretty terrible by today's standards. These days, you might want to go through a whole bunch of different devices, pick a partition more intelligently. You might have to decrypt, like open up a decrypted partition. You might want to verify the thing you're going to execute. You know, just some basic sanity checks. So once in ever booting, so never booting used to mean that you would load an OS image from the server across the room, probably on the same latch. The way never booting involved was built around trusted networks. However, these days, we want to be able to load an operating system image from over a network that might reach out to a whole other data center or perhaps somewhere else over the open web, completely untrusted networks. So, like storage, there are many more devices, interfaces, and protocols with varying levels of complexity, robustness, and security. Current network booting technology was, you know, mostly designed in the 80s and 90s and hasn't really changed a whole lot in terms of things like Pixie Boot. Pixie Boot relies on TFTP, which has basically no security built in and you certainly wouldn't want to send your credit card information over the open web using TFTP. You definitely don't want to boot your infrastructure that way. So, it's time to use protocols that were designed with security in mind and robustness, things like HTTPS and even torrents. So, long story short, is that booting has gotten pretty complex and consequently, the system firmware has gotten much more complex as well. The system firmware these days has its own drivers, network stack, crypto libraries, shell applications, graphics drivers, and so on. And our philosophy is that if we're going to have an OS in firmware, let's make it Linux. So, for us at Facebook at least, Linux is very familiar to us and I'm sure it's very familiar to a lot of you as well. We have teams of engineers who are supporting Linux at all levels, with kernel, tools, services, etc. We want the debugability and traceability of using open source software. And keep in mind that this is code that runs at the highest privilege level and has unlimited access to your storage and networking resources. So, we really need to know what's going on behind the scenes. For tooling, we want to go from vendor specific and sometimes product specific tools to open tools. A good example of this is, think about how you update the firmware on your laptop. You might have to run some vendor specific utility that if you ran it on someone else's laptop, it would probably result in them getting a brick or worse. Or maybe you just wouldn't run it. But we want open generic tools. And we want firmware that can be ported across a wide range of systems, including servers, networking appliances, and embedded platforms. We don't want to have to support ten different firmwares on all the, you know, on ten different devices that go into our data centers. So, long story short is we want to let Linux do as much as possible because Linux, of course, runs everywhere and it can be common across a wide variety of platforms. The approach for taking is called Linux Boot. And here's a brief overview. So, the main idea of Linux Boot is to put Linux with an embedded environment like an inner ramifest in the boot ROM and jump to it as soon as you've got the CPU and DRAM initialized and you're ready to actually run the same normal boot. Once we're, in our case, we're using core boot to do the silicon initialization. Some of you may have heard of core boot. And we try to keep that as minimal as possible because we want to offload as much as possible to Linux. Once we have Linux loaded, we let Linux initialize storage and networking resources and we reuse tools from user space to carry out the remaining steps. So, by doing this, we're able to get rid of a whole lot of stuff that's shipped in firmware and replace it with code that we're already using in user space and in the current space. So, this gives us the same production quality of drivers, apps, and networking facilities that we're used to in supporting the runtime environment. And, of course, there are many other benefits that we don't really have the time to get into, but we'll be happy to discuss later on. So, just to drag and point home, this is what Facebook infrastructure looks like today. Some of you dropped by the table yesterday over in building AW. Thank you very much for visiting. But some people are asking, hey, why does Facebook care about this? Well, this is why. Even though our data is in the cloud, this is our cloud. We physically have at least a dozen of these sites, and they're just practical servers, and we actually have to maintain this stuff. So, yeah, that's a lot of servers and switches. And we're not just building data centers. So, here's some example of open cellular hardware that we're producing and shipping out to various parts of the world. Open cellular is part of the telecom infrastructure project, with the goal of improving communications infrastructure. So, yes, the data is in the cloud, but it's our cloud, and at the end of the day, it's our responsibility. The cloud consists of many data centers, packed with servers, packed with equipment, and, you know, we have pretty large infrastructure, but so does Google, Microsoft, Amazon, and all the other hyperscalers. So, at the end of the day, everything that runs on our servers is our problem, including firmware, and we need the ability to debug, audit, patch, secure it, and deploy it. So, in 2018, Open Compute Project officially picked up the Open Systems Firmware Initiative within OCP, with the goal of opening up firmware that goes into our data centers, similarly to the way we're using open source software and open source hardware. As part of this effort, even Microsoft has been working with us, and they have a slightly different work stream called Open EDK, so that's more on the pure UE5 path, if that's your thing, while Google and Facebook are working more on Linux boot. And, of course, we have many other partners in this ecosystem. In both cases, Open Source Firmware enables us to support our company's use cases with open, portable, and auditable firmware and tools that are well understood and actively developed within our respective communities and companies. And, here to demonstrate how we're going about this is under it. I'm going to show a hands-on Linux boot. It's going to be a lab demo. That demo can go well and fail in some way, so that's a good beginning. Short introduction, I work in Facebook as a production engineer. We work on reliability and performance and security in-front, and that works specifically on before we start, I'm going to show you what's the backend and what's the architecture we use for Linux boot in Facebook. On the left side, you can see that's true. On the left side, you can see that what runs on the flash chip, there's actually a picture of the flash chip on top, and on the right side is what moves from the hard drive. So all of this is physically soldered on the motherboard. What we are doing is, as that is suggested, we are replacing an operating system and this last one is Linux. On the left side, the main component, the first component is Proboot, which is able to do hardware initialization, the very basic hardware initialization, CPU, VRAM, and so on. Then we have Linux, which provides us with the usual native drivers, bug systems, all the familiar things, we know, Bluetooth, we have a different environment, separation between terminal and user space, and so on. On the right side, we have the latest demo, it is the user space. This user space is made mainly, it's based on top of your root, which is an environment for embedded systems entirely that can go, started at Google, and which we have contributed to. And we have said traditional tools that we call system boots that are, I think, bootloader behavior on your root. So, let's go, so before I show you how it works, I will show you how we build it. I've prepared a bunch of build scripts and run scripts, of course, and I hope you can see it correctly. As per the diagram before, we're going to build your root Linux and Proboot. Your root is the first component that we're going to build, and this is going to be embedded within a Linux kernel, which is going to be embedded into a collection of tools that we're embedding into one another. And we are going to start with your root. Your root is actually the one that I'm interested more for this demo. As I say, it's an embedded environment for you to go. Because, well, go a center bank, it's a very fast compiler, so it can build image very quickly. It's very easy to understand, it's a member's language has many benefits. It can compile very easily. It can just change an environment by as well and build for instance 6 or R from whatever architecture that it supports. As I say, it's very fast, so I'm going to show you live how fast it is to build just like we're facing this demand in time. I'm going to present it in a moment, but I'm going to show you how this build is done before I present it. The parameters I'm passing are a bit more later, and we are embedding a couple of external programs, like as an example. The main part here is that we are embedding the core programs, like psls.com, all the things that are based on the system, and we're also importing an external thing that is the simple thing that I mentioned, which is able to behave as a program. I'm pressing enter, and this is going to build the whole system, compile everything and that will be useful directly into Williams kernel as your ramifest. It's pretty fast, it usually takes yes, you can see that with a bunch of tools it takes 15 seconds, which if you are familiar with other build systems, it's pretty fast. And this generated the ramifest. Since we don't have much time, I'm going to just go quickly through it. This will compress, the next script will compress and the ramifest will build the kernel and will build the program. And this is going to take a bit, so I'll give a bit more details about why it builds. So Yeruk, as I said, it's a product that's adopted in Google to provide an environment for embedded systems. It's entirely going to go for the benefits that I mentioned above. And it has two interesting modes, one is called VCbox mode, which is the main one it is, and the other one is the source mode. VCbox mode, if you're familiar with VCbox, will take all the source code, will compile it into one single binary after rewriting everything into one single source, and will generate one binary. This is great for saving space. Yeah, even the building here. The source mode is even more interesting. So instead of having one binary on your firmware, and remember that here we're talking about firmware, not hard drive, it's not the target of embedded system, it's on your motherboard. The source mode will embed the source code of your programs, like LSPS and so on, and a Go compiler, and everything will run live, will be compiled on the fly in your machine. For example, when you boot your firmware, and instead interrupt it and enter the shell, when you run LS, you will go compiler, will build it on the fly, and will execute it. The next time it will be cache, then it will be fast. If you make an definition of your program because you want to do debugging, because you want to understand why it's not working and so on, and why it's on the field itself, you will get rebuilt, and then you can troubleshoot and verify whether your body's fault was broken, whether your pitch works. When you're out of the machine, you go back completely. Alright, so now I'm going to run all of this. We are going to do a demo on QAML, because it's what we're going to do to bring real switches and servers here. But this works except this in a way. The QAML system has a filter which you can call with dash bias as you can see here. It has hard drives, and it has a network. And in this case, I have set up the demo with a network that has another machine with DHCP, HTTP server, all the HTTP6 environment that you need to acquire for Google advertisement, etc. So in this case, we'll spin up virtual machine with off-filmer, the one that we just built a few moments ago. I'm going to run this for this positive of course, and I'm going to stop here. So this is the filter. It's already ready to boot your machine and to boot your operating system. But I interrupted it, but you see stuck-in-boot sequence, press control, CVD, find settings to drop into shell. And you can run commands like LS. You can see the network configuration. You can check your network neighbors. You can do, well, there's no a moment for that. But you can basically have a proper Linux environment where you can run commands and troubleshoot weather and why your machine doesn't work. So this was very fast. So I can scroll back and show you that we booted and debug booted. Here we start the code boot. The code boot does its job by by missionizing memory. Then you see this messages in the boot. It will scan the PCI pass, and you get the utility device, and the disk, and so on. And then you get into the Europe running test, which creates the structure for which it will collaborate. Here it eventually enhances the control of the which we call system boot. System boot is just a collection of financing programs that are able to boot machine. I'm going to show how does the network program work. It's just a little command. It's implemented as a Linux command to run on server. It's able to boot by HP or by a slot without HP or DNS, et cetera. There are several different ways you can boot from. I am going to simply boot in debug mode so you can see the messages. And this is going to try to get a configuration by an HTTP. A network boot program. And I should be able to check whether you will get a network boot program. And yes, that's what I meant. I'm looking on this demo to show something else. So I am running with a different config. But here you can see that we got the address, for example. Anyway, now the HP I just configured HP to boot from the app. I'm going to try it now. It's a lot demo type. So it gets the configuration. It gets a network boot program. And then it boots into our installer. Here it's just a dummy installer. And here to prove the point, it's not really installing anything on the machine, it will take too much time. But it's proving that it booted via HTTP, not via HTTP. And it can even boot via .NET. We are working on that, but unfortunately I couldn't make it in time for the demo. You can literally pass the magnet link in the HTTP and boot your very large image. And yes, it did work. So let's imagine, yeah, I should move on to questions now. Let's switch to the question. And why do we switch to questions? I will just show them to you. So questions? Hands up? I already went up to the source for more at Facebook. Again, this is all open. We encourage you to come along and join the community and download the code and try it out. And of course try out 3DSD boot or plan 9 boot or whatever else you want. Alright, questions? What is the status for ARM64? So the original site that I saw in the mix boot was thinking was thinking about using 2 boot sqrt for starting ARM boards or what has happened to this area. So there hasn't been a whole lot of movement there. But the good news is that Linux runs pretty well on the boards and recompiling this in ARM64 is really good. We just have to set GoArch for its ARM64. And this all compiles tribally. We actually tested that beforehand and it's really good. Yeah, if someone from the UBOOP community would like to speak to this, you know, go straight to the Linux boot, that would be awesome. Next question? Yeah, has this been tried on workstations or just... Yeah, well, we're trying to call the production hardware, so switching from servers. Yeah, that's what I'm hoping for. The only requirement is that you're left to pass the support for boot and you have to be able to question it. For example, I use Linux a lot and it's a perfect one for refreshing the Linux boot. What about softwares from normal PC motherboards that require multiple enhancement to the boot? A more complicated question. One more question? Normally, the firm has services like plan management, everything. Who's doing that in those cases? How guys are they doing it? So, the question was about certain run time services that the formal might be involved with, such as fan management. For our case, the fans are managed by the BMC. It's kind of a cop-out answer. For the laptop or workstation case, you mentioned it? I don't know. If you do need to implement things like system management mode, routines in order to handle stuff, Ron Minick did a really cool talk called, let's move SMI into the kernel or something like that. So, check that out. Thank you. One more. Just to note, for the ARM64 case, what does the ARM64 quite a number of ARM64 users use? Oh, yes. One more question about ARM64 support. This gentleman raised the point that the core boot does support ARM64. It's a new boot that also supports a lot of ARM64 platforms. So, I mean, they're both great open source projects. They're great to have them both able to boot into running.