 So my name is Ryan Fairfax, I'm a engineering lead at Microsoft where for the past three years or so, I've been working on a Linux kernel based OS targeting IoT. And early on, we decided to bid on the Yachto project. And what I'm gonna talk about today is some of the lessons we learned and how to use Yachto to build IoT targeted operating systems, specifically targeting crossover chips. And we'll talk about what I mean when I say that in a few slides here. So the general agenda, we're gonna start by talking about what is a crossover system on a chip and why should you be paying attention to them? Why are they relevant? We'll take a brief detour and I'll use the product that I've been working on for the last three years as a use case to help show you the components in a system targeting these kind of chips. Then really the meat of the talk is how to integrate this with Yachto. I have three different approaches that we've tried throughout the last couple of years that I'm gonna present in detail. And then lastly, I'm gonna spend some time at the end to talk about kind of lessons that I and my team took away as we've been doing this over time. So first things first, what is a crossover system on a chip? In general with embedded CPUs and system on a chips, they fall into one of two broad categories today. So the first is application processors in the ARM world. These are Cortex-A class CPUs. And they're basically highly capable CPUs with clock speeds and the hundreds of megahertz or gigahertz. They have a memory management unit so they can do virtual memory. They have a lot of the features that you would expect from kind of a modern PC or smartphone platform. They tend to run Linux, Android or other what are often known as high level OSs. They usually include networking integrated in some form either on the module directly or on the die. Things like 802.11 Wi-Fi, LTE cellular for your phone chips, that kind of thing. So they're internet connected first. And they're usually on the more expensive side for embedded applications. And so tens of dollars though the prices on that are starting to drop. And then the other class is really microcontrollers or what some people refer to as MCUs. And so these are your Cortex-M class CPUs in the ARM world. They're feature limited and they're really designed for real time or highly predictable applications. And so they're what you put in your sensor devices where you need to know that my code is running at a very fixed interval and I hit every clock tick with precision. So these tend to run real time OSs, things like Zephyr, ThreadX, FreeR TOS, there's many, many others. And today less than 1% of them include networking. Most of these are not connected devices and if they are connected, it's in local network protocols rather than internet protocols. And so as a side effect, they tend to be very inexpensive on the order of a couple dollars at most. So a cross is our SOC and when I say that term, what I mean is something that combines the best of both worlds onto a single chip, usually via multiple CPU cores. And so you might have some set of application cores on the chip for your networking stack, for graphics, 3D GPUs, for your machine learning, wakeboard detection, that kind of thing. And you'll have some set of microcontroller cores for real time sensors, your motor code, maybe a simple LCD display. And the idea is rather than having to buy two chips you can get the best of both worlds onto one single chip that you buy for your platform. So there's three examples I call out here. There's more on the market, but these are kind of good examples to give you an idea of what these chips are capable of. At the very top we have the STM32 MP1, this chip's about a year old. It is a dual core Cortex A7 and a Cortex M4 with Gigabit Ethernet. So you can see at the top there's a module that you might drop into your device. There's the MediaTek MT3620, this is the chip my team has been working with for the last couple of years. It's a Cortex A, three different Cortex M4s for different purposes. Integrated 802.11 Wi-Fi, the middle picture there shows it's designed to be a connectivity module type thing where you might drop it on to an existing PCB and integrate it into your solution. And then kind of on the more advanced end, NXP has the IMX8 family of chips that are quad core CPUs, integrated real time cores, they have a 3D GPU on some models. They're really quite powerful. They're something closer to a chip that would traditionally run Android. And so these kind of span a pretty wide range of applications from the lower ends to something closer to a smartphone chip. So okay, why do products use these? Why do we have to care about this if we're building an OS? So I kind of hinted at this earlier, one of the major drivers, it's just a lower cost of ownership. If I've got a product that needs an application core and it needs a microcontroller, it's generally going to be cheaper for me to buy this one system on a chip than two separate chips on my board. The other reason that we're seeing a lot of interest in here is that people can leverage Linux without having to do real time app logic in Linux. And so if I am building a connected appliance, let's say I'm building a connected fridge, I want Linux for my 3D displays. I'm drawing my 1080p graphics and I want Linux for my networking stack. Because it's going to be far more mature and far more capable than anything in a real time OS. But I don't necessarily want to have to drive the compressor purely in straight Linux code because getting that predictable timing in the Linux scheduler, it's doable in a lot of cases but it requires a lot of effort. So I could put that motor code on my microcontroller part of this SoC, put my networking on my graphics code on Linux and really let me focus on my business logic rather than having to make everything fit into one bucket. And then lastly, the major reason that people use this is to add security and features. When you think about a networking stack, the Linux networking stack is very mature at this point and continues to take bug fixes, feature enhancements, react to security issues almost daily. And so I want to be able to bet on that technology that's proven in my product if at all possible. So we're entering a really interesting point in the market where the cost of these crossover chips are dropping pretty significantly. The first ones showed up on the market as early as five years ago, but they were pretty expensive. You were paying a premium. Now, I think, if I remember right, every single one of those three chips I listed is available at less than $10 and that's before volume pricing. And so the real cost to a manufacturer might be considerably less. So they're starting to get to the point where you can consider to drop these into lower cost devices and machines without breaking your budget for your product. The second kind of part where we're seeing adoption is that the curve of ramp up is much easier. So if I have a team that is traditionally a microcontroller team, and they've been writing RTOS code, let's say they're a free RTOS shop and they've been using free RTOS for years. Teaching them Linux is going to take some time. And so this gives you a partial adoption curve where you could keep certain parts of your code in your existing code base, while starting to move over to Linux over time, for example. And then lastly, there's just more silicon vendors are producing these chips. I think one gets announced all the time and there's stuff coming up that I've seen early information on that is getting more capable, cheaper. And I think that trend will continue simply because this is solving a real world problem. So before we dig into how to integrate this into your system, I wanted to kind of show a use case here and talk about the components that you might build if you were building an OS targeting one of these crossover chips. This is really meant to be representative, not the right solution. So I'm only going to spend a little bit of time on it. But this is the product that I've been working on for the last three years, Azure Sphere. The kind of marketing pitches, we're trying to build a solution for secure IoT devices. I think a lot of people are saying that right now. It's certainly a problem that the industry is working on quite a bit. And the idea is we want to run a custom crossover OS on these chips that use a combination of what most people would classify as firmware, bare metal code, a secure enclave, so a trust zone based secure enclave for certain sensitive operations. And the stock Linux kernel. And so if you think about the components in that system, there are many, many parts. And so we actually have a dedicated security core and security runtime we call Pluton. It runs on a dedicated Cortex-M4. It does things like secure boot verification. It does things like key management for hardware backed crypto, and device authentication, that kind of thing. We ended up putting on a dedicated core at the time, because we felt it was the right thing to do for security. Then Spectre happened, and we really felt good about it. But I think a lot of people are starting to see the value in dedicated security cores and enclaves for your really sensitive parts of your operations. We have what we call the security monitor, which is a trust zone based secure enclave. Running on an A7 core, we have Linux of course, running on an A7 core. And we have customer written real time applications running on these other Cortex-M4s. And so these are where they put their business logic for driving motors like I talked about earlier. And every single one of those bullet points there runs in a different execution environment. And most of them run on completely different CPU cores. And so when we're building this OS, it's not just we're building a kernel, we're building a root FS, maybe an init RD, that kind of thing. We're actually building many different components targeting many different CPU architectures. And that's where a lot of the complexity comes in and what led us down the path of really digging into Yachto and figuring out how to get it to build these type of environments. So challenges here. The major one is SMP. Linux development is focused on SMP. When you target a multi-course CPU, it's kind of assuming that all the cores are the same. I mean, that's not strictly true. There have been phones that have shipped, for example, that have some cores that are clocked higher than others. But they're very close in capability. And it assumes that you're going to run Linux on all your cores. You're going to run your high level OS. You're going to let the OS handle your core scheduling on what program ends up where. And that really at the end of the day any app can run on any core. And all of those assumptions kind of melt away when you start talking about crossover chips that have very heterogeneous sets of capabilities on their CPU cores. So you've got these different CPU cores and different capabilities. In some cases, certain peripherals are only tied to certain cores. And so your GPU might only be available to one of the five CPUs on your chip. And so it's really important if you're writing an application that needs the GPU, you end up on the right CPU. They tend to run multiple OSs or firmware. So as we talked about in the last slide, we have various firmware components, a secure enclave. I've even seen setups where they're running two or three Linux kernels on different cores designed for different things. And so you might have a kernel that has a very minimal set of config options turned on. You might have a general purpose kernel, that kind of thing. And lastly, apps have to be targeted to specific CPU cores. And we kind of hinted at that earlier. So the real challenge here is that most of the build systems that I've worked with are really optimized for SMP. They're assuming you're building a traditional SMP environment, you're building for either kind of a simple embedded chip with one type of CPU core or you're building for a server or PC where SMP is the norm. And they're not designed for AMP. And so that is where most of the challenges in building comes in. Okay, so let's switch to Yachto. Let's bring this back to something practical. So I'm not gonna spend a lot of time talking about what Yachto is, their public documentation is fantastic. And I encourage people to read through it if they don't have the right background. But the thing I will say is that out of box, there is really minimal support in the Yachto build system for non-Linux targets. That's not what they're going after. And similarly, targeting multiple architectures. There's some stuff there, it's somewhat limited. But the main thing is Yachto is very extensible. In fact, I haven't run into anything I can't fix through extensibility yet. It's one of the most malleable build systems I've ever used and to its advantage. So when we made this bet very early on that we wanted to use Yachto, we did it knowing that there were going to be limitations but we were going to be able to work around them and extend, rather than have to go find an alternative technology. So I'm gonna talk in detail here about three main techniques for building the non-Linux parts of your crossover OS. We're gonna start with recipe-level overrides, talk about custom classes in Yachto, and then lastly, talk about multi-config, which is their latest attempt to solve some of these problems. Okay, so a recipe-level override. This is the simplest solution you can do. And the idea is when you're building a recipe, and for those not aware, a recipe is simply a component that you're going to build in your system. And so you're gonna fetch some Git code, run configure, run make, that kind of thing. The simplest approach is to just override your flags and just tell Yachto, I know you're smart and you've figured out all the right flags for my Linux variant. Don't worry, I'm just gonna set C flags and LD flags to what I want. The thing that's really interesting is that Yachto actually does this in a number of places and the best example is Uboot. And so if you think about it, if you're building Uboot for your system, it's targeting the same CPU architecture, it's targeting the same ISA, but your flags are a little different. You can't go link in libc.so into it. You may have some different flags to optimize for size to keep your boot loader down, that kind of thing. And so there are good examples of this in the code base. And it works really well if your recipe is entirely self-contained, if it has no extra dependencies beyond your build tool chain. So on the right I have an example where it just shows literally what you're doing here is you're unsetting or basically erasing the default defined compiler flags and overriding them with target specific values. We started with this and this is how we built our very first code in our system. We built our boot loader for our Cortex-M4 by just saying, yeah, you know, you're thinking of booting for an A7, don't worry about it, target an M4 instead. It worked because the same GCC can target both those architectures. And it does work, there's just some caveats you have to watch out for. So if you need to build for multiple target ISAs, it gets complicated. If you need to build for multiple targets, and so let's say you've got a shared library for data serialization that you're gonna use to communicate between your firmware and the Linux instance that's running. You wanna compile that once for Linux, you wanna compile that once for your firmware. That gets a little complicated in this model. You have to have a lot of conditional logic, a lot of ifs and if-def-like equivalents in your recipes which can get out of control. The second thing to watch out for is the Yachto build system by default doesn't understand what you just did to it. And so it'll go take this code that I have on the right, it'll compile it up and it'll say, great, I built the A7 version of this and stash it in their package cache targeted as that architecture. Even though you just said, override this compiler flags, build for M4. Now you can fix this, but you've gotta watch out for all the places that make assumptions. And then the last thing to watch out for with this approach is dependencies. They can get very complicated very quickly. If you end up in a situation where you produce the same file for two different architectures, you're likely going to break your build. And that comes up, for example, if you're building a libc and so you're building the standard c headers like standard io.h, once for your firmware, once for Linux, you'll end up with a conflict. And Yachto tries to do the right thing and point out you have a conflict, but in this case it's gonna create issues. So we started with this, we stuck with this for about six months before we reached the limit of it. And I think for a lot of projects it actually probably is gonna work fairly well. We ran into problems when we started getting more sophisticated, when we started building multiple binaries for the Cortex-M parts of the chip. And so what we moved to next is custom classes. And this is taking advantage of some other features inside Yachto. So for those aren't aware, they already have a mechanism built into Yachto to take a recipe and to build it multiple times for different sets of configurations. You commonly see this when you're building for the dash native recipes, for the recipes that build for your build server versus your embedded target or native SDK, which is if you use the Yachto SDK to ship. It's controlled via this BB class extend variable that you can go look up the documentation on that says for this recipe, I want to build with these classes. And it's pretty fully extensible, but there are a few caveats I'll walk through. Now on the right here, I kind of took an example of one of our existing pieces of code that we use, one of our custom classes. It's a custom class called security FW. So it's the firmware for our security core in the system. This is only a subset. This actual file is like 300 lines long. There's a lot of variables you have to set or have to reason about if the defaults are correct. For those interested and the slides will be posted after this, I've got a link to GitHub where you can see the whole thing. But I didn't try to jam that onto the slide. But the idea here is that you're setting compiler flags, just like we talked about in the last slide, you're setting GCC flags, linker flags, you're setting what libc you use, you're setting what architecture and some other variables that control your build. And so to do this approach, if you want to go down this route, you make a new class definition in a file, a BB class file with the class name. In the class, you override a number of default flags to get your compiler to target your right architecture and set your dependencies correctly. You set class override and you add a vert class handler, which is a tiny bit of code that tells it how to process this. And you can see more in that GitHub link there. So now that you have your class defined, you actually have to leverage it inside your recipes. And so there's a couple of things you may need to do along the way to really get this end to end. So first, you may need to find new tunes, which tunes are basically sets of compile flags. So Yachto has tunes for all the kind of common ARM Cortex-A CPUs, all the stuff that targets Linux. But they don't have the stuff for microcontroller course. And so you may have to add a few. That's really simple to do, so it's no big deal, but that's something that's not in box. You may wanna build a custom GCC. We actually only started doing this about a year ago and it found, I think within a week of doing it, it found four bugs that I'm kind of amazed our code even worked. The problem with using the Linux compiler is you get Linux defines, so like pound define underscore underscore Linux. And if you're pulling an open source and they're saying if def underscore underscore Linux fall into this path, they'll end up going down that route even though it's not appropriate for your environment. And so reasoning about, are my compiler flags correct? Do I wanna build a custom GCC that is closer to the ARM bare metal compiler versus the ARM Linux compiler for example? It's really important. And if you're targeting completely different architectures, you may need a new GCC anyway. So if you've got one chip that is an ARM, like say an A7 and your other CPU cores risk five, a completely different architecture, you're gonna need a new compiler regardless or you're fortunate enough to use Clang and configure how to pass the right target flags there. But you'll probably have to do some tweaks there. The link I have here shows how we defined custom GCC and bin noodles for our firmware, taking advantage of the existing work already in Yachto and extending it to produce a firmware targeting compiler. And then lastly to actually use this in your recipes, you just add BB class extend to your recipe to say what targets do you wanna build for? So on the right we have an example where we're targeting four different classes, native and native SDK. And then our classes that we defined our security firmware class and our high level OS firmware class. And there's also an example there on the right where you can see how we change the dependencies based on what class we're executing in. So you get some control to customize your recipes as appropriate. And then you can just build it with BitBake like you would any other recipe or add it to depends and have the dependency chain pick it up. So the pros and cons of this approach, the main one is you can build a recipe for multiple targets. I talked about that scenario where we've got a serialization library. That came up a lot for us where we wanted to build the exact same to get repository multiple times for multiple CPUs. And this makes it really easy to do so. The other thing that I really like about this approach is it's opt in. So if I say that this recipe can build on these classes in my BB class extend and I try to build it for another target I get a pretty clean error message saying this doesn't match up. We had some problems when we were doing recipe level overrides where you try to build them for a core that we knew wasn't gonna work. It didn't make any sense, but it would start the build anyway and then you'd a cryptic GCC error and we'd lose a lot of developer time of people trying to understand like that configuration doesn't make sense. So this does a better job of expressing that. And then the rest of the system tends to just work dependencies and packaging for example we didn't really have to do a ton of work too. There was one bug fix I'll talk about in a second but for the most part the rest of the system just picked up on this because it's a first class feature. On the negative side it's a little bit harder to reason about if you're not familiar with the product because you have to know, oh should I be building my recipe or should I be building security firmware dash my recipe? So if your developers are coming in new to your product they have a little bit more cognitive load to get up to speed on what targets am I building? You can solve that a bit by having a master level like we have a recipe that is just build the entire OS and through dependencies it pulls everything else in and then we can teach the developers the first few weeks on the project just build the OS target and then over time as you get more familiar and you wanna do more incremental builds you can get into specific recipes. And then the last one there are a few caveats here about complex dependency graphs. If you have a dependency graph that crosses between architectures so you have something in Linux that depends on something in your firmware that depends on something in Linux well first of all try not to do that trust me it's a lot of waste of debugging time but sometimes it comes up like you may wanna embed your firmware into a Linux driver as a binary for example so that it could load it up during startup. We had to do a few tweaks to the S-StateBB class and there'll be a link at the end with those changes as well for people that are curious. This is code that already handles native so I just kinda followed the pattern and it's something where I've been thinking about recently if there's a way to generalize this and figure out if we can upstream something to remove the need to specialize this in the future. Okay last option I'm gonna talk about is multi-config so this is the newest way to do this it's been around for three or four releases of Yachto now so it's been a couple years but when we started our project this wasn't an option and that's part of why we went so far down that class approach and so the idea with multi-config is that you tell Yachto you wanna build more than one machine at the same time and machine in this parlance is for purposes of what we're talking about is a logical CPU and so you have one machine that targets your Linux environment one machine that targets your security firmware environment with your Cortex-M and on the right there's an example this is from the public documentation in Yachto that shows how this works where you create config files for the various targets and then the BitBake engine just knows how to apply those or run over the loop of those configurations rather than one at a time so to use multi-config you create a multi-config folder with a machine.conf for each machine target so that's what those target one and target two.conf are on the right you update your local.conf to basically list the name of machines that you have and so in this case machine one, machine two in the example and then you have this somewhat verbose syntax let's say so you have multi-config colon the machine target colon recipe and then you can also do multi-config colon star as well and then they have a dependency as well because you sometimes need dependencies across machines and so they have this MCDepends syntax this one took me five or six times to read to make sure I understood all the parts correct but it does make sense especially when you see some real-world examples so you're basically specifying the both directions of the dependency from this target to this target and it can go down to the task level so you don't have to say I dependent on the build output you might say I just depend on the do create firmware step to get the final package one that often comes up when you're building your final images for example if you're building a single flash image that contains all your firmware so the pros of this approach is that you need pretty minimal modification and for the most part you don't have to touch your recipes which is great it's pretty easy to build recipes for multi-machine targets you just invoke it big multiple times and the major pro in this approach is that there's continued investment I've seen improvement here just about every release the first release this came out I was looking at it right before it shipped the first time to figure out if we should try it and the release notes had like 10 or 12 caveats of like it's not quite as performant and you might build a few more times you need to and over time those have all kind of melted away and so I think that shows the continued investment which is always great to see in an open source project the cons, the bit-bakes and taxes pretty verbose that threw our team for a while I think they wrap their mind around it it's very logically consistent which is the most important thing but we had some hard times with people just trying to intuit the syntax at first the second con is you can kind of attempt to build any recipe for even any machine even if it's not applicable and so this goes back to what I talked about earlier where it will build you can try to build for a target that makes no sense and you'll just get GCC to panic and throw a bunch of errors there are some ways to deal with that by limiting compatible machines and things like that but by default it just assumes that you know what you're doing when you're typing the bit-bake command line and so there's no guard rails there the multi-config dependencies are a bit limited and what they can share this will not be a problem for most people we had some interesting scenarios where the output of one of our firmware builds generated a header file that we wanted to include in a Linux build and so we needed to pull items from another architecture into our sys route that's pretty dangerous and I think I was doing it again I'd be really careful to try to break that in a different way but it's hard to do that with the multi-config approach okay so those are the three approaches we can try let's talk about some key lessons to extract here that I think are generic regardless of how you're doing this so first of all the most important thing I can say is designed for debugging I don't know how many hours I lost staring at GCC failures or configure errors before I found the one word that said oh wait I'm building for the firmware all my assumptions are wrong and so anything you can do to have your logs make it really clear I'm building for this target architecture and in this logical target is really important for your team if they have to go look at compile flags you're gonna lose a lot of time and a lot of wasted energy keep your targets as isolated as possible so we had a bug where we were accidentally sharing the standard library headers assert.h in particular between our firmware and Linux and it was trying to run the POSIX version of assert on our firmware which didn't compile so we fixed that bug and then our firmware assert ended up in Linux and our programs just started infinitely looping when they hit an assert so I went on a very angry refactoring tear to solve that problem and make sure everything's isolated but it's really easy to subtly leak libraries and headers between your targets and if they're the same CPU architecture and the linker will just do it you might get really tough to reason about behavior the third one is just don't assume one compiler can target all your cores I'm seeing a lot of scenarios where even with an ARM you might have a chip that has a 64 bit Linux CPU so you wanna build the ARCH64 compiler but your microcontroller is 32 bit and it's gonna stay 32 bit for the foreseeable future and so you might need multiple compilers there and I'm seeing a lot of interest in hybrid scenarios with ARM plus risk five or ARM plus a specialized DSP that kind of thing and I think that's going to that trend is going to increase over time and so you have to reason about do I have the right compiler tool chain for my target and then the last one try to keep as much in the octo as possible we originally when we did this we had a bunch of Python scripts that wrapped BitBake I think a lot of people have done this at various points in time to kind of do the final step of the build to gather, grab all the compiled binary images and build a flash dot bin that we're gonna ship off to our partner to put on the flash chips it got hard to debug and it ended up slowing down our build because we weren't taking advantage of all the optimization in octo we had to run BitBake twice and so you get to a point where you had idle cores, that kind of thing and when we pulled it all into a final single recipe in the octo system it not only made our build time better but it made our developer life a lot better because I could just teach them hey, just run BitBake on the OS target and you will get a fully bootable image and if something went wrong all they had to they just had to understand how to debug octo rather than having to debug octo and having to debug the custom scripts that I wrote on a whim one day and so we've really benefited from keeping as much as possible which is why I think it's so important to look at extensions like this and pull the stuff into octo rather than build your firmware somewhere else and link it in later. Okay, so I've got some links here on the slides and again like I said, the slides will be posted to the schedule site so you can get these links rather than having to quickly type down the randomly generated hex strings but this shows a lot of the changes that we did in the full examples that I showed on the kind of right hand side of that screen in some of the earlier slides so you'll see the full real world example code for some of the custom classes and multi-config stuff and that last link is actually our entire open source OS drop it's not everything we produce in our OS but it is enough to show off a lot of these concepts and that is updated monthly that's the link to the latest one and that includes the entire octo tool chain that we use and so that's a real world end to end example if you wanna see something concrete. Okay, with that I think we've got a few minutes for any questions. Oh, great. Oh, yeah, absolutely. So if you set them up correctly, a single code will build the whole system, all different operating systems and whatever. Absolutely. Or I'll set it up but you said, can you go back a little bit? There's a, it said that you're trying to build a recipe I think on the comments on the next one. It'll try to build a recipe for any, it might be this one. Yeah, okay, so can I just go in and write the current machine? Could you explain that? How did you come up with that? So we ran into a couple problems where we had good examples, we had a static library that was for a firmware only and so it was some bare metal code and we ran into a problem where people would run BitBake by default targeting the default machine which was Linux in our case and it would try to build it for the Linux environment and fail to build and the reality was that it was just never even applicable. It was a misunderstanding by our developers but there wasn't guide rails to stop them and give them a clean warning of this isn't applicable versus here's this GCC error because your libc doesn't link correctly. Sure, okay, other questions? Yes, and I could probably give an entire talk on how we did update because as I'm sure everyone's aware it's a very simple topic with one clear solution. We ended up building a custom update solution that we've talked about in a few public talks previous to this conference but we ended up having to do some custom stuff specifically around bootloaders to get our bootloaders to update, that kind of thing. It's a big part of the product offering that we built on top of Linux but the biggest challenge and update for us was mostly about redundancy and making sure that if someone pulled the power cord at any moment in time it kept working because on an IoT device people do that all the time. Other questions? Okay, thank you very much.