 I apologize for that. I'm going to have to go fast here to make up time. So I apologize. This talk is about building a network operating system using Linux and Yachto. My name is John Mahaffey. I'm a Linux kernel architect at Hewlett Packard Enterprise. It's not about how to build a network switch per se, but it's how to use Linux and Yachto and how we did that to build our Aruba OS CX. That's a marketing name. I'm going to use Halon in this presentation. Network operating system is basically a command set. It looks a lot like a Cisco switch with things like turning off a port, you say shut, but to turn on a port, you say no shut. Anyway, you get used to it after a while. It's based on OpenSwitch and it's based on Linux underneath it all, both the kernel and user land, all the applications and everything are based on Linux and GCC and Yachto as a build system. This talk will discuss the practical challenges of combining those open source technologies. Looks like I lost all my timings. I hope I've got the right presentation up. The Linux kernel is very configurable. The trick is finding the right combination of kernel configurations and tables and various pieces that go together to make it work. In the switch, we use a RAM-based system. It's completely memory-based. That includes the kernel, all the routing tables and those sorts of things along with the user land is a RAM disk. A trick here is you start small. You don't really want to start with the SDK that you get from your chip vendor. That's got everything in it, but the kitchen sink. It's hard to pair that down, but if you start small and build your way up, you end up with a much more manageable size. Use the Octo Layers mechanism to add platform-defendant features. We'll talk more about the layers and how that works, but it really helps you configure different platforms differently. If you have dependent kernels, for instance, a Kexec kernel, or in our case, we also have a boot kernel that we use. You trim those even more aggressively. It's easy to start with. Here's the OS that we have. We'll just reuse that, but it wastes a lot of space and time. Use Busybox. Start with Busybox. When you need the real features that aren't in the Busybox, we didn't try and configure Busybox. We just used Busybox as it came out. You don't really want to go down that path. If you configure Busybox and you run into more problems, you just go down that rat hole. We didn't do that. We would use the real packages when Busybox wasn't enough. If you've got a small RAM footprint and a smaller network switch, you can share RAM for functions that are needed at different times. For instance, if your redundant management says, oh, there's a problem with this, we need to reboot it. You want to get your logs and your core dumps and everything together. You really want to fail over the functionality to another CPU or another switch and then use that time to gather all your pieces. You no longer need network buffers and so on. You can use that space differently. Kernel updates. If you saw the keynote, Greg says you really want to keep up to date with your kernels. I agree with that. We started with kernel 4.4 in 2015 and then migrated to the LTS kernels available in Yachto along the way. Those kernel updates, boy, I wish my, this isn't the presentation. Anyways, they provide critical bug fixes. They provide security updates. They provide new features that perhaps your newer hardware platforms need. But they also provide a convenient excuse for engineers to say, well, the kernel upgrade, you know, it failed after, my code was working fine and after the kernel upgrade, it doesn't work anymore. When we upgraded from 4.4 to 4.9, we got 18 different bugs all blamed on the kernel. Half of those were actually code or design flaws that were exposed by the upgrade. The kernel didn't used to care about it and it started, you know, being a little bit more careful with parameters and so we had to fix it in their code. Yeah, your code worked before and it doesn't work now and it's your code that's bad. Only two of those were actually kernel issues. The rest were misunderstandings. Management perception is guided by its initial reports and from people that they trust, you know, master architect level people. They listen to them and us saying, you know, no, no, no, it's really not. We had to spend a lot of time doing triage and analysis and putting together presentations to management. It really helped a lot that our management at HPE is pretty much all, you know, from the ranks, technical people that understand this sort of stuff but it had to be explained in words they could understand. So once we did that, they understood the issues and why we needed to upgrade kernels and for some reason the upgrade from 4.9 to 4.14 was a lot more smoothly. Now maybe we just flushed out all the really bad bugs or maybe it wasn't as difficult of a transition but it went a lot more smoothly and management expectations were properly set so we didn't have that same issue. Yachto layers. We use extensively in Halon. We've got Pocky layers for the basics. There's four Pocky layers that we use, Meta, Meta Pocky, Meta Yachto BSP and then we added a layer called Pocky patches. So anytime an engineer wanted to change something that was in the Pocky layer, one of the Pocky layers, we said, no, you can't change the layer there. You have to put the changes into Pocky patches and that helps us upgrade the Yachto. So we keep our Pocky pristine and then we can just remove it and then put in a new one and then apply the patches that are still necessary. We've got an open embedded layer. A lot of stuff in open embedded. We only bring in the stuff that we need. Open embedded is very large and costs a lot of space when we brought in the whole open embedded as a layer. So we just pick and choose and bring that into the open embedded layer. We have a Meta Foss layer for open source packages that aren't part of Pocky or open embedded. Stuff like DevTools and real-time database stuff, math packages, various things that we bring in from open source. We have a Halon common layer that is common between all of the boot loaders and the Halon OS and everything. And then we have the Halon distro layer, which is the commands, the NOS commands that are custom and common to all of the Halon platforms. And then we have platform dependent layers which are features that are only needed by some switches and not by others. If you have a big chassis, a core switch, it has a different set of features than a small edge box that is very limited in memory and limited in features. How many of you are familiar with Yachto and Yachto layers? Okay, so you guys get it. Yachto layers have the same issues as kernel upgrades, Yachto upgrades. They're a lot harder to justify than the kernel upgrades because you don't have quite so many CVEs and if you do, you can pick and choose. And management really doesn't want to have any trouble. We're getting close to release. It's only a few months away. We don't want to have any of these changes. So you try and do your Yachto upgrades at the beginning of a release, but off in the beginning of a release is the end of the previous release. Key personnel are often needed elsewhere. The customers are finding issues with release and they just start, you know, let's put all hands on deck and find out, fix what the problem is and that kind of thing. So the takeaway is you got to be persistent and if you see a release coming up, say in six months, you're going to be releasing and then moving on to the next release, then that's the time to branch, you know, get your prep work done. Even if you've got it pretty well done, you're still going to have maybe a month of transition, but that month is okay. It's the three months that it takes to bring it in that puts you behind the eight ball again. Subsystem upgrades also have the same issues. Upgrades of large subsystems such as the ASIC SDK, they have the same issues in addition. We don't normally allow merge commits in our Git repository. They cause issues, but for a large subsystem, it's easiest if you branch and then remove the old code, bring in a pristine copy of the new code and then replay all of the commits that were on top of that old code. So if you follow that as a methodology, then it works pretty well. The commits come in one at a time. Some maybe aren't needed anymore. Some, you know, are minor upgrades. Some you might have to rewrite, but once you're ready and you have your commit, you know, your upgrade ready to go, then you push it into your repository. Make sure you save the Shaw before that push and the Shaw that comes out after that push because you're for sure going to have people coming and saying, well, your SDK upgrade broke my code. And sometimes that happens months later. It's like, okay, well, I've got this issue. It only happens once in a while. And I've traced it down to the SDK upgrade. So it's useful to have actual system images that can be booted with those Shaw's from before and after, save those for months. You know, sometimes, like I said, three months, five months later, people ask you, did you have an image that I can just try it on? Halon uses 527 out of the packages. 122 of the FOS packages, 276 kernel modules. Some of those are custom. 518 custom user land packages that we wrote for the NOS for Halon. Currently we support two architectures and six variants. So we're using all of the power of Yachto. Build times varied widely throughout the development, sometimes as much as four hours. We tried a lot of experiments to get things down. We tried Ccash. We tried Ice Cream. We tried Electric Accelerator. We tried, you know, the ASIC SDK. We said, well, if you're an SDK developer, you've got to live with it. But, you know, when you're done, then just check in the library that comes out of that build process and then everybody else doesn't need to live with building the SDK every time. But we, you know, we've got it down to a reasonable amount. It's about 20 minutes to 40 minutes for a build, depending on what you change, which is still too much, but we're working to get it down. Biggest remaining issue we have is depends chains. Our system is based, the user land software is based on CMake. CMake needs to have all of the packages that it needs in the sysroot. So you have to say it depends on this. And so if you make a minor change to a package, say you add a check for a null pointer or something like that, it doesn't really change. You don't really need to rebuild everything. But Yachto says, well, it depends on that and you rebuilt that. So now you need to rebuild everything. That depends on that. And the blast radius from a change that's high up on that depends changes can be quite large. I think I might have some ideas on how to mitigate that. And I'm going to take them to the Yachto BOF that's after this meeting. And finally, we ended up with some PCI issues that turned out to be one of our biggest headaches. And it's primarily due to the high availability requirements of our chassis products, the core switches. The HA architecture requires basically that the modular line cards that plug into the backplane have to remain up and passing traffic during a failover event. So if the CPU fails from the active to the standby, you would want to have it continue to pass traffic. Now, if you're punting packets to the CPU, those, of course, would get dropped. And there would be recovery from that. The failover happens fairly rapidly. This is a very schematic diagram of the PCI architecture. In the backplane, you have an active CPU and you have a standby CPU, which is powered up and keeping synced with all of the changes that are going on on the active CPU. There's a non-transparent bridge that fans out to all the line cards on the backplane. The line cards actually have another non-transparent bridge on them. And then that fans out to the CPU, the ASIC, and all the PCI peripherals that are on that. And a lot of our problems stemmed from the fact that it isn't tested this way in the open source community. The usual PCI architecture has the root complex on the CPU talking to a non-transparent bridge, which then talks to endpoints. And so non-transparent bridge talking to another non-transparent bridge isn't well tested. So we had line cards have a fairly underpowered CPU running a transport layer on there. And what we would find is if there was a lot of packets being punted to the CPU, a lot of traffic over the PCI bus, the transport layer would starve due to interrupts, the DMA traffic. It wasn't so much of a PCI bandwidth problem, but a CPU bandwidth problem dealing with the interrupts of that traffic. And since the interrupts are in the kernel, they have a higher priority and the transport layer would time out. And the management software would say, oh, that line card is not working. And it would reboot it. So the answer to that problem was we created a fair share algorithm so that the transport layer would always get some cycles even under high DMA traffic loads. We ran into a problem where PCI drivers on the CPU, the main CPU, would misinterrupts and then confused it and it would lock up. This is a known problem that was fixed in later kernels, but it wasn't fixed well enough for our particular PCI architecture. So we went back to just using polling rather than interrupts. And that seemed to fix most of those kind of issues. The third issue we ran into was the active standby failover would cause issues. The non-transparent bridges on the line cards would have to switch from one CPU to the other. The PCI drivers just weren't really up to that. And the answer basically is when we failover, we would reset all the PCI links to the CPU. And one of the biggest changes that made the most difference was changing to using, so the PCI driver, we used Ethernet over PCI driver. We severely modified the one that's available in open source, but it made a big difference. The Ethernet protocol is much more forgiving of things like changes in link status and switching over to a different CPU and that kind of thing. The KISS principle, keep it simple, definitely applied here. I mean, this stuff is very complicated. And if you get very complicated in your own software, you're just asking for trouble. Apologize for the speed of that, but that's the end of my talk. So I think we have five minutes or so for questions if there are any. Yes. The question is how difficult is this to port this to another platform? We're not finding it terribly difficult. I mean, it depends on how different the platform is. We've recently started using, like I said, a different architecture, which causes a whole raft of issues, but it's manageable. So I don't know in terms of difficulty. Yeah, there's putting a new platform in, aside from actually designing the hardware of it, just the software changes a few months. Yes? That's not too terribly difficult in our experience. There's a bit of interface issues, but it's fairly straightforward. If you have a working, booting hardware platform and kernel, then the Halon configuration, so in the Halon layer, in the MetaDistroCNOS, it has all of the kernel configurations. So we use CFG files and SCC files, and so it brings in the Halon configuration on top of the layers that, you know, you concentrate your layer on the architecture and things that you need for your peripherals and things that you have in the hardware. And then the Halon layer brings in the other pieces that are needed for the NOS. So from a kernel's perspective, it's fairly straightforward. There's testing, of course, there's always issues regarding integrating new tests into the CIT infrastructure. Anybody else? So you talk about kernel configuration? So we use the Kconfig database for configuring the actual Halon NOS, and so that's a very well understood. So all of the network configuration stuff is encapsulated in a menu config kind of thing that you bring in those default configs, and you can bring in custom configs of those Halon configs. And then the software, when it's built, it takes all of those config things into consideration. Other questions? Well, thank you. Sorry about the delay.