 Thank you. Welcome, everybody, to the first talk this morning. It's a great pleasure to introduce Arun Thomas, who is going to talk about arm kernel internals. Thanks. OK, so I'm Arun Thomas, and I'm going to talk about BSD arm kernel internals. This is my second talk at a EuroBSD con. I gave a talk at a EuroBSD con 2011 when I was in Holland, and it was a great time. The EuroBSD con events are always a lot of fun, a lot of interesting talks, and interesting people. So let's get started. So we're going to start off with a little bit of a quick demo. Yep, cool. So what you're seeing now is a free BSD booting on arm, specifically the Beaglebone Black. So there are a couple versions of the bootloader that ran, and now the free BSD U-bootloader is running. I have some screenshots, so a lot of stuff flew by. We'll have some screenshots where we can actually kind of look at what's going on. So the U-bootloader is going to load the kernel shortly. And here we go. OK, so now the kernel is running, and we're doing a bunch of device initialization. And so what I'm going to do throughout the talk is kind of go through the process of how free BSD boots on arm. And we'll focus on the machine-dependent parts, kind of the early kernel initialization. I won't talk at all about the machine-independent initialization. There are a lot of other good resources out there for that. OK, so now we're in user land. And then we're going to run a bunch of the knit scripts and eventually get to a prompt. So as I said, this is a Beaglebone Black. It's a pretty convenient platform. It's pretty cheap. It's pretty powerful. It's supported by all the BSDs except Dragonfly, which doesn't have an arm port yet, as far as I know. And it's going to keep booting. We're almost there. So I have it hooked up via serial cable now, so we can see all this stuff. So now I can boot. I can log in. So log in as root. No password. It's very secure. So that's the message. And then if I run uname-m, running arm, which is what you'd expect. So let's go back to the output. So as I said, there's a couple versions of U-boot-boot. All right, cool. Good. That was good. All right. So there's a couple versions of U-boot that run. So there's SPL. And then there's the real U-boot that runs. Then there's the U-boot loader. It'll load the kernel. And it uses something called a DTB, which we'll talk about, Device Tree Blob. And then you get to the copyright notice. So the kernel's booted. Before this happens, there's a lot of machine-dependent initialization that needs to happen that's arm-specific. And so we'll go through what the code that is used to achieve this looks like. So if you look at the logs a little more carefully, so it's FreeBSD11 current. And it's got a weird revision because they use git. I use crochet to build the image. And we'll talk about what crochet is. I use the standard Beaglebone config. And it's built on arm. It's arm. So FreeBSD is kind of interesting in that it builds a system with Clang even on arm. And so here's a CPU that Beaglebone Black has. It's Cortex-A. And I'll talk about the different CPUs that ARM has. So the little thing about the CPU features has thumb-to support. And it talks a little bit more about the output has more about the cache hierarchy and stuff like that. Down here, you see something about the Texas Instruments AM-3358. And that's the SOC that is in the Beaglebone Black. And we'll talk about SOC's system on chips. A little bit more about device tree. We'll talk about what that is. Down here, the last line, you see the mapping for the serial. This is basically how the serial gets configured. So it's a 16-5-50. It's mapped at this address, 4-4-E-09-0-0-0. And it's at IRQ-72. And so we'll talk about how this information actually gets to the kernel. So my goal with this talk is to get you all hacking BST on ARM. How many of you have done little or no hacking on BST ARM? Anybody? Excellent. So this talk is designed for all of you. So my goal is to get you all hacking, or at least interested in hacking BST and ARM. So the talk is three parts. So I'll start off with a little ARM 101. So a little bit about the instructions that architecture, some of the hardware that's out there, some of the SOCs that are out there, to go to find the kind of useful ARM documentation. Then after that, we'll look at some kernel code from FreeBSD and NetBSD. We'll focus on the machine-dependent kind of early kernel initialization code. And then there's a short section on BST ARM tips and resources, so kind of how to set up your development environment, where the good debug tools are, and other good resources so you can continue your ARM study. So I gave a version of this talk at a BST can a few months ago. And so I can honestly say that a patch, someone actually submitted a patch because of this talk. So one of the FreeBSD committers noticed a typo in my slides. And it was actually a typo in one of the FreeBSD source files in LOCore.S. So they fixed the typo an hour after my talk. They made a commit for that. So that was actually kind of cool. So I'm hoping all of you will also make some patches against BST ARM. OK, so let's get into ARM. So ARM is, as you know, hugely popular in embedded systems. So your smartphones, your smartwatches, now all that stuff, you probably have several ARM devices in your pocket right now. It's moving into general-purpose computing, so your laptops, desktops, netbooks. So the Samsung Chromebooks have ARM chips in them. It's also moving in a server platform. So AMD is making ARM servers now, which is kind of cool and kind of crazy. It's also moving to high-performance computing. So NVIDIA has some pretty cool GPGPU platforms. There's just Jetson board. And I saw some posts on the FreeBSD mailing list that people are actually working on porting a FreeBSD to that, so that's actually pretty cool. ARM has an interesting business model. It doesn't manufacture chips. It basically licenses their architecture and their processor designs to other vendors like TI and Samsung. So it'll take the core, add some extra logic, and then they'll fabricate and sell that. So it's kind of an interesting business model. Okay, so let's get into the ARM architecture. So ARM stands for Advanced Risk Machine. Formally, it was an acorn risk machine. So a risk machine is a reduced instruction set computer. So it has simpler instructions, simpler addressing modes. It's a load store architecture. So if you want to operate on memory, you need to load memory, your memory, the word for memory into register, operate it on there and store it back out. There's no memory-to-memory instructions like in X86. It's big Indian or little Indian. Little Indian is far more common, especially in BSD. There've been several versions of the ISA over the years. So there's been the current ones are ARM V7 and ARM V8. So ARM V7 is a 32-bit ISA, and ARM V8 is a 64-bit ISA. The ARM V8 architecture is actually pretty cool. ARM cleaned up a lot of things in the architecture, but I won't talk about it all here. We'll stick to ARM V7. ARM also has several architecture profiles. There's the application profile, the real-time profile, and the microcontroller profile. We'll only talk about the application profile. The real-time and microcontroller profiles are for embedded systems that don't support virtual memory. So as I said, we'll focus on ARM V7A, so the 32-bit architecture. So if you're looking at CPU models, these are the Cortex-A CPUs, so Cortex-A5 to A15. And we'll get into the different CPUs that are out there. So these are CPUs with full MMU support, and they're designed for what ARM calls full feature operating systems, so things like FreeBSD, NetBSD, iOS, that kind of thing. So ARM V7A actually has two instruction sets. There's the ARM instruction set and the thumb instruction set. So the ARM instruction set is 32-bit. It was the original instruction set. And then later, the ARM introduced the thumb instruction set, which is a mix of 16-bit and 32-bit instructions. Originally, it was 16-bit, but with the introduction of thumb 2, they also added 32-bit instructions. And the reason why they added the thumb instruction set is for code density, so they want to code, if better code density is good for your caches, so that's the main reason, and which is good for performance. Okay, this is a term that you'll see a lot, system on chip SOC, especially in the ARM world. So what's an SOC? So essentially, it's basically an ARM CPU package up with a bunch of other logic. So typically on these SOCs, the ARM CPU is just a small piece of the full system. So here's some examples of things that you'll find in your SOC. So you've got interrupt controllers, timers, UARTs, SDMMC controllers, SATA controllers, USB controllers, GPUs, all kinds of peripherals. So like your GPS controller, your all the stuff, all that functionality that's basically in your phone is probably has some sort of corresponding block in the SOC. So here's an example of an SOC. This is the AM335X that's in the Beaglebone black over there. And as you'll see, the core is actually a fairly small portion of it. So that's the Cortex-A8. That's what's in the Beaglebone black. So if you look at it, there's a lot of other logic. So you've got a GPU in there. You've got like a bunch of buses. You've got UARTs. So you've got SPI-IC, timers, watch our timer, real time clock, JTAG, ADCs. You got your MMCSD, GPIOs, USB, Ethernet Mac, memory controllers. So there's a lot of other stuff that goes into an SOC than just the ARM core. So this is the board that I've been using for the demo. It's a popular hobbyist board. It's called the Beaglebone black. It's 55 US. It used to be 45 US. When they released the RevC version, which is the current version, they increased the price by 10 bucks because they increased the onboard flash from I think two gigs to four gigs. I think in Europe, you can get it for around like 53 euros from one of the European distributors. So it's supported by FreeBSD, NetBSD, and OpenBSD. If Dragonfly had an ARM port, I'm pretty sure it would support this board as well. It's a pretty nice platform. It's pretty powerful. It's pretty compact. You can only get it powered from USB. So that's actually kind of cool. And it was kind of nice because when you're packing for a transatlantic trip, it doesn't take up a lot of room. So it's kind of cool. Okay, so here are a bunch of the different BSD supported ARM V7 SOCs and boards. So if you grab one of these boards, you can run BSD on them. So I'll start off with the Texas Instruments family of SOCs. That's, they're really popular. They're in a lot of really popular boards. So the TI, OMAP3, and DaVinci are found in the Beagleboard and the Beagleboard XM. The Sitara SOCs are found in the Beaglebone White and the Beaglebone Black. And the OMAP4 is found in the Pandaboard and the Pandaboard ES. So another popular set of SOCs is from Allwinner. The Allwinner A10 and A20. These are found in the QB truck, the QB board and the QB truck, respectively. The FreeSkill IMX6 is found in the WAN board. It's also found in the Novena laptop. It's an open source laptop from Bunny Hwang and company that's getting released soon. It was like a Kickstarter campaign. So that's actually kind of cool. The Samsung Exynos 5 is a really high-end SOC. You'll find this in the Chromebook and the Arndale board. The Xilinx Zinc is actually a pretty interesting platform. So basically it pairs a Cortex CPU with a bunch of FPGA fabric. And you can find that on the Z board and the MicroZ. So I get a couple of the Z boards in at work, and it's a pretty cool platform if you're doing any hard bruison. So here's a list of the various Cortex CPUs that are out there, the A15 through the A17. So at the low end, you've got the Cortex A5. You'll find that in the FreeSkill Viber. For your really kind of embedded stuff. At the low end, you've got the Cortex A8. So the TI OMAP3 DaVinci Sitara. So those are the ones that you find in the Beaglebone and the Beagleboard. The AllWheeler A10 also has the Cortex A8 that's found in the QB board. The Cortex A9 is the mid-range CPUs. So that'll be found in your OMAP4, which is the Panda board, the FreeSkill IMX6, which is the One board and the Zinc board. So the A8 and the A9 are basically what you'll see in a lot of these hobbyist boards, the majority of them. The Cortex A15 is the kind of the high end of the ARM CPU models. So it's found at the Samsung Exynos 5, which you'll find a lot of your high end smartphones, as well as the Chromebook and the Arndell board. The Cortex A7 is one of the newer CPUs. It's a replacement for the faster A8. It's a faster A8, so it's going for the lower end. And you'll find this in the Exynos 5. So the Exynos 5 is kind of interesting because it has the Cortex A15 and A7 in what they call a big little configuration. So that's kind of interesting. The A8 is also found in the all-winner A20, which is in the QB truck. The A12 and the A17 are newer processors. They're supposed to kind of fill in the mid-range niche. So they're A9 replacements. I don't know. I haven't seen any SOCs that they're in yet, but I'm pretty sure they'll be in a bunch of boards pretty soon. OK, so we talked about a bunch of the hardware that's out there. Let's talk about the software now. So we'll talk about ABI's. So what's an ABI? An ABI is an application binary interface. So if you read the ARM docs, it says that there are rules that an ARM executable must adhere to. So these are things like executable formats, calling conventions, alignments, what do system calls look like? So ARM has several ABI's. There is the ARM-embedded ABI and the ARM-embedded ABI with hardware floating point. ARM-E-ABI and E-ABI-HF. And these are the kind of two current ABI's. There's an older version. There's an older ABI, ARM-O-ABI. I don't actually know what the O stands for. It might be original or old or obsolete. I don't know. But it's not really used as much now. So NetBSD and FreeBSD both support E-ABI and E-ABI-HF. And depending on which ABI you want to use, you'll build your tool chain for that ABI. OK, so let's look at the instructions set a little bit more. ARM has 16 general purpose registers, R0 through R15. Some of them have designated uses. So R11 is the frame pointer. R13 is the stack pointer. R14 is the link register. So the link register, when you do a call instruction on ARM, it'll save your current PC. So you have something to return back to. So that's what the link register is used for. R15 is used as the program counter, the PC. ARM also has some program status registers. And it also has a floating point. They're called VFP registers in ARM and a SIMD registers that are called NEON. OK, so I won't talk at all about the VFP and the NEON stuff. But if you're doing numerical code and vector code, you should look into that stuff more. So there are two program status registers. The current program status register, the CPSR, and the saved program status register, SPSR. And the SPSR is used for exceptions. And so it holds some important bits for the system stacker. So it has the processor mode, for instance, SVC mode. The interrupt mask bits, like IRQ mode, if you want to disable interrupts, basically, you'll set this bit in this register. So if you'll set IRQ bit to disable interrupts. It also tracks the state, so whether in ARM state or thumb state, as well as the andiness as a processor operating in big Indian or a little Indian mode, as well as your standard condition flags, negative 0, carry overflow. OK, since we'll be doing some kernel hacking, it's good to have an idea of what the assembly syntax looks like. You'll at least be reading some assembly when you're doing kernel hacking, probably, if you're looking at some odd jump and stuff like that. So here's a fairly simple program. It just adds the numbers 1 and 2 together. So what this does, it'll load the immediate value 1 into register R1, and then load the immediate value 2 into register R2. And then it'll add R1 and R2 and put that into register R3. So the destination's on the left side. So assuming the process is implemented correctly, you don't get any weird cosmic bit flips. You should have 3 in R3. As I said, we've got to arm the load store architecture. So if you want to load, you have to load and store memory. So here are examples of the load and store instruction, so LDR. So this will load the value that R1 points to into R0, and then this will store the value that R0 has into the memory pointed to the pointer, whatever R1 points to. You can also push and pop things to the stack. So this is how you push multiple values onto the stack. So this will push R0 through R2 onto the stack. And then you pop those values into R0 through R2. These are actually aliases. For ARM, it has these load multiple and store multiple instructions. So you can load multiple words and store multiple words to and from memory. ARM also has control flow instructions, as you might expect. So this is how you do a branch. So branch of 0, BZ loop. The call instruction uses the branch and link instruction BL. So this will jump to funk and then save the current PC to the link register. And if you want to return, use the branch exchange instruction, so BXLR. So this will jump to the link register. You save PC. And in older versions of the ISA, you would just do this. You just directly move the link register into the PC. That's deprecated in ARM v7. OK, so now we went through some of the usual level stuff. Let's look at the OS relevant stuff, because that's what we're doing. We're trying to get an OS running on ARM. So ARM has several privileged levels. We'll only talk about two of them in this talk. So PL0 is used for unprivileged user code. PL1 is used for privileged kernel code. And ARM also has several operating modes. This stuff's a little bit complicated, but it's good to know if you're doing exception handling when you're doing exception handling. So there's one unprivileged mode, as I said, that runs at PL0. But then there are eight privileged modes that run at PL1 and above. So Supervisor and IRQ are some examples of those privileged modes. And the privileged modes are used primarily for interrupt and exception handling. So here are all the modes that the processor can be in. So Supervisor mode is used for system calls. So ARM has an instruction, SVC, or Supervisor call, and that's how you trap into the kernel. In earlier versions of the ISA, this is called SWI, or software interrupt. This is actually what you'll see in the BSD code. It's also the initial mode that the processor starts up in. There's also interrupt mode, which is used for normal interrupts. Fast interrupt mode is used for higher priority interrupts. Also, the processing of the interrupts is a bit faster, as you might expect from the name. Abort mode is used for memory faults. And undefined mode is used for illegal instructions, or also to emulate instructions. System mode is a privileged mode that allows you to access user mode registers. It's not really used much. Hypervisor mode is used for virtual machine monitor support. And monitor mode is used for ARM's trust zone stuff. The most important modes, the ones you'll see when we look at the code examples, are Supervisor mode, interrupt mode, abort mode, and undefined mode. And we'll see those when we look at some of the BSD code. So ARM has an interesting feature that I'm just going to mention. I don't really have the time to go into it, but the feature is called Banked Registers. So most registers are shared amongst the various modes. So you have one PC in all of those modes that I mentioned earlier. But Banked Registers are dedicated registers for each mode. So they're actually duplicated. So ARM allows you to have separate stacks for each mode. So you have separate stack pointer registers. So there's a stack pointer register for user mode and a stack pointer register for Supervisor mode. So I don't really talk about this. I just want you guys to know that it exists. And it's important for exception handling. If you want to know more, some of the ARM docs will tell you all the details. So let's briefly go over ARM virtual memory. So on ARM V7, you're looking at a 32-bit address. If you're not using LPAE, which is the large physical address extensions, with LPAE, you're looking at a 40-bit address. But we're just going to focus on the stock ARM V7 virtual memory architecture. So with a 32-bit address, you're looking at a 4-gigabyte virtual address space. ARM has paging support. There's two levels of page tables. ARM calls them translation tables in the documents. And there's a hardware-managed TLB. So the MMU does the page table walk-in and TLB miss. The commonly used page sizes are 4 kilobyte small pages and 1 megabyte sections. So one important part of the architecture is a coprocessor 15, which is the system control coprocessor. It's heavily used for systems programming. So if you're doing in kernel hacking, it's good to know that it exists. I call it the kernel hacker's best friend. So this is used to set up the processes page table. So this is the instruction that you used to do it. So this will write to the translation table base register. So basically, you'll look into the docs to figure out how to do this. But the instruction is MCR. Move to coprocessor from register. And these numbers, basically, you'll find in the docs, basically. But this is how you install the page table. Another thing that the coprocessor 15 holds is the system control register. And that's pretty important because this is how you enable the MMU. And it's also how you enable branch prediction and caching. It's also how you tell the processor where the exception vector table is going to be. And you use that with these instructions. Again, you'll basically just look at the ARM docs. If you don't read from it, it's moved to register from coprocessor. And then to write, it's moved to coprocessor from register. Basically, you look at the docs to figure out what numbers. But these are the numbers that you need for the coprocessor stuff. OK, so that's kind of like a quick introduction to the ARM architecture. If you want all the details, these are the guides that you should be looking at. So this is a quote or a snippet from the NetBSD source code, CPUfunk.c. And I'll read it out. And thus, spake the ARM ARM. So what's the ARM ARM? The ARM ARM is the ARM architecture reference manual. It basically has all the details that you could possibly want to know about ARM. It's about 2,000 pages. So you're probably not going to read it cover to cover, but it's a great reference. So for our purposes, we'll want the V7a and V7r version of the manual. So they constantly updated it with CPU or Rata. So there was a 2014 version. So that's the one you should grab. The Cortex-A series programmers guide is also great. It's a much lighter, quicker introduction to ARM. So I would definitely grab that if you're new to ARM. So ARM released this originally in 2012, and they've been continuously updating it. So it's now in version 4. So that one just came out this year. So these are both free. You can grab the PDFs on the website. And you have to register for a website on ARM, but the documents are free. So you should definitely grab these right after the talk or even now. The other guide that's good is the ARM system developers guide. It's a decade old. It was published in 2004. So ARM's moved a lot, come a long way in the last decade. So some of the stuff's dated, but in terms of the system level aspects, especially like exception handling, all that stuff, this is a really great resource even still. OK, so in addition to those manuals, you'll want to get some manuals for your specific board. So in this case, for the big one, black. You get a Cortex-A8. So you want to grab the manual for the Cortex-A8, the technical reference manual. You want to grab the AM335x technical reference manual, because that's the SoC that's on this board. And then you want to grab the Beaglebone system reference manual as well. So the SoC's TRM has a lot of useful information, like the memory map. So if you want to write to the serial port, UART0 is mapped to this address. We saw this earlier. 4409000. So if you just write to this address, that's where the transmit register is. So you can get output to serial that way. The interrupt controller is mapped to this address. The DM timer 1, which uses the clock, is at this address. And the DRAM is mapped to this address. So the TRM has a lot of useful information, the SoC TRM. ARM also has a lot of useful migration guides. So if you're coming from MIPS, PowerX86, these are good. They're kind of a quick start guide. So the IA32 guide has useful information, like on ARM characters are unsigned by default. And so that can cause problems. So you should definitely grab these if you're coming from one of these other architectures. OK, so now that we kind of went into the basics of the ARM architecture, and I showed you guys where to get more documentation, we can start digging into the code. So the vast majority of your OS code is going to be machine independent. So it's going to be the same across all the architectures. And then a small portion of it is machine dependent. So that depends. That's like ARM specific code. And this is usually a mix of C and some assembly and inline assembly. So we're going to look at some examples from FreeBSD and NetBSD and kind of go through the machine dependent initialization of the kernel. So FreeBSD and NetBSD both have really great ARM support. And there are some notable differences. So NetBSD's build.sh, which is what you use to build the system, allows for cross OS building. So you can build on Linux or Mac or whatever. So that's actually kind of cool. FreeBSD uses Clang to build the system. So that's actually kind of interesting, even on ARM. So FreeBSD uses device tree for hardware configuration while NetBSD uses the auto comp framework. FreeBSD has an extra bootloader stage called Ubootloader. And we'll look at that when we get into booting. So here are the pads that you want to know when you start digging into the code for NetBSD. So all of the ARM specific code is under sysarc ARM. So the headers are under include and include ARM32. And the .c files are under ARM, ARM32, and Cortex. So these are the pads for the core ARM architecture support. Then if you want to look at the SOC and BeagleBone specific code, you'll look under OMAP and EVB ARM. So EVB ARM stands for Evaluation Board ARM. And that's where all the platform specific code goes. So include has the header files. And then Beagle has your source files. Then the configuration files. So if you want to figure out which files get built when you add for the core ARM support, you'll look in this files.arm file. And standard.arm has, or std.arm has the various build options that's used to build ARM. You'll also want to look at files.cortex. For the BeagleBone and the AM335X stuff, you'll look at the OMAP2 file, the EVB ARM file, and the Beagle file. And this file, the BeagleBone, this is the top level BeagleBone kernel config. So for previous D, here are the key directories for the core ARM support. There's fewer directories. So all the architecture specific code goes under sysarm. So include has your include files, as you might expect. And ARM has all the .c files. And then for your SOC and for the BeagleBone, you can find that stuff under TI. So that's the directory that has all the shared code for all the TI SOCs. And then the AM335X has the code for the BeagleBone specific stuff. Then you look at files.arm to figure out which files get built, basically. And then there are corresponding files for the SOC and for the BeagleBone. So you look at the TI1, the AM335X files. The BeagleBone file and the top level kernel config can be found here. So if you want to modify via the kernel config, you'll copy that, and then you can tweak it there. OK. So let's talk a little bit about booting on ARM. So we'll talk a little bit about how the BeagleBone boots. So the bootloader has a few main responsibilities. So it'll do your basic hardware initialization. So it'll initialize DRAM and serial. It'll pass some boot parameters, some information of the kernel, and then it'll load the kernel. So those are the main responsibilities. So when you boot up, there's several bootloaders that run. The first thing it runs is the reset handler in the SOC ROM. Then the first stage bootloader will run. So SPL or NLO in the BeagleBone Black. And there's a stripped down version of Uboot, and that's needed since your DRAM is not initialized yet. And so here's some output from SPL. Then after that, your second stage bootloader runs. You can run a full version of Uboot. And this will read the configuration un.txt. So you can see that here. And if you want to tweak the booting stuff, this is the file that you'll modify. So if you want to set up net booting, you'll modify this file. And then it reads your DTP file, and I'm going to talk about the device tree blob shortly. And then it'll start the Uboot loader. So this is an extra bootloader stage that only occurs on FreeBSD, and netBSD doesn't have it. It's an implementation of the standard loader that's on the other FreeBSD platforms. And you can find those sources here if you're interested. And here's the output from the FreeBSD Uboot loader. You can tell that it's going to boot the kernel, and it's going to use the DTP provided. So device tree. What's device tree? So device tree is used to capture hardware configuration. And it's used by FreeBSD on several platforms. I think it originally comes from PowerPC, but it's used on MIPS and ARM and PowerPC as well. So you can find the source files for the device tree blobs in this directory, DTS ARM. And for the Beagleblown Black, you'll look at the Beagleblown Black DTS. Most of the logics actually in this included file, the AM335X DTSI, that's the SOC's DTS file. So here's a snippet from that file. So the SOC is the AM335X. And so this is UART 0. And so it tells you what kind of SOC it is. It's an NS16550. It tells you the mapping. There's that 448909000 address. It's mapped for 0x1000, so 4k. And the register shift is 2, so it's going to do four byte accesses. And it uses interrupt 72. So that DTS file gets turned into a device tree blob of the device tree compiler. So this BeagleblownBlack.DTS becomes the BeagleblownBlack.DTB that we saw getting loaded earlier. So the device tree blob is stored in a compressed format called a flatten device tree. It can either be compiled in the kernel or loaded separately. So in our case, it was being loaded separately. The kernel parses the DTP to learn the board's hardware configuration. And if you want to know the details of how that stuff works, if you look at lib FTT, it has all the FTT parsing code. OK, so NetBST does not use device tree. It uses autoconf instead. That's the device auto configuration framework. So the hardware config info is generated by the kernel configuration process. So when you run config, basically. So here's a snippet from the top level BeagleblownConfig file in NetBST. And you can tell that it has basically the same information. So it has the same address, 440900. And it's mapped for 4k. It's using interrupt 72. And it's accessing things at four byte boundaries. OK, so now that we talked a little bit about booting, let's go through the kernel initialization process. So the first thing that happens is sort of early kernel initialization. So the first thing the kernel will do is save off the boot parameters. It'll set the initial page table and then enable the MMU. It'll set up the exception vector table, the exception handlers, and the exception stacks. And then it'll do some device initialization. So it'll initialize the serial, the interrupt controller, the timers for the clock tick. Then we'll get into the machine independent initialization that's common across all of the architectures. So this will initialize your kernel subsystems. It'll do some more device initialization. Then you'll enable interrupts. And then I'll switch to user mode and run in it, which is the first user code. So I won't talk about this stuff at all. There's a good talk at a HBSD con that will go over these details. Or you could also read the new edition of Kurt's book, the booting chapter has a bunch of useful stuff in there that's been updated. So here are the first steps for a HBSD on ARM. So if you look at a LOCore.S, the entry point is start. So these are the very first instructions that execute. So HBSD uses a Linux boot ABI since it's using Uboot. So R0 will have register 0 in it. Or register 0 will have value 0 in it. R1 will have the machine type. And R2, in this case, will have a pointer to the DTB image. And if you read this comment, so all this stuff actually gets passed to in it ARM in the struct ARM boot params structure. So these instructions down here is basically just saving off these registers that Uboot gives you. So the information that Uboot gives you. Then the next thing the freebies does is it tries to make sure that interrupts are disabled. So Uboot will do this for you, but the kernel programmers are kind of paranoid. So they want to make sure that interrupts are disabled because you're not ready to handle interrupts. Yeah, because you haven't set up any of the exception handlers or the interrupt handlers. So this is the code to do that. Basically all it does is, earlier I mentioned the current program status register has an interrupt disabled bit. So basically it's just going to read the current program status register, or in the i bit and the f bit to disable interrupts and fast interrupts. And then it uses MSR, move to status from register, to write that value. So that freebiesd, the start routine is actually common across all of the SOCs. Nepesd has actually different start routines for each board. So the beaglestart.s file has the entry point for the Beagle board, which is a Beagle start. So it's pretty similar. What you'll see is that it will try to switch to SVC mode and disable interrupts. And that's what the CPS ID instruction does. This isn't actually necessary, but it's good to do just in case. And then it does the same thing, basically, as freebiesd. It basically saves off the arguments that the bootloader gave us into this location, uboot args. So we'll continue on with the Nepesd initialization. So the Beagle start will create an initial page table. It calls armboot l1ptnit to do the work. So this page table is an l1 page table with 1 megabyte sections. So it has mappings for the kernel. It's just an identity mapping. So virtual address equals physical address. And it has a mapping for serial. So we can get debug output out to the serial. After that, Beagle start will run. And Beagle start will continue to run. And it needs to enable the MMU. And it calls armcpunit to do the work. So if you read this comment, it says turn on the MMU and caches. So it passes in that temporary l1 page table that we mentioned earlier to armcpunit. So armcpunit, it's a little bit misleading. It's in this function that has an a9 file. But it also applies to the Cortex-a8 as well, which is the process around the Beagle bone. So the first thing this does, it'll invalidate your caches and your TLBs. Then it'll enable the caches. And it'll set the ttbr, which is the translation table base register. That's how you install the page table. And then you'll enable the MMU. So the rest of the machine-dependent initialization is handled by start. This is a common routine across all of the SOCs. You can find it in locore.s. And this is basically the jump to that common start routine. So start, basically what it does is it'll set up the environment for ccode. So you can do the rest of the initialization and c, which is a lot more convenient than writing everything in assembly. So the first c function that'll run is init-arm, which is a helper function init-arm common. And that'll do the rest of the machine-dependent initialization for you. So there's a lot of stuff that happens in there. After that's done, then you can call main, which is the first machine-independent code. On FreeBSD, this main is called MI startup for machine-independent startup. But the flow is pretty similar between FreeBSD and FBSD. So init-arm, as I mentioned, init-arm is a SOC specific. So there's one for the Beaglebone. And then init-arm common is the arm-chanaric one that's shared across all the SOCs. So these two functions perform, these two functions together do the following things. So it'll map the devices, and it'll initialize the console. It sets up the real page table. So we had kind of a dummy page table initially. This will set up the real page table and switch to it. It'll set up the exception vectors and stacks, and it'll parse the boot arguments. Afterwards, init-arm and main can run. So we'll briefly talk about exception handling for the kernel hacker. So the things that you'll have to do, you'll have to set up the vector table, the exception stack pointers, and you'll have to write handlers for each exception. So the exceptions are briefly reset, undefined instruction, supervisor call, use for system calls, prefetch the board and data report, these are used for memory faults and instruction and data, respectively. Interrupts, fast interrupts, and hypervisor calls. So the exception vector table is a jump table with eight entries, one for each exception type. So each entry holds one ARM instruction. So you can either make it a branch to an exception handler or a PC load of an exception handler. That's what this looks like. They're basically equivalent. So this is what FreeBSD's exception vector table looks like. You can find it in exception.s. So here's the entry for the system call. So you do SWI entry. So you also have to tell the process to where the vector table is found. So there's a few options. The low vector location is zero. The high vector's location is here. And depending on, you'll use the system control registers V-bit to determine this. Another option is to use the vector-based address register that allows you to put the vector table in an arbitrary address. So here's FreeBSD's exception setup code. It's an in-it ARM. So it'll allocate stacks for each of the modes that we saw earlier, so IRQ, abort, undefined. And this is the kernel stack that's used for SPC mode. Then this function will go in and modify all these banked registers that I talked about for the stack pointer. And FreeBSD uses the vector's high location, so the 0xFF address. OK, so we'll briefly go over developing BST on ARM. So BST has really great cross-compilation support. You can cross-build the entire system. So it'll build U-boot, the kernel, the tool chain, libraries, user land, and all that stuff. Then you can also create a bootable SD images really easily, just with one command. So on FreeBSD, you'll use crochet for that. On NetBC, you'll use build.sh. So this is actually pretty convenient. If you're doing a lot of development, you'll probably want to set up Netbooting. So you can TFTP boot the kernel in NFS-Mount-the-root file system. This will really shorten your development cycle. And if you don't feel like making your own image, you can grab one from the NetBSD and FreeBSD websites. So debugging BST in ARM, a lot of stuff you'll be doing is printf debugging. U-boot sets up the serial for you. And if you want early debug output, you can set verbose init ARM on NetBSD or turn debug on in FreeBSD, on NetBSD and FreeBSD, respectively. JTAG debuggers are handy. Some are relatively inexpensive. The fly swatter is one. And it supports the people on black if you solder a header on there. Kernel debuggers are useful. QB is also really useful. So you can hack an ARM without actually having any hardware. There's a lot of useful talks. I won't actually go into all of them, but in the interest of time. So the FreeBSD on people on black is a really good talk. It was in the first edition of the FreeBSD Journal. You should definitely check that out. And then this talk, How for FreeBSD Boots, the software core MIPS perspective is really good. It goes into all of the initialization, including machine-dependent stuff. This NetBSD talk is good for learning about modern SOCs. And then this stuff's good for learning about booting. And this guide, the reporting NetBSD guide, is a guide on the NetBSD website. It's a little dated, but it's probably the most complete information on how to get an SOC up. OK, so in summary, we discussed the basics of the ARM architecture with the instructions that looks like, where to go to get more documentation. We looked at some of the machine-dependent BSD code. We're looking at FreeBSD and NetBSD. And I showed a few tips on how to do some debugging and set up your development environment on ARM. And then all those talks are really good. Also, the FreeBSD and NetBSD developers have a lot of really good blogs that are really informative. So if you type them a term into Google, probably one of their guides or one of their blog posts will pop up. It's really useful when I was getting up to speed. So I'm hoping I've kind of, so I present a lot of information. And I gave you a lot of resources that you can check out. So I'm hoping that I've given you at least the inclination to maybe grab a BSD and start hacking. So there's a lot of cool hardware out there. So grab a BSD, install it, and then start hacking. There's a lot of things you can do. There's a lot of new hardware coming out. You can port to a new board. You can add drivers. You can fix bugs. You can optimize code. So I'm hoping you'll at least grab one of these boards and do a little bit of hacking or at least think about it. So I'll be around. You can definitely come up and talk to me if you want to talk about ARM. And here's my email address. And I'm happy to take maybe a question. Thanks. Thanks, Arun. Any questions? Hi. You mentioned the device tree support. Is there any chance to have the same device tree definition used by Linux on BSD? So that's kind of a complicated question. I'm probably not capable to answer that. So the problem with the device tree stuff is that it's a GPL, I think. So I talked to Grant, likely, one of the Linux kernel developers a while ago, a long time ago. And he seemed interested maybe switching that stuff over to BSD licensing. So maybe it's possible, but I don't actually know. So someone official from free BSD would probably have to go talk to the Linux guys to see that happen. But that would be great, actually. Because I think there is a problem like naming of the property or something like this. Is it done in a compatible way? Yeah, I don't actually know. So I think some people, I think they may be starting to use some of the Linux device tree. I don't really know, actually. So I think if you ask them free BSD arm, they'll probably know. Ian LaPore probably would be the guy to talk to about that stuff. There's a lot of stuff. OK, thank you. So I don't know enough about that, unfortunately. But it would be great if we could use the same device trees, and it would be great if they were BSD licensed. Thanks. Other questions? I'm pretty much sure they are compatible. OK. We have a lot which we use. Richard, yeah. There's one or two things that. Just a small comment. If you port to your own board, make sure to set the alignment flex very, very early. Otherwise, you can get some interesting surprises from C code. That's a good comment. We've got a question about what you said at the early boot. You say you use the identity mapping for virtual address equal physical address. Is it only for the early stage? Yeah, it's just for the early stage. OK. And then what do you do? You map the current data and the high addresses? OK, like 1 gigabyte, 3 gigabyte separation? Yeah, I forget the exact details. But yeah, something like that. OK, thanks. Other questions? Cool. Thanks a lot. Appreciate it. Thanks a lot. Thank you. Thank you.