 Hello and welcome everyone to my presentation on the Zephyr Project RTOS startup and initialization flow. My name is David Leach and I'm a software engineer here at NXP Semiconductor. I've been involved with the Zephyr Project for the last several years, primarily supporting internal groups here at NXP to enable our various SOCs. I also support a number of the Zephyr Project working groups including the Technical Steering Committee, the Release Management and the Security and Safety Working Groups. If you're familiar with the Zephyr Project, you have likely seen this diagram. Zephyr is a highly configurable modular RTOS targeted at memory and resource constrained platforms. It provides a full set of capabilities and features that you would expect from an RTOS including cooperative and preemptive multi-threading integrated device driver support, memory protection and various networking stacks as well as a growing ecosystem of additional OS and application services. There are numerous presentations out there that have covered various aspects of this diagram and I would encourage you to go and seek these out. But my presentation is about what happens to get you to this point of using Zephyr for your application or more specifically the initialization and boot flow that takes the system from a reset to running the application, the initialization and boot flow. I have found myself in a situation of having to debug some driver code which forced me down the path of trying to understand how the system started up. It wasn't intuitively obvious just staring at the code, so I started digging into the rabbit hole of how we go from a reset signal to the point of an application being up and running. The initial results was a diagram that I put together for the ARM Cortex-AM. This is what I ended up with. I know this diagram is a bit too big for a single slide. You can get a closer look at this when you download the slides in the collateral for yourself. But what this represents is the basic boot and initialization flow for the Cortex-AM architecture. I had created this diagram like you would expect through visual inspection and with the debugger. Initially, it was a text-based Microsoft OneNote document, but because I'm a visual person, I decided to put this together as a flow diagram. I find these kind of diagrams helpful to understanding the larger flow and the relationship of the objects and things inside of the system. In this diagram, you can see and throughout this presentation, I coded the items to convey additional information. Where you see a blue box, these are architectural dependent code unique to the architecture. The green boxes are a Zephyr common code that all architectures use. And whenever there's a k-config variable that's being referenced like italic and bold them. Also note that the Zephyr RTOS supports running without multi-threading enabled. I'm really not going to get into that in this presentation because multi-threaded use of Zephyr is the primary usage and it's more interesting to explore. If you're interested in exploring single-threaded operation, then you're welcome to go dig into the code based on this presentation. When you look at this flow, though, there are some natural compressions of the boot and initialization flow phases that can be extracted from here and help simplify the understanding of what's going on here. So if you compress this down, you can end up with a diagram that's similar to this. The system resets into the early startup code. This is where things are being prepared for C code operation. And then it transitions to a kernel initialization phase. Then it will prepare the schedule for actual operation for doing a context switch to the main thread to wrap up the system initialization and get the main application running. So let's dig into these phases in a little bit more detail to understand what's going on under the hood. The Zephyr early startup phase is analogous to the startup code that you would find in normal applications that most people probably never look at. The primary goal of this early startup code is to get your system up and ready to run an actual C code. Unless you are actually porting Zephyr to a new architecture, you likely never need to touch this code either, but there are some points that your platform may need some customization support. You'll also note from this diagram here that the large portion of this phase is architecture dependent code. This is because it is focusing on getting the hardware on the platform ready to run C code. Also note that if the architecture supports it, the processor will be in a privileged level for initialization flow, likely thread mode. And if your application is supporting user mode, the kernel will later manage the transactions between user and privileged modes. So if we start at the top at reset, the system will start at the kernel entry point. The function point or the name of this function is configurable via the kernel entry config option, but the default value for this is underscore underscore start. Some architectures will use a different entry point. For example, Intel's extensa DSP platform uses underscore main entry and some risk systems use just entry, but the vast majority of the platforms all use the default of underscore underscore start. You can typically find this code in arch slash, arch slash core slash reset dot s, but it may float around you and you know if you care about a particular architecture, just do a search and for the entry name and you'll find them. The first customizable hook though is a call in this particular architecture is a call to ZR and platform init. And if the platform has specified the kconfig option platform specific init, it'll call this function or expect this function to be provided by the port. And this code is the customization needed outside of the generic ARM initialization. So if you search the source tree, you can find some examples. And this includes the infineon XMC 4XX platform. The Nordic and RF platforms does tend to take use extensively this customization. You'll see them implement the ZR and platform init, but it really just calls a Nordic how provided system init function to deal with a variety of items specific to their SOCs. You can go and search that modules how Nordic for system init to see all the different usages and see how it's used. The nxplpc54114 uses it also to handle the dual core mode operation of this platform. The platform has an M0 plus and an M4 core and both cores have them to run the same shared startup code but with different vector tables. The supplied ZR and platform init function determines which core is running and then in the startup code and then handles the core state accordingly. There are a few others in there if you want to search for them. Next we have the inarch hardware at boot config which calls the ZR and init arch at boot. This will reset the system control block and the core registers into an initial state. The option is useful when firmware is chain loaded by a debugger or boot loader and there is a need to guarantee that the internal states of the core blocks are consistent. Then if a watchdog is enabled the watchdog timer is initialized followed by the initialization of the interrupt stack. Now we transition to more specific initialization for preparing the C code and first you'll see that there's a vector table initials relocation and then if there's floating point that needs to be initialized the CPU floating point hardware will be initialized as needed. This is followed by zeroing out the bss area of your application and then copying the initialization data for the init section. Those two are actually fairly common things you'll probably see in a lot of other normal applications. Finally before calling the Zstart function the interrupts are initialized. For the Cortex-AM the interrupts are initialized to a default priority that is not zero. The reason for this is that the Cortex-AM based platforms use a base PRI style of interrupt blocking that needs to ensure that they are not initialized to something of not initialized to zero. The base PRI is a register in the ARM v7 v8m architecture that defines the minimum priority for the exception processing. When it is set to a non-zero value it prevents the activation of all exceptions except the same lower priority level as the base PRI value. For other architecture there may be custom hooks to initialize the interrupt controller based on the needs of that platform as well and you can explore some of those by looking at them. Here's a risk v early startup flow. Note that there is no hook yet for platform OSOC specific initialization but you can see the general flow is similar to the Cortex-AM. The risk v startup flow doesn't have any custom hardware initialization function likely because there just has not been a need for one yet. The Zephyr project, if you go look at it, has a ton of ARM-based OSOC supported in the tree. For risk v there's really just a small number so it may be that at some point in the future the risk v architecture may need a hook for OSOC specific initialization in the early startup stage but right now they just didn't have come across a target that needed it yet. Risk v does have a kconfig option to call SOC specific initialization, interrupt initialization. Like the ARM function this allows specific initialization of the interrupt controller that's unique to the SOC. And again common to all architectures there's a contract here to ensure that the platform is ready to run C code and complete the rest of the initialization flow in the Zephyr common C start function. This is where the what we call the kernel initialization. The system is now running is now ready to run normal C code on this entry. The Z start routine is called and the first thing that we see in here though is the kernel init arch kernel init function call which is a architecture specific function. And this is the only part of this that is architecture specific for additional hardware initialization. But note that this function may get deprecated in the future as most architectures tend to handle all special hardware and platform initialization requirements in the early startup phases. The only architecture currently really using this is the ARM 32 bit architecture. And here it does additional hardware configuration that includes setting up of the interrupt stack, setting exception priorities to conform with the base PRI locking mechanism with Pendesv priority set to the lowest possible initialization and turning on of all the fault handling hardware as well as the initialization of the ARM core for idle events by setting the SEV on PEND or set event on interrupt pending for WFE support. And if available, it will also initialize the MPU and configure the static memory regions. But note that all of this stuff could have been done in the early startup phase. But moving on, we next create a dummy thread and make it the current actor thread. The schedule isn't running yet, but this is needed because later in the initialization flow, there's going to be a forced context switch to switch execution over to the main thread. The kernel context switching code needs some place to properly store the state information when the context switch occurs. You see, the assumption here is that there's always an active thread, the idle thread or some other application system thread. And by creating a dummy thread, the context switching code doesn't have to do any special casing code at the startup when there isn't a current thread. This helps keep the code and the logic pretty simple here and always assumes that there is an active thread to be swapped out on the context switch. Next, we initialize the state falls static devices. Static devices have a fairly important role in the initialization and startup of the system. These devices are the device drivers and the system driver objects that are statically defined using Zephyr APIs. Device drivers are being things like GPIO drivers and UR drivers and network drivers and various other drivers in the system. These static objects have their states initialized before the respective object initialization functions are called. The order of initialization is managed by statically assigning them to one of multiple run levels. The function z, sys and net run level is used to iterate through the list of static objects and call their initialization functions. The function is invoked with the run level parameter and only those devices assigned to that run level will be initialized. We'll cover these run levels and devices in more detail later in the presentation, but be aware that there are dependencies between drivers and system services that require the initialization order to be managed appropriately. At this kernel initialization phase, the kernel is not up and running yet. So what you see are two run levels called free kernel one and pre kernel two. These levels help coordinate to bring up the devices that may have dependencies on each other before the kernel is up and running. Finally, before moving on to prepare to the scheduler, if the stack canaries kconfig is enabled and the compiler supports it, a canary value is pulled from the random subsystem. When enabled the compiler will emit extra code to insert a canary value into the stack frame when a function is entered and then it will validate this value on exit for stack corruption detection. As you would expect, enabling this option can result in significant increase in the code size and reduce the performance, but it is useful when debugging your application. So if you have the resources, it's probably good to turn it on. At this point, all the pre kernel devices are initialized and ready so we can transition to starting up the kernel scheduler. To prepare for the main thread running and completing the initialization, the kernel scheduler needs to get set up, and this is the job of the prepare multi-threading function. Preparing the kernel schedule includes initializing the ready cues, initializing the z main thread, which entails marking it as ready and started. This is the only thread in the system currently set with the read and started state, so when the thread swap is initiated, the scheduler will pick up the z main thread as the first thread to run. Additionally, the z idle threads are initialized. Note that there is a per CPU idle thread that is initialized and marked as ready in this flow here. After preparing the main and idle threads, a context switch will be initiated. Again, remember there's only one thread in the system that is ready to run, and that is the main thread as it is marked as ready and started. So the kernel will select this thread to run transitioning the initialization flow to the main thread entry routine, BG thread main. As you can see in this diagram, the context switch is architecture specific as shown by the blue box. A side note, the type of RTOS scheduling is configurable. The current list of scheduling algorithms supported are listed in the kernel kconfig under sked algorithms, and they include schedule DOM, schedule scalable, and schedule a multicue, as well as time slicing support between preemptible threads. The scheduling is outside of the scope of this presentation, but there is an excellent presentation by Andy Ross from Intel on scheduler details that was given at the 2021 Zephyr Developer Summit, and I would encourage watching this presentation. You can find a link at the end of this slide deck. So now the kernel is up and running. All kernel services are fully available. This means that all the static objects in post kernel and application run level initialization are allowed to call any kernel functions. If you have a serial terminal attached to your target, you'll see the boot banner print out, the debugging clue that your application has at least made it through the post kernel initialization phase, which is a good sidebar in debugging tips here. If you're trying to bring up an application and you don't get any boot banner coming out, you can quickly determine if a static object initialization bugged your system by stepping around the various run level calls from the previous slides. If one doesn't return, then you're going to need to walk through that particular level's iteration of device objects to narrow down the problem device. If your application is using C++ components, then the C++ global object constructors and the initialization routines are executed after the post kernel, but before application run level. If your application has any static threads defined, they are now initialized for the scheduler to run. If your system is an S&P system, it is initialized and the S&P run level initialization is executed against those S&P tag device objects. And finally, the actual main function is called. Zephyr defines a weak main function, which just has no op and returns. This will cause the main thread to move to termination. You can also provide your own main function to do whatever else your application needs to do. But, you know, suffice to say when the main function ends, you probably have a number of other threads that have been initiated either as static or programmatic threads that were configured up. So at this point, the startup initialization of Zephyr is basically complete. The application is hopefully running. There may still be application level initialization that is needed, but from the kernel perspective, this initialization startup flow is complete. I want to cover through some details about the static initialization objects and static threads since they're really important to the initialization flow. Zephyr provides APIs to allow you to instantiate device driver objects and system drivers that need initialization functions called during the initial kernel initialization flow we just talked about. The APIs define a level and a priority where the level is really just is the run level that the init function is called. The priority is the initialization priority of the object relative to other objects in the same run level. It's an integer value from 0 to 99 with 0 being the highest priority. You can see here the reference APIs here and you can go look at the documentation and Zephyr to get more details about these, but you have device driver APIs and you have system driver APIs. The initialization levels and priorities must follow one of the five symbols below with pre kernel one, pre kernel two, post kernel, application and SMP. If you try to type in something else, the build will crash on you, so you won't get past that. But as we previously discussed, the pre kernel one and pre kernel two are used before the kernel is up and running. The primary rule here being that these objects that are being initialized here cannot make any kernel service calls during their configuration since the kernel is not yet up. Post kernel application SMP though are free to make any kernel calls because they are being run from the main threads entry routine. But the key takeaway here is that between run level and the initialization priority, the user has a significant control over the sequencing of initialization of your device drivers and your system drivers. As previously described in the kernel initialization flow, the drivers using the run levels are assumed to understand that there is a distinction between the pre kernel and post kernel services being available. If you have a dependency though between drivers, you will need to either use different run levels or use initialization priority or combination of the two to control the order. Also note, again, and I want to reiterate this, the order of the initialization of multiple objects with the same priority in a specific run level is undefined and cannot be counted on by your application. If you have some requirement for the order that they come up, then use different priority levels between related objects or different run levels. So let's take a look at some examples of the drivers. So here we're showing, I'm showing two different samples that come from the source tree. The first one is a driver device tree defined for the MCUX LPUR. You can see here that in the macro here that's being called or is in the code, it's in telling the system that it needs to have this init routine called at pre kernel one run level. And it wants to use the config init priority device k config value for the priority level. The other example is the IMX RT init system driver, and it's also at pre kernel one and it's got a priority level of zero. What happens is that the underlying support macro will define a struct init entry object where it'll apply a section attribute constructed from the run level, pre kernel one, pre kernel two, et cetera, and the initialization priority. The link stage will then collect all of these and put them into an init level section. So let's take a look at the link map because it's useful to determine what's going on in your program. So here's a snippet of the build map highlighting the init level section from the previous examples. The init level section is populated by the zebra linker script collecting these device objects by their constructed section attribute name. The section names are highlighted in green from the two previous examples. And the first highlighted one is the zinit pre kernel 10 or one zero, probably more appropriately. And it is from the sysinit instance where the object has been initialized for the pre kernel one run level using zero as the priority level within this run level, which is the highest priority. The zinit pre kernel underscore one 50 is for the DT device instance of the MCU X LP you are. In that example, the priority level was coming from the K config value and you can see that its value is a 50 unit level section then is assembled by collecting the symbols in a sorted order based on the section name. Here you can see the linker directives are highlighted in yellow. Note that the order of the symbols attributed to the same construction section name ie run level and priority are not defined and cannot be depended on as it is only using the section name attribute. This can cause you some grief again if you're not careful and let me explain because there can be some fragility in the order of initialization. At NXP we rent recently ran into a problem where the initialization of one of our platforms had a dependency that we weren't aware of. The device objects initialization function was using a K busy wait API that depended on the system clock driver be initialized. Both drivers were being initialized at pre kernel to level at a priority of zero and for the longest time everything had been working fine. Then an unrelated commit was merged into the source tree that modified a CMake file which ended up affecting the order in which the clock driver and the platform initialization drivers were called. The actual dependency between these drivers were unknown to us and just happened to work because of the luck of the build system ordering the dependency between these two in the right order. But when the build changed the order the platform boot would hang in pre kernel to because the K busy wait call would hang due to the clock driver not being initialized. Fixing this is is not that difficult. I mean one quick and dirty way to fix this was to change the priority value of the two drivers to ensure the order of initialization. But what this bug also exposed was the fact that one of the drivers was using a kernel function in pre kernel boot sequence which is not supposed to be allowed. Note that there's there is no runtime checks to enforce this is really a programmatic rule. And so sometimes like in this case it worked because we lucked out about the ordering but then it broke when things were changed around. With a combination of run level priority and and a good bit of flexibility and controlling the order of the initialization of these objects we should be able to fix this and in fact in this case we'll probably move. The problem driver to post kernel or figure out a way to remove the kernel function call from this driver. There's been also some thought about exploring if we can encode dependency ordering into the DTS file so we'll see if we can help make these dependencies a little bit more explicit. Let's also briefly discuss threat static threads that are initialized after the application run level devices are initialized in the BG thread main. In Zephyr you can create threads programmatically with K thread create but you can also statically create threads using the K thread defined macro. And there are numerous references in the code base for both of these. Like device objects the K thread defined macro will instantiate a static thread with a section attribute to allow the linker stage to collate all the static threads into a common section called underscore static underscore thread underscore data underscore area. What this mean does is it allows the allows the z and it static threads function to iterate through the list of static threads and initialize each of them for passing them to the scheduler. You know, remember to this is happening while the main thread is running and so you know once you start in setting up these threads with the scheduler there may be some cases where the scheduler will swap to those threads. But it's generally safe and it's just the way things work. So if you look at this snippet of the Zephyr map you can see that here we have an example of some static threads that are created by the sample net sockets app or the packet app. And here the packet app is creating two different threads ones a receiver thread and the other ones a sender thread. And the green highlight shows the constructed name that was created by these macros and so they all they always start off with this underscore static thread data dot static dot and then it creates a name based on the name ID. What's kind of useful about this is that you can go into the, well not kind of but if you can go into the map file you can find the static thread data area you can look and see what static threads your application is going to be creating. So if you're unsure you can just look at the map file and see the list. A similar type of developing can be done with the with the device objects except that the device object name is really not part of that, like it is with the static thread. But at this point, you know, the threads that are all collated within this section and the init static threads function will be able to iterate through all of these and get them up and running and scheduled into the scheduler. So this brings me to the end of my presentation. I hope that the information that is contained here has been useful for you at some level. There are some links at the end of this slide to help you get oriented with this effort project. But, and I'll try to make sure that they're the collateral for these diagrams are available in a form that is a little bit more usable or readable to you. But thank you very much for watching and have a good day.