 All right. So this presentation is shared logging with the Linux kernel. If you were there in Dublin, I did an initial presentation on this. So this is actually a follow-up. First, last time I had managed to skip this and was reprimanded by my company, so I have to say Mentor Graphics, I have to wear this. I'm an embedded Linux architect at Mentor Graphics, I've been involved in Linux for a while now, the Octo project for a while now. I happened to be also on the open embedded board. So thank you. So I figured I would dive in a little bit as to why I wanted to do a follow-up on this one. The obvious reason is to provide an update on the talk that I did in Dublin. Real quick, how many people here were at that talk? Oh, fair number. Okay. So I used my talk from Dublin as a skeleton for this one, tried to change it up so that I'm not doing exactly the same talk with a little bit at the end. Hopefully I achieved that balance. But for those who hadn't seen this, I left some things in there to kind of give some background. This is where the slides for my previous presentation are. They're out there on elinux.org. And of course the video is available on YouTube. And of course I use part due for those who are familiar with American culture. It's a silly cultural reference. If you are curious about it, I'll tell you later. So during this talk, I'm going to go ahead and describe the what and the why of shared logging. I'm going to also give a little bit of history about where this feature came from. I'm not the original author or this isn't my brainchild. What I did was took something that was existing and tried to revive it. I'm going to talk a little bit about the history of the kernel logging structures. For those of you who were in the talk in Dublin, this is pretty much the same outline. As I said, I'm trying to keep it a little bit different. And then I'll talk about the design and implementation that has evolved over time and has evolved since I gave the talk in Dublin. And then I'm going to do Q&A. If you were looking for a live demo, unfortunately my live demo suffered a live death. So that's not going to happen today. I apologize for that. So what is shared logging? It is very basic. The bootloader and the kernel are going to read and write log entries for themselves like they normally do. And then they're also going to re-log entries from each other. They're also going to be able to read multiple boot cycles. So as each one goes back and forth, they're going to be able to read multiple boot cycles from the other. And the bootloader also gets to dynamically specify a shared memory location that the kernel will use in order to allow for this exchange of log entries. This one is one that not everybody, I think, really grokked in the first presentation. So there's a few things about this that I want to kind of emphasize. The idea is that this is as dynamic as possible. Since this is a debug feature, or intended to be a debug feature, we want this to be able to move around as much as possible. And the final implementation, I envision this being something that you set with an environment variable in the uboot command, and it will automatically go through that. I'll talk a little bit more about that as we get into it. So one of the other things that hopefully is fairly self-evident is that in order for a bootloader to read kernel entries and to read multiple boot cycles, this is going to have to stay persistent in some way. Now, I'm not actually prescribing nonvolatile RAM. In fact, I'm not requiring anything other than a location and memory. So for multiple boot cycles or for being able to read them back and forth, it just needs to survive a reboot cycle. So a warm boot is fine. Most of the time, if you're hitting reset button on your target or you trigger a CPU exception that causes a reset, you're going to still have contents of RAM. Now, it's not 100% guaranteed. So if you really need this to survive, then you might consider using something in nonvolatile. And one of the things that has come up a couple of times is P-Store. And I will talk a little bit about that. I didn't in the last presentation. That does provide a way to perhaps integrate with this. Right now, this feature doesn't currently integrate with that. So this one is also another question that I get a lot. Why would you want to have shared logging? Anybody have any ideas? The response is in the debug case where the kernel has gone south and you don't have a JTAG port to be able to maybe get some sort of information back. And so, yeah, that is one of the primary use cases. I personally have a strong preference for JTAG debugging just because my degrees are in computer engineering. But I have come around to the idea of logging is ubiquitous. Everybody uses it. They're very comfortable with it. And so I kind of put in a little tongue-in-cheek here. Imagine your life without logging and how much you would scramble. So the idea is that we want to extend that. So the most common use case is a post-mortem analysis of a failed boot. That could be a failed boot because the kernel crashes. That could be a failed boot because the boot loader is doing something wonky. Your definition of failed boot is going to be your own. I've seen people define it as my peripherals didn't all come up. But this just gives you a tool. So some other useful cases and the ones that actually generated this was when I was working on a project where we were looking at increasing boot times. And we were wanting to look very closely at what was happening during the boot. So for performance tweaking, boot timing analysis, and also for looking at sequencing, when we had things that we could reorder so that the long poles got started sooner. And of course, just general boot and system debugging. It's not a silver bullet though. This isn't something I'm not up here to tell you that you guys should abandon. All other techniques is going to be everything that fixes it for you. It's really just something that's another tool. Put it in your toolbox, pull it out when you need it. This is actually kind of important though. It's the when you need it. And I'll talk about that one again to the design. But first, I had one or two people actually ask me about this. This has been done before. This has been seen before. Again, I didn't come up with this original idea. If you go back and get history, you can actually see that in late 2002, there was support added by Klaus Haydeck, that was the commit that I found, and U-boot, that gave support for a shared memory buffer that could be passed to the kernel. And this was going to be used for, among other things, shared logging. As far as I could tell from doing a little bit of kernel and U-boot sleuthing, this was only supported on Denx's kernel. It was never reported to Mainline. And it only, as far as I could tell, worked for PowerPC. This seems to have been really to allow the kernel to see the bootloader entries. There was no round trip. It was really a one-way thing that allowed the bootloader to write something that the kernel could display. So a very simple use case. And I asked this in Dublin. I'll ask you to hear it. This has not appeared to have been widely used. Has anybody heard of this before now that wasn't in my talk in Dublin? See, that's sort of the proof of that. So unfortunately, as things that aren't used tend to do, this feature suffered bit rot. It didn't really receive a lot of adoption. It didn't really have any updates very much after that point. And then the kernel logging structures changed and made it a much more complicated affair. And so essentially this capability died. So this is that question that I mentioned before. I've been asked about P store and RAM loops several times in this context. This came up at Dublin. I didn't actually, I wasn't aware of P store and RAM loops really at the time. So I went and did a little bit of digging. These do seem to be significantly different in terms of the design goals. Is that a question? No. Feel free to raise your hand, ask a question during the session. I tend to like to take them during the session more than I'll try and collect them all at the end. So they do appear to serve slightly different purposes. They both rely on very small pre-allocated regions of memory that makes them quite fixed. However, I have not done the analysis to figure out how well we could maybe integrate them together. One suggestion that has been made was that perhaps P store could be extended to understand the logging structures and be used to provide the actual non-volatile capability that I mentioned before. Which sounds like a good idea but without having done a lot of looking at it I can't say for sure. And as I say here, this is certainly an area for future exploration. One other thing sort of related to this was that several people had pointed out that there didn't seem to be a real well-documented or accepted method for passing chunks of memory between the bootloader and the kernel with a specific semantic attached to it. There are mechanisms that sort of pass generic buffers but no real context applied to them. So this is maybe an additional extension of that. In other words, doing future exploration to maybe provide that capability as we go forward. So this is, I should probably have led off with, this is sort of a side project for me. My employer tolerates it to some extent but they expect me to keep doing my regular job. So this has not made huge leaps since last year. It has made some progress. So I plan to continue to toil on this for a while. This is my opportunity to kind of ask you guys to do some talking. Does anybody know of any other features that I should be considering as I'm looking at this and looking to continue to extend it? MTD oops? Sure, sure. So I'll have to take a look at them. It's an MTD oops, probably similar to RAM oops. There's perhaps some opportunity here to go and kind of rationalize a few of these together and make them a little more consistent. Just for references for folks who are interested in learning a little bit more about P-Store and what I did as a quick scan and about RAM Moves kernel documentation of course. So this is sort of the historical section. This is important. I did go and spend a little bit more time than I did in Dublin to figure out exactly how far back the original kernel structure that changed was. And I went back to my oldest, the first, literally the first Git commit in my tree. I think it was either Linus or Greg K.H. did an import of 2611. And it had the same byte index array. So it's a very simple structure. It's basically a simple array of characters and then it's got a couple of indexes into it. One of them for start, one of them for stop and one of them for console. All of it was contained in printk.c, which is fairly, or is generic code. It's non-arch specific. This one becomes important later. The buffer is declared just as a static global inside the printk.c. So that means that as soon as the C runtime is up, it's available for use. So the first message that you usually see, the kernel banner, is making use of the fact that that's available as soon as the runtime is available. I already covered that in terms of the indices. It's very simple, which made it very simple to support by a bootloker, right? All you needed was a memory location and a couple of indices. And this is what it looked like. I made a mistake in Dublin. I tried to pull this up live and that didn't work for a while. This right here is the actual allocation of the static buffer. And you can see down here is the start and the end. There's a little counter for log characters. And then console start, which gives you an indication of where the actual console is at that point in time. Pretty straightforward, right? So then, in May of 2012, caseevers posted a patch where it was, I should say, it was merged that changed the structure to a variable length record with a fixed header. I'll show what that looks like in a sec. Everything was still contained in printk.c. In fact, there's comments to the effect of never put this into a header because we never want it exposed to anybody else. This is only internal, which is ironic because then there's some other stuff that isn't in printk.c. and isn't a printk.hitter. It is still declared as static global. Again, it gets all those nice benefits. So as soon as the C runtime is up, you're good to go. It does include a timestamp, and the header is fixed. I'll look at that. Fortunately, it's more complex. So there's more pointers for tracking. There's a sequence and an index for first, next, clear, syslog. That complexity gives some additional functionality, which is useful, but you pay for it with complexity. So this right here, this printk underbar log is the actual structure, the header structure. You see that the U64 is the timestamp. It has the length of the actual buffer. It has the length of the text buffer, and it has this provision for a dictionary of key-value pairs. I've never seen anything using this, but sorry. I've never seen anything using this yet, but I haven't gone really looking very hard for it. It's there. Maybe that's useful. Maybe it's not. It's a little bit of future proofing there. So all of this stuff is in here. You notice it's packed aligned. This will go away in my patch. In fact, it does. And I'll talk about why. This is the set of indices that I was talking about. You notice they're all statically declared. So they're limited to file scope in this C compilation. This is in printk.c. There's several of them. As I said, there's a first. There's syslog. There's console. There's clear. The most important ones for our purposes, for the sharing, is actually the first and the next. Coming out of the bootloader, it doesn't really have any concept right now. And I don't intend to add it of the console in the same sense that the kernel does. And no idea of syslog. We're clear. I'm a little fuzzy on what clear does anyway. To tell you the truth. So essentially, we really use the bare minimum in the bootloader. But again, we want to make sure it agrees. Pretty dry so far, right? So this is what it changed to. And then I come along. Well, a few observations. In order to allow a clean handoff, basically the introduction of additional pointers made this more complicated. That's really what it comes down to. I've already covered this point, which is that the global declarations, the static declarations, makes it available right away. Unfortunately, that also comes at a cost for sharing. Because those are just a bag of variables inside the printk.c, it makes it quite difficult to replace. So one of the first things I have to do when I come into this is replace that. And I'll cover that in design goals. But these are the goals that I had coming in. And I've modified them slightly since last time. So the original focus was getting the bootloader to write the format that the kernel understood. Sorry, I should be clear. The original focus of the feature was to get that bootloader to write the entries in such a way that the kernel could see them. But it was not focused on a general mechanism for sharing. So I come along and I said, I want to do things a little bit different. I want this to be available all the time because this is something that is one of those tools that you don't know when you're going to need it. I want it to be in there and have almost no impact on boots if you're not actively using it. I also wanted it to be very, very portable. Printk itself is very portable because it relies on post-c runtime initialization. It's pretty much agnostic to whatever the architecture is that it's running on, which is great. So I wanted to preserve that. I had forgotten to state that in Dublin, which is why I boldly did it here. And I also wanted to be portable across bootloaders because you're talking about a really clean divide between a bootloader and a kernel. There's a clean handoff. So as long as you have a well-defined mechanism, there's no reason why core boot, you boot, take your flavor of bootloader that you like, it shouldn't be able to write this. And that also makes it, again, more portable across architectures. So I decided I would use U-boot since I was most familiar with it as a proof of concept. And then I also wanted, and this one was another key point, I wanted to take some dynamic arbitrary location during runtime and assign it, essentially. But that has some complexity involved with it because some things have to shift dynamically. I was really kind of hung up on this one. I don't know about anybody else, but I come from pretty long history in embedded space. So leaving memory on the table was always kind of verboten for me. It was something that made me very itchy. So the fact that we had these global static allocations was great from an initial being able to dump things, but it wasn't great from the perspective of now I'm leaving memory. And granted, it was, I think, like 16K. But in my mind, 16K is still a lot of memory. So this one, I really spent a lot of time and effort trying to get rid of. And every time I did, I ran smack into a wall. I'm sort of going to have myself, but it's basically you're fighting against portability when you do that. Another very important thing for me and something that I'm very glad that I did early on was that because you're crossing boundary between a bootloader and a kernel, you really need to make sure that this thing has a robust amount of forward error correction. It's got to be able to receive something, look at it, and tell that it is not what it was expecting and be able to ditch it. I said forward error correction. It's actually just error detection. The idea is that the bootloader may be out of date with your kernel or vice versa. I want to write a format that is perfectly okay at some point in time, but is no longer okay when you put these two together. That creates a real opportunity for crashes. So rather than have that happen, I put in a fair amount of additional data that would allow me to check and see, at least do some sanity checking. I also wanted this to be very much an opt-in. I wanted it to be latent but compiled in and just a code path that wasn't taken until it was necessary. Vice versa, I also wanted it to be something that was very easy to disable and it just disappeared. So hopefully I accomplished those goals. I talked about this kind of already. In order to enable the kernel to replace all those indices, I had to create a structure that was easy to replace. So I created this control block structure which essentially was just a wrapper around all those static variables. Then I redefined it with a pointer that I can then replace. So as it says there, it takes all of these nice little logging pieces of information and collects them into a single struct which is much easier than to replace. So it allows a single pointer to replace it. And it also has the benefit of allowing the bootloader to pass a single parameter to the kernel. The other option instead of doing this would be to sort of pollute the command line with every single one of these values that the bootloader needed to be aware of and the kernel needed to be aware of. That would work, but I found it to be icky for lack of a better term. So I kind of consolidated this down to a single pointer, a single memory location that gets passed. The nice thing about that is that in theory this would allow the kernel to just simply replace one pointer and pick up and start running. So it's literally a 01 operation. The reality unfortunately is that there are some wrinkles to that. And I'll talk about those. Still awake? All right, so this is what the structure starts to look like or the first part of the structure looks like when modified. So config log buffer is the feature that I added in terms of kconfig to control this. You'll notice that all I did in this case was I just added this magic number. That was that sanity check. So when you're spinning through memory, if this doesn't check, then you know that something went wrong and you basically can reset to... You'll throw away log entries, but it's better than wandering off into memory and crashing your kernel. This is that log control block that I talked about. You'll notice up at the top here is the pointer to the buffer itself, the length of the buffer, and then all of these should look familiar because all they are is just cut and paste from earlier. And then this section right here is a little bit of that self-checking information that I was talking about. In particular, log version, the length of the padded size of this log control block because it can be different from the boot loader natively because this is the kernel due to padding and other issues. The size of the structure itself, again, this is to help self-check. It's just redundant data. The size of the header, which is basically that previous... Sorry. The previous structure here. So if any of these don't align, then when the kernel starts trying to parse this information that came from the boot loader, then it's going to wander off into memory. A physical address because, again, redundant information but also due to the fact that the boot loader basically only looks at the physical address and the kernel has, at various points in time, a physical or a virtual and then another little magic number. So that's the structure. And from that, all other good things come. That's as good. That's as funny as this talk is going to get. I'm sorry. But this can be a bit of a dry topic. So then the question was, and this was probably the key technical challenge to this whole thing was how to actually pass this control block into the kernel. So the first pass at this was essentially Denkz's pass and used a fixed, well-known location. When I picked this back up, I just started porting that forward to work with a new log structure. It sort of worked. But unfortunately, it was quite brittle and I found out that depending on which platform I was trying to do this on, the calculation didn't always work. So this relied on a calculation at the top of memory that would differ in the boot loader versus the kernel. The reason why it would differ in some cases was because as RAM sizes have gotten bigger and memory addresses became more scarce, you would actually sometimes have the kernel truncate its address space. So then when it calculated the location of the buffer, it calculated based on the truncated RAM space. So maybe you had four gig of RAM and it would basically truncate it some amount and the boot loader wouldn't care. So it would take four gig and set it right at the top. So this basically didn't always work and that made me unhappy. So the next approach was to use the command line. This is stuff I covered in Dublin. So command line variable and I tried to pass almost everything on there. It is of course very flexible and allows for the dynamic setting that I was looking for. There is a small performance hit that's going to occur because of how and when the command line is parsed. It's tied. There is an order and operation because it's tied to the number of entries that exist in the boot loader and the kernel at the time that you do this coalescing operation. I kind of preferred this approach except that I kept running into certain issues with the reservation of memory and that also drove me to some things that were not completely arched portable. There was some concern having some conversations in ELC two years ago when I talked to a couple of maintainers about this because I was curious to whether or not they would take that approach and they weren't super enthralled with the idea because there was both multiple parameters and because they don't like things being thrown on the command line. So there was some question moving forward since then so this is well, sorry I actually then moved on to the device tree based on that, this was the version 2 I tried to shove everything into the DT itself at the time there was not or I was not as familiar with the dynamic FDT command inside Uboot to use and modify a DT also I was working against a slightly older kernel when I presented this in Dublin so we're getting kind of an intermediate here essentially this used the open firmware functions to extract information for the DT I personally found this a bit difficult to work with there's not as much information about using OF and early init and again it sort of made me itchy I didn't like what was happening when I went down this approach you still have the log coalescing unfortunately it seemed like though I never did do hard numbers on this that it happened a little bit sooner so there was less of a hit so there was fewer entries but ultimately I still wasn't overly happy with this it was however perhaps a little bit more acceptable so then we move to today what I have found is when I did deport to mainline I saw that I could make use of the DT and the command line in a combined way that allowed me to maybe leverage both in a better way so in particular I'm relying on the bootloader now this is implicit in this that the bootloader any other bootloader that doesn't have the dynamic capability for the FDT may have to fail back to a fixed DT to do this I'm now relying on the reserve memory areas and the DT to leverage the infrastructure inside the kernel to just reserve that memory and do the right thing so that you don't have to worry about it the nice thing about a new boot is that when you specify this in U-boot on the command line it also reserves it for U-boot so now you've got both section both the kernel and U-boot that are reserving this memory so you don't have to worry about it so that took care of one of the biggest thorns on my side it also allowed me to avoid any platform-specific code at all which primarily was around memory reservation and certainly in the kernel as I say here that in the U-boot case this utilizes the FDT features that are available today I don't know how far back those go but they're looking features so if you're looking it could potentially be back ported or you could again fail back to just writing a DT manually you're getting less of the dynamic nature of it but I've already covered this point there still is going to be some log coalescing and it does rely on the command line parameter to specify just the single memory location for the control buffer I think acceptable it's a good question so the question was does this not work at all on architectures that don't use DT I'm hemming and hawing up here because technically the implementation doesn't care how the reservation occurs as long as the kernel doesn't step on itself trying to access memory or the bootloader doesn't and I'm not as familiar with EFI and x86 as far as how it can do memory reservation but I believe there's capability there I did have a conversation with somebody about core boot and they do say that there's a way to do this as well it would be very similar so the answer is I think that it should still work it just may have to take a slightly different path in order to do the memory reservation what I found was that I ran into more problems trying to do a portable version of memory reservation than anything else I do have some ideas on ways that I might go back to using the open firmware functions early in boot to extract a couple of small values that might make that go away that's in my future looking slide at the end but it's a good question again my whole goal with this is to try and make it as portable as possible but there is unfortunately implicit in this a fair amount of work that are imposed on the bootloader to make it work the kernel already had the infrastructure in place to display the messages much more robust log capability so I didn't want to perturb that and I perceived that it would be easier to modify the bootloader that may not be the case, we'll see but moving on then one of the nicest things that I found about Uboot well first of all it basically had a very different very simplified version of the logging when I looked at what was present after 2012 with K's logging patch and what was at that point in time in Uboot they were quite different but Uboot has implicitly already the concept of a version log format had a v1 and a v2 which made it very simple for me to just extend that capability to add a new version of it which also made it very easy for me to make it a divertable path so if you didn't want to use the v3 structure it just simply took the v2 so this is why I introduced a new format and it just happened to be the same as the kernel convenient for me so in Dublin I talked about using several Uboot environment variables maybe too many to try and control this I've dropped much of these for a couple of reasons and I'm sort of cheating in some of my slides ahead here but in particular what I found was that during boot some of these environment variables weren't quite as reliable as I had thought they would be they were subject to initial boot conditions when you had a clean system that had never had an environment before and it was actually writing a default environment I also found that on occasion if I read a value too early I would get the default if I then went back and read it again I would get an alternate which meant that I couldn't quite rely on them as much as I had so I pulled most of these out for now and I'll talk a little bit more about what I hope to do with that moving forward so the bootloader upstream status I did since last time port it to mainline which was actually a lot easier code-wise than I expected there is unfortunately still some cleanup and refactoring there's some additional features in here that kind of crept in another co-worker of mine suggested improvements and gave me some patches that I'm going to have to kind of re-work to make it acceptable which means that unfortunately it's not submitted upstream as of yet for the bootloader on the kernel side I did relocate all of the stuff into that control block as I said and I also added support for repointing that control block based on the inbound command line parameter this one's kind of obvious here I have it basically during command line processing the values are pulled in and captured there's not really that many of them and then this is kind of important and this is where maybe I'll take a moment to describe why there is coalescing recurring during the initialization of the system memory initialization occurs approximately around MM and IT the actual call that sets this thing up the external buffer is the setup ext log buff it actually halts temporarily because at that point in time because of the fact that that global static existed the kernel has been free to write a whole bunch of entries I say a whole bunch a fair number of entries some 10 less than 100 I think in most cases entries into the log and then when we hit this point in time you have an unknown number of entries that may or may not have been written from the bootloader you have an unknown number of entries that may have been written by the kernel and if you've done multiple boot cycles then that problem gets a little bit worse over time so what's going to happen is that the logic of this is going to halt at that point and combine those into the combined region everybody following me on that point essentially yeah so I cannot I can't rely on having access to that buffer until this function occurs and this occurs late enough that there are some some value or some log entries in there already one thing I may have I don't know if I made this clear so if you've got that chunk of RAM that you've been writing multiple boot entries into that's going to get longer and longer that impact is not going to that that length isn't going to impact the amount of time it takes copy over it's only amount of entries that have gone into the kernel specific global static which is going to be generally pretty fixed if you're doing unless you're I can't think of a case in which this wouldn't be the case it's pretty much going to be fixed every time you're going to have roughly 10 to 100 somewhere in that range yes unfortunately it's not I did look at that and I thought oh well I should be able to just do two copies of these things and there may be a more algorithmically sound way to do this what I did I went for correctness in this case because these are variable records and because they can wrap it's a ring buffer there's no telling exactly where you are at any given point in the boot loader or the shared buffer when you come to the point where you're going to copy over the kernel there's also no restriction right now on the size so if the shared buffer is smaller than the kernel buffer then you're going to almost certainly have wrapped you have multiple boot entries and things like this so I think in the back of my head that there should be a way for me to simplify it right now it does a record by record copy to ensure correctness I suspect that I could probably do something that would be maybe two or three copies but I make a big deal out of this it just defends my sense of efficiency I would have preferred to be able to get to that pointer early enough so that the first print from the kernel lands in that shared buffer in which case there's no need to do any copy it just takes that buffer and runs I haven't gotten to that point yet so the upstream status so the refactoring of the code since last time I managed to figure out all the art specific code which was nice all of that was really related to the memory reservation and sharing pretty much almost all of the changes are located in printk.h or printk.c which is kind of nice from just a cleanliness perspective there are basically two exceptions to that there's kconfig in order to add the option and then the actual setup ext log buff call in main.c I submitted these to LKML for those who can read dates that was not that long ago if you're looking for these I forgot to add the date there so this is sort of a cautionary tale I meant to put the date up I think I submitted this as I said 10.04 and I think that I resubmitted v2 without any prompting from anybody on 08 and that was because I had to go and check some things what happened was as soon as I sent it in about 10 minutes later I got a very polite email from the kernelci.org that it had failed I was so pleased my first patch and I had managed to fail in 15 minutes what had happened was I had tested very carefully turning my feature on and off on and off and made sure it worked on the different targets that I was building for anything else but what I did not do was turn off printk itself and as it turns out I had moved something outside of one of those pound if deaths and appropriately I had to relocate it inside and but yeah I was a little embarrassed by that so v2 came out like three or four days later I meant to put the date up here and you'll notice that on my little github link here I've got a v2 there there's also a v1 so good cautionary tale I don't know why that one came out of order this got ported I decided just arbitrarily to port this to the mainline kernel at that point in time it was 4.8 so to be honest the part of the impetus for this and finally making me get off my duff and do this was because I knew it was going to present on this so I should actually have this submitted upstream so I've talked through some of these already but again I just I want to emphasize some of them physical versus virtual addressing most of us are aware of this it's something that we all kind of keep in the back of our head however bootloader uses physical the kernel uses both it depends on where you're in the code even though I knew this I managed to hurt myself several times on this so that's part of the reason why I had that additional information in the log structures and that lcb block because I use that to tell myself am I really using the right address and ironically I was using logging itself to help me with this so this one was another really key pain point for me the mapped memory versus unmapped memory when you first get into the kernel c runtime you think everything's hunky-dory all your memory is present that's not true you actually have to map stuff in something happened automatically command line is one of those when I went back and I was using that dt approach pure dt approach hey I had my command line so I'm up and running I've got all my memory right nope so this one actually cost me quite a bit good news is I learned a fair amount out of it but just be aware of this whenever you're looking at stuff that's going across these boundaries certainly early kernel but make absolutely sure that you know where your memory is and that it's mapped in when you go to attempt to address it because the most common behavior for that I found when you get into the situation is the kernel just stops nothing no oops nothing else it just goes and it's gone that's the technical term one of the other things that again this is like one of one structure packing so for years I've known this compilers are free to pack structures in unique ways there's no real reliable way to know that it's going to do it pay attention to your structure packing as it turns out I was using initially I forget which version of gcc when I was doing it it worked fine the packing all agreed everything was cool the client was good I migrated forward and updated my compiler tool chain all of a sudden the code hadn't changed the patch applied cleanly everything compiled and nothing worked and as it turns out the alignment of the structure the packing itself of the structure was bad so that line that I highlighted before that aligned by four and packed that I pulled out that was part of it and also I had to manually pack the structure itself in order to make it happy so one other little tip on that there's a lot of variability when you have a large u64 and so how that gets handled in particular how subsequent fields get handled can be different or if you have it at the end that seemed to be the most problematic no that's exactly right basically if you look in the C standard you know it says this quite explicitly in almost every book that you read about C programming bit packing in a structure is terribly non-portable and the actual packing of the structure itself so just pay attention to that especially in this case we're doing something that is crossing conceptually the boundary even though it's the same processor you've got a bootloader that looks at it one way and then a kernel looks at it another turned out to be very problematic for me one of the other things was that messing around in early init and init itself is quite fraught with peril and they are just quiet failures I kept finding myself going what's it doing there's a definition of insanity that you keep doing the same thing over and over again expecting different results so it would hang and I go reset it would hang again and I go reset I'd usually stop after three and go damn it and my first inclination was if only I had logging around then I could figure out what was going on and then I would kick myself violently and go get something to drink just be aware that any time you do this there's a whole lot of this stuff that is very fragile people have sort of got it working and then they've backed away from it and they don't touch it if they don't have to so have a strong constitution if you're going to do that I need some chuckles I'm getting you guys awake anyway porting the mainline as I said the patches themselves actually ported pretty easily most of it really compiled pretty easily too that was nice that really isn't a gotcha unfortunately this is a gotcha when you go great I'm done you're not done so this can give you a false sense of success so just going to keep that in mind in my particular case one other little surprising gotcha so all of that stuff about reserve memory everything that I told you I had figured out I had a solution that worked I had a place that I could reserve memory when and everything else I passed in my reserve memory region and it was working fine in the older kernel and the newer kernel thing stopped working and back to the quiet failure if I only had logging I would be able to figure it out eventually I actually turned on I did have to be using logging to find this ironically I had my own feature and I turned on MMNIT debug and was able to find out that the memory region that I had selected was unfortunately somehow being reserved by something else and it was just quietly dying because of that so all I did was and this was where the dynamic side of things worked in my favor all I did was change that one address and things started working magically one of the other things I've talked a little bit about this building because of tool chains can be a problem I really really wanted to have the working demo so I spent a lot a lot a lot of time trying to get my x86 uboot to work and that has been unfortunately non-trivial I've got some folks that are swearing that they're going to help me after the ELCE so by the next time I post anything on this I plan to have an x86 board and my one board working to show this across two different arches this one actually I'm a Yachto guy if you guys don't know that I'm open embedded as I said so this one was surprising to me how much of a pain in the butt it was to try and use a mainline kernel and a mainline uboot because of the fact that every distribution that I was using inside Yachto was really trying to grab hold of the kernel and really trying to grab hold of the uboot that it was using so this created a significant challenge for me and as I said here it was about 10 minutes so clearly I didn't test well enough it never occurred to me that I should disable printf itself entirely to see if my patch worked but just cautionary tale and I already explained that part so planned and future well planned impossible future org I need to obviously clean up the uboot patches so the proof of concept is complete and people can play with it I expect to do that fairly soon I say fairly soon because last time in Dublin I said I'll have this done by the end of the year and here we are almost a year later so I really want to get this working for x86 so that we can see what it looks like instead of uboot I may also and I think I put this in here I may look at coreboot I was asked several times by people who were asking me about this session about this this is going back to your question earlier the OF extraction of the LCB pointer now that I'm more familiar and I've done this a few times it might be possible using that DT combination with the command line and the OF extraction to maybe do away with some of that need for coalescing I'm in the back of my head and I need to go try it out and see what's going to happen there because OF functions are available actually if you know what you're doing they're available before C runtime comes up so in order to ensure that we get the same behavior of the buffer before the C runtime comes up then you would need to do it in one of those early early and yet sections if I can do that in a portable way it's going to be a challenge one of the other things that nobody has asked about is a time base right now the timer resets every time you reset so we really want to find a way so you have a single time base that can continuously incrementing this is much more useful for tracing instead of trying to find some marker in your trace file that says this is a start of a new boot this one is one that I've really been thinking a lot about so the environmental settings in U boot because of their instability in terms of being able to read and I kind of moved away from but one thing that I also was hoping to be able to do was I'm sitting there at my U boot console and I set setting in the LCB physical under our address I wanted to actually relocate the U boot log itself currently I don't know of any example where setting an environment variable triggers that kind of a change but that would be I knew I came to the right place and the good news is you live in my town so I want to pick your brain about how to do that core boot as I said and then we are into the tail end so again I apologize for no demo I really wanted to have that one working but sadly that was not going to happen questions I think we are maybe a couple minutes over maybe not a question but to extend on the core boot stuff so the core boot guys they had the same problems to get the logs there's also this payload concept so they also want to get the logs from the payload and so I think they didn't go the path to get a chart in the cataloging buffer but they just reserve memory themselves called core boot memory cbman and in there they can store several information on this cbman console and they just have like a marker which they then look for a memory and then they have a utility called cbman console which you can run on the console which finds this marker and then prints out the information they can also store timestamp information and so on and then only in the chromium repository there's a Linux kernel driver which also exposes over the slash sys but maybe it's probably more interesting to also get into an approach where Linux can by default without any driver access it but no idea why they didn't go this path probably because it was easier maybe because there was a loadable module they could do it that way so for those who may not have heard it the core boot has a similar functionality in terms of a reserve memory that has a kernel driver that they can use to access and also has a command to dump it so that's certainly something that I'm going to need to look at to try and combine with if I can for a core boot implementation Any other questions? Thank you very much