 All right, hello. How is everyone doing? If you're watching this after the fact, after I have streamed, make sure and check the description below for time codes, thanks to David. That's a great way to see everything we cover and what's going on. And not necessarily have to watch all two hours or so of this. So we'll get going here. I'm going to try it without headphones today, which is weird. I realize the only thing that I'm listening to in the headphones is myself, so now that I have the mic that's attached to me, so when I move, it should be OK. I'm going to try it without headphones. So let me know how things are going, and I'll say hello to folks. But first, I'm going to pause my playback while we get everything going. And I'll say hi in just a second. It's weirding me out not having headphones on, even though it's like there's no reason that I need headphones on. All right, so let me say hi to folks. Hi, Keith E.E. Hi, Linux 203. Hi, Bruce S. DCD. David was in the chat a little bit earlier. Hi, Love the Factory. Hi, Charles Bruneford from Discord. Hi, Unexpected Maker. Are you expecting any colors to drop in? No, I'm not. But I don't really plan ahead for that anyway. Oh, captioning is working today? That's good. It should be in low latency. Maybe it's not. Hi, Chadley Nerd. Hi, Beata. Is that where it was? Hi, Bruce. Hi, Michael. Hi, Andy Roberts. Hey, Dave Odessa. And I said hi to those folks already. Hi, from Morocco. Hi, Yusef. Thanks for tuning in. Hello, Mark. It says, I can imagine how weird it is. I feel like I live in my headphones. Yeah, I've been heads down the day making some good progress. And I usually listen to music when I'm working. But I think it keeps me hotter as well. So I was like, I'll try it. I'll try it without. Bruce says, got two eyes. Well, I lost track. Hi, VR. Welcome, welcome. End of the work week. It's true. How's everybody doing? Hi, Senkalp. Hopefully I said that right. Hi, Mototimo. I think we got some exciting stuff. I'm excited. I made some progress today, which is exciting. Oh, nice. Glasses legs hurt my ears. Yep, that makes sense. Band cape is awesome. All right, let's get going. I've got more debugging to do. I'm like on a roll. So let's kick it off. So hello, everyone. My name is Scott. This is the first time joining me, welcome. This is a, hi, Anthony. This is a deep dive. Happens every week on Fridays at 2 p.m. Pacific. That's what time it is right now. In case you're in a different time zone, that gives you an idea. Although things will shift around a little bit. We're getting to the point where we do what daylight savings time soon. So keep an eye on that. These streams typically go for two hours or more. So make sure you check down in the comments for time codes, thanks to David. Oh, you like to set up behind me. Thank you. Questions are welcome. So I'm happy to answer questions. And if you wanna join us on the Discord, that's the middle box here. You can go to adafru.it slash discord. Check out there. That's a chat that lasts all the time, which is great. So you don't have to wait just for a stream to join the chat. And I'm kinda doing housekeeping in reverse. So if you don't know who I am, like I said before, my name is Scott. I go by Tan Newt online. It's all run together. I work for Adafruit on Circuit Python. So Adafruit is an open source hardware and software company based out in New York City. That's the channel that we're streaming on. Just one of the many people that will stream on this channel. If you wanna support me and support Adafruit, you can do it by going to adafruit.com and purchasing some hardware there. I work on software in particular Circuit Python, which is a version of Python designed for bare metal applications, meaning there's no operating system. So that's typically on what are called microcontrollers, which are these little tiny inexpensive computers that are used to distribute work into all of the different small things that you're working on. But today we're gonna be getting out of the microcontroller box a little bit, and I've been working on... Oh, Mike, I was asking, are you in New York too? No, I'm not. I am in Seattle. I work remotely for Adafruit and I have for the last five years. Hi, Andrew R. So, what was I saying? Today we're gonna step outside the microcontroller box. We're gonna continue working on the Cortex-A series chips, which is an application processor, which is not really considered a microcontroller, but I've been working to be able to treat this application class processor on application class processor as if it were a microcontroller. So my goal is to bring Circuit Python to the Raspberry Pi's. In particular, I'm working right now on the Raspberry Pi 4. So that's what we're gonna continue working on today. And if folks have questions, please ask them. Otherwise, I'm totally down to just jump in and I keep working on what I was working on. Update from last week, I think I was trying to figure out how to get interrupts working. I don't exactly remember exactly what we did. But I was trying to get TinyUSB working because TinyUSB is a prerequisite for, usually a prerequisite for Circuit Python because Circuit Python, we really like to have USB functioning. But I ran into a wall. So I got interrupts working. And the problem that I was running into there was that there are, let's see, how do I explain this? Did we cover exception levels? So one thing that's new in application class processors, so Cortex-A versus Cortex-M, is that Cortex-A has what are essentially privilege levels, except they call them exception levels. So there's EL3 that has the most privileges and that is like what the CPU starts in as and that the GPU runs a little bit of code and that's supposed to be the most secure state. And then before it's handed off to us, it gets moved down to EL2. And EL2 is kind of known as the hypervisor tier. So this is the thing that kind of operates outside of the operating system. So EL2 is hypervisor, EL1 is operating system, and EL0 is like Inkscape or Firefox or any user applications that have the least amount of trust. And the reason they're called exception levels is because usually what you'll do is if something at a less privileged tier wants to do something, it might cause an exception that something with more privileges needs to handle. So if Firefox needs to get more memory, for example, it will just try to write to someplace and that will cause an exception and then the operating system will have a chance to give it some more memory. I don't, so Youssef is admiring the setup behind me. I actually, I do have a Haco iron, but I actually upgraded, oh, this iron here is not a Haco. I do have the Haco fan though. And I'm blanking on the name, the brand name of the iron. Yeah, I don't do a whole lot of actual hardware stuff, but I do from time to time. So it's nice to have a rework or a soldering station. So the problem that I was having with interrupts last week is I was, we're running an EL2 right now because that's where it kind of like starts us up and that's the hypervisor mode. And it turns out there's a bit that in the hypervisor control register that says whether to actually be interrupted or to actually have exceptions handled at that level. Cause typically you'll either want to handle them at I think the most privileged level or the OS level. So there was a bit there that I had to set in HCR register that said like, yeah, I want the exceptions. And then once I did that, I was able to get interrupts working. Did I, I forget how much I talked about interrupts last week or not. Are there any questions to connect the dots? And the bit that I had to set was TGE. Let's go to the desktop and I can find it. Squish this down off by one bit error. Yeah, yeah. And this is a 64 bit machine. So it's got a lot of bits that you can flip. JBC, that's right, Bruce. I think that's what it does. So the place to look is the arm. I have a lot of tabs open. I should probably go through them. So I was looking at this arm reference manual. This is the architecture reference manual. So Paul asks what channel have I go on Discord? So the channel that is on the stream right now is the live-broadcast-chat channel. Yeah, I'll tell you how I got there. So let me find HCR EL2, EL2, the documentation. So this is the hypervisor configuration register. And as you can see, it's a 64 bit register and it has a lot of bits. But then I found this TGE register, which if we scroll down, and I'm going backwards. So this is not how I found it. I'll show you how I found it. The problem is that it's kind of hard to find how I found it, because this document's 8,000 pages long. So I basically have to remember like what I searched for. Although it is cross-linked pretty well. Let's just look. TGE, trap general exception. So trap is another term for what I think it was an interrupt. So does it link me to, it must not be cached in a TLB. That would be bad. Where was it? I think it was Kimmer's model. So there is this table that looks kind of like this table, but is not this table. Where did I find that? I think there's a section here about interrupts, exceptions. Hi, Johnny. I think it's in this document. This document has too much stuff in it. I mean, it is not, I'd rather have too much document the notation than too little. I'll tell you that. System level programmers model, registers for interrupt, types routing and priority. This feels like, let me just, I know it has reference to HCR, nested virtualization. There's more than a thousand matches on this. Controlling data that's collected. JBC or JCB. How did I find that last time? Hi, Patrick. Like it's this table that says another file that would have this in it. Synchronous exception prioritization. I guess I'll, I could just, this sounds, aha, here it is. This is the table. This is the table I'm thinking of. So this is the section, let's see, D113, which is asynchronous exception types routing, masking and priorities. So they like to bundle it into two types of exceptions. Synchronous and asynchronous. Synchronous meaning like something in an instruction caused a problem. Asynchronous is just like it interrupt came in at any one point and now you're gonna do that. So here we have this asynchronous exception routing. In the tables it's explaining the different bits and it has this really cool or it has this sort of, so SCR is a config register set by EL3. So it's kind of fixed for me. And NS is secure or not secure. And so we're in not secure mode and read, write is zero, is one, I think. Oh, this is not quite the same. It's basically the same though. So this is talking about when you have an exception based on what level you're running, what exception level it gets bubbled up to. There's a different chart like this that has like A, B, C in it rather than these. But I think it's the same chart or it's essentially the same chart. So there's, and maybe it's from a different doc that's that could be it. So we're in non-secure mode and then I noticed this TGE bit here and we're in read, write was true and that's just set up for us. So if we look in this state, there's a C here and I think, or does it say what C means? C is the interrupt is not taken and remains pending regardless of the P state. So the P state is processor state and that's how the processor can turn on and off interrupts really quickly is that there's like a bit that's like right in the CPU about it. So what we're doing is that if read, write is one, although I guess it would be one here or it doesn't matter. But by setting TGE to one here, then we're in EL2. So it says hype, which I assume is the same thing as like the exception is taken to arch 32 mode. So are we in, we're in non-secure read, write is true. Oh, okay. So we're here and then the, when we set TGE to one, we're down here. That's probably really tiny for you, sorry. But it's then says it goes to EL2 here. Anyway, so that's the magic bit that I had to set in order to actually have interrupts, cause and exception and then the, what I've done since then is I wrote this in the peripherals thing. So I had this Broadcom peripherals repo that I've been working on a lot. Now I have it in circuit Python as well, cause I, the USB stuff is still not working and I got frustrated and didn't have any more ideas. So I took a break. But if we look in Broadcom, Gen interrupts handlers. So there's this Jinja C file that's really hard to read, I'll pull up the generated version. So what it's doing is it's generating a bunch of UART or like interrupt name underscore IRQ handlers. I think I talked about this last week where they're all marked week and what they all do is they just do a while true loop, which means that if you inadvertently cause an interrupt to hand, to happen that you haven't written a handler for, it just goes into the weak handler and just spins waiting for you. And that basically means that if it happens, it stays there. And so if you're using a debugger, you can find out that it happened and fix the reason that it happened. So these three I just added today, but you can see generally there's just a lot of them. This I did today as well. So the ox handler is one of the special ones. And let's pull up, there's a good diagram for this. And I'll tell you why I was doing that. So in the BCM 2711 data sheet and for the peripherals data sheet, there's this interrupts chapter. And they have kind of groups of interrupts. So they have RMC and video core. And then they have this explanation of these special registers. So what happens here is that there's four interrupts that come into the VC peripherals. So in this list, so ox here, and then there are also these ORs of things. So one interrupts going into the VC interrupts handles all the spies except one and two, which are an ox. And then all the ice grid Cs are piped into one interrupt into the larger interrupt controller and then same for the rest of the UARTs. So what I've done so far is that this ox IRQ will actually just call kind of like dereference it. So there's a weak handler here that's saying like, oh, if we call this ox IRQ handler like while the IRQ is active, check all of the three bits for it and then call the respective handlers. So I just added that. And the reason that I added it is because I've been trying to get circuit Python running on the Raspberry Pi. So I had to get some things compiling. There's some trickiness around like circuit Python has this model of using a 32 bit pointer and knowing that only like some of the bits are actually going to be used by pointers and the rest of the bits they can use to like indicate that the thing in the pointer is not actually a pointer. It's called, what is it called? Pointer Boxing I think. I think that's the term for it. But because we're on a 64 bit platform we have a bigger pointer and there's a different way that some of that boxing happens. So that was some of the stuff that I had to fix. So let me show you where I'm at because I think you'll be excited. So I've got here. So I think this is the same setup I was showing last week and let me try to make it a little bit bigger. So on the left is the J-link or open OCD and I build in that window too. This bottom right is GDB for debugging and then the top right here is spoilers which is the USB to serial link to the Cortex M0 or the CM4 board not Cortex M0, the CM4 board. So this is the USB to UR converter. And so spoilers, you can see what's happening but I'll turn it back on and we'll see it pop up again. It does take a little while for the GPU code to run. So here we're in and I have these debug prints for the GC and we'll see why in a second. But what I just got working is if I hit enter now I'm in the REPL. So I can do one plus one or one plus two and get a result. I can do one plus 20 and get a result. I can do, I don't know, 400 times a thousand and get a result and that's all well and good. But the problem is the next thing I did was I did print hello world and it hangs unfortunately. So this is where I'm at. I made it really far. So I had to get the, there's one UART called the mini UART in AUX peripheral. And I got that working so it can read and write bytes across the UART. This is using the debug UART stuff that CircuitPython has. So I'm like super close to like having it all kind of work minus USB of course and USB is like, so I wanted to work. I really do. But this is really good progress. Getting the UART working allows me to work on other things that I'm interested in working on instead of just being blocked behind USB. So I think the next thing I'll start and probably next week, I have to fix this bug first. But the next thing I kind of want to do is do the display stuff that I've been talking about. The reason that I want to do this is to actually have the Blinka terminal display and display IO all work with HDMI stuff. So that's going to be kind of like the next step I think and this allows me to play with that. And that is also like really the, that's really one of the main goals or the reasons that I want to bring it to the Raspberry Pi is so that the HDMI stuff works. So I'll probably work on that next week. But I ran into this bug and it seems like a huge bug. So I figured we could try to fix that today. And Patrick says, that's awesome. Thank you. Okay, so I was doing a little debugging before the stream. So let me just get everything back going. So I just started OpenOCD again and so I've got to reconnect to it from GDB. So I'll do that. And now we can see that we're in error hang. So this happens when, it's the equivalent of like a hard fault exception, except that are a hard fault interrupt on a Cortex-M. But potentially like because there's those like tiers of execution, those exception levels, like theoretically something at a higher exception level could I fix something that caused a problem and allow the code to actually continue running. But we don't currently try to fix anything. We just do this error hang. And I did this, I think I showed this in the last few weeks. I did this Cortex-A, which is now in the post tiny USB Cortex-A.py. So if I source that, which I had already actually sourced, I have this ARM V8 exception decoder thing. And what it's telling me is that the translation fault level one, so it's a data abort, meaning the CPU tried to read some data, but there was a problem reading that data. So there's two kind of like faults you could get, like either the address for the instruction was wrong, you'll get an instruction fault, an instruction abort from the same level because we're not, we're in the same exception level. But in this case, we're getting a data abort because there's data that we're trying to load that's like from an address that we don't have access to. So we just go up. What I noticed is that this O in, so we're in this MPOBs get type. That's kind of like that call is where the problem's happening. And if I print the pointer that we're trying to get the type to, the object, we can see that it's actually like pretty large. So if you count like here's four hex digits and here's four hex digits and then a one. So we're actually like above the 32 bit range which is probably a problem because if we look up here our GC heap which is where it should live is only, see there's four hex digits there and then three. So for some reason the pointer that we're trying to get the object type to is like over our 32 gig or not 32 gig or 32 bit boundary. And the way that we have the MMU setup right now. So I think we talked about this as well is that another feature of these application class processors is that they have a memory mapped unit. So you map from virtual address space to physical address space. And the way that we have it set up right now is that only the first gig of virtual address space and the range for the peripherals are mapped one to one. And so what's happening here is that and that's what this translation fault level one is telling us is that we're using a two level table with a translation level one and two level tables that have like those given range sizes. And so it's telling us that like we tried to look up this virtual address and we couldn't find it in the page tables which is also the name for like the tables that map do that mapping. Because that mapping actually usually happens dynamically although in our case we're doing that mapping statically. So the real, the problem that we have to figure out that I haven't figured out yet is why we're getting a pointer that's too big. Why are we getting a value in that's just too large? So what I was thinking I would do next is because I'm printing out the heap here I thought I would just print out all of the allocations that we're doing on the heap because printing seems to be working pretty well. And that may be able to tell us like it'll be able to tell us what the ranges that we're actually allocating are. And we could take a look at that. So that pointer is greater than four gigabits, gigabytes. Yeah. I don't actually know my like gigs in terms of hex digits. I gotta learn it. Like I usually work so low that like I don't really know how to count that high. But yeah, that's my theory and if I had to guess and maybe we could actually print it out but my guess would be that the actual value is like ignoring that one at the top is the actual value but it could be wrong. Yeah, the pointer is outside of the expected address space. It should be like I assume that this object that we're trying to print should really be like much lower. It should either be in like our static data that's in memory between 80,000 and up and then the heap which is only up to this value here. So let's just start poking at it. So let's go into, so I think I know the root cause. Well, I don't know the root cause. I know what is messing me up, I think. So I was talking about the pointer boxing and that happens in here. And then you can see that there's these different object representations. So we're using object representation D which we don't use in any other case. We usually use C. So on all the 32 bit stuff we're using this object representation C but because we have 64 bits we're now in this micro pi object representation D case instead. So there's some code here that does sort of like wrapping and unwrapping. It could be, yeah, like here's just a cast. Like there was this weird MP ROM object stuff that seemed very strange to me and actually obviously undid and commented out. One gig is 4,000, 4,000 and then zeros. Let's just print out the values that we're getting out of a GC. I mean maybe we're getting, so here's the initializing heat print and then GC out possible and I just changed these. Okay, so let's do our dance. Pointer being overwritten, it could be. What's at that value? I think that's the heap. So I'm trying to do a 32 megabyte heap and I'm just like, that's gonna be so amazing. We've never had a 32 megabyte heap before which is like super tiny for the Raspberry Pi. But I'm amused. We're going up again. That's interesting that these allocations here are like pretty far. Oh, maybe that's because the greatest says I just got a zero 1.3, can I use this with any LCD? Usually LCDs are driven by I squared C devices and that should work. But a zero can't really drive an LCD directly. Well, that certainly seems like the range. So let me just do the REPL. Yeah, so see those allocations are all like very much in the same range. This could be because there's gotta be space at the start for all of the metadata that the heap needs. So that could be why that is being allocated here. So it looks definitely like it's correct if we just connect to it again and do the same thing we just did. Again, there's like, we can see we definitely allocated exactly that. But for some reason we're getting this extra top bit set. Which is weird. So it could be a case of like, so print helper, we can see that this number here is being up. But I think the problem is that like we're not able, like our stack is not quite right. One thing I was thinking about was actually trying to get the stack correct once we get into the exception handler. So maybe that's what we could do today. So I think the problem is is that we're here in this function or another function. And when we get an exception we're not like adding the correct frame. If that makes sense. So like we're not correctly filling out the stack so that when something wants to unwind it can do it correctly. Which is annoying. But it would be nice if we could figure that out. Because then in theory we could actually rewind unwind completely. Because I don't think our stack's being corrupted here in the sense that like, I think there is probably more data like further up the stack but it's just for some reason not being read correctly. MPOB's print helper, reduce what we're looking at. Like ideally we would be able to go further out the backtrace. Let's just take a look. It's the AAPCS, I think. AAPCS, what it is? Procedure call standard for ARM architecture. It seems right. Oh, is it just a download? It says standard variants. Core instruction set. This is 32 core integer registers. The first four are used to pass argument values in return. Subroutine must preserve the contents of the registers. 4 through 8, 10, 11 and stack pointer. Hi, Stefan. Stack, stack must be double word aligned. Are you diving into how to decode the call stack? I'm actually, what I want to do is I want, so I think what's happening, the reason that the decode is failing is because I'm not in peripherals in boot.s. There's this code that I snagged from somewhere that has this IRQ entry, IRQ exit. So this is how it's saving all of the registers and things. But I think what it's doing is it's not adding a correct call stack frame. And so the GDB is unable to kind of correctly unwind it because this data, this is wrong. So I'm not trying to decode it by hand, what I'm trying to do is when an exception happens, like correctly save the state so that the unlined still works even if I'm in this error hang state. So yeah, that's what I'm trying to do is figure out what format does all this stuff need to be and maybe there's other data that I need. So yeah, I think that's what I'm trying to figure out. A list of stack frames describing the current call hierarchy in the program. Each frame shall link to the frame of its caller by means of a frame record of two 32-bit values on the stack. The frame record for the innermost frame belonging to the most recent routine invocation shall be pointed to by the frame pointer register. The lowest address word shall point to the previous frame record and the highest address shall contain the past value in LR on entry to the current function. Your call stack is sometimes wrong or always wrong. I think it's always wrong when I'm in this error hang state. It looks better if I end up in a different, like in a C exception. Like I have the C interrupt handlers, like if I'm in a C interrupt handler, I think it looks okay, but if I'm in this error hang state, it's not right. Subroutine call can be synthesized by any instruction sequence that has the effect of, I just need to see like the layout. So it's not something as simple as missing a frame pointer. It could be, I don't know what, I don't know enough to know, unfortunately. So like when you take an exception, like the LR register stores it, not the stack, maybe this is it, it's not a GCC thing, right? Like the order that it pushes all the registers, right, position independence, arm C and C plus plus mappings argument passing conventions. Like why get, there's gotta be some diagram of just like, well, this is what a call frame looks like. Like we're not doing any of that. C, eight arch 64, call frame. Ooh, good presentation. What other layouts do the, I mean, it should just be the standard one is registers. This I will probably have to do too. Memory ordering, this may not actually, right? So those are registers. Are different types of exceptions using different call frames too? No, like all of them go through this. Call frame is defined in API. No software, API. Hey, look at that, that might be what I want. So the, like pushing the IRQ entry and the IRQ exit is used across all of them. So the only, so all of these invalid ones, they just add a number to a register, which actually might be the problem now that I'm thinking of it. So they do the IRQ entry and exit, but then they see handle IRQ just pushes it all, but these invalid entry things actually store more stuff in registers. So I wonder if I just, if I didn't do that. So I wonder if I wonder if that would fix it up. I'm reading from Restream, Lunk Will is helping from Twitch. Okay, let's just see what that does. And I know this is inefficient, but I haven't found a better way to, I would also take the time to figure out a better way to load it. You can put your headphones on, you just do it. All right, let's, while that gets going, let's look at the ABI thing. Oh, it just took me to a search. How, just putting that. ABI. Base platform ABI for the ARM architecture. ARCH64, this looks very similar. Unwind codes. Let's see if we can do this. Did some allocations before it hung. Your platform is Linux, not Windows, I'm guessing ABI is different between the two. Yeah, so I'm actually compiling with, yeah, it's still, it's still confused. I mean, I'm compiling with ARCH None Elf. So that None Elf GCC. So I'm not actually sure what that means in ABI terms. Oh, one thing we can do is, I think there's an info frame. Stack level zero frame at, program crowning earn hang. Saved program and counter equals 958D0 called by frame at 7FA5. If it's Elfin, that's what Linux uses. I thought maybe that they were, they both followed. Is there a way to print the ABI or something? I mean, I think I'm on the right track, right? Like that's why the stack pointer is broken. I mean, I'm surprised that, so see, this is interesting because the saved program counter here is zero, which doesn't seem right. This is the sort of thing like Wikipedia might have. GCC stack layout, stack layout and calling conventions. Frame layout, to find this macro to be true of pushing the word onto the stack moves of the stack pointer. Well, it looks promising. GDP's memory layout. M4, investigating the stack. Like I just don't think it's right. That looks more promising. See, the problem is that I can't just call a C function because normally what would happen, system five, ABI, what about dumping memory and drawing boxes on what the frame looks like? So typically in C with a function call, like there's some stuff that the compiler would generate before the function call. So like there's a distinction of like the caller saves versus the callee saves. And in this case, because we're an exception, like the caller never had a chance to save anything. Which is a problem. This is the ABI for 32 bits. Procedural call standard for the ARM. Here's ELF, view on get up. Oh, that sent me to the same place. But let's take a look. Exception handling ABI for the ARM architecture. What is that? Oh, but that's for 32 bit again, don't do that. ELF, I just opened two pds and new tabs. ELF for the ARM procedure call, data types and alignments. I'm doing one-on-one lookups. Yeah, so the exception handler is trying to save that unsafe state. Right, so usually a function needs to only save the callee designated stuff. But because it's an exception, I want it to do both. Lunkwell says, another way you might want to replicate function preamble might be to look at what produced for a normal function, godbolt.org is great for this. Yeah, but the problem is the function preamble is only gonna do the callee saved stuff, right? So Patrick's talking about one-on-one. So there's like two main places that memory is done, there's the heap, which is dynamic and things can last in different time periods where the call stack is keeping track of what function called another function. And that's how you get a backtrace is that it's very sequential, like it grows downwards in the stack and all of the stuff saved on the stack has the lifetime of the function that it's in. And so as you finish a function, you can move the stack pointer back up. And there's conventions as to how, because if you take one function and call another function, you have to worry about like, are we using the same registers? Are we using the same CPU memory? And for any of that that we might, that the called function might use, it needs to, there's some way of saying which registers it has to worry about preserving which ones are preserved by the caller. So yeah, a little bit of a better explanation. Top of the stack has locals and then a return address and then parameters for draw line. Construct a linked list of stack frames. Each frame shall link to the frame of its caller by means of a frame record of two 64 bit values on the stack independent of the data model. The frame record for the innermost frame belonging to the most recent invocation shall be pointed to by the frame pointer register, the FP. The lowest address double word shall point to the previous frame record and the highest address double words. There's gotta be a diagram. Does the stack grow up or down? Grows down. Usually it's defined. So eventually your bare metal code wants to decode the exception before it's trashed. What I'd like to do is I'd like to see the call stack between the print and main. So I think what's happening is that as it's trying to unwind, it's not aligning correctly to the call stack because I think it's preserved. It's just not able to find it. There's language mappings again. Ooh, a diagram. Although this standard does not mandate a particular stack frame organization beyond what is required to meet the stack constraints described in the stack, the following figure illustrates one possible stack layout for a variadic routine which invokes the VA start macro. Stack grows down. I wanna fake it so GDB can trace back. That's a different problem. Yes. So I think if we just got this frame pointer stuff right, we would be okay. Because this SP can change and that's fine. And there's local variables and saved registers that can all go there. So I think it's just the linked list that's wrong. So really what we, let's take a look at this again. So we're putting, we're moving the stack pointer. We're pushing, we're storing everything there, but we're not moving the frame pointer at all, which potentially is fine, because we're just changing the stack pointer. So the frame pointer is in the same place. Which register is the frame pointer? Symbol versioning. Object files, symbol table, relocation. I don't think we want any of that. This is like the closest thing that I have found to what I want. Like this is where we're starting. And then if we change FP, which you know, purpose registers, FP is register 29, which we're not changing. So I'm not sure why it can't figure it out. Registers. Register 29 is 7FB30D0. Interesting that the program counter is, pointer is 7FA20, but 29 and 30 are the same. Called by frame is 7FA50. That doesn't seem right. Oops. 48. Like previous frames stack pointer is 7FA20 is not right. Maybe it would help to think of the exception as being in a different thread. I wonder why we're getting another frame. Like what I could do is I could save two more. Like this is just a linked list. So where's my, like what if I stack overflow time? J-Walk age. I feel like I would have found stack overflow if I was gonna find stack overflow. But it could be a GDB thing in knowing it's in an exception. Like that definitely could be true. Cause like this is saying stack level zero is frame at 7FA20, which it's coming from the stack pointer. It's not coming from the frame pointer, which is suspicious. Like why is the frame right there? Like why is it looking at that first frame if that's the case? Called by frame at 7FA50, which seems right. Yeah. I mean, the macro is pretty straightforward. It's just move the stack pointer to 72 and then store. These are pairs of registers. So 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, like it just stores everything. It stores 31 registers. And then the stack pointer is registered 31. So why would GDB, oh, you know what? It's live backtrace that I think everybody uses to do this. C library that may be linked in the C program to produce symbolic backtraces. This library relies on the C unwind API. I don't know. Ah, Lunkwell says, I don't have a deep understanding of your particular problem, but could the GCC flag F omit frame pointer be generating incompatible code? Can no omit be forced for this object file? Let's just, yeah, let's check our built flags. So we are running bare metal. And so we're like all one giant object file. Freestanding no start flags, but we can double check to make sure that we're not omitting that. It looks like it's in some of the expressive stuff. Does it default on or off? Oh, this is enabled by default when compiling with 01 or higher. How does it figure out where the, right, there's a GCC flag to dump all of its settings since some flags are toggled by others under the covers. Yeah, good night, Dave. Thanks for hanging out. Let's see what the GCC omit frame pointer. If the frame pointer is omitted, how does it unwind? If the function does not need it, instructs the compiler to store the stack frame pointer in a register. Parts that the frame pointer is stored in register X29. Frame pointer limitations for stack unwinding. It is a debug build. Ops jump or a GDB disassembly should show if a function is using a frame pointer. We can do that. Kind of what Luke Lunkwell was talking about. We could just do a disassemble, PT disassemble NP cobs print helper, which I think might be too many things to tab complete. Well, there you go. So if X29 is the frame pointer, it's moving the stack pointer, or no, it's storing, the first thing it's doing is it's storing the frame pointer, the current frame pointer, and the X30, X30 is link return, right? And then it's storing that to the stack, and then it's moving the current stack pointer into the frame pointer register. I mean, we could just do that. What's interesting is that the stack pointer is negative. That's right. I mean, we could try just doing that. Those first two instructions. So the first thing a function call is doing, what was 30? Doesn't mean it does. This is something I was thinking about doing, so I'm not surprised by it. Got a meeting. Good luck. Thanks, Lunkwell. I appreciate your help. You should join our Discord, a-b-a-f-r-u-dot-i-t-slash-discord. Could use more of your help. Itanium exception handling them. So stack pointer on entry. Live unwind. Yes. That's what I was thinking of. System Linux Arch 64. I think this is what is used. Macadam snoring. Primary goal of this project is to define a portable and efficient C API to determine the call chain of a program. Implement the stack manipulation aspects of exception handling. Debuggers. Huh. Useful for a running thread to determine its call chain. Efficient set jump. T-shin. Drill would be really nice right now. I know, right? It's just a fast backtrace. Yeah, see, when they say exception, they mean C plus plus. IA64 unwind conventions. Much of it is currently documented in the form of source code only. Does GDV use live unwind? IA64. Deep wizardry, stack unwinding. This should be, oh yeah. Welcome to programming, folks. Yeah, always easy. Have a great weekend, unexpected maker. Sorry I'm making no progress. The reason that I was thinking about this too is that the Cortex-M interrupts, model the ABI so that it works. And so that's what I'm trying to reproduce. Winding frames in Python. Unwinding is the process of finding the previous frame. The sad part is I know the top of the stack. There's no reason to unwind. I could ask it to start from the other side. Standard frames. Ooh, however, mixed language applications, like a JVM use frame layouts that cannot be handled by GDB unwinders. See, this would be cool if I could actually do a back trace between running circuit Python code and C. Like that's another thing I've been thinking about in terms of being able to handle these frames. But now that we use a separate py stack, we could do it separately. Sweeney says, fun to see how you approach development. Like I do, I have the RP2040 chip Wi-Fi from Arduino and want to use underscore thread to activate the second core. Which I got to work, but then the WS2812 isn't available. Huh. Is that, that's in micro Python? Cause we're, circuit Python doesn't use the second core for the, doesn't use the second core for NeoPixels. We use PIO instead. PIO and DMA I think. Wind info. Wonder how we debug pending frame to architecture. Winder skeleton. Registering an unwinder. Unwinder precedence. GDB first calls the unwinders from all the object files in no particular order than the unwinders from the current programs space. And finally the unwinders from GDB. Writing a frame filter. I mean, this stuff is really cool. It's just like, USB is blocked. Let's move on to this. It's a good lesson for people new to bigger projects. Yes. Yeah, and before I switched off the USB stuff I emailed a number of people being like, can you help me? Including TAC and TAC said like, he tried to get it running on his Pi 4 and then it was like, I really, he needs the same setup I do. So he's getting a CM4 and an IO board. But he said he should get it next week. So TAC has a lot, a lot of USB experience especially with the particular thing. So I'll at least get fresh eyes on it next week which will be good. And then of course like, if I had struggled and struggled and struggled on USB I would not be here struggling on this other issue of like, why is this number wrong? Which I guess is the top of the thing. But like, if I can spend the time to figure out how to get the stack trace working that means that then I can really like it'll just make it easier to debug all the things that happen in the future. ARH 64, exception. Co-host for next week. Well, TAC is in Vietnam. So I don't even know what time it is there. I think it's the middle of the night for him. So I don't know. Ooh, this looks promising. ARH 64, unwind. Extract frame address. 5.30 in the morning. Yeah, he's not awake yet. Do I really need live GCC? Call frame. Oh, is this the same thing I know I haven't downloaded this before. But this looks awfully similar. The stack, the frame pointer. This is what I was reading before. Why don't they just have a freaking diagram? This is just the source for that. What does the hardware standards say the CPU does when it encounters an exception? The things that I've seen that the hardware does with an exception is that it disables the exception via the P state register. So that means that once you're in an exception, another exception won't happen on top of it. So what you can do, what that means is that you can do all that saving that I was talking about and then turn it back on and then you can nest in theory. Does it come down to, Andrew says, does it come down to the exception entry code needing to update the frame pointer to be stored on the stack to a GDB correct value rather than storing the previous frame pointer value when exception occurs? Yeah, maybe there's another word for exception so we don't mix the two. I mean, interrupts, but I mean, maybe what we should do is just do exactly what the first instruction is. It's storing, like this is storing those two values, 48 bytes lower in the stack. Like that's what that's doing. And then updating the frame pointer to where the stack pointer is currently. I wish, I wish, I wish, I wish. The address thing is stack variables. That doesn't look like arm. I don't know. I mean, ooh, standard register usage. R30 is the link register. This is what I was talking about. Text registers. This is exciting, isn't it? 8R64 frame. You have our attention. Well, David, you're paying attention. I'm not sure how many other people are. LP64, AAPCS64. There's gotta be some sort of like intro that is an application binary interface. The main thing to think about with looking at the assembly is that LD is load and ST is store. Like most of that, yeah. So STP, so store, store, store, move. So moving is the destination of the first register, I think. So, like this is what's cool about computers, right? It's like they're really just data machines. So there's lots of just like store, move, store, store, move, move, move. BL is branch and link. So that's like a function call where link means store the register. There's a post increment. Oh, is that a post increment? So it is mutating the stack pointer at the same time as well. That's a post increment. So it's storing those two values at the stack pointer, but then that's a post increment, I guess. But these two values are, they're eight bytes apiece. So there's like extra stuff between them. Well, the P may not be, so there's different classes of storage of stores. So like this STR is also a store. So there's like multiple ways of doing stores. I noticed that when you initially backtraced the frame pointer was the same value in both versions of the stack. So I think the unwind is doing, when I do info registers, like info registers changes depending on the frame pointer. Like I think it's trying to be smart like that. See, the C stuff are comparisons and is like a bitwise and STP is store pair of registers. And then STR is probably just storing a single register. So like ultimately we want to store all these things. We're just storing them in the wrong order. I wonder if there's just more. Are the assembler commands abbreviated to save space? Yes. Yeah. What is the right term for that? But yeah, generally they don't like to write them out because that would make it too much. That would make it too easy to read. Well, pre index is with the exclamation point. So there's an exclamation point here. So it's pre index matters where the number is. 32 bit, I think that is a pre index. Our assembler to be fit into columns. Ooh, does this have it? The arm cheat sheet. Assembly commands is what they're calling it. I wish I knew more about stack unwinding. Ooh, Tijiki has the NRF driver boards available and LED glasses. I was thinking about, I have the glasses but I don't have the driver board nor do I have actually glasses, actual glasses. Hitch cost. There's some people that are so much smarter than I am. That are like in the deep weeds of all this stuff. I could try to find the problem. I could try to find the problem without fixing this but we've only got 10 minutes left on the stream and it's gonna be way easier if we got it correct. I wonder if we could find another example of an interrupt handler. I mean, we could try to find, like there's this circle library that probably has it. Holy cow, where did the time go? Right? There are articles on arm stack unwinding but I don't know which would be useful. Hmm, is there? Ha ha ha. Maybe there's stack unwind samples. If we wanna understand how to unwind the stack we need to know how the system invokes a function. Bigger. Great stuff. Thanks, Jason. I'm surprised people enjoy me watching me not succeed. It's good, it's really good that all the registers are named using x86. Here's the name mapping. Frame pointer is x29 in our 64. The STM32H725. Huh. Yeah, folks are getting stuff done while we do this. It looks very elegant but the real world is dirty. Arm won't guarantee the frame pointer even if we pass no omit frame pointer to the compiler. There are at least two cases for compilers that won't follow this elegant stack frame in the APCS doc for details. In the case of leaf functions much of the standard entry sequence can be omitted in very small functions such as those frequently occurring the function call overhead can be tiny in the real world. Anyone know what Scott uses to get that grade out command options in the terminal? I think that's what you're talking about is coming from fish. I use fish shell. Fish, fish, fish. What a funny word. Yeah, so I think that's what's doing the grade out command options. It's realistic to see you struggling not every day as a slam dunk. I mean, I got further, like when I started today I wasn't reading from the UART at all and I had to debug that so that was good. Dwarf use is a data structure called the debugging information entry deep wizardry. All I wanted to do is be correct in GDB. I mean, that's weird. Like why isn't it right in the functional static the compiler really doesn't have to adhere to any conventions. Generally all arm registers are a general purpose. This is arm not arm 64. We're doing 64 bit compilation. Well, I don't know if Dwarf can't do debug information for the assembly for the exception stuff, I assume. Yeah, this is not, there's no unwinding thing there. L32 or L64, LP64, looking at the assembly pseudocode. How easy is it going to be to convert my bash RC to my fish RC? I think that depends on what's in your bash RC. I'm using it pretty stock standard. I'm using fish pretty stock standard. I do have an extension called oh my fish that is doing the like prompt. Like this prompt is coming from something called oh my fish which I find very handy to know what I'm in. GCC unwind assembly. I don't remember the noise from assembly output. Oh, you know what, I never did. I was gonna see if I couldn't see what they do. Arms dub, no, that's not what I want, I think. So this is kind of one of the canonical, yeah, oh my fish and bob the fish theme. So one thing, we're not turning the cash on yet either so this is kind of interesting. Invalidate data cache, clean the whole decache. Hmm, oh, they're using push. But V7, that may be earned V7. Delay loop, exception stub, macro stub name. There's a function bass that will light around bash scripts under fish. Can't you just do bash and then the script name? Like just call out to it. Well, this certainly looks a lot like, oh, look at this, IRQ stub is even store 29 and 30 onto the stack. Save, you know, one on save the floating part registers on IRQ. I think this is probably what, store return address for profiling. This looks actually like very much what we want. FIQ stub, oh, what does this do? So this exception handler is not doing everything. It's interesting that they're doing the stack pointer moving as it does it. Let's just try it. Oh, we're over time. But like this IRQ stub looks very much like what we want. So like what we're doing here is all of this saving of all of this stuff. But there's just like this little bit of other stuff that's happening. And we can skip this enable FIQ. So let me just snag this. Let's put it, let's just try it. I just wanna see. So in IRQ entry, we're gonna do this first and then these ones were in yield two. So we're gonna do yield two. And then it's interesting that, oh, they're storing it a second time. Preserve, make a new stack frame, maybe. Make a new frame. And then this is save, save all registers. Do it, try it. What's the worst that could happen? It's true. And it's in get. So I can always just get check out if it turns out. So we're not gonna do this. And they're saving it. So they're just saving X0 through X28. And they're doing this immediate stack pointer thing, which I kinda like. It's not, like this is kind of susceptible to like ordering changes. I've forgotten how to do the, and exclamation point, right? Okay, so we're not gonna do 29 and 30 because we already did 29 and 30. Show some of your verbal, like, ah, why are you doing it by hand? And then it's just a regular store. It's interesting here, IRQ return address. It's a regular store. It's interesting that this is minus 16 now. And look, they put, they store 29 into zero. And then they call it. I wonder if this would be enough because we don't need to leave this. We just need to, like, because we're gonna hang. It doesn't matter if, well, I guess getting this right, because we're gonna take other exceptions before we hang, because this is shared. So it was right. And it's the reverse out. And this is the thing that needs to go in the end. So we need to do the same, same thing here. And then we restore those two things. Oh, they offset it. Oh, interesting. These are not reversed. We don't wanna do this because that's changing our stack pointer. And we're doing that every time now. So we wanna actually do, I'm gonna have to swap all these. What's the format here? Code also does x31 in circle. x31 is the stack pointer, right? Okay, so in here, in this case, the 16 is outside the brackets. So that's kind of what we're going for. Just wanna see if this works. We're gonna do it backwards, but I don't think that matters. I think the main thing is that we want it to be the reverse of this order. Otherwise, it won't be correct. We'll restore it in the correct order. Q31, yeah. It's amazing. I'm amazed anybody pays attention to this. But hopefully we'll get a payoff here shortly of being able to look up the full stack trees. And then we'll call it quits, and then I'll probably, I may hack on it some more. We shall see you in the reverse. Okay. And then we restore EL2s, registers, and then we restore X29 and 30. So that's like, I'm literally like checking for the reverse of these, which it looks right. Let's try it. Dexter slept through it. No worries. If I can put some people to sleep, because they need sleep, then, you know, that's totally fine. If I eject from the Dolphin program, it crashes stuff. So hopefully we'll see stuff here. Okay. Okay. Okay. One plus two works. This should still crash, because we haven't fixed the original bug. So we're hung. Have a good one, Minnesota Mentat. Open OCD. Backtrace. It's worse. Well, darn. Not enough registers or memory available to unwind further. Well, let's call it, we're over time. SavedPC is that number? I might have to, yeah, exactly. DCD says, time to dump the memory. Yeah. Yeah, I kind of think like, like where's that SavedPC number coming in? Like it's definitely not correct. Info registers. SPSR. So that is like, that's what it's saying is that saved program counter, and that's definitely not right. Anyway, I'll keep poking at this after I take a break. Let's switch the view. So thank you everybody for joining me for yet another deep dive. We definitely dove deep today. Unfortunately, didn't actually fix this problem. But I think we've demonstrated at least that we're in the like, vicinity of it, right? Like we just got a different result for the backtrace. So clearly we're messing with something that is impacting what we're looking at. So this has been a deep dive. They happen every week at Friday at 2 p.m. Pacific. I live in Seattle, that's why it's Pacific time. I work for Adafruit. If you wanna support me, support Adafruit by going to adafru. Wait, that's Discord. You wanna join Discord, adafru.it slash Discord. If you'd like to purchase some hardware from Adafruit, they pay me to do this stuff, go to adafruit.com and purchase something there. Sounds like there's some stuff on Discord or not Discord, DigiKey. DigiKey has some Adafruit stuff too. So if they have stuff that adafruit.com has out of stock, we encourage you to buy from resellers as well because that does, Adafruit sells to the resellers and so they do get a portion of that as well. Next week is on Friday, as always, or as almost always. Yeah, I think that's it. I'm gonna keep plugging away at this. I thank you all for hanging out and I'll take off my mic and give the cat a pet. Valiant try, good luck. Thank you, Jason. Yeah, I think I covered everything. Thanks again to Patrick for putting all the notes in the deep dive repo and thanks to DCD David for taking notes in the first place. And with that, I'll wake the cat up and get some kitty purrs going. Have a great weekend, everybody. If you wanna check this out, CircuitPython is at circuitpython.org, github.com slash adafruit slash circuitpython is the main repo. If you wanna follow along with what I'm doing, I tend to push to github to back stuff up. So you can check out github.com slash tanute slash circuitpython. Hit the branches tab and you'll see the latest branches that I've updated. This one is an RPI and then all of the Broadcom peripherals go to github.com slash adafruit slash Broadcom dash peripherals as well. So check those out if you wanna follow along or if you even wanna fix it for me, ideally. Anyway, I will call it and have a great weekend. Of course, I'll pet the cat. Right, Spook? All right, have a great...