 The story so far, I am porting Fusix to the ESP8266. So far I have the kernel up and booting, I have the userland building, I've got a file system on the internal flash which is successfully being mounted, and I have written a binary loader because this platform is a bit weird and can't use the standard one. So today we are actually going to start running code. Now, last time I got to the point where it was actually jumping into the loaded binary, the init binary, and the system was then hanging. So we're going to have to try and debug this. However, there is one flaw in that that I'm going to have to fix. That's in the level which is that I haven't actually flushed the instruction cache. And when you use data instructions to write code on most platforms these days, you have to ensure that there's no cached instruction data as writing via data instructions will not update the instruction cache. So the isync instruction does that. So let's just build and burn it and see what happens. Probably exactly the same, but it's worth a try. So we scan the flash, yep, it's a hang. Okay, so we're going to have to do some debugging. Unfortunately, I do not have a debugger. I do have a JTAG device that's attached. Unfortunately, it's not a very good one. It's a bus pirate and the ESP8266's JTAG interface is also not very good. And I've been unable to make it work. I've got a real JTAG adapter on order, but it's not going to turn up soon enough. So we're going to have to do things the clever way and I hate being clever. Right, so we are going to have to painstakingly go instruction by instruction. We need to find a way to get feedback that what we are doing is working. And then we're going to have to make very small changes so that we can verify it's doing the right thing. Now the, so doexec here is a routine which calls into this little bit of machine code. And all this does is it jumps into, well, it sets up things like the stack pointer for the process and it jumps to the processor's entry point. It is being called as a ordinary subroutine, but it's not one. It will never return. So let's just put this in and let's go to low level and put a ret in. So this should, why doesn't it like that? Array subscript 4 is outside array bounds. Okay, what's the, what's the type of code base? It's a Uadger which should be 1832. I am not actually sure what that's trying to tell me. It's one of the new GCC warnings about dubious code, but let's just do that to be on the safe side. Okay, so that is passing the way I thought. Right, I know what's going on. And it is spurious, unfortunately. So these symbols are, they refer to things in memory. They are described as Uint32s, but they're not. They're actually arrays of data. So it's taking the address of one of these. And the compiler is noticing that I'm taking the address of a four byte object. So which is effectively a one byte array of Uint32s. And I'm adding a value onto that. So it thinks that's bad. So let's do this instead. I think that will work. Yeah. So now I told the compiler that each of these things is pointing at an unbounded array of bytes. Right, okay. Where was I right? I had just put a ret in on low level. Yeah. So it's built. And let's run it. And we should get a message. Okay. So it's called into doexec and is then returning. So what we're going to do next is create a binary that the only thing it does is return. And then try that and see whether it works. And that is, we get the same result. However, there is actually one thing I should do which is just to make sure that the address being jumped to is correct. Because you know it might not be. And we wait. Right. 1003C. And we're loading in it. And we've got the L version here. 1003C is wrong. We should be going to start. Okay. Well, that explains one thing. 3C, where's the entry point? I do not see a 3C in there. I know the entry points down here somewhere. But I'm looking at the L file. That's why it looks weird. Okay, that's better. 3C here is the address of start. So let's take a look at the library. So we're actually looking for this. Yeah. Because of the way we arrange things, this will appear first in the, it will appear immediately after the header. So that will be this. At 1, 4. No. 1, 0. Sorry, yeah. The signal handler entry point is first, which is a four-byte address. Then there is the jump instruction at 1, 4. And I can verify that by disassembling the binary. And we see that start is here at 1, 4. 0, 6, 8, 9, 0, 0. And that then jumps to 2, 3, C, which is where the entry point is. So the fact that we're getting that value means that something horrifyingly broken has gone wrong, which is good because horrifyingly broken things are really easy to fix. So that is code place plus header a entry looks right, actually. Let's take a look at that link script. So entry minus origin code, which is this. There an entry is defined in the header block. So that seems wrong. Well, we have the hex of the header here in convenient reverse order. And there is our 3C, which is there. So we've got magic number, four bytes, text size, data size, BSS size, offset entry point in bytes, three zeros. Okay. Let's look at the exec format. 16 byte magic number, CPU, CPU feature, base address page, hints, text size, data size, BSS size, all 16 bit values. Right. So this means that something here is going on. So if I change this, now I put text start up the top. If I do this, does it anything change? And of course, now I need to build the binaries. And I still see a 3C. Let's try this because then I can look at the symbol table. And that should have exported. Okay. That is very much not what I was expecting. Hmm. See, that's an address. And what I was expecting is a small integer, which suggests that this subtraction hasn't done what I wanted, that it appears not to have subtracted anything. Let's try that. This compiled last time. It doesn't like this. Apparently a single O is not a valid symbol name. Okay. What's origin set to? All right. That looks right. Can we do this? Hmm. What I think is happening is that it's running through this in multiple passes and the first pass through this value and therefore this value is zero. And that is causing the symbol to be set. That doesn't really make a whole lot of sense. Let me look for some documentation. Unfortunately, I can't look at the man page because there isn't one. I have to go find the actual documentation. Evaluate expressions lazily. This looks like it should work. If it's not going to work, then we should get the error. Less. Okay. Right. What's happening is, yeah. What's happening is that this is being set to the relative address of the section, which of course is zero because it's at the beginning. So it's subtracting zero. At some point it gets converted to be the right value. But I think what I need is this. No, it's still 3C. Wait a minute. This is complete rubbish. This is complete rubbish. This was right the first time. It is doing everything correctly. That is the wrong address. That should be 401.00014. Yes. Well, let's just move on, shall we? Okay. So now, right, now we need to do something else I was planning on doing. We've just rebuilt the binary and now that is, you know, the right address. So we need to update the file system. However, the one megabyte file system, well, kind of smaller, really, the one megabyte of flash we have to burn is pretty big and takes forever. So I am actually going to do some automation. So let's just make a script. So let's make a... I'm not sure how big we can make a file system. Stand alone, muck fs, mustn't image 512. Okay. I can make it 64K. Can I make it 32K? I can. Okay. Let's make it 32K. So the first thing is quit on error, make a fresh file system that makes it empty. Then we are going to copy our init into it. Then we are going to turn it into a... I'll do the FDL conversion. Okay. So we've got a very small flash file system. It's bigger than I said it was supposed to be. I think muck fs might be padding things. Okay. So I think we need to set the size here to be smaller. Now I'm just trying to remember what the parameters are. I size and F size. It looks like the smallest we can make them is 3 and 2. So let's try that. Apparently not. 3 and 3. Yep. 2 and 3. Okay. Is too big. They're too small. Just going to keep making this bigger until something works. Okay. Let's do this a different way. I don't actually understand what these numbers mean. Okay. So I size 16 works. I size 8 works. I size 4 works. How big is our image? Still quite big. 2 does not work. 3 works. 5, 6 works. 1, 2, 8 works. Right. Now we have our 32K file. Now this is 32K of logical blocks, which means that our FDL file is bigger than that. We also need to update this. What's this in kilobyte 44? So is this a round number? Yes it is. I should have been able to figure that out myself. All right. So we now have a single script that will update our file system. So we add this here. And we're going to update the burn command in the make file to say put it at here filesystem.ftl. So now when we run this, it fails to... Okay. So now it is writing... Yep. It wrote the kernel and then the file system. And it's failed to look fine in it. So it hasn't been made executable. So we need to modify our script and like so. Okay. Now we can iterate quickly, you see. There's a single command and 20 seconds of waiting. Okay. And scanning the flash is much quicker than it was because it's now so much smaller. Right. I wonder whether it's worth... Yes, it is worth taking the ret out into seeing what happens. In fact, in the wrong place. Yeah, it would be nice to automate this as well to be honest. We didn't need to clean the libc. But we do need to clean the applications. Refresh. And it hangs. All right. Now one thing that... Yep. Okay. So let's leave this as it was and we go into our CRT. You see that it's running at the right address now. And let's put a ret there. But we also need to comment this out. This updates the stack pointer because we're returning directly back to the kernel. We need to keep the kernel stack. So this is going to be kernel target equals sph668. And did I save both? Yes. So... And it does not return. That's interesting. So that should have gone from here to here. And then exited back to the kernel. Let's try that. I'll be very surprised if that made a difference. Very interesting. So this jxA2 should be jumping to the process. Which I believe we've verified, actually contains this code. But it's always worth doing that again. I will actually also... Let's take a quick look at the kernel disassembly to make sure it's doing what I think it is. Here's our icing loadA3 jump to A2. Here is where it's calling doexec. So this is loading header.aentry. Loading codebase, adding the two together into registerA2. And then it calls doexec. So yeah. And jump to A2. That looks sensible. So let's double check to make sure that the code actually contains what we thought it did. So that will show the address and the word at that address. Which is that. And we're expecting the first instruction to be a ret. So what does a ret look like? Not that. Okay, this suggests that we've loaded the code at the wrong place. So 8906. What is an 8906? That's a jump. I did rebuild the library. And I rebuilt the binaries. So 1, 4 is 0, 6, 8, 9. That suggests that it didn't relink the applications. Applications, utils, init, right. There is no init bin. Okay, that is extremely interesting. And it suggests that something is stale. Like here I can see the linker line. And here it is linking against the CRT. So let's just take a look in that. And see what it says. That's got a ret in it. I'm not sure where the jump entry went. It's assembler. It shouldn't optimize it out. It's using the wrong version. No, it's not. No, that was rubbish. So yeah, that was the version that it just assembled. What's going on? That's got J entry at the top. What's on Earth? And I'm doing the same thing again. I am looking here. I'm looking at entry, which is this one, rather than start, which appears in any different section, which is here. Right, let's put our ret in. Right, good grief. So I'm expecting to see ODFO at the top of our binary, but I'm not. So I can see that vial was the thing that just got built. So let's take a look at it. And this has our ret at the top of it. So why does vial have it? And in it does not. Because in it is linked against the other version of the libc's front time. It's the no studio version. Yeah, you always want to put your source code in one place and not two, or this happens. So what's actually different between the two? Because we didn't do a few bug fixes. Well, we've got our ret in both places. We've got the initialized.io, and then we've got this where we load the environment pointer. Okay, we're now looking at the right files. Let's put our ret in there. Let's do our big complicated build. Let's look at our init, which has got our ODFO there followed by the jump. Did that jump look wrong place? No, that is the right place. That does look a bit weird. So what does init look like? Yeah, apparently that is right. So burn and see what happens. Right, we have managed to successfully run some code from our binary. So let's move our ret up to here. I'm going to do this in baby steps. Okay, can we wipe the pss? So the next stage is to jump to the main subroutine. Okay, well, let's go for broke and see what happens. So this ought to run code until it hits a system call handler and throw an exception, which hopefully the ROM will dump, but it's not. Because of the hack we did, this is all happening on the kernel stack, which is fine. The kernel stack is quite big. So are we going to verify what's happening next? Well, the other thing I can do is to try and turn an LED on and off on the top of the module. But I can see that although it's a bit of a pain, but that won't show up on camera. So let's do something else. Let's not do that. Let's do this. So this is the code that is produced for every system call. So what we're going to do is rather than call the system call, because that is going to load the system call number into A2, we are going to... I'm trying to... What does add look like? Searching big files in Chrome takes forever. So it has not actually doing anything because there are so many matches. Oh, that's useful. We actually want an add eye. So we're going to add 65 for capital letters. We are then going to call ETS put C to write a... That one. We're going to call G. Can I do that? No, I can't do that. I have to do this. And then here... Oh, hang on. We need to call XOA3 to actually call the routine and then spin. That'll just stop execution dead. So that's rebuilt all the system calls. Update everything. So what we should see is if it works, is a single character being emitted to indicate the system call is happening. And there we have it. Good. This means that our binary is working all the way up to the system call handler. So now we need to do the system call handler, which involves figuring out how the ROMs work. So the boot ROM manages all the interrupt vectors in the VEC base register, which actually looks like we can change. Yeah, so the ROM vectors are at that address, which is right at the very beginning. So we've got the debug exception vector, NMI, kernel, user. Is user the system call exception? So this is allocating quite a lot of stack. So addMI, I believe, is add immediate by shift by 8. Yes, I think that the disassembler is decoding this for us. So that is... Yeah, it is. That is subtracting 256 bytes from the stack pointer, presumably to give an exception frame. This is loading whatever's at this address, which will be the handler. So this is saving A2 and A3 on the exception frame. Reading the exception cause, adding it on to... Ah, right. This is for a whole bunch of different types of exceptions. This is... Let's go look that up, actually. This is the actual type of exception. Should be a table somewhere. There should be a simple table somewhere. Let's try starting from here. Okay, well, this is kind of annoying. Let's take another look at the source code. The way exceptions work is... Certain registers are saved in special registers, which you get out using the RSR instruction. I think A2 gets the old program counter. Let's actually find the... This is calling instruction. Yes, this goes through the general... Yeah, user exception vector. X cause is the type of exception, so this is going to be one. So what this is clearly doing is... This is the table of user exceptions. It reads the type of exception, fetches an entry from the table, stores a four into the table on the stack, and then jumps to it. So let's find this. Come on. Well, there is a table here somewhere. I'm not looking for debug. I'm looking for syscall. All right. Honestly, this is not looking terribly promising. I think we might need to set our own exception vector rather than try to use the built-in ones. Is this defined anywhere? Yeah, this is where the exception tables initialize. I actually remember seeing a reference to this somewhere. This disassembly has been annotated somewhat. I think we are going to have to change the... Yeah, this is going to be easy enough to replace. There was a... Ah, here we go. So we could... So I don't know what a vector number is, but we may be able to call this routine and reuse the ROMs vector table. So if A2 is the vector number... Okay. Fail if it's not in range. Set up lots of pointers. Multiply by 4. So A5 is now multiplied by 4. A11 is a table in ROM memory, in the ROMs workspace. So this is modifiable. So where is this being referenced from? Apparently only here. C100. What was that other... Well, this is not the same table. Okay, I think the simplest thing to do is going to write our own vector table. All right. So we don't want that in boot.es. We want that in tricks.s. So each one is... You get 16 bytes of code, and in fact, a lot of this is junk. So the debug exception vector is not used. The NMI exception table is not used. The kernel exception vector is not used. It just fails to the debugger. And here is the user exception to a vector table. Now, I am wondering whether we can put this in flash or whether it wants to be in actual RAM. So VEC base. We will... We have a relocatable vector option. We obviously have this because it's being used. Static group and dynamic group. Okay. What this is saying is that some vectors are hard-coded into the processor and won't move. These will be stuff like the startup vector, while others can be changed. So where are the... So some of these will still go presumably to the ROM. It would be nice if it told us what they were. Here we go, 465. The window overflow stuff is not being used. Ah, here we go. So we don't care about the static vectors. We just want to put a vector table somewhere. We're just going to go here. What I'm looking for is alignment. Freely writable. The offset's from the base. Order 466. Which registers are involved. This documentation could be... And this is useful to know. This documentation could be kind of better. It would be nice if everything was in one place. Alright. So this is never going to be used. We could save 16 bytes of RAM by simply not having this one. But that makes life a little bit more complicated, so let's not. So run start vector. Does nothing. We want to fill work. No space. Fantastic. Yeah, I'm trying to figure out what the assembler syntax is for something that advances the... That adds blank bytes. Be that. Okay, it's that. So that adds 16 bytes. The next one is debug exception vector. Wait I to jump to debug exception vector. Because of course we've just added some stuff to it. You know what? This is much easier. NMI. Which is a RFI 3, whatever that does. Let's be returned from interrupt. Kernel exception vector is break 1, 0. Jump to kernel exception vector. User exception vector. That's the one we want. Which for now is going to be of a 2.65 pool 0 ETS put C. Jump to here. Double exception vector. Reset vector is a... Yeah, that's one of the static ones. We don't have to do that one. Okay, so we have our vectors. So in... As part of our startup we are going to write to special register VEC base. Trying to remember how this works. Or we can put it in here. We got rid of boot.s. Because we're now doing this in C. Right, it is going to have to be in line assembly. Okay, and I am going to do this here. No I'm not. Yes I am. That's just to go after... After we turn on the flash and clear the BSS. We're going to put that here. Set up the vector table. So write to VEC base. Register 0 is a general purpose register table. Think that's right. Output constraint. Okay, this is GCC's in line assembly thing. You give it a constant string containing your in line assembly. And then you give it input constraints and output constraints. And it will attempt to marshal values in and out of registers. Weirdly, the first block, the ones that go here are the output constraints. So these are input constraints. So let's see what this does. Boot.c. Too many arguments. That one's new to me. We did use these before in... We did use in line assembly before here. So you can see that this is... Here is an output register. It seems to want A for an address register. And here is input. So let's try that. Okay that worked. Has it produced the right code? So here's our main clear BSS. Set up the vector table. Get the address of the vector table. Ha! I forgot the ampersand. So what that's done is it's read the first word of the vector table. Which is wrong. Okay, that's what we want. We load the address of the vector table into A2. And we do WSR VEC base A2. Okay. Now let's see what this does. I don't think we updated the system call handler, actually. I did. But I don't think I rebuilt anything up here. So what I'm hoping to see is a single character A, I think it was. But we don't. So there's a couple of reasons which this could be caused by. One of which is that this is not set up correctly at all. One of which is it's actually got here but nothing is happening. One could be that the vector table is not doing anything. It could also be calling one of these other exceptions. I'm going to do 7. And I should also look at the disassembly of this to make sure it is actually... Oh, that's nice. We've got something. D, C. So that's A, B, C, D, double exception. And then it produced a debug exception because it hit break. Interesting. So this happened after we called exec. It did not hit the user exception vector. Let's go back to our system call stuff and press undo and get all this lot back again. Change that to a lowercase letter. And I'll take this out. So now it will print a character and then hit the system call. So this will tell us whether it's got as far as actually trying to do the system call or whether it's failing before then. We've got something. Well, which is good. I mean, it looks like it's hit the system call. Okay, so that's a lowercase D, which is a 100 in decimal. So system call 35, which should be... I'm not quite sure what that's going to be. Zero, 35 down. Signal is trying to set up signal handlers. Well, we know that it is calling the system call. This is good. But then it appears to be hitting the double exception vector. This is bad. It should be going through user. However, I did see a thing in the documentation. Maybe it needs to be set up that we can see system calls on page 597. I mean, we could avoid this entirely by just jumping into the kernel with a jump instruction. The system is guaranteed. No general post registers other than A2 will be modified, but we are the only people using system calls. So I'm expecting more in there. I'm redirecting execution to an exception vector. I wonder if this is set. So ps.exem is... It'll be a bit that says it's in the exception context. If an exception happens when handling an exception, you typically don't want to jump to an exception handler. So that's what this is for. Exception mode. Well, we can... Why would we be in exception mode? Unless the ROM has left us there. I don't think there's an instruction. So it looks like the ROM does set exception mode. So let's look for... This is setting the processor status register, which has got all these useful bits in it to 32. So let's look for other uses. Restore int level. That's the same as our IRQ restore. I wonder if I should put an rsync in. Now I copied that code from the Arduino stuff, so I'm going to assume it's right. Interrupt mask stuff that we don't care about. Turn from exception. Well, it's setting only that bit. So unless WSR does something weird, here is what ps looks like. So interrupt level. This is for if you want multiple types of interrupt, some of which can interrupt others. You prioritize them using this. Exception mask. Exception mode. UM. This is for running operating systems. Yeah, this is for operating systems with a memory manager, where if an exception occurs in user space, then you're calling into the kernel, but the kernel may live in a different address space, so you need to change the stack. Don't want that. Don't want that. Don't want that. Okay, so let's just write zero to the process word. Notice in non-exception mode, and let's see what we get. Hopefully something different. DBC. Right. So D. The lowercase D is coming from the system call handler. The B is 66. It's a kernel exception vector. Yes, because I haven't set... I haven't set UM, therefore it's zero, and it's going through the kernel exception vector. Vector. And then C is 67, debug exception, because it returns from the put C and hits that breakpoint. Good. We are actually getting somewhere. So I am going to set user mode simply because it's more appropriate for what we're doing. We could use either for this. Interesting. Wait a minute. Wait a minute. Exception mode is 16, not 32. User mode is 32. So the ROM was leaving... I think that comment is wrong, or unless I misread it. So 0123. This is the bottom 16 bits. So the values are 1, 2, 4, 8. This one is EXEM 16. This one 32 is user mode. So the ROM was setting it not to exception mode, but to user mode, which is just what we wanted. So I bet there's too much code here. Yeah. I knew I should have looked at the disassembly. So platform vector table isn't 0. 1, 0 is the debug exception table. 2, 0 the NMI. 3, 0 the kernel. 4, 0 the user. No, it fits. Okay. The kernel exception vector block is actually 32 bytes long. So it's user exception. Okay. I was not expecting that. Yeah. Okay. So this actually wants... Oh dear. I'm going to have to go back to fill. So we want to fill byte up to kernel exception vector plus 32 minus here. Spacesnops or fill specifies non-absolute value. Can I say that's 32? That looked like it worked. I don't think it worked. Yeah, that actually set a value called dollars. So let's try this instead. I'm disassembling the object file because it gives us easier to understand numbers. So NMI, kernel, 8, 9, A. Yeah, this works. I mean it's terrible, but now we don't need to kind of line this to 4 and save it to space. Okay, this table was I wanting... I was wanting the one with the RFE in it. That was a dynamic vector. This was table. Okay, I was hoping to see a offset from WSR, but there isn't one. That's a pain. Okay, so advanced debug exception... NMI exception vector is 16 bytes wide. Kernel exception vector is 32 bytes wide. User exception vector is 32 bytes wide. Double exception vector doesn't need anything after it. Okay, so now let's see if we get anything else. We have set this to 32, so that should enable user mode. That jumped to entirely the wrong place. I bet that platform switching here is here. So why did it get there? We are actually looking at the vector table. Okay, vector table 3F0. This is the reset vector, which is unused. 400. 410. The NMI is 1-0, so this is 16-wide. Kernel exception vector is 2-0. Just 2-0-wide. User exception vector is here, which is 4-0-wide. Double exception vector is here. So I'm curious. Okay, it looks like... So A is correct. It's now finally hit the user exception vector. It looks like it does need to be 16 bytes aligned. And because I'm annoyed that ROM start vector is never used, we are going to do this, which will save some space. Only 16 bytes of space, but it's not working. So platform vector table ought to be at 3F0, 3F8. That has actually subtracted 8 rather than 16. I don't know why it would do that, but apparently it's not going to let me do that, so let's just leave it like this and make sure it works again. Okay, so we are finally hitting our user exception. I save that, rebuild the CRT, and we're actually going to implement this. So back to the documentation. So on entry, the old program counter is saved in the special register EPC. Another... This is a special register which is provided for use by the exception handler because it's kind of hard to do anything with no free registers. So this allows you to stash something in there to allow you to do stuff like changing stacks. So we do know that this is going to have to exit with an RFE, so that's not a lot of information, as always. I keep coming back to that. So you go, right, EPC contains the address of the system call instruction. System call handler should add 3 to EPC before returning from the exception. Okay, what we're going to do here is this is going to be other exceptions including our system call exception, and we want to do the bulk of the work in C. So what we're actually going to do is save the old stack pointer to our temporary special register. Do I want to disable interrupt while I'm filling the stack? No, I don't have to. Okay, we need to talk about stacks. In Fusix, each process has two stacks. There's the user stack and the kernel stack. The user stack lives in process memory. Yesterday, when I was doing all the stuff for copying arguments, we were working on the user stack. The kernel stack lives at the top of the uData block, and its job is to give the kernel some space to work in that isn't in user memory. Now, we could just have the kernel running on the user stack. That's perfectly fine, except that it makes certain things quite complicated. For example, here in our exec routine, we are writing to user memory and updating the user stack. So if we are running from this location, then this is going to go horribly, horribly wrong. So what's going to happen is that on entry to a system call handler, we're going to switch to the processor's kernel stack, which lives at the top of the uData block, which is uBlockSizeBikesLong, and do the work there. On entry to an exception, we're just going to discard the old kernel stack because we should only be getting these from that's not true. I was going to say we should only be getting these from the user mode, from the process. Right, this is what the exception mode, sorry, the user mode bit in the processor's status word is for. When we're in the kernel, we will set that bit to make sure that we don't switch stacks. We always do stuff on the current stack. And when in user mode, we set the bit to make sure that we do change stacks. So what we're going to have to do is save the... This is a bit complicated, because we have to be very careful changing the stack pointer because the stack pointer will be used by interrupts. So if we just put junk in there and an interrupt happens while junk is there, very bad things happen. Okay, so we're going to have to turn interrupts off for this. This is not what the code here is doing because this code is not changing stacks. It's always doing everything on the kernel stack, even though it's coming from the user exception vector. So how do we turn interrupts off again? Our serial 15. We are now safe to fiddle with the stack pointer. We wish to switch to... Save the user stack pointer. I'm hoping this will fit in 32 bytes. We're using SP as a scratch register. Right, special register. Yes, SP set kernel mode. This means that any further exceptions, including interrupt exceptions, will go through this code rather than this one. And at the same time, we need to go down here to low level and because when we enter a process, we need to set user mode, but we can't do this with interrupts on because once we set user mode, if an interrupt arrives, it will try to switch stack to the kernel and bad things happen. So now we switch to the user stack, switch interrupts back on again, so this all happens atomically, and then jump to the entry point. And given that the Arduino code was doing this, let's just put the icing there to keep it happy. Okay, this is one of the two places where execution enters the user mode. The other place is switch in. All right, back to here. So interrupts off, switch to kernel mode, set the kernel stack pointer, interrupt back on again. We now need to save registers. We want to save all of them, believe so. So there are 16 registers plus some bits. We also need to save the program counter and the old stack pointer. So that is 18 4-byte slots, which is 72 bytes. I think we have enough space. So we are actually going to allocate 80 bytes for our stack frame, and then we save... This is going to be bigger than 32 bytes, so let's just do user exception handler. So save A0 to this slot. We want now to save the user stack pointer, which we put into exe save, but that is now... Now we've saved A0. We don't need it anymore, so we need to read it into A0 and save that. And then we just go through the others. Three, four, five, six... Actually, thinking about it, I don't think we need that many. So I think we only need to save up to A7 because A8 and above will be automatically saved by the C code. Let's just look at some C code. Raw flash read is complicated. Yeah, so here it is allocating stack frame, 32 bytes, and here it's saving some registers. The ABI allows you to use A0 to A7 without saving them, but A8 and above... Here we go. A8 and above are... No, A12 and above are called saved. Okay, we're going to have to save all the way up to A11. Who's using A11? This routine is using A11. Is it saving it? This is the MBR parser, which is quite big, and it is not. Okay, we need to save all the way up to A11. So three, four, five... Keep doing that. Six, six, seven... Oh, I've got capsock on, so Vim's working very oddly. Seven, eight, nine, seven, eight, nine, ten, eleven. Okay. Now we want to save the old program counter. So that is in EPC. So we actually want this many 32-bit slot. Don't think we need to save anything else. Okay, so now we load the exception cores into A2. We put the stack pointer, which is pointing at the 16-byte blocks. So, yeah. We load the stack pointer, which is pointing at the base of our exceptions, into A3, and we call the interrupt handler. This is written in C. It will go away and do stuff. So on exit, we need to put everything back the way it was. So let me check the way RFE works. RFE returns from user exception vector sets. Oh, it does this for us automatically. Excellent. Okay, that simplifies things. I don't need this. No, I do need that. It changes exem automatically. It jumps to the addressing EPC, but it doesn't put anything else back. Okay, so to exit from this, we need to load the old program counter and put it into EPC so RFE can return from it. And then we are just going to do all this in reverse. Okay, so the old user AW goes into the exe save scratch register. So we now need to do the reverse of this block. So interrupts are off. We don't need to save the kernel stack pointer, but we do need to... Hang on. Have I got that right? Save user stack pointer. Load user R0. Yep, we actually do it like that. So we're using the A0 as a temporary register. This sets the EPC special register to the program counter. This sets the exe save special register to the old stack pointer. That's a one. Then we load all the other registers. So interrupts off, set user mode, reset to the user stack pointer, interrupts on and return. Okay, so there's quite a lot of this, but a lot of it's common and in fact our kernel exception vector is going to reuse a lot of this code. But let's try this from now. We do need to define interrupt handler, which takes the exception cause and an array of registers. So hopefully let's just put this in for now and let's build that and see what happens. Okay, what's wrong with this? Bad register name. Blast er. Right, Arsil wants to set the interrupt state and return the old value. So we're going to have to do this using sp as a scratch value. So we have actually done this without the... We've done this with interrupts on. Arsil reads the ps special register. Oh, it is in the ps special register. Oh, okay, that's going to change things a bit. Because when we write to ps, then that sets the interrupt level. Yes, I've forgotten that. Okay, this is going to change things a bit. Now in order to put a value into ps, we do have to put it into a register first. Yeah, the thing I'm worried about here is that if a interrupt happens between the exe save and the arsil, then between the wsr and the arsil, then exe save will be corrupted. Oof. Are we going to have to actually store something in the user stack? Okay, so these appear to be the only interrupt instructions. This also means that when we're writing to ps, we took that out there. Okay, good. But in low level here. So, okay, so switch to user stack. Yeah, I think it's going to have to be like that. So this turns interrupts off but does nothing else. We update the stack pointer, which has, because it's two instructions, then we have to do this atomically. Which I wonder if there's any special features for atomic operations. Oh, this is windowed registers. All right, this is windowed registers which we don't have. Okay, yeah. All right, let's go back to our exception handler. So we are going to have to, we have to turn interrupts off before we touch exe save. I am actually slightly wondering if interrupts do go through. So where is RFI used? High priority interrupts. There's a reference to XSR. Where are the interrupt vectors? Okay, I'll admit this is somewhat confusing. So list of vectors. Here are our vectors. So these apply to exceptions that get routed through the two exception vectors. These all look like actual exceptions. The difference between exceptions and interrupts is that typically inside an exception you don't want to turn interrupts off. That would be nice. Yeah, I think interrupts go somewhere else. So we don't need to worry about them for now. This means, this does actually simplify a few things. We do still need to worry about interrupts when switching. We do not need to worry about interrupts when switching stacks because interrupts always happen on the current stack. They never change stacks unless you're using preemption but that's kind of a different matter which we're not doing here. So this gets simplified to this. Right, back to here. We know that as long as the interrupt handlers do not use exe save we don't need to worry about race conditions here. So I'm actually wondering whether there's a better way to do this. Let's try something else. Let's put a0 into exe save which allows us to immediately load the address where we're saving our registers into a0. This then immediately allows us to save all the registers. This then frees up loads of registers so we can do read the saved a0 into a2 save a2 into and because a0 is pointing at where we want the stack pointer to be we can just switch to the kernel stack like this. So we haven't saved PC. We can do that here. So now using a2 for all those things. Right, so now we call the C stuff. We get the exception calls into a2. We get the address of the save registers into a3 and we call this the interrupt handler. So in reverse we are going to load the user program counter into PC load the user a0 into a0. A0 becomes the pointer to all our registers. We now load everything including the user stack pointer. Okay, so the only thing we've got left to do with at this point is a0 which is in a0. Okay, that actually looks better. Now does it unknown opcode or format name? Right, this is because I think we need to use this syntax. Yeah, this way around unknown register a0 for wsr instruction. Now wait a minute, we were using this in other places. So why doesn't it like that? Unknown opcode or format name, actual register looks okay to me and I mean we're doing it here. Have I got the right, yeah, we need some special registers, blah blah blah blah. It's an assembler macro which gives the name or the number. So 94, it doesn't like this. So can we do wsr. Invalid register EPC. I think I know what's going on which is that the assembler does not know about these particular special registers. So here is a big table EPC1. Okay, it does like that. And exe safe one. Can I do this? Yeah, okay, it likes that. So one, one, not that one. Okay, good. So in our main.c, that should be that. And let's see what it does. Hopefully print a queue. If we're very lucky, we're not lucky. It does not print a queue. That's okay. This can be debugged. So a0p, where did I put the helper routines? Oh, I linked to them directly from here. Okay, good. Because that means I can do, well, I can do this. So this should print a p. We have this bit working. So no p. And it's not printing anything else either. Okay, this suggests that I broke something in low level. It's never actually calling the user code. So let's just try commenting bits out and seeing what happens. One. Okay, we've got this far. Can we get this far? Okay, so I'm going to hazard to guess that something has gone wrong here. No. We commented this out. And I will actually just speed things up a bit. We don't need to reflash the file system every time doing this testing. Interesting. This bit stopped working. Okay, so things we did have that working. I think it was this version. No, it wasn't. It was, I don't think I ever tried to switch stacks. Right. I never tried to switch stacks. So that's not working. So potential reasons are it's entirely the wrong place. So let's print it. And you see that did actually hit our handler. It looks right to me to be honest. That is now an exception handler. I don't want that anymore. Let's see what this does. Right. It prints a queue. That's good because it's got all the way through to the C code here. So we should be able to now use things like printf to say exception cause was this. Program counter was 13. I believe it was 12. Hmm. He thinks that is not the right number. Also, this looks somewhat that zero looks somewhat suspect. That should be one for a. Oh, yeah, let's try that. A one should be the system call handler. I believe. Interesting. Different. Not right, but interesting. Okay. Anyway, our exception handler is at least getting exceptions. So why is our why is this not working? Okay. The obvious the obvious reason why it's not working is because this does not correspond to the stack pointer address. And in fact, I can I can make lots of things better. We own do exec and we own the system call handler. So so we can actually just do that. We pre compute what the stack pointer is. We don't have to fiddle with interrupts. And in our, sorry, our exact thing, not a system call handler. We can just do this is all going to need cleaning up afterwards. 17 do exec. See this way, we don't have to rely on that kernel header being correct. Yeah. Okay. That worked. This means that the kernel header, this is all wrong. So let's just go through that. I mean, I said there's a better way to do this, but where is you data to find? Here it is. So this, this is, as we say in the trade Pollux. This is, this should be six. We are going to need this later. No, actually, I think we can do this and see. Okay, let me ignore this by just getting rid of all this stuff. And we'll deal with that when the time comes. Okay. Cause one, that's correct. That is the system call handler. Our program counter should be at 12 times four. We can actually dump them. So that wasn't what I expected. What else did I change? Just that very suspicious and makes me wonder whether we are hitting some kind of timer issue, running out of stack or hitting a double exception because I got this bit wrong. I would expect to see, you know, a D being printed. Okay. No, not okay. Yeah. It's definitely not a fan of that for loop. Okay. As I think we might be running out of stack, let's just get that rid of that. That means that the exception handle will run on the user stack. So if you see a change in behavior, we don't see a change in behavior. So why would this work? Why would it work without that, but not with? Is it because it's taking more time? And with this version, it's still locking up, but it's doing it after it's printed this. I mean, K printf doesn't do any buffering. So K printf is defined in .io. It is a pretty complex function. Oh, he's gone cold. Okay. What can we do about this? Well, I kind of need more information about what's going wrong. We have our traces in all these places. It could be running out of stack, but I doubt it. If it was running out of stack because of K printf, I would expect it to see it always print one thing. I keep forgetting that that doesn't work, but not the next thing. But it appears it only likes to have a single K printf and anything else doesn't work at all. Okay, so that is a kernel stack pointer. Okay, looking at our memory map. E8000 to F8000 belongs to the user process. F8000 to FC000 belongs to the kernel. FC000 to four lot of O's belongs to the ROM. And the boot stack is at the top of that. So this is a user stack pointer, which is... No, that's actually all correct. This is a user stack pointer in user process memory. This is in kernel memory. It's the Udata block, the top of the Udata block. But that's at AD10. So our Udata is at AB70. So AD10, there's 416 bytes free in the Udata block. And we're not using anything like that. So I do not believe that we're running out of stack. And besides changing the exception handler to use the user stack didn't make a difference. So let's take a look at the disassembly. So here is the user exception handler. Here it's calling into the C code. Here is the C code. I see it... Now, this one worked. So I see it allocating a stack frame, doing stuff, calling printf hold. All right. Let's do that. Let's see if it fails. It's failed. Disassemble. Allocate a stack frame, save some stuff. 32 bytes stack frame, call kprintf here. Do some more stuff. Call kprintf again, hold. This is not noticeably different other than the size of the stack frame. This is saving a zero onto the stack frame. Why is it allocating a 32 bytes stack frame for this tiny little program? Tiny little function. It's storing two values, zero. Yeah, that's very odd. Okay, let's back up. See, I know this one failed, so... And now it's still using a 32 byte stack frame. No, I do not understand this at all. See, I was expecting this to be done. Let's put some other code in and see what happens. So we are eventually going to want this. The system call handler is normally in signal handler interrupt handler. The system call handler needs to take values out of the registers, stick them in the uData block, and then call through to the kernel's dispatcher. So where is it? It will be in tricks.s. Unix syscall entry. So this is going to be... This was in A2, 3, 5, 6. It will be used. Yeah, I've been having this for a while and I'm now getting quite tired. Okay. Enable interrupt, so we're already on. Do the... This is some MSP430 stuff for doing overlays, which we're not using. Otherwise, call system call uinsys equals 0, and it will now return. It is... Why are we storing the stack pointer in the insist call sp? Let's ignore the signal stuff for time being, and we wish to... The return value goes into R2, and the error value goes into R3. And we won't let it return, we're just going to let it hold. Okay, what code is this produced? So check the cause. Let me copy stuff around. So let's see what it does. Probably just hang, to be honest. Yeah. So now let's put a... Put that in. Mm-hmm. Mm-hmm. Well, it handled the system call. It hasn't returned, but it handled the system call. Let's just do this and see what happens. Honestly, I'm beginning to think, okay, that was not a system call. So in real life, what this code would do would be to translate certain exceptions into signals. This would allow the system to continue running if a user process did something wrong. Though as this system has no memory protection, then I would kind of expect that not to work very well. The most useful thing in null pointer exceptions, and I don't know if zero is... No, zero is not mapped on this system, because I saw that before. Okay. That is very, very interesting. It really doesn't like those K-printfs. So these are no longer even in the code path for the first system call. That looks like complete gibberish to me, to be honest. They do not look like sensible register values. So I wonder if... I thought those two parameters should be the same, but let's just be boring. No difference. I'll see if I can catch it at the beginning of the dump. Too slow. Okay. Scroll up and see what we can see. So did the system call? Okay, all these values are the same. Have we... Idiot me. See at this point the stack pointer is now the kernel stack. No, that's right. The kernel stack, which is at A0. And A0 is pointing at our structure. So here's the exception handler. It's moved A2 to A4. Here it's calling put C. Here we are reading from the regs parameter, which should be in A3. Did it pass it in in A3? Is that the wrong register? Oh no. Copy it to A13. Here we are reading values from A13. And here is our exception handler code. We put the calls in A2 and pointer to the registers in A3. We did verify that the stack pointer is more or less correct, but honestly this junk does... It looks like code. Let's leave us on the user stack and see if that makes a difference. Okay, that's good. What that's done is it's handled one system called has tried to return and has locked up. And I know why. Potentially. For one thing, I remember that the system call stuff said that when handling a system call, you have to advance the program counter. But I would expect that to keep hitting the same system call over and over again, to be honest. So what's it going to do? Still hang. So I'm going to guess that something in this code is wrong. So load the program counter. Actually, in this exception handler, you don't need this. This exception handler will only ever return from system calls. So we can just do... Write it to the program counter. Save A0. A0 becomes pointer to register file. Oh! Okay, that's not going to work. Let's try that and see what happens. Okay, well, no luck. So anyway, SP becomes pointer to the register file. Load A1, 2, 3, 4, 5, Blar to 11. Restore A0. So we've restored all registers from 0 to 11. Return from the exception. Now with this... Since we're staying on the user stack, then this is actually going to back the stack up, which means that if an interrupt occurs during this piece of code, then it'll overwrite the currently running... Well, it'll overwrite all the registers being restored. Let's just try that. These look more like registers, to be honest. Put the halt back. So we've done one system call, but these registers are all completely bogus. I... Well, that's wrong. I have a feeling that we're using the wrong program counter registers, resulting in... Resulting in us trying to return to the wrong place in that RFE instruction. Anyway, user exception vector fetching exe cores. So I think it was this block here that contained our vectors. So dc44... Among... Incremental search means that first it tried to search for all the fours. Yeah, there we go. Unhandled exception. Least extos user... No, I think it is this one. So this interrupt handler is using PC1 and is turning interrupts off. So what this is doing is loading a constant here, which is 33, which is interrupt level 1 and user mode. Possibly exception mode. And then this is swapping the contents of PS with that. And then it's doing stuff. On exit for ec... Yeah, that's a loop. This odd is it loading and saving the user exception handler. Has it actually assembled into the right thing? Well, I can't actually read these, but it all looks... That's not right. No, no, no, no. That's wrong. Very wrong. Very, very wrong. And that was the program counter, which would mean that that's better. That's much better. Right. We are handling system calls. Here's one. Here's the next. Here it actually tried to do something. It's tried to read data. I don't know what that was, but let's just... Actually, if I go look for Unix syscall and we can actually turn on built-in debugging and let's see what happens. I mean, if it doesn't like these K-printfs, it may not like these. You know it doesn't. Yeah, it's something about K-printf. Yeah, very peculiar. It seems to be able to get away with small ones, so let's just do data.ucallnow and try that. No. But we are going to want some tracing, so yeah, this is as horrible as you can possibly get, but it is at least cheap. K-printh.ucallnow and nothing. Keep that. This was the version that sort of worked. Lost the... I lost my print routine. So it is successfully returning from at least one system call, but if I put more tracing in, it stopped working. Oh, that's interesting. That behaved differently when I removed that line of tracing. Something's being corrupted somewhere or it's using up the... Or it's running out of stack. I mean, if the C-compiler is generating a 32-byte stack frame for something as trivial as that original K-printf, then it's going to use a lot of stack. Unfortunately, the way this platform works and the stack point has to be 16-byte aligned, then this means that every function call will allocate 16 bytes of stack. But I don't think that's the problem. I wonder if it's corrupting registers? So I did... I did look that registers 12 to 15 are not being saved, but this routine is... Let's just put that in as 12, 13, 14, 15. 15. So this now needs to be 16. And this needs to be 20. Okay. And this now needs to be... No, that's fine. That needs to be increased. It's different. Maybe this is right. Is this going to behave? No. Great. Okay. I really don't understand why this is so unhappy. The way it seems to be doing things more or less at random as I move the code does smell like code corruption. I do not think the exception handler needs to live in instruction RAM. At least I really hope it doesn't have to live in instruction RAM. I wouldn't expect it to work at all if it needed to live in instruction RAM because the first time we actually call it, it hasn't been loaded into the cache. So I would expect that to just outright fail. I don't have a good handle on when the flash executing place stuff works and when it doesn't. I mean, we know that the vector table is fine. It might be that this routine is too big. I mean, that's another failing to run out of RAM thing. That's peculiar. Very, very peculiar. I think... Let's just comment that out one more time. And just see what it does. Okay, that has not... That has hung. So it hasn't... Whatever's gone wrong, it's not related to which stack we're on. This routine is setting interrupts to mostly on. I use a kernel mode and it's on. No. You see, I didn't think this would... I didn't think it would need to fiddle with the interrupt settings. So our real exception handler... Our C exception handler is here. Oh, 48 bytes. What is it doing with all that stack? Okay, some ABI's require a certain amount of empty stack for things like exception handlers to use. But I don't think that's the case because we've got this sort of thing. So this is storing data at 12. So that leaves eight words of stack unused. I wonder if the C compilers... This is all windowing stuff. I wonder if I need to tell the C compiler about our ABI. But all the code's been working so far. I mean, we've got quite a lot of code that seems to be running fine. Windowing... I'm just trying to find the ABI documentation again. It's not much for ABI. Here we've got call not register usage. Stack pointer call you save, function argument, static change not using, call you saved, frame pointer optional. Stack frame layout is the same as for the windowed register ABI. Yeah, I wonder if... I still think that we might be running out of stack. So as it's late, it's like it's 10 to midnight here. And this has become a really long video. I am going to call this a day. And there's going to be a chance to do some reading up and get back to it at another time. I was hoping to get the system call handler actually working. But we have seen it handle system calls. It'd just be nice if it did it a bit more reliably. So it's mostly there. We still need to do a bit of fiddling with stuff like... Setting the mode bits in the program status register, which actually can happen. So I don't forget like so. There's going to be no difference here. So yeah, I am going to call this a day and see what comes up. Anyway, I hope you all enjoyed this video. Please let me know what you think in the comments.