 The story so far, I am porting Fuzix to the ESPE8266. After a couple of false starts, I have the kernel booting, a working file system on the internal flash using the Dara FTL library. The file system will mount. I can load and execute binaries from it, and in it is running up to the point where it's forking and trying to run a sub-process. Press the wrong button to manage to kill my PDF reader. Just put that back again. This is the extensor LX106 documentation. Maybe do control z and not control maxlash. So the forking process is working up to the point where it swaps out the current process so that the new child process can replace it. It's successfully, well I think it's successfully swapping the process out, but then it crashes immediately afterwards. So this is what it gets with the crashes. I will hit the reset button, which is there, and that's it swapping out the process rather slowly. And there it fails. So let's investigate this and see if we can figure out what's going on and how to fix it. And in the process, let's try and speed this up a bit. I'm not sure if we can. So where is the ESPA266 swapper? So what that's doing from the numbers or the debug tracing is it's swapping out the current process. It allocates a new block in the swap partition. It writes out the uData block, which describes the process. It writes out all the code, which is 64k. Then it writes out all the data, which is 31.5k. Now we could swap out only part of the process. We know how much of the kernel's data space is actually being used by the kernel's data space. Processor's data space is actually being used by the process. The way traditional unixes work, which this is one, is that in user memory, a process gets to use everything from the bottom up to the breakpoint, which is defined by this variable. Now the breakpoint can be modified by calling the Brooke system call, or sBrook system call, which adjusts this up or down. If you adjust it up, it allocates more memory from the kernel for the process and if you allocate down, it releases it. The way our platform works here, we have a fixed size 64k block, which is used by the current process. So what this does is it just defines a point in the memory space, and that's about it. But we do know that the process should only be using data below that. Now it would be easy enough to just write out everything from the bottom to the break address. The problem is the stack. The stack extends downwards from the top of the data space. So Brooke works up and the stack works down. And that needs to be stored as well. We know where the stack point is going to be because it's defined here. So we could just write sp to top, which is the top of data memory, except that it is possible for the process to move its stack, at which point the assumption that the stack extends from here to here is invalid. So I think we could get away with it, at least until something odd happens. Actually, we can be a bit cleverer about this. If sp is above break, then we know that we must be using a traditional stack. If it's below break, then the process has obviously moved its stack somewhere else, which means that we only have to write everything. We have to write everything because we don't know how much of the old stack the process is still using. Okay, this is doable. So if udata.usp is less than udata.ubreak, then using a custom stack architecture, we have to write out the entire space. So copy the entire data block into swap. Otherwise, we write out the bits that we think the system is actually using. So we are going to say the stack offset from the bottom of the data space is usp-database. So we want to copy a ubreak-database. So that copies the main data, and then we want to copy data. The thing is the swap-write routine writes a certain number of bytes at a particular block, and the stack pointer is very unlikely to be block aligned, so we're actually going to have to align that. There's an align macro. Yes, there is. So we have to align it to and the number of blocks. We then going to write the stack from, wait a minute, I've missed a parameter. So this is the length ubreak-database where ubreak is the top byte we want to write. Database is the base. The length of the stack is going to be u-data-u-top-minus. That is in bytes, isn't it? Yes, it is. That just calls the device driver to read or write an arbitrary number of bytes to or from the device. So it should be like this. Let's just stick some more tracings in and then see it work. It's not working. Too few arguments to swap-write. I've missed the comma one at the end, which does something mysterious, I don't understand. Okay, this writes the code out. The code will also only occupy a fraction of the panic-swoop-roor. Okay, this needs to be an aligned value. So we're going to use a line-up. That's going to be aligned. Yes, unfortunately the kernel doesn't store how much text space there is, so that's going to take a while. That's what it's doing there, and then it fails. So we would either have to add something somewhere, probably in the u-data block, to keep track of that, or something else. Let's just have a quick look around what is in here in terms of useful things. The top of memory is, of course, data memory. I'd rather not add something. This is the binary loader. This is the code that actually loads the binary of disks. This is the thing that knows how big the text area is. It does copy some stuff into the u-data block. This is just doing I.O. It doesn't seem to copy the actual exec header, which would be useful. This just tracks the stack pointer. This sets up the break address, which is just, you know, the top of data usage. Yeah, let's add something. This will make debugging easier, because actually, like loading stuff, we're so much faster loading stuff. Actually, doing test cycles where I do a build and run it, it will start up more quickly, and therefore it will be easier to iterate. So we're going to do text top. And we then want to add this to our configuration. Let's call it that, actually, systems. And we want to edit this. u-data.u-text top equals code base plus header a text. I believe that's right. Okay, and we want to clean because the dependency system isn't quite smart enough to know that changing that file requires rebuilding everything. And so code len then becomes a line up. u-data.u-text top minus code base on lock shift. And let's see what this does. So it has to rebuild everything and fails. Let's see. Okay, so no change here, which is kind of what we expected. Now, the issue with what we've done is we won't actually know whether it works or not. It's possible that we've actually made things worse, in that there are now more possible failure modes. We could be writing this stuff to the wrong place. Therefore, we won't know what we have to read out again. But you saw how much more quickly it did the fork. This will make a big difference to development and also to flash lifetime. Because this system is going to swap a lot. Okay, it printed swap out done and then crashed. Swap out in this context is happening in do fork, which is here. So this happens there. So possible things that could go wrong are everything from here down. So the next place that we're calling C is make proc. So let's edit make proc in add some tracing. It's in process.c. I don't remember seeing that bit of tracing. Let's just try that again, shall we? I hit the reset button. Get ready to pause it. Interesting. It's stop. That took a long time. I wonder if that was Dara doing a garbage collection. 130 to 133 was the piece of code that wrote the code segment. That could well have been a garbage collection. We'll see what it's like next time. Okay, so I see this line. Where does that come from? I do not see... This is characteristic of a double exception. The first exception... Oh, I also see no fatal exception message up here. So the exception handler appears to be crashing, which is why we're getting a different v address. However, we're getting the same code. I want to know where that comes from. 0x... sb%... That was an 0x there. So not... there. Okay, well, let's do... We also want to make sure that we're returning from the fork correctly. So where is do fork called from? syscordrock here. So let's turn global debugging on for this file and give it another go and see what happens when we start in it. Swap out. Yeah, that was quicker. We must have been seeing Dara do a garbage collect. Okay, we've returned from makeproc and then things went haywire. Now, we called makeproc here and at the end of this code, I know what's happened. I've forgotten to reload a0. Hey, look, I've forgotten to save a0. a0 is the link register. It contains the address to return to once the subroutine exits. So we want to do... s2i... a0... sb... Without that, this ret will actually return to the last value that the last address returned to, which is here. So it'll looped around this over and over again and each time it goes through, it adjusts the stack pointer. So the stack pointer will increase rapidly until it leaves the area of readable memory and then there'll be a crash. Right, let's try that and see what it does. In it, swap out, crash. Well, that did not help, but it's less wrong now. I'm not using swap seven anywhere. Okay, well, so the first thing that happens after returning from doofrock is that we print some stuff. Therefore, the crash must be happening somewhere in this code. The error was a... it was an exception 28, which is a address error. So where could that be going wrong? The only dereferences we're doing are here where we're writing a value to runtix or one of these loads, which is stack relative. And then we restore the stack pointer, reload a0 and exit. So let's try turning off all of this and see what that does. Give us just enough code that we can return from the subroutine. If this fails, I'll just try adding trace messages. We start in it, we swap it out, and right. So doofrock has returned and execution is continuing in the normal way. Well, I can see one bug, which is that should be a right because we're trying to restore the SAR register. Where is runtix? It's a int16 in the kernel data. So that actually wants to be a s16. If that was unlined, that could be causing the crash. I would have thought to get a different error, but let's try it. Yep, that was it. Okay, so doofrock is returning. So last time it actually called switching before failing, and now it's not, so interesting. Let's take a look at this doofrock. So we've got this message. We've got all the way down to here. Doofrock returns to interesting. I think this is returning. This will then return to the system call handler and try to continue on with user code. Let's verify that by just doing that and trying this. Whoa. Okay, it's spinning on something. Right, it has returned somewhere and everything is going horribly wrong. This gives us somewhere to look. syscall55. All names. So this is zero, so let's go 55 down. Waitpid. I wonder if this is the parent who's running and the parent is now waiting for the child to exit by calling waitpid, but we haven't implemented something that is called that means that waitpid is not working. It's not blocking. Therefore, the parent process will be spinning waiting for the child termination. So we're waiting for pid and search for an exited child. So this will then loop through the process table looking for a child that's exited. If it doesn't find anything, we'll call psleep and psleep will put the current process which is the parent to sleep and then the child will get swapped in. So the stuff I said yesterday about Doofork in that the parent is swapped out would appear to be wrong. What seems to be happening is we're swapping out the child and the parent continues to run. That's a little bit inefficient. See, what's going to happen is in it will fork. This will cause the child to be swapped out. Then in it will go to sleep. So the parent will be swapped out. The child will be swapped in again and then replaced with the new binary. If fork were to continue running in the child then we wouldn't have to... we would save a complete swap in-out cycle. Of course, if you've got a memory management unit and you can have multiple processes in memory at once this is not a problem. So let's scatter logging through there. Save and try that again. So where did we get to? We end at 350. Not found. Okay. It cannot find... So it's asked to wait for PID 2 but it can't find a process with that ID. So this is pinning the blame on Doofork somewhere. I think that we need to save the PID of the child process somewhere or we're returning the wrong one or something like that. So let's take another look at the Atari ST version and see what this does. Now I'm trying to remember which way round parameters are. Okay, it's source destination, clearly. So this is writing the stack pointer into USP. We haven't done that. So we save our registers. A2 is pointing at the process structure. I'm looking at the wrong piece of code. Okay, Doofork. Right, A0 is the... This is copying the first parameter into A0 to the child process table. We save A2. Load the childUdata into A1. Why? What are we doing with A1? Oh, also the other thing I need to go through and check to make sure these are 32-bit accesses. A0 is the childP tab. So for us that's in A2. So this is loading the PID of the child and saving it onto the stack. This is the PID we are returning here. So we're saving it into slot 5 and loading it here. Okay, let's check those... Okay, well for a start this is a 16-bit value. USP is a 32-bit value. UP tab is also going to be a 32-bit value because it's a pointer. Okay, so this means we're actually returning the right value for the PID. However, we could tell by the tracing that something in the C code was obviously truncating the wrong value we were working with to 16-bit. So I don't think that was what was causing our problem. So let's look for anything to do with the PID. This... oh yeah, 16-bit. So this is the childUdata, which we are getting from here, no. Okay, this is something else we're doing wrong. This is... yeah, I think last time I forgot to register that this is getting the childUdata pointer. Udata, of course, is pointing at the current processors Udata. So that's going to be L32iA3A2 being the child's P-tab, PUData offset. And why are we writing that to the child's stack pointer? That's just wrong. What on earth was I doing? So we have pushed the child's Udata onto the stack, and that is in fact what's happening here. So let's save that into slot 6x4. No, actually, this is... this is getting the current processors P-tab pointer. And we're swapping out the current process. Is that correct? Right, this piece of code is the wrong piece of code. I don't want to look at that because that's the... that's for a flat memory map model. That's why it doesn't look familiar. That's a flat memory map model with no swap in it. So let's take a look at the... Is there an 8080? No. MSX? Okay, yeah, I think this was the one I was looking at. Switch out to DoForg. Right. So the U-area is live as the parent U-area. This is... HL is the new process P-tab, which we're saving to a global variable there. To prepare the return value in the parent process, we're getting the PID from the parent. This is happening here. This can be 16. But this is actually the parent PID, isn't it? No, it's not. It's the child PID because we're actually fetching this relative to the P-tab that we are passed in. Okay. So that's correct. Save the stack pointer and the critical registers. Yes. We haven't saved the stack pointer. That will be in the child... That will be in the U-data block. Right. We do need to write the current stack into the current... It comes back in... I think I am doing this wrong. See, this is saying that it always returns as the child, and what we were seeing code was always returning as the parent. So we want to copy the current stack pointer into the parent's U-data. So we need to get a pointer to U-data. So A3 U-data. S32 I store U-data USP offset. Save parent's U-data. Okay. Then swap out the current process. This is not a swap system, so it's not doing swap out. How about the MSP430? That did actually work, I remember. And I wrote it, so I should at least remember how it works. So save registers. Save the child PID. Yeah, R12 is the input parameter. So that's loading the PID out of the child P-tab pointer. Save registers. Save stuff. Put the stack pointer into the current U-data block. That's this. I don't even know if we need to save this. We're not using it anywhere else. And besides, we're about to reuse that stack block. Okay, so save the parent process to disk. We want to save the child P-tab onto the stack so we can swap out the... Yeah, this A3 is pointing at the parent's U-data. This swaps out the parent. Okay, now we're the child. Yes, we are the child make proc here. It's going to set up the current U-data, which, if you remember, is a copy of the parent. Well, it is the parent. We've just saved the parent to disk, and that's saved the parent's U-data to disk. So we're going to turn the current U-data into the child. This takes the A2 is already the child's P-tab pointer because we saved this here. So this is... That comment's wrong. So this will set up the current U-data to point at the child's P-tab that we passed in. This is the child. Therefore, we are not returning the child P-tab here. We're returning zero. Okay. That seems much more plausible. L16i. L16. I'm sure there was an L16i instruction. Ui unsigned in. That just means it's zero extended rather than sign extended. So what's this going to do? Swap out. Right. What's happened? Fork has returned. It has said etc-rc-colon unknown error. Excellent. It's tried to load the RC file and has failed. This is correct. Okay. Let's get rid of this tracing because it's junk. Let's keep the process tracing. I'm going to need to copy a file onto the file system. Probably several. And it's panicking on platform switching because we haven't implemented that. What's happened is the child is terminating and then it's tried to switch back to the parent. Etc-rc. That's the RC shell configuration file. This is a shell script. Has it just tried to run? Has it just forked into the shell? It can't have because I don't... Unless this is the SSH shell then I haven't actually implemented that bit yet. Okay. Well, let's put RC in. And now is probably the time to add some of these. So let's put in... Let's put in these, actually. Let's date. Not there. Remount. Not there. Correct root is there. Okay. F disk as well. I'm being absurdly optimistic here. Put root. Yes. I should probably at some point try and build the shell. I wonder how many of these things I actually have space for. That's just showing the number of three sectors on the logical... on the... in the FDL partition. Okay. We run out of space. So let's... Actually, we've run out of space. Uh... We can make the partition bigger. We can probably size it up by 50%. So we have to un-mount that first. Okay, that's done. F disk... image. So let's... Actually, it has told us exactly how many we can have. 1, 4, 6, 5, K. 4, 6... Just to be on the safe side. file system.image .image 2, u, primary 2, full size. So we're now... We're now a megabyte. Write that. How does this feel? I wonder if I actually need to increase the size... increase this. I don't actually know what these numbers mean, which is a bit annoying. I kind of have no idea what I'm doing. I don't think we've got a megabyte worth of binaries. Now I think about it. They're all very small. Well, they have to be. They need to be under 64K. All right. Some of these some of these applications have moved since that script was kept up to date. So let's just... clip those out. Because you haven't built them. PatchCPM... PS. I thought PS would be in utils. Yeah, it's right there. Oh, right. Now it's running out of space. Do we want to double this size again? At some point it will actually be too big. OK, that actually seems to be behaving. So let's just put that up to... OK. This will leave five... OK, that will leave five sectors. Actually, I'm going to crank that down again. I want to leave a certain number of sectors free to give the FTL some working space. So... where does that go? Right, I think the file system is just full at this point. Can I double this again? No. Right, I don't think that number does fit. So let's put that down to there. Nope. Well... It fit. We know it fit. Yeah, OK, let's go with that. I'm also slightly wondering whether I want to increase the number of swap slots. There's only four, but I need to uncomment these. That will fail because some of these don't exist anymore. LN MV Patch CPM SH and Leve. OK. So let's write that. OK, and that's going to take a while. We can make the file system bigger. I think this is only two megabytes worth of flash. Yes, it is. We've actually got three to work with. Well, actually, we've got four megabytes of flash, but I'm using the first megabyte for code. We don't have anything like that much code. Let's just see how much code we've got. 47K of code. Yeah, 47... 47K. So... We have loads of space. We can actually use very nearly all of the four megabyte flash for our file system, which is quite nice. That will give loads of space for a system like Fusix. We could probably even get a compiler in there. So, Leve got to the point where he was trying to switch in the new process. So, while this is flashing quite slowly, let's take a look at this. Switch in text for... It takes the process pointer P tab of process to switch in. So, this should be relatively straightforward. Fork is the trickiest. What's this doing? Is... Ah, okay. Okay, what I seem to have done... This is a bit odd. Why did I do that? I'll tell you what. I think that my source... MSP430 code in my source base is contaminated. So, let's take a look at the upstream version. Kernel MSP430 Tricks.S Where is... Is there a switch out? Now, apparently it's calling platform switch out. This could be part of the overlay stuff that I did. The MSP430 was a bit special and I was a bit over-enthusiastic about doing weird optimizations, which means it's more complicated than it should be. Okay. That has written 2 megabytes. Let's just look for platform switch out. See where that appears. Process.C calls it. Ah, okay. Okay. Right, the MSP430 code is right. So, platform switch out is the code that swaps out the current process and swaps in another process. And it looks like what I did here was the to delegate nearly all the logic into switch in. So, what switch in does is it looks to see if the current process needs to be swapped out and if so, it does it. And then it switches in the new process. So, all platform switch out does is to call getproc and then fall through into switch in in order to do the work. So, let's just nuke that and duplicate some of this code. Platform switch out switch in. So, we've got platform switch out and switch in. Now, platform switch out is going to set the return code which will be returned by fork to zero. Now, we are going to want to create a stack frame the same structure as this. So, we're actually going to copy nearly all this code. So, let's do that do fork does it. Now, this is only going to matter this return code if hang on, hang on the return code from switch out will only ever be non-zero if there is no return code. So, yeah there are no returns. What switch in will return is either the return code which is set by switch out which is going to be irrelevant because switch in to switch in return anything because switch in doesn't return anything yeah the other situation is when we've called fork when we do the switch in we're actually going to return into the context just called do fork and we are the parent and we want to return the child processor's PID but in that situation the parent process will not have gone through switch out. So, we don't need to save a 5 that's just going to be garbage we don't need to save the return code because it's just going to be ignored. So, we want to save the stack pointer find the next process to run is get proc it does get proc still exist call zero get proc call through into switch in okay A2 is the p-tab pointer of the process to run okay now we need to duplicate this logic compare the swap page is this are we the current process I think this is what this is saying if we have not been swapped then just start running because everything is set up correctly. So, here's the new process actually swapped out what is a p-page p-page is a un16 A2 is the process pointer so we want p-tab p-page offset b eq and z now I can't remember how this thing does branches peq branch equal immediately branch equal to 0 b, n, e, z right, if if A3 is not 0 jump to not swapped now this says that we're using simple.c and the swapper is going to swap in a process it's where it's swapped out manually do we need to swap out the current process yes, of course what it says here about the stack is correct we are currently running from the processor's kernel stack and swapping stuff in is going to overwrite it so we can't do that we're going to have to create a new stack which we're going to need to switch to now because we've got our own swapper it might be possible to use to simply modify the swapper instead so that swap in swaps out the current process that would save us from having to do stuff in assembly this is not a lot of logic so we can simply do if the current current process need swapping out if you data uptab dot p page if it is not 0 swap out uptab then we swap in the new code it's panic swap in but we do need to switch to the new swap stack in the switch to the swapper stack and swap in the new process by sp to swap stack plus end we can probably make it smaller than 512 bytes a2 is still pointing at the ptab pointer of the new process so we just want to call swap in this will then copy the new process and the new process's udata into memory what is that called swap called from switching we discover we want to run a swap process we let page map alok cause any needed swap out of idle processes page map alok may cause a swap out we want to call swapper and then this will do the work of swapping something in I have a feeling this code has changed since I wrote the msp430 pdata are we going to make it running in this piece of code no apparently we're not make the new process runnable and we will actually need to save the um we will need to save a2 onto the new stack we can't use this stack frame because we've just shifted to a new stack so be nice to do this from c but we are having to tinker with sp so make the new process runnable s that's a byte s8i new process ptab storing a3 new process ptab ptab pstatus offset so the processes udata is now in the new data block so all we're going to do is simply load load the original stack pointer now this will have been saved either here or here this means that this block is on the stack so we can return it we can restore it with basically this load so we set run ticks to 0 this counts how long a process has been running we reload these parameters the parent process needs to need to return the pid of the child which is in slot 5 so 2i a2 4 and exit ok I think that is all we need really can I do eq apparently I can't do eq it might be possible to be able to allocate a buffer from the kernel for this but let's try this for the time being 41 invalid right we haven't defined this this needs to go in there's actually going to be several of these this one 56 and this one and that's offset so these are going to have to go here we do have that one we don't have this one it's not equal to p status ok let's just have a look and see if we can guess what that should be oh yeah and we also need to set run p running 1 p status is the first so that should be right ok what times do we build it nothing 41 p page offset ok there's another one so if pid is 4 uid is 6 8 12 16 20 24 I think we're saving these onto the stack and we don't actually have to attract back from the stack pointer because we're about to discard the stack pointer completely when we reload it here std s unknown opcode call 0 ok and I think we don't need to do this swapper.c is now doing it I think alright this is probably going to fail immediately on start up and not do my mental arithmetic correctly and and come on right ok panic swap in right it got to here it tried to swap something in it's still saying unknown error 13 when it comes to RT but let's write our swap in routine actually do a bit of cleanup actually just use ints they'll be faster are we using page anywhere yes we are so this is going to be almost the same logic swap in if no page ok right we now need to do our reads and we're going to read the entire space should be swap read yup that should do we're reading the entire data because I believe that reads are much faster than writes and that's just you know try it first and see what happens I still don't like that unknown error 13 but let's try it and it fails what's U it's the right it's where we're swapping to which is going to of course be Udata I am a little surprised but why this isn't using swap read or swap write to be honest don't think there's any reason why it's not I just cut and pasted this from somewhere else anyway let's see what this does swapping out swapping in oh swapping done ok well I'm not quite sure what it did but it did a thing let's clone this to make the reads faster so where does it go from swap in well we're in swapper.c well rather we're in swap.c we're in the swapper function so let's turn on some tracing there this should then return to whatever process it was that'll be weightpid I think we still have tracing on here we do let's give that a go actually before we do that let's take a look at those warnings and see if there's anything useful swapper.c in byte 20 that's a warning that needs fixing uadro t 103 uadro t if the kernel is a bit inconsistent whether it uses a uadro t which is an integer or a void star in file included expected use size t argument is that's the that's this 154 honestly this code could be comment out really easily just need to pass in a pointer to either swap read or swap write let's try that okay that builds cleanly so let's see what it does I mean it'll fail in the same way I hope just with different messages alright so that's the swapper telling us what it wrote and then it hangs now it didn't hang that's intriguing so it wrote 2560 bytes at this address then it wrote 44544 that's not right oh that's stupid uh yeah let's comment this out we want the block the udata and one of these yeah let's change this to call to our all purpose swapping out function to swap dev block data len is obviously udata blocks address is u1 oh there we are so here we just replace all this with swap in out block u swap write and this becomes swap in out block uint8 dstar udata swap read okay let's see what this does um passing 4 of read write makes integer from pointer 1 2 3 4 yep that wants to be a u adderity okay right so it's no longer writing 48k of junk so let's see panics at least this does explain why the swapping in and out was so slow there's far too much data doesn't explain why this is quite slow okay so you can see here it has written the udata area which is that address data stack on read if we read too much at this address then we'll overwrite the ROMs workspace and all kinds of terrible things will happen this does seem to not be doing a lot so I think it might have failed yeah okay this number seems wrong because there's only 32,000 blocks in a swap hang on this is writing the code each block is 512 bytes so that's actually 128k from the beginning of swap partition which is kind of way too much um that's this calculation here so start of swap area number of blocks in the udata which is 1 length of the data segment which is 64k blocks that's the start block this seems to be okay 1024 bytes is 400 in hex so that will go up to the end of the data block the data area 64 wow did a thing 64k divide right shift by 512 is divided by 512 is 128 uh yes I had multiplied rather than divided in my head that is the right number so I'm not quite sure what this is was doing for all that time it seems to be perfectly happy to do it so let's just our syscall tracers back the data now the pd is udata ptab ppid no it's not it's u ptab this will show the pd of that made the system call which should give us information about what's going on okay system calls swap out long pause with no system calls I wonder if this is Dara frantically garbage collecting okay so here's pd1 this is in it we've done the fork we're now in pd2 which is whatever it is in it ran that's correct whatever it is has just failed and has swapped back in the thing that was swapped at 288 so it swapped in it back in so we have successfully forked run a process and it swapped in the parent I I don't believe it has returned from the parent I have no idea what in it is doing but I don't see a call to I don't see a syscall end yes I do it's there that's exit so I would expect to see from here a syscall ret which is the other half of the fork that's happened here so this is we haven't seen pd1 return from this fork that should be happening here so this makes me think that given that the swap in worked then this code is failing in some description we haven't exercised this yet so what could be going wrong we are the new process the swap in has succeeded we reload a2 which is the p tab of the new process which in this case is the parent we make it runnable we restore the stack pointer so if this is wrong then everything will go belly up further down this is the parent which means that loading here is the thing that got saved here so it's a 8 slot stack frame with this stuff on it so 8 slots stuff we haven't seen an exception so what kind of tracing can we put in we can write out bytes to the let's do a movi a2 65 call 0 by a3 ETS put c a3 this will print an a and spin and while we are at it let's put some stuff in here kprintf scop lpa dara read let me also put in raise so this is actually going to return is going to display physical blocks whereas the other tracing is going to show logical sectors if it's spending all it's time erasing then we may be able to help that by switching to 64k erase blocks well that's it reading in it swapping in and out okay well that was a lot faster than it was last time let me have a quick scan here's read this is reading the binary read read read so here it's swapping out we erase physical block 21 22 23 24 honestly I think that's fine all right so we have reached the end of we've got our a which means we've reached the end of this so we should have returned to wherever do fork was called from so here there so we should be seeing this tracing but we're not which means that the ret has failed somehow which makes me think that this stuff is going wrong we have incorrectly restored the process and the u data or stack data is all wrong now swap right oh I've lost the swap in swap in tracing let's do that again then shall we read stuff ooh lots of copies this is it garbage collecting okay so this is what I wanted to see we have read this is our u data somewhere up here you can see yeah 80512 bytes so we read that we read our data we read the user stack we read the text okay so I want to verify that the u data here is valid but I don't want to put tracing here because that would be hard so let's put it here in swap in so we can do um pd equals this should udata.up pointer ppd udata.usp what else would be sensible I think that's probably all we need what it's called so if these two values are correct it means that the u data block has been restored correctly more garbage collection the reason why Dara has got this copy operation is that some NAND chips can do fast copies internally but I don't think this is one of them I think I need to shrink the file system because it's there aren't enough free blocks for Dara to do a good job certainly seems to be spending a lot of time doing garbage collection a lot of time doing garbage collection yeah I've run out of things to say just waiting for this to finish whatever it's doing it seems to be shuffling nearly all the flash around which is odd there's only 512 erase blocks you can see we've already done like 300 well if if the numbers that we should eventually get look right then the problem will be something in this code in that the the values that we are restoring are all wrong therefore as soon as we do the ret it jumps into nowhere I am surprised weird so PID1 stack pointer that's the wrong stack pointer that's the hang on that's the user stack pointer that's that's a very round number 3 f7d that is indeed a user address but that's the swapped out process should be in the kernel stack now we can tell where the udata block is because the address is printed here so the udata block is 512 bytes from this address and the kernel pointer the stack pointer when system calls are executed is in that block the old stack pointer is stored in the exception frame which is on the user stack so when our do fork happens the stack pointer that we are saving here onto the into the udata structure is a kernel stack pointer and should be pointing above this address this is below no wait this is below that address the PID is right but this is not okay this gives us some data to work with for a start we could simply go here and do that this will show us the stack pointer that gets written to disk so that we can verify it's the same when it comes out again so we know that we know that things are returned correctly when do fork exits and we're not changing the stack pointer here at all I mean we're not changing stacks we're obviously moving the stack pointer up and down which suggests that the saved value here is somehow wrong but we are actually saving the current stack pointer onto the udata block and then we're calling udata swap out so unless the stack will be restored this state not at the state it was in swap out so let's go and just see what it does hopefully should need to wait that long so it's in it okay it's now swapping out we have to wait for that to finish another garbage collection I think I'm going to give 64k arrays blocks to try you may even get more space in the file system that way and I'm also going to try shrinking the file system this is clearly not right alright what's this done same stack pointer there swap out done stack pointer is 3fff7000 that is the bad stack alright so do fork is somehow inside swap out then the value saved here is wrong somehow how can that be wrong we've switched to the kernel stack we did that in subunix's call wait a minute what the what an earth this can't be right we allocate a stack frame we save the stack pointer onto this wait we a2 is the exception frame we store the stack in the exception frame we change stacks we call unix's call a2 does not contain the exception frame anymore that's complete nonsense I'm not sure how we ever actually managed to make a system call it's still not right and the numbers are all wrong but this is still clearly the wrong address so if I go up here to n32 this is this is our fork kernel stack pointer here user stack pointer here kernel stack pointer is in the exception frame so that is here no it's not that's the user stack pointer kernel stack pointer is approximately here it's difficult to get exactly but this will this will return the address to a local variable which is on the current stack frame so the user stack pointer is going to be there so let's try that and honestly if I pause that and scroll up kernel stack pointer yes no maybe that does not look right the user stack pointer is kind of garbage that value has changed it is the right value I mean it's the one that we saved so okay 12 0 4 8 12 yeah stack pointer is here and then we call unix's call with this in a2 so unix's call gets the stack frame in well a2 which is the first parameter wait a minute wait a minute this is called handler cb that that stub is never being called this is always being executed on the user stack which means that whenever we do a swap in or an exec we're going to be erasing the stack we're currently executing from which is very bad so this call handler trampoline handler cb this doesn't want to be static anymore put that there so let's I need to call the stub so xtern fnc exception handler t syscall handler trampoline like so syscall handler trampoline call nought syscall handler cb which is no longer a callback so it's just a syscall handler right let's see what this does this should now switch to the kernel stack the difference between the that's not so good and of course putting this in a15 is wrong because that's not going to we do actually need to save a15 so let's just save a2 into swap0 stored stack pointer into the exception frame no that doesn't help we have to do that here so that here we can load a2 back out of the frame and then load the stack pointer from it which allows us to ret but we're doing a call so we also have to save a0 here we also have to load it from the stack frame oof hopefully it'll maybe work now okay can I get away without this I don't want to be on the user stack when I do exec because we are going to be writing all over that bit of memory so we do have to switch stacks and I believe that this is the best place to do it of course it's not working this saves the exception frame switch to the kernel stack allocate a structure of the kernel stack which is 16 bytes save the exception frame structure onto the kernel stack save a0 onto the kernel stack all the thing reload both things change stacks back to the user stack and return well let's see if we get here let's see a3 j. okay we haven't got there that's a good thing because that means it's more likely that this code works so this code is actually exec which is this there is no debug tracings we're just going to have to add some during the execution of this we are on the boot stack which is way up the top of memory at some point I would like to reclaim that because we only use it on start up it's never touched again and there's enough space there to put a number of file system buffers that would make a lot of things better but now the platform doexec is it doexec that we have problems with it could well be so let's just do that this is in low level doexec yeah right we've just called doexec we bypass the whole system called return stuff and just jump straight into the new code there's not enough here to be wrong honestly so that suggests that what's happening is that it's calling this code and then this is crashing so let's put this here and see what this does so if we're calling user code and the user code is trying to make a system call and then this is going wrong somehow hopefully this will show what's happening aha something new well that's not a valid program counter this looks fine callex is the indirect call instruction so we load the address of the putc rom function we call it we're not doing anything stupid like loading this I'd love to be able to load from memory in one instruction from a constant address but this does look like code has it dereferenced this I think it might have okay so this is our let's disassemble our kernel and you want to look for syscall handler trampoline and the address is let's go right to left for c 1 2 3 1 d8 it's dereferenced this it's treated this as a pointer and dereferenced it ah friends don't let friends define function pointers that are function types that are pointers the now I need to clean the c language and it will actually convert from a reference to a function to a function pointer and it will do this well enough that you most of the time you don't notice whether you're using one or the other except the times when you do and this was one of the ones where you do declared as function returning a function that's because it will return the old function pointer which you don't care about so let's just do that put this as one of those it makes no difference so what's this going to do it's called the right thing so let's take this out and run it again and see what happens so read and fall over do exec we are in the bind in the user code let's just get rid of that line of tracing if you don't need anymore let's look at our stacks this is a kernel stack pointer this is a user stack pointer thumbs up there we are now doing a thing what is syscall 6 unlink not sure if syscall name is in the kernel we are at this address we are in memcopy we are in the 8-bit memcopy what is our virtual address 9 okay I thought that was going to be code data again okay syscall name is not defined so why would this be crashing okay anyway this is nothing has actually changed we just now have better error messages so we have gone into unlink we are doing a couple of sector reads and then it's dying now the big difference between what we are doing now and what we were doing before is we are on the kernel stack so let's do this and see what happens now we are not changing stacks if this behaves differently right it doesn't like us being on the kernel stack chances are that we have run out of stack space that's what that usually means if you switch back and everything is fine then well let's see what this does it's not going to work properly we will be scribbling over the stack when we swap the thing in but once that finishes garbage collecting so if we need to make the stack bigger the kernel stack bigger we are going to have to double this double this okay yep that's failed swap size is now 32 and a half k it's 31.5 for the because this has increased to a kilobyte so where is what size used I don't know whether I can use a 0.5 there I don't think I can no I can't so well this is how much user code there is if we increase the size of the u data block again to 1.5k it's now 3 blocks long this is now 1.5 and this is around number again we have lost a swap slot because I configured it on the profile system for 4 slots of so now there's only room for 3 I'll have to reconfigure it but I'm going to have to do that again at some point anyway so let's put these back switch back to the kernel stack I wonder how many places I've encoded a 512 oh that needs to be different new block size this needs to be different how many places in swapper did I assume the size of the u data he that is length okay I think I'm good so clean build and write and it all goes horribly wrong because because of that stupid 0.5 so this is going to have to be 33 64 plus 31.5 plus 1.5 as it is late and let's see how this goes okay that is better we've given it loads of stack space I'm actually a little bit surprised it hasn't run out of memory but I think we have a good amount of kernel memory wait for that stupid garbage collection I don't know how much we have left actually this is not going to help I need to use nm so okay here is our user code our kernel memory starts here so here are all our kernel variables somewhere there will be u data okay we have got we've got up until c000 so actually we've got 2k left oh here is u data and there is our swap stack so I don't know where our buffers are or how big they are these are the 2 512 byte buffers used by dara buff pool here we go there is a decent size decent size so we've got some space to increase it a bit but we probably shouldn't because we are going to need to add some more functionality to the kernel we can get rid of that system call name table which will save some space okay right here we go swapped in do fork excellent we have successfully restored the state of a process all proc so we have returned from here and we should have seen a syscall ret but we haven't so something else is wrong that that is not good why is that there that is a very not good number that is a number that I do not wish to see okay b600 is the stack pointer alright what's happened is that usp is not the user stack pointer anymore it's the kernel stack pointer so our optimization here is invalid so as always primitive optimization, root of all evil etc etc etc that was definitely my bad so let's try this and see what happens and a garbage collect let's scroll up and see if our numbers look right udata block 1536 we are raising lots of blocks luckily we can keep this one which saves us a bit but having to write out all that 64k is not so great yes I do not know how to get the real user stack pointer to be honest I don't think there are any system calls that require it so other than this we should be quite happy for it to live in this stack frame somewhere in the user stack don't know how user processors are supposed to allocate stack I think let's take a look at one of these other well that's garbage collecting so there is a field in the header which is the stack hint a stack which is apparently not used by anybody all right let's take a look at the 32 bit version stack size what's this used by it allocates a write when using this format a believe it puts the stack pointer item next to the bss and it's a fixed size we could do that that would allow us to save the user space in one chunk it would mean that we wouldn't be able to increase the stack if a user needed it but I don't think there is a way to do that anyway it's yeah this is still putting everything right at the top user memory is this doing anything okay that's it swapped now it's reading the process back in and let's halt that that went wrong okay user area 64k of data which takes a while 8k of code we are returning from a system call and then we crash here we got this 402126fc interesting so that's trying to write registers to the exception frame and it's crashing because a2 here is null 1,4 is hex 4,20 it's got a2 from the stack this suggests that we have somehow corrupted the kernel stack on exit and as a result it's failed to reload the exception handle the exception frame properly the exception frame lives in user code but this stack frame it's been loading it from is in kernel space, kernel stack area let's take a look at this we're restoring the stack b610 we've saved the stack b610 so it is at least the right stack b610 is within the bounds of the kernel stack block so we should have successfully loaded that possibilities for not loading it are that we are on the wrong stack which means it somehow got corrupted switch in swap stack plus swap stack size save on to the stack we don't need to do that oh I'm an idiot let's switch stacks then let's save the stack frame on to the stack frame right 15 is saved up here and is going to be restored further down so we don't need to explicitly save it let's show you more this is going to be sba15 restore old we actually let's take a look at that old code yeah we are putting the stack back where it was we want to stay in the swap stack in fact yes because we can't put the stack where it was because after we've reloaded the process then the stack point has moved so we stay in the swap stack until we get to here now in the case when it's not swapped then the stack pointer won't change because we're still in the same process I don't actually know when that happens okay so that ought to be better however let's take another look at the swapper I think we're going to have to say to make this feasible at all is uh let's say 3k they can have 3k nothing else so this is going to be write out the data area data then minus data then data then minus user stack into blocks the address is actually going to be database plus user stack and the length is going to be user stack okay hang on I suggest parentheses around you know yeah so read in it and what's this complaining about this is a ROM code exception because that's just rubbish database plus user stack blocks now user stack so for this we want to write out um yes it's just data then minus user stack data base plus user stack no the length goes first the length goes first mon I want this to work read write slowly read and crash we swapped in the process stack pointer in the u data block is b610 it's correct you've got the right stack pointer uh and we die at this address which is here we failed to make the new process runnable is this because a2 is corrupted because we didn't save it across the call to swapper so we can use our trick to save a2 for the save register so this is now a 15 so swap out add great length yeah I'm going to fix this offline because it's going to require reflashing probably multiple times until I get it right and as you know fiddle with long and slow tests to see what makes a difference so I'm just going to live with it until the end of this session which I'm hoping is going to be really soon now come on I really don't know what it's doing I might file a bug report with daro because I cannot think of any sensible reason why it needs to copy stuff like this I mean all it's going to do is use up erasers it does claim that it uses that crashes that's our usual double exception I was a bit slow here we go dara does claim that it has perfect where leveling so that every no2 erase blocks will have a where differing from one but if it's artificially trying to wear it that's not really how it's supposed to work okay we've returned from the system call we've reached this call red which is not the same thing and we crash here oh look it's there hmm so this is different now this address this is the virtual address that caused the error and this is pointing at the ROM so I wonder if ef is now pointing at the ROM and it's trying to do the right and then it's failing I wonder if the is it trying to cache the yeah it's in a register so any corruption of the kernel stack will cause that to be read backing correctly the actual place where that is going to be stored will be somewhere obscure in the in the chain of stack frames because that register is actually going to be saved probably by the Unix syscall function which this is calling here so the the registers we're saving here are not the same ones that the that could be getting corrupted but if it was getting corrupted then I doubt that it would have managed to make its way down the stack tree stack chain to get to here yes well I just want to see what these say unfortunately adding those trace messages will have changed the code that always happens in particular the values will now be saved across the call to kprintf actually there was a call there before anyway so that shouldn't make a difference so here we call Unix syscall this is where the call to kprintf happens hmm it would be nice if physics was smart enough to know that it can just reload the text from the the binaries file to keep the nine-odd open for that I don't naturally have enough knowledge of physics internals to know whether that would be particularly expensive then we wouldn't have to save the text at all weirdly on 8-bit systems reading stuff from disk is so fast because disks these days are SD cards that run way faster than the 8-bit micro so on an actual Z80 system it's just as cheap to load from disk as it is to do a memset to clear to zero so there's actually no advantage to saving part of the process I should be calling memset to clear the middle section process the bits that's between the stack and the breakpoint because a process could see data left by another process that just got swapped out okay come on I mean it's not as if this platform is even going to be slightly secure okay and catch it so what's this done oh crash somewhere else now 270F that's here oh of course it's gonna crash there I'm trying to dereference EF okay well honestly I'm going at this I mean I'm getting quite tired I am going to call it today here I'll try and fiddle with the file system a bit to see if I can make this easier even if it's just shrinking the logical file system it's very nearly working which is why it's so frustrating that this is not it does look like EF is now pointing at garbage or rather it now contains garbage and if EF is in a register variable then that means that the stack frame is corrupted which is annoying because that's just really hard to find I suppose I can always dump part of the stack frame for an after-each system call because we're not actually dereferencing EF doing that that would work fine but I think that's something to look into next time so I hope you enjoyed this video please let me know what you think in the comments