 The story so far. I am attempting to put Fusix to the ESPA266. I have nearly all of it working except putting the file system on the internal NAN flashes achingly slow so I'm trying to make the SD card interface work and I'm having endless trouble. It looks like I'm correctly sending bytes to the SD card but the SD card never responds. Now since last time I had a thought which is I am using the SD card reader from the MSP430. This is a device I ported Fusix to several years ago. Now that worked. So why not simply hook up the MSP430 which I still have, plug it into the SD card reader, run it and then do a logic analyzer trace. That will show me, assuming it's still programmed into the device's flash, that will show me a successful SD card initialization which I can capture in the logic analyzer. Then I can compare that against what the ESPA266 produces. So I dug out the MSP430 board and completely to my surprise I actually managed to find the SD card for it. So in fact we have a complete booting system. If I hit the reset button here it is booting. So it looks just like it does on the ESPA266. The version number is somewhat advanced over this. This is years old and is rather bit rotted. But you can see that it's detected the SD card. There are three partitions. It's mounted the root file system. It's started in it and we have a shell. And in fact it is a completely working system. We can run programs and everything. So if you want to play Hunt the Wampus it works fine. We've got all the standard physics utilities. They all work. There's the fourth interpreter which works. And you can see that the speed is it's not instant but it's perfectly usable. The MSP430 has 64k of RAM more or less. The device I'm using is a bit weird. It has a few kilobytes of genuine RAM and the rest of it is F-RAM which is a strange form of non-volatile memory that I've never really seen used anywhere else. It's got the performance of S-RAM yet it's it retains its state when the power goes off. So in fact the device has no flash. It just happens to have the kernel loaded into F-RAM. But yeah that works fine. So here is the logic analyzer. Let's set this to something reasonable. Set this to something very big. Hit the run button then hit the reset button. Run. Reset. And leave that something came out. Yep. Stop. Did we get anything? Hmm. It is plugged into the logic analyzer. Yes. I'm looking at the wiring. Let's try that again. And hit the reset button. There we go. Oh I just wasn't quick enough. So let's set this up to 200 and do that again. Okay. That's it booting. So let's find the boot sequence. That's a command 62. Hang on this all looks very strange. Very strange. This is the reset command. This is the 0 4 blah stuff which is the actual command byte. Then we have 4 zeros and then we have the 9 5 of the checksum byte. So I think it samples on the leading edge. So 0 down up 1 down up 0 down up 0. Yes. So that should be going to Mozi bits here 0 1 0 0 1 0 1 9 5. So why is this peculiar? This looks wrong. So I can tell by looking at the what's happening on the different lines. This is chip select. So it's search the chip by going down. Then there is the command. Once the command has finished the chip responds with MISO here, which is a bunch of zero bytes. Then we de-assert chip select and prepare for the next command. Why isn't this decoding it? I mean the cards responding. Is this correct? Active low. Yeah it is active low. The clock phase wrong. It's interesting that the actual clock isn't grouped into bytes. This is reading bytes I believe one at a time. So I think the... Well that's inverted the clock polarity. It hasn't helped phase. No. Both. No. I am more or less just changing stuff randomly to see if anything pops out. I am wondering whether some of these lines are actually inverted. Which is why they are being decoded incorrectly. Word size has to be 8. So chip select is asserted. 1 2 3 4 5 6 7 8. 1 2 3 4 5 6 7 8. So it's reading bytes counting from when chip select is asserted. But the clock is ticking continuously. So why is the SD card responding to that command? Okay I was expecting to see something that made sense here and I'm really not. I don't understand why this is working and the decoder down here doesn't understand why it's working. I've got lots of samples. There's no aliasing problems causing a bad trace. I don't think the SPI stuff is configurable but let's just chop off the SPI decoder and actually add our own SPI. Clock, MISO, MOSES, yes. Right this is in fact the same as we were seeing before. So nothing wrong there. So this this zero should be the first bit of the command packet. So this is the MSP430 code. Set our slow, actually set a constant clock speed. Disable interrupts, MSP1st. I'm not sure how to configure that actually in this. I can configure it in the SD card layer. MSP1st. Here is the clock configuration to set it to fast or slow mode. Interestingly, SPI mode zero here is, let me see, KPH. I don't actually know what that is. Sync mode, clock phase, clock polarity. So on the ESP8266 we are not setting either of the mode bits. Of course the mode bits are unlikely to have the same meaning on the MSP430 as it does on the ESP8266. And I am still baffled as to why this is not decoding correctly. It seems everything is one bit shifted. You can see this should be 4-0 because the zero that's on the end of FE should be part of this byte. Well this is at least, if I ignore the decode, this is a logic trace that seems to be correct. So the next thing to do is to create a new session. Let's rename this. Apparently we can't rename it. Oh well. So to create a new session, swap out the board so we have the ESP8266 back in again and do another trace and then compare the two. So I'll assume this is right even if it's not decoding correctly. Interesting. Okay so I will do that offline. Okay here we've got that trace. So this is the ESP8266. This is the MSP430. Now one obvious distinction is that on this MOZI is left float high when it's not being driven. But if we look at this, this is our command. So CS goes down. Let's actually add the decode shall we? SD card SPI mode. So clock MISO MOZI chip select. Yeah okay. So this is decoding correctly. So here is the command. Groups of 8 from the clock. Not sure where there's a pause between each one. That's worth looking at. So that we start here with 01 and we start here with 01. Now let's look at the details of the clock phase. So you see this gets asserted high as the clock goes low. Which is exactly as happening here. And then it drops low again at the end of the next clock. So that's that is equivalent. And again at the end here is our 95 byte. 0 1 0 0 9 5 sorry I'm counting this incorrectly. 1 0 0 1 0 1 0 1. Yeah it's decoded here you can see. That's on the MSP430 and here on the SP866 we've got the same. I did as of yesterday figure out how to get MOZI left floating high when receiving bytes. So that works. This is where the chip responds. And no response is seen here. Okay so what are we doing differently? Well if I put a marker there. Can we get a length between them? I'm not sure. Ah here we go. I don't need this marking. See I'd rather like to be able to count how many clock bits that was to make sure it's the right number. But I do know for a start that this is on the lower edge of the bit that's here. Now the clock speeds are very different. So this is taking 150 milliseconds to do. Let's actually put that to. Can we put it to there. And we want this to be the last leading edge of the first clock, trailing edge of the last clock. So in this here is our first clock. We want it to be there and there. So on the MSP430 this is 115 microseconds. On this it's 156 milliseconds. This is because we've set this to the incredibly slow mode. The MSP430 should be running at by default 400 kilohertz. Okay well let's try and do something about that. Let's set this. Oh yeah I did actually figure this out before. 200 should be about the right number. So let's burn this. I've tried this it doesn't work. Here we go. So let's rescan the button. Stop. Okay now this looks very strange because we're getting bursts of clocks at very high frequency with long periods in between them. I can add a timing. We want this to be attached to s-clock rising edges. If we go over here and do the same thing. Not that one, that one. Delete some of the things I added by mistake. Connect this to s-clock rising. So the time for a single cycle is two and a half microseconds or 400 kilohertz. So the MSP430 is doing it right on. Here it's faster. Okay I thought I got this right but apparently I haven't. So 160 should give us 500 kilohertz. So let's just see if that produces the right number. So here's one of our clock words. That's hopelessly wrong. This 40. I wonder if that's related to the clock speed which is defined in here. Let's put that back to 80 shall we? That should restore the peripheral speed to 80 or not. That's completely garbled. I need to clean. Okay well I've broken the UART stuff but I do at least now have a trace of this stuff and I do in fact now see a command zero go idle. Oh yeah we were seeing that before. It was the MSP430 that wasn't decoding properly but this speed's wrong. Okay now I do not know how this works. These speeds do not appear to apply anymore. So let's just pick something by trial and error. That should be quite slow. So that's 800 kilohertz. According to the ST card spec we need about 400 preferably a bit less. 700. I am suspecting that my tinkering with the clock speeds has broken everything. So let's just take that out shall we? That's still way too fast. Okay let's just chop this stuff out. I'm going for globals.h. We want this to be 56, 56. Hopefully the UART should work now. Okay we've still got four megahertz for the clock. I did not touch this code. Still four megahertz. I am not convinced that the clock is being set correctly. Is this code at all right? Well I can I can set it to very slow mode. So let's just see if that makes a difference. Very slow mode is indeed very very slow. So here are my clocks full of glitchiness again. This is what I was seeing the first time I tried to do this but then it went away. So the decode all fails. This is mysterious. I have by the way got a second ESP8266 board coming in case it turns out that that was what the problem was. Interesting. Very interesting. Let me just try power cycling the board and doing that again shall we? That's not different. I was wondering whether I might have incorrectly set up the clock. So this is the this is the code that was setting up the various PLLs. So the peripheral frequency of 80 we set it to 88 and we need to set the second one to 91. So so for 80 we want to just set the clock frequency function. Don't bother setting the overclock bit. Change these. What have we got? Sane looking and I see that this is decoding correctly. There's still a long way apart. I wonder if a lot of the trouble I'm having with performance is just that running code out of flash is really slow. 0 1 0 0 0 0 0 0 yes. But 500 is still a bit fast. What's this set to 160 which is in fact the right number. So let's try 200 400 which is approximately what we're getting here. Okay. So 200 is indeed 400. So we are now more or less correctly setting the clock speed. I don't like these very long delays. The SD card should not care frankly. You see that's nearly a millisecond between bytes. What is it? It could be in fact it probably is that if it's my debug tracing was causing the problems I'll be very upset. Okay well it failed to find the card. So okay here are our bytes closer together. That's much more what I expect. However the card is still not responding. Well it's now operating at 400 kilohertz. So the timings we see here are fairly similar to the timings we see here. So if I set my cursor to last clock down there, first clock up there more or less. That's in the middle of a bit wanted to be well there actually. So 175 microseconds, 114 microseconds. This is a little bit faster but not much though. This is 400 exactly. This is 400 exactly. Oh right there are more gaps here. All right so what else could we be doing wrong? Well a chip select is asserted and then we get two bytes worth of zeros. FFs. Then you can see them here. Well 15 bits worth actually. Well here we're not. So that's easily testable. When the chip select is lowered we want to transmit two sets of FFs. This isn't in the spec anywhere that I could find but let's try it. So chip select goes down. We output two FFs. Then there's the command. Now what's different here is there's 15 FFs. That's 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 and then the command starts. Here there are 16 two groups of 8. Do I want to try and write out 15 bits? I mean I might as well. The code to do so is straightforward. So this is the number of bits minus 1. Output buffer. Yeah. So chip select goes down. 15 bits go out. Then we get our command bytes and you can see that the decode here is now showing the incorrect command 62 we were getting over here. I mean to the bit. The card is still not responding. So what else could be going on? I did find some references to needing pull-up resistors which I have enabled but maybe maybe they need disabling again. The biggest difference I can see is the way that Mozzie gets de-asserted between bytes. But the card should only be sampling on the leading clock edge. So when outputting data, the microcontroller on the on the trailing clock edge will set the bit and then it will leave it until the next trailing edge where it will set the next bit. The card waits for a leading edge on the clock to sample the bit. So this means alternatively on each half clock we set the bit, sample the bit, set the bit, sample the bit and so on and everyone's happy. So this is what the clock phase stuff is all about. However here I can see that this is exactly the same behavior. Trailing edge is when Mozzie changes and the card will be sampling and leading edge as usual. Okay I did see that reference to the SPI mode bit so let's see if I can figure out what that meant. So these are the two bits that control the clock polarity stuff. Okay pH, zero. Data is changed on the first u-clock edge and captured on the following edge. That's not the behavior we want. We want this one, data is captured on the first u-clock edge and changed on the following edge. So this is what this is setting up here. This one I'm intrigued about. Three pin SPI. I think we are on three pin SPIs. These three pins. Clock, Mozzie and Miso. I'm not sure what four pin means. Let's have a look. Yeah this allows multiple devices on a bus to originate clocks. Normally on the SPI bus the master emits all the clocks and the slave returns data on the clock pulses emitted by the master. So I can tell by the the traces that we are actually emitting the clock bits in the right place. So this is one, two, three, four, five, six, seven, ten, 11, 12, 13, 14, 15, 16, 17, 18, 20 bytes. So that's 160 clocks. That should be sufficient to initialize the chip. And in fact, if we go over here, this stuff along here is emitting exactly the same 160 clock bits because this is coming from the same code. The card itself is being powered off the 3.3 volt line, which is correct. It's a 3.3 volt device. If I peer at the MSP430 board, the only power line is just labeled VCC, which I believe is 3.3 volts. There is a 5 volt line next to it, but that's not being used. So I think we are powering it off the right thing. It would be hideously embarrassing to discover that the card was not actually powered up, which is why it's not working. Well it's worth checking with the voltmeter, so I'll just go and do that. Well that was worth checking. It was in fact connected up to the 5 volt line, but I don't think that makes a difference. Let's hit the button. No, it doesn't make a difference. I mean SD cards are safe to use at 5 volts. So here's our capture again, set to 5 volts, and it is not at all different. Initialization, command goes out, nothing comes back on me so. So what could be different? We're getting our bytes in burst, but that shouldn't matter. It's an asynchronous protocol. Well I do not like this piece of code. I know it was appearing in the MSP430, but I can't actually find a reference to it in the spec. And if I look at the actual initialization code in the physics source, we can see here, so we raise CS, we switch to slow mode, we send 160 clocks, we then send command 0, and if you look at what command 0 does, so we raise CS, where CS was already raised, this is happening in this gap here, we send 8 clocks, which is this one here, we lower CS, which happens there, and then we get this. If it's not a command 0, then wait for a FF byte. That, I think, is what's happening here on the MSP430, but it's no longer happening here. So let's just see if there's, this is something new, change 7116, how this works, support SD adapters without pull-ups. Let's see, right, so this only does, this change only does the wait if it's not command 0. And we are indeed a, we don't have any pull-ups because I just turned them off up here. One thing we could try is just changing this back, but I think that's basically what we just did by modifying, by modifying lower CS. I do need to be careful changing this code, as this will apply to all users, not just us. Well that still did not work, but anyway, you can see, uh, chip select goes down as a pause while we think, then we clock out 8 FFs to try and read in a response, then we write the packet. Let's put that back the way it was. I will admit that I'm rather surprised at these gaps. This seems to be code being executed in between bytes, and this device should be a lot faster than that. So I'm wondering if there's something wrong with the NAND cache code. I mean between here and here, let's turn these back on again, is 64 microseconds, and we know that we are running at 80 megahertz. So that's 5000 clocks. I mean it's not doing that much work, so I'm wondering if these things are wrong. Well we can always try, I know the second parameter here is speed, so let's just try randomly changing it. So this gap here was 64 microseconds, so let's try this from here. Interesting. 28 microseconds. 17 microseconds. That's faster. I can't honestly tell whether the gaps are noticeably smaller, and I cannot seem to duplicate a session, unless there's a way to do it here somewhere. Apparently not. I miss programs that had menu bars, and a user interface you could understand. So what's this one? So we're at 17 microseconds, still 17. Yeah I think this is a mode enumeration rather than a any kind of speed thing. See SPI flash config seems to be a frequency value. So I don't know why this would be four, because SPI interface is usually is usually either zero to mean the the main SPI, or one to mean HSPI. Well SPI flash config is 4568. Of course now we have to wait for it to find all the fours. That's better. So this is comparing A2 against zero. One, two, three, four. So five would fall through. Yeah I have no idea what that does. Now the other thing of course it's worth doing is just to turn this code off completely, and then see what the timing there is, because maybe the default value is the one I want. 32 microseconds. So setting it is actually faster. So we might as well leave it as that. The other thing that can be done is, so in this code, we're waiting for the interface to stop being busy before and after every operation. So this means that we're never doing any transfers in the background. Now when transmitting, we don't need to wait. We can just go ahead. When we change clocks, we do want to wait to make sure that anything got sent. Likewise, when we raise or lower CS, when receiving, we need to wait until the transaction has finished before we can read out the value, not that we're getting anything to read out. This code here can be dramatically optimized by using the bold transfers. So that doesn't look any different. Yeah, still 17. So that's not really helping, but it's at least something. Well, once again, I am stuck. I cannot think of anything that could possibly be wrong here. My great idea of comparing the MSP430 trace against the ESP8266 trace hasn't really come up with anything useful. I'm just looking for any config things. It's just got just config sd. Yeah, this, by the way, is a really good set of documentation as to how to initialize SPI devices, which is a horrible mess. So let's just work through this to see if we're doing it more or less correctly. So software, here we go, power on, set SPI clock rate between 100 kilohertz and 400 kilohertz. We're at 400. So set di, that's data in. That's what the card calls Mozi. Di and CS high and apply 74 and more clock pulses. So Mozi high CS high. Now Mozi is dropping between bytes, but that should be fine. The card will enter its native operating mode and get ready to accept native commands. SD cards support two different command sets. There's the native ones and the SPI ones. We want to use the SPI ones. The native ones allow parallel transfers and the faster. So send command zero with CS low to reset the card. Don't know what this sentence means. I think that should be a when. If the CS signal is low, the card enters SPI mode and responds. Well, we're never setting CS high. So if the command gets there, it should be sampled. I do wonder, do we need to try and make sure that Mozi stays high throughout all of this? If so, I wonder how. But yeah, this is the point at which we are stalling out. So I don't know. Wait for one millisecond. Well, the card is permanently powered on, so that shouldn't be a issue. Do we need to make the clock slower? I don't think so, but it's worth a try. So if I set this to like 250, it should be even slower. I mean, very slow mode, it still doesn't work. So that is 320 kilohertz, which is between 100 and 400. This indicates the speed that high speed mode should be at, which wants to be 20, 25. So we don't want to run it at 80 because that's too fast, but 10 should be fine. We could probably use four to be even faster, but let's wait until that bit actually works. So I changed receive byte to work in duplex mode. So what this does is it, rather than just toggling the clock and waiting for data to come back, it writes out the content of the buffer at the same time as it reads in. What this ends up doing is here, where we're waiting, here's a better example here, we're waiting for a response. Mozi goes high. If we did this in simplex mode, then Mozi would be low, which we can duplicate by doing that, like so. So here you can see that both the init sequence and waiting for the command response all have Mozi low. I thought this might be upsetting the card, but it appears to make no difference. Now, with regard to keeping Mozi high while between command bytes, I am not sure if there's a way of doing that. Possibly this Mozi delay thing might help. This appears to be slave related. Mozi signals are delayed by this number of cycles. I don't think that's what we want. So this is the clock configuration. If the top bit is not set, then it says it's derived from the 80 megahertz clock, which must be referring to the peripheral clock. So we have multiple complicated dividers. Okay, user. We want the read data and write data phases. SIO is when they share the same pin, which is not the case. These are for complicated parallel signaling systems. Byte order. We always want to send the most significant bit first. User one contains the length of the various phases minus one. So for us, it's eight and eight, which means seven. And because we're using the Mozi phase in full duplex mode to receive, we set Mozi in both cases. We can actually put this up here. User two contains the command value, which we don't care about. Write status. This allows you to select which of the three devices is used for the chip select. This is all for slave configuration. Likewise, likewise, likewise, 64 bytes of data. Let me get down to the UART stuff. Yeah, I am once again stumped. I do wonder whether the problem has been caused by either dropping Mozi between initialization bytes. Can we do something about that? Yes, we can actually. So all we need to do is our transmit actually, and I am going to put, I'm going to put these back the way they were because I want to copy this code and stick it in here. So sdi, sbi, raise, cs, put cs up. We want to write out 160 bits. So 159. This is going to be 32, 64, 128, 160. So that should be 160 bits of ones sent out in a single chunk, and we wait for it to finish. And over in devsd.c up in devsd discard, which I never got around to loading. Here, we just want to disable this slot. Don't think it would make any difference, but let's run that. It didn't run. Is this hung? It's hung. Fantastic. 16 megahertz, 200 million, run, reset. Well, here's our init, which did not do what I wanted. So that's just wrong. So I told it I wanted 160 bits and filled the buffer with ones. We did get 160 bits worth of delay, but nothing showed up on the output. I never configured the clock. So here are 160 output bits, and then nothing happens. So that made no difference. I don't think the grouping is, in fact, important. So let's put this back the way it was. Yeah, I've got no idea. So I'm going to have to go away and try and do some more research to figure out what's going on. I'll be very disappointed if I have to call it a hold here. There is some work that can be done in configuring the TTY. I can avoid needing to wait for the flash, but it's still not right. It really isn't. And I still think it's running awfully slowly, so maybe some work needs doing here. We have 64k of code RAM. We want to use half of it for user programs. The other half of it is the kernel, which is 48k. So it's too big to wedge into the remaining 32k of code space. I might be able to put all the discard stuff into the flash. Then we can run all that, and then the rest of the kernel runs in RAM. That's worth a look. But I don't think there's that much of it. I don't have a useful way of getting the size of functions. But we can, for example, put all of start.c, which can all be discarded into discardable memory. I wonder if I copy this over, and here do start.o, start.o. I don't want this anymore. I cannot find start.o. Just want to be just start, which is there. Maybe it does need to be like this. I don't think so. Okay, so let's do, okay. So now if I look at the map, look for discard start. This is appearing first in the ROM, and discard end is here. So all this stuff can be discarded. It's 64 rebytes, which is one and a half k. Okay, so devsddiscard.o, so that's up to 93b. See, I very much doubt whether we'll get 16k's worth of stuff here. So we could move some of the platform code into a discardable resource, but frankly it's going to be about 3k. That's not going to be enough to make everything fit within RAM. One thing we are doing is that all our read-only data is in code. So we could put that into RAM. However, this means that we then run out of space, because we run out of RAM. I don't know how big the read-only data is, but let's just do... The problem is that our RAM is pretty precious, rather not use it for read-only data. ROM nought, data start, a90-4a5, one and a half k. So 3k of code plus one and a half k is four and a half k. We can save 3% by putting the 3% on the code size by putting all the read-only stuff in RAM, because then we don't need to use the 32-bit aligned accesses for everything. But I think it's still not enough. The kernel is just too big to fit into 32k, even with the fairly dense code this thing produces. So I do know that the Dara library is, I think, it's 7k, so it's not small. No, no, it's smaller than that. But we would still need it. And we will need some extra code to make the TTY work. Yeah, I'm going to have to go away and do more research. Well, this makes two episodes in a row where I haven't actually done anything other than fiddle with the SD card stuff. I think next time I will try and make the TTY work, I can avoid horrible delays by simply configuring it to use the shell as in it. In fact, I can do that. So I just need to change this to bnsh. And then when we run it, let's try using the right comment character, it will eventually fail to mount the root because hdA is the flash dev init is here. This should be registering a block device. Interesting. Well, if it wants hdA to be the SD card and hdB to be the flash let it. No, that's not right. Why isn't that working? Well, let's put that back the way it was. I haven't corrupted the NAN files flash again, have I? I think I have. That's not brilliant. Well, I'm going to go fix that. That will involve reflashing. It's not a big deal, it just takes a while. But once that's done, I should be able to do some work on the TTY. And that means once the TTY works, you can actually interact with the system. It should be finger quotes done. There's some tidying that needs doing because there's a fair bit of hackery in here and cleanup. The biggest missing thing is the SD card. The system running out of flash is just not usable. Yeah, well, I hope you enjoyed this video and please let me know what you think in the comments I suppose.