 Nice new brother WP1 word processor. I could connect the Raspberry Pi up to the UART I am actually going to take the time to write a whole new terminal emulator Well the cursor keys don't seem to want to work in VI, but this is loads of progress Certainly less snow I'm gonna have to do something about that things remaining to do are snow fix the state machine issues character attributes and See if we can bump the board rate up After the abject failure of the video capture quality in the last couple of videos I put some time into getting direct video capture actually working So the video output from the board here is now being sent to the PSOC board where it's being reformed into something a bit more similar to CGA so that my OSSC and CGA to RGB are actually capable of capturing it. It's not quite right You can see that the text quality isn't brilliant But this is so much better than just filming the screen directly and The frame rate is such that actually keeps up with my typing, which is nice So where were we well there's some bugs you can see that deleting Text with control you Causes the cursor to go into the wrong place. That should be a fairly minor state machine Thing I believe I know what's going on there those are arrows of beeps But the biggest one is the snow which you can see at the top of the screen Now I have also bumped the board rate up to nineteen point two killer board Which makes the system so much more responsive as you can see Yeah, you can see the the dropped characters causing the columns to misaligned there Now the good news is that I think that The same problem is causing both the snow and the dropped characters which is that the Software on the brother is spending too long doing things Isn't responding to incoming bytes quickly enough and Isn't responding to v-sync pulses quickly enough So I think that I may be able to fix both of those at the same time But this is going to require some experimentation So I'm going to go over to the PC and take a look at the code So Here is our program's main loop. This is the thing that drives the way the entire terminal works what it does is it loops forever it Attempts to update the screen if we're in the vertical blanking period It then attempts to read and print a byte from the interface if there is one present It tries to read a byte from the keyboard and If the interface is writable it tries to push it it's possible for keyboard push here to push multiple bytes into the ring buffer and This will then pull a single byte each iteration out of the ring buffer and write it to the interface Now I expect these to be pretty quick as they're not actually doing much most of the time is going to be spent in This routine because it may have to do stuff like memory shuffling to scroll the screen and This routine which is going to be flushing the display It has to copy all the data from the back buffer onto the front buffer Now I believe that the reason that we are dropping characters is These routines specifically this one display flush is Taking so long that by the time we get back to here and pull a byte out of the interface the Interfaces own FIFO has filled up And has dropped bytes in addition there is another problem Let me just sketch out a bit of a diagram so Imagine a graph with time going down this way a Frame extends from here To here The vertical blanking period Is here So all of this stuff down here is data We can only update the screen in the in during the vertical blanking period which is here and what we expect to happen is update starts Update returns However We when update returns we finish this routine We go through to the end of the loop we jump back to the top and Hey look We're still in the vertical blanking period so it runs the update again Which is because it takes so long It overlaps into the actual display period and generates snow And there is another another possibility Which is that the update just does still run once but takes so long That it doesn't fit entirely in the vertical blanking period so Let us do a bit of work Here is our update routine Here is the chunk of code that actually checks to see if we're in the vertical blanking period Now What I'm going to do is add a variable Into which we are going to store the previous state of the vertical blanking period So that would be Let me think how we're going to do this So we get it fetch it into a We fetch the address into HL We load it like so So now here in the comparison code we have access to Both the previous state and the current state We only want to run dpy flush If the previous state is False and the current state is true. This means that it will run once The first time into the vertical blanking period It then won't run again Until we exit the vertical blanking period and enter it again so What we want to Do is We do a red Z So if we're not in the vertical blanking period at all, we don't need to do the rest of the logic We then want to X or the current state with No, we don't that's over that's overthinking it At this point, we know that we are in the vertical blanking period All we need to do is check to see if the old state is False Wait true. So a here is Zero if we were not in the blanking period or to if we were so Return if zero if we're not in the blanking period So that we now know that we are in the blanking period We want to return if the previous time round we were in the blanking period and therefore C must be non-zero Okay So that might help The other thing we're going to do is speed up the actual refresh. There's several things we can do We could use the DMA engine to copy the data, which is much faster than just doing an LDI are It's more code But it should be faster so much faster that it will pay for itself But the simplest thing we're actually going to do is To remove this So what these two lines do is they advance DE to point to the next row of video memory using arithmetic BC is a HL is advanced for us by LDI are in fact, this is a waste of time. We don't need to do that Because since the loop is unrolled we can just load the correct value here which gets calculated Yeah, so we just duplicate that bit of code like this and This should give us Well, this D line is three bytes our previous code where we called add a HL was four bytes Five bytes Two to load a with the value to increment three to call the subroutine And in a reasonable number of cycles because we have to do the call and the return Which will stack operations stack operations slow. So I Think this assuming I haven't made any stupid mistakes. This should be a Big improvement So I am just going to try it and see what happens Okay, well, we're still getting snow Just wait for there we go What happens if we do this we're still dropping characters Okay, so this Suggests Hmm, so you've got three lines of snow and we're using less code. I Cannot remember how much love how much snow we had previously So was this an improvement or not? I'm going to have to go and check the video footage If it's less then it has actually helped So we can do more. I think this is going to need the DMA engine Yep, okay, I'm going to go check the footage and see how much that's improved by if at all and Let's try the DMA engine and see what that does 10 frames per second. Why is it recording at 10 frames per second? I told it to record at 30 This is bizarre anyway The resulting video does actually work even if it is only at 10 frames a second and it's showing Three lines of snow. So this hasn't made any difference at all, which is interesting Anyway, let's Go find some Brother WP one Z80 So this is the code needed to run a DMA if you compare it to what we've got here three bytes three bytes two bytes this is Many bytes but It should be so much faster that even though we're doing more work It will do the copy faster So let's give this a try and see what happens So we want to it's actually just copy all of this over here the DMA engine works with physical addresses So the source is going to be the back buffer address Which is at This offset I believe We are loaded at Yep, that's correct target physical address is Here That's where the video memory lives. In fact, it's going to be here number of bytes we want to copy is 91 so So we need to calculate the back buffer address and we have a line to do that here and then we Output it to the we configure the DMA source We then want to do the same thing for the destination address the This is the top byte of the address. So in this situation, it's a two in this situation. It's a one the Video memory is mapped so that e OOO is it goes to physical address one? OOO so we don't want to add anything on there 91 bytes 91 Actually, we're gonna do Width. Oh, yeah, and I also change the syntax Here slightly. Yeah, I found a new copy of Z Mac that does proper Z 180 instructions so we don't need the old macros anymore and Rather than out putting everything through a it's cheaper to load a 16 bit value into HL and emit it like this okay now So we start the DMA, but we then need to wait for the DMA to finish before we can reprogram the The DMA engine for the next line so we need a Loop I think These I think we want to do D stat to a and It with Oh, it's one of these flags and it with something and then branch back to dot one if The flag is set and I will just go and look at that up Actually from looking at this code When the DMA engine is in burst mode Which has got this M mod bit in the Mode register is this one set to one Then the CPU will halt Until the DMA transfer is complete So I don't believe we actually need any logic down here however We probably also want to control the wait states in D control Because I don't believe we want any and if there are and by default you get some and I believe this will slow things down a lot. So Up here, we only need to do this once We load a with you see these are the wait state registers Number of weight number of weight states introduced to CPU or DMAC address cycles Wait insertion bits, so I think you want those to be zero DMA sense bits. We're not using external DMA. So that's irrelevant mode bits This is for memory to IO transfer modes. We're not using those either. So that's just zero a and write it to D control and I think I've got all of these backwards Yeah, I always get the order of parameters mixed up. Yes, that's better Oh, I've got to change this one. Oh, it's not D control, okay so Let's see if this is any faster or indeed if it works at all Turns out that the wait state stuff was wrong and I do need some wait states. Otherwise nothing happens at all But it is doing something. It's not very useful So I think that It's copying the wrong data onto the screen, but it is at least doing something and the terminal stills Still works. You can see the the cursor move However, there's still some snow Which is annoying. We only have one line of snow. So this is faster So I Need to try and speed up the DMA stuff a little I know how to shave a few cycles off I think I'm gonna have to do some experimentation with the wait states To see what we can get away with Nope that hasn't worked We now have only one line of snow using one wait state, but if we go down to zero wait states then the system crashes and I still haven't figured out why it's writing bar eyes to the screen So given that This basically isn't working We just don't seem to have enough time in the v-blank interval to update the entire screen We're gonna have to be a bit cleverer and Update the screen over multiple v-blank cycles And if we're gonna do that then there's not really much point using the DMA stuff Which I can't make work anyway. So let's just go back to using the LDIR stuff also the way the screen is Flickering every now and again is a capture issue There's something not quite right about the about my OSSC settings the OSSC doesn't really like this really strange screen mode Okay, so we back out all of our code here by holding down the undo key All right, let's just go forwards a bit so that we have our new Only run the routine once code and Let's grab the the update Code Okay, and this is the point where I start to work on the DMA stuff right What we're gonna do is We are going to track the dirty state of each scan line The number of routines we have that actually draw on the screen is quite small So that's fairly straightforward and then in our flush code We are going to test to see if any individual scan line is dirty and only update it if it is dirty and Also give up if We see that we're not in v-blank anymore Right, this does mean that the last update is Going to Overlap with the end of the v-blank No, we're going to have to be a slightly cleverer than the than that What we're gonna do is Is We are going to track we are going to keep track of the Number of scan lines the number of rows that we're actually going to update So that each time round will only update so many rows To keep the redraw time below the limit so it all fits in the v-blank so Let's go for eight because this routine is actually going to be slower So our redraw loop we are going To just thinking of what we're going to set up. We want to look at Okay, this the Z80 doesn't have any decent indexing operations or any comparisons Because I want to work through the dirty buffer until you reach the end But I want to avoid having to do a 16-bit comparison, but I don't think we can So This isn't annoying. Okay, I'm going to do this thing unrolled so We're not updating in order anymore therefore we can't rely on HL being Automatically set so we have to we have to set it here so that every row We are going to Get the address of the relevant slot in the dirty buffer Load it if it is zero meaning the row is not dirty We jump forward Otherwise we can do better than that We unset the dirty bit We then do the copy We decrement the number of rows left to update and give up if zero This needs to be a local Label I can't remember how to do that local to their file Macro labels Labels starting with a dot a temporary reset whenever a non-temporary label is used a Macro call expands the text of the macro's body Which each local simple question mark SK? Can I do that? Oh, or I can do something simpler Let's try this. Nope. That's not working Rept is obviously not really a macro. So we're going to have to Make one by putting all this code in here Like this. Okay, and what does our listing? It's more than what I wanted that was what I wanted. What does our listing say? It's definition of the macro invocation of the macro so Load address of dirty buffer plus line number load it into a skip to yep a One which is here cool We don't have to unroll this we could use actual code we could use a loop, but that would be slower, and it would use up I am It just be generally more annoying Alright, so when we now need to set the dirty buffer so print a Actually, let's not do that Okay so print a We are going to dirty the row This touches no registers other than HL HL and a The input character is in C. Therefore that is safe This is going to want to do the same thing we haven't done that bit yet move the cursor Delete a line. We can just do that so This is copying mold. This is moving multiple rows. We're going to have to Dirty multiple lines So that's going to be so this is going to Set the bit for line Y and leave the appropriate address in HL Okay, I need to figure out how many bytes we want to update So to delete a line It's from here to the end of the screen. In fact, the same code is going to be needed by this so let's do Dirty to end of screen so the so We put that in B Because this then allows us to do right, so this will set all Dirty bytes from the current row to the end of the row to One so and again, we want this here Dirty to end of screen Clear a line again. This is just going to be dirty row This is used to clear everything. So that should be everything we need so Let's try and see what happens. Well, it's a nice steady screen no snow That was a There's a capture dropout. So let's try a scroll Kind of working to be honest Yeah, okay Right, there is snow when it does that update So maybe I need to reduce the number. We are doing quite a lot of extra work between each scan line and There are fewer dropped bytes as well because we're doing less work Let's just try clear Row that didn't work at all interesting that we get about a second worth of snow after each call So I think it's trying to do something Right. Well, I think I've broken clear line No, do not save Vi. Yeah, that's upset too Okay, I think there are some bugs But on the whole it's mostly working So let's go and take another look at the code Yeah, clear line failed because I called dirty row that Modifies HL. So what this is doing here is it's writing lots of spaces to the dirty buffer That's never gonna work. So we actually want to put that up there so Think that's safe I don't want to decrease update rows too much to be honest Because The lower this is the higher the number of frames it will take to to update the screen a frame We have sixty two point five frames per second. It's actually pretty quick Let's Let's set this to one and see how bad it can be I think so Was I oh, yeah, the other thing I wanted to do was to clear the What's the set to the dirty buffer on? first Run so I thought that happened in TTY in it actually Yeah, do ED all Clears the whole screen it calls clear L for everything, but of course clear L was broken So, yeah, okay. This is this ought to be this ought to be working now Okay, how's this behaving? Yeah Let's try clear Yeah, you see updating one row at a time scrolling is slow and kind of weird Yes, you can see how strange things happen as it updates part of the screen at a time So we definitely need to be faster than that on the other hand it does not appear to be dropping bits nano works more or less Okay, well whatever's going on there is not Pissed to be not updating Not scrolling that part of the screen No, let's try vi Let's try vi with a What's a decent sized file see local There we go. Yeah, that's that's very unhappy. Oh I know what's going on. I know what's going on Or do I did I remember to set the tty size? I think I actually have You have to do this every run stty rows stty calls I Cannot type Okay, I cannot remember the command to get it to tell me the screen size Also, I cannot Reliably type in this thing while staring down that little phone screen which is down here Which is what I'm actually seeing the output on at a certain amount of delay Okay, so it's better. We're gonna have to up the update count But it's definitely plausible and with a reasonably steady image No snow even when it's busy. That's a good thing here it is with update rows set eight as it was before and It's oh, there's snow. Yeah eight is a bit too optimistic But it's definitely working excellent and You do need to fix some of the escape codes you can see the the man Prompt at the bottom is truncated at the beginning. Oh, I see missing characters Cursors Well, actually, it's not it's not the end of the world There's a simple fix we can do for that So we've got this instat here that tests to see whether the interface is writable well What we're gonna do is I rather want to inline this to avoid the call ret overhead Right, let's move these into Constants we then need to go to interface and Max the constants Yep, and what we're gonna do is go here to display and every iteration through we are going to see whether the interface is readable and Give up if it is So what this will do is if it ever sees the interface is readable it immediately stops Returns to the main loop Where we read the byte from the interface and then we continue with the redraw The next frame it will glitch the It will glitch the redraw six actually a little But I don't think anyone will notice this So Yeah, we're still in the man page control L to redraw That's not great Yeah, what this is doing is it's drawing a few is updating a few rows Then a character comes in so it stops redrawing The character is processed through the state machine This Causes the screen to scroll which dirties everything Therefore when it does the redraw it starts again at the top of the screen So that's not brilliant. I mean it's producing correct results It's just not a very nice user experience So in fact we want to prioritize redrawing the screen over Running the TTY state machine But we can do that By adding another ring buffer