 Okay. So good afternoon or good evening. It's night already. So my name is Julien Martins. I'm going to present this part of arcade game for the ZX Spectrum. But this is applicable to any part that you want to do to a micro machine and that you can eventually implement for other systems. Okay. First question. Who played on a ZX Spectrum? Raise your hands. Okay. Who coded in ZX basic? Okay. Who coded in assembly? Okay. Fine. It's going to be painful for some of you that didn't raise your hands. Okay. And by the way, who did program a game? Okay. Cool. So we are going to these topics. I will go one by one. There are several slides for each and we'll go through all of them. Okay. So how do you choose a part? Okay. The first thing you need to decide is what's your target platform. In this case, I chose as a ZX Spectrum 48K. It could be anything else, but this is the one I chose. Then you need to select a game, whatever game fences you. Okay. Probably something you like to play. Then you need to select an origin because that particular game can have been implemented in several other platforms. For some of you, this might feel the other way around. So it should be like 321 and not 123. That's the logic of someone that is going to do a port for profit for something that already exists. For the hobbyists, that's the actual idea because your hobby is about a specific microcomputer, so the order is usually 123. Okay. After you selected this, you have to be sure if it's possible to be done or you can try and you can fail. Okay. Regarding the target platform, you really need to know your platform. This is not just about software. You need to know the architecture of the system, of the machine. Because that plays a really important step in the conversion. So basically, you can throw everything away that you already know about developing a game. Like for example, you might say, okay, I'm going to clear the screen or something like that, and your computer probably doesn't allow you to do that. You have to think differently from what you do on a PC currently. So you have to determine all the hardware limits experimentally. What do you need to do? You need to at least develop a game on the system so that you can get the feel of what it really hurts doing a game for that system. Then you know all the problems you get. By the way, you should develop another game. Why? Because you learn valuable lessons with the first game. So you should develop another game before you do actually try to do a conversion. And what's really difficult is that you really need to know your game. What do I mean by this? There are probably even features in the game. You need to play the game to actually find them. You need to get the feel for it. I mean the game, OK? Because it's actually playing that you know how a game feels. So is it a fast game? Is it a slow game? It's a puzzle game. It's an action game, whatever. And very important is also you need to watch others play. Because you have your own vices while playing a game. And sometimes you play with a friend or you see a friend playing and you see different things that you never saw in the game before. So you should watch others play. And you should play with others also. Why? Because they can, for example, let's think about the fighting game, for example. Playing with a different opponent, a human opponent, will give you new insight so how to play, OK? Or a puzzle game in battle mode, for example. So what is game essence? What is important in a game? How well do we know a game? So I bet all of you know this game, right? What's the name? Obviously, right? So do you actually know this game? Do you know the names of the ghosts? Do you know what each ghost actually does in terms of going after you or going to a different pattern to follow you? So you probably don't know everything about this game. So you need to study a little bit more. This is, obviously, Tetris, right? It's written in there with the reverse R. So is this a name that you can say you completely know? Is it completely random? Is that long bar always when you want it? Probably not, right? So as you can see here, there is a different distribution of parts. And is that by mistake? Is that by chance? Or is it a specific designed feature to make the game simpler or harder? So you need to go for that. Anyone knows this game? Puzzle Boba. Yeah. So, or bustle move. That's another commercial name. So what do you know about this game? It's a match tree, right? It has some speed increase eventually. But what are the patterns of the bubbles that you get? Do they have a specific pattern? Is it the same when you play in two players? What's different about it? So you have to start to look at these games and find out what's different about them. Who knows this game? Part time. What's that? It's the boss of the first level, right? There is a way to beat it. And that way is slightly different depending on the implementation of the system where the R-type was actually ported into because they didn't always use the same source code. Okay? So the game that I proposed to port was magical drop two. Who knows the magical drop two? No one? Okay, great. One point for me. So I'm gonna show you three clips for magical drop. It's a puzzle game like puzzle bubble, but it's a match tree game. So you join three colors of the same stuff and you will see it pop, okay? Now, take into attention that the left side is a human player. On the right side is a CPU player, okay? So let's watch that. Have you noticed anything in particular? Yeah, there is an attack pattern when you play fast enough. And did you notice anything else? Let's see the second movie. Do you have a jack? Yes, you can plug it in here on this side. Does it reach? Try it here. No, here, here. No, fine. Here? Yeah, let's do it. Yes, I get 20 more seconds, right? Okay, let me see if I can continue. Okay, so let's see the second video and I'll try to pay attention to what is happening into the sound now that we have a better sound on the left player. This one? Got it? I think sound is not... Oh, you could just use your mic on the speaker. It's actually heading out. Did you notice or were you looking at me? Do I need to repeat it? No, right? Yes? You should pay attention. Okay, so I think I passed one too much. Okay, what did you hear? A guy actually bashing a lot of keys, right? Trying to be really speedy, right? To beat the other character that was exactly the same. Right, so last item, take a look at the computer now. That's not fair, right? He's getting a lot better patterns than me. That's specific to that character. So it's important for us to know our game. So what's the essence of this game? Can you only phrase it in one word? Speed, okay? But we are porting this to as a next spectrum. So we are kind of maybe, okay? So what are we up against? What is our challenge? This is the Neo Geo, the arcade version that I chose. It says all these big chips in there, right? And besides that, this is the mighty spectrum. Doesn't look so mighty when compared to that one, right? And even worse, that thing works with cartridges. And each cartridge adds a few more chips. So you're going on a huge battle, right? So what's the actual specs of the arcade versus the spectrum? The first thing that pops to mind is that we are one CPU less and they are actually using a faster Z80 just for the sound. It's not going to go away very well, right? And the video has problems, right? We have a lot less resolution and a lot less colors. We might not dismay. And so the first thing is that we are going to look at the, someone is ringing me. Okay, this is the architecture, very quick overlook of the architecture of the ZX spectrum. And you have here the PAL video encoder. So to generate the video signal, you have the crystal for the PAL color carrier. Then we have a crystal that is used by the ULI to generate part of the video and it's divided by four to give the clock to the CPU. So here we are at 3.5 megahertz. So we got a big loss in clock cycle, right? It could go to four megahertz, but in the spectrum is at 3.5. So the big problem is that when the CPU wants to access high RAM, those 32K, it can do that exclusively without any problems, without any interference. But when we are generating the video, the ULI will repeat this every 20 milliseconds because it's 50 times a second. And we'll actually have to share the bus, the Dress Bus and the Data Bus with the CPU. But since the ULI is greedy, what it actually does is it stops the CPU on its tracks if it sees an access to the same address. So it's actually doing contention to our CPU. So we are actually losing clock cycles that could be very useful. So we will eventually look like that when the ULI is actually putting the video output. Okay, so if we are going to do a game, can anyone tell me what are the cycles of a game loop, for example? Usually do what? Clear screen, draw screen, and process game logic, right? So the question is, okay, clear screen. Can we draw a full screen buffer per frame 50 times a second? Some guys already hurt. So we have this resolution, right? And you have this pixel buffer size, so the resolution times, divided by eight times 192, that's the number of bytes you have to push every screen to fill it up. And we have the colors, which are a different bank on the side, which has 668 bytes. So in total, that's a big number for us to do every single frame, okay? Now the question is, as I was asking, is this possible? So we have Paul actually doing 50 hertz refresh frame rate. So that means we have 20 milliseconds per frame, and the ULA steals some of it from us, okay? And then we have 70,000 thicks per frame that looks like a lot, but the Z80 uses the minimum instructions, uses four ticks or four clocks per instruction. The fastest copy instruction is an LDI, takes 16 clock cycles. So in fact, we are only left, even when we use the trick of the pop push, not poop, sorry. And requires some more instructions, but copies two bytes. That would actually be very close to that number, because it's two bytes, so 3,333 instructions would get me very near that value, but then I wouldn't be able to do anything else, right? Okay, so you don't have enough time to do a full frame. Okay, so what are our screen update options? What can we do? Ideas, we can do like project management. We start cutting, right? Reduce screen active area, okay. So if we just draw part of the screen, you are faking it, right? You use less CPU. You can do partial updates, just update a small section of it, okay? So less CPU. And if you look like at this, you'll see that there is, the field is filled in. And the idea is that there is another thing we can do, which I call colors for dry fact. What does that mean? If everyone knows, I think, for the ZX Spectrum we can only have two colors for each character, right? And if you put colors on it, you'll notice there is the one that is really weird, because now it's gone. So it's everything black, right? Because the ink and the paper, the two colors you can have are identical. Well, you can use that for color for dry fact, right? So if I paint everything the way I want it, I have a game that I don't need to change the pixels, only the colors, lot less code. But this looks like a 1982 game, so you might do a little change, make it look better, some colors, and then some lighting, some more lighting on there, and then we can do a mix of everything, which is what I've just did, and we get a game similar to this, right? This was the goal. Okay, graphics challenge. And I'm gonna have to speed up a little. So the arcade has that resolution, which looks smaller than this one, but this is actually fake, because we only have this resolution, the rest is border, right? So we only get that. So that means we have to do some magic. When we need to have a screen that has a lot of pixel data, we might have to glue our pixel to the border and use the border as fake, like it looks that it's bigger, okay? So these are just experiments to see if we could do the main graphics for the game, and actually they look similar enough to be a nice game. So this is the one-player game, single player, sorry, and this is the two-player game, okay? These are some nice graphics that were needed to be converted for the characters of the game, but they will have a further impact further on, okay? So a port implies you have access to the source code, the original source code. When you don't have that, it's not the port anymore because you need to reverse engineering everything. Everything is hard to come by. You don't have a clear idea of what everything is, the code is actually doing because you don't have the code, right? So reverse engineering is actually, when you have, will not have access to original code, and everything is a pain to actually get to the most simple things. You have to try and try and try again, okay? I'm gonna play another small video. Rules for playing the game. So what do they tell us? Did some match tree game and only give us the basic mechanics. Why? Because this was an arcade game. They want your coins. So you need to spend some coins to learn more about the game, about combos, about attack patterns, about scores. You can do different plays that the same amount of bubbles popped will actually give you, will actually give you less points or less attacks to the enemy. So these are all the characters. I've been able to define what these things are. These are the power-ups, but I haven't quite figured out what, if this is just a way to say this is a better character than the other one, or if it's something more and changes the gameplay. I haven't found out it yet. But what I had to do was I had, actually to play this game a lot, so that I could find the attack patterns for each of the characters. And I found out that it's a eight by eight, sorry, seven by eight pattern that repeats for each one of them. This was hard to do. This is not good. So what compromises did we have to do? So we don't have the background because we can't handle that. We don't have distinct patterns. For example, this is the exact same players, but when switched they change colors. Maybe we could that if we have enough memory. And we don't have characters as backgrounds of the game. So we cannot do that because we are faking it with color instead of pixels, right? So when we have this completely distinct thing, but at least the gameplay we hope will be the same. So going back for the color for dry effect, what boost does this actually give us? So this is what we were used to have in terms of bytes, right? But if I only process the colors, I have a boost of eight times. So I have a free time for computing game logic. And I have less contention with the ULI, so some more cycles that I could eventually use. And for a 50-hertz palm frame rate, I can actually do 50-hertz if I'm careful with the code and optimize it. Okay, tips and tricks. This is the hard part, okay? Okay, which data structure should I use for that? Yeah, it's an array or a vector, right? A grid, but that's what I mean. So to represent this, we need to decide so should the array or the grid be in memory like horizontal or vertical first? Does that influence our code? Does that make ours code faster? It probably does, okay? So as you notice in the videos, the animations are pretty much vertical. So that's a good choice to be vertical because with a simple increment, we'll go to what we want instead of doing ads to go to the other side. And another important thing is that we need to do color match search on that array, okay? How do we do color match search? We start on one place and then we have to check every other adjoining area, right? And then we see there is another color there and we keep going, keep going and we'll pop everything that we find and then we roll back and we continue on that side. This is how we do a color match and notice that I marked it with white. That means you need to mark it so that when you come back, you know, okay, I've been here before, okay? Okay, so what's simple C code for now? I'm not going into assembly right now, okay? So it's a mark and sweep kind of thing. So if at that position, there is something of the current color I'm checking, then I will go and check the left, sorry, the first, the top, the left and then the right and the bottom. But what happens if you look at those spots on the corners? If I go to an array and try to visit something outside the array, what do I get? Crash, not garbage, really crash. Garbage you get on a PC because it's protected, right? Here you're gonna get the crash. So we need to somehow check that we are not doing that. But if we do that, we need to do something like that, right? If that position is still valid, and then we do the same, right? And the smart guys here would say, what? You're going there, you're right. So we could do a refactor and just put valid on top, right? The problem is that this implies that I need to do an extra check. Can we do better? And that guy is on the spot, okay? So the idea is that if we do, if we find a way to avoid the edge cases, what can we do? And you cannot speak now. Kind of. So that one was also right. So we can create what is called a fence. So an area outside that is safe because it's still part of the array, but you only handle the inside, okay? So even if you go there, no problem, it's still in the array. So if we have any pattern, and the best way to explain this is like, this is kind of an island. So if you have water all around, you cannot go anywhere, right? So it's safe. And that's basically what it is. You put water all around on an island and we'll never get out of the island. Sorry? Okay. Okay. I'm going to do a quick recap of common optimizations and I'll do a comparison between C that most people know and to assembly, okay? So when we do tail call optimization, is that if the last thing you're gonna call is another function, it means that when you call the other function, the other function comes back and then you go back again, right? So we have something like this. We call and then it returns, takes all these clock cycles and then we return again, right? So instead we could do something like this. We comment the return statement and just jump because that function will do the return for us, right? Okay. And we saved 17 clock cycles just in that optimization, okay? Next. Okay. Move invariants out of the loop. So what we have is that delta equals base plus offset that doesn't change anywhere in the loop, right? So it's out of there and outside. That's most typical optimizations that are current knowledge, right? We can also enroll loops. So we can substitute that for loop and just do a simple dump. And then we can optimize this a little bit more because we know that all the results of these instructions, right? And we can do that. Something else we can do. Enroll in multiples of n, why? Because if the e on top, the limit is 3000 or some huge number, we cannot enroll everything, right? That's crazy. We wouldn't be able to do it. So we might enroll in blocks in n so that we can do optimization. And what do we gain with this? We do less code controlling the loop cycle, right? This is the standard stuff. And we can always do inline functions. Does anyone know, does not know what an inline function is? Everyone knows what an inline function is? Yes. All right. So what is a portion inline? So an inline function basically is just to replace that function called with the contents of the function. This is actually a representation of the function to calculate the y vector for the screen. And if we know that in one in eight times, we'll get that, sorry, seven in eight times, we'll get that first if. I can avoid actually doing the call. How? Actually putting the if on the code directly. And then I only call this one once every eight times, right? Another optimization. And this is the big mother, right? Self-modifying code. We need to demystify this because you can do very complex stuff with this, but the main part is very simple. So what this means is start code at 6,000. And then we define a variable that has that value and starts in 6,000 because that's what we said. And when you hear we have an instruction that loads a register with the contents of that variable with the contents of an address, right? That takes 20 cycles. But if we load an address for the HL register, it's a special case, it only takes 16 cycles. But if we are going to do this in a loop and if that thing, the value that we want is not actually changing or we don't have enough registers to keep it, we can actually put an instruction like that with a constant. But that constant is changeable when we start the function but not within the loop, okay? So what we can do is actually define an embedded variable which actually points to the thing of that is in the middle of the instruction. So the load HL has this representation but when I use this dollar, what it means is that current position in memory plus one. So my variable will actually share these bytes with instruction, okay? This is the main simplest self-modifying code you can have but look at the difference, sorry. Look at the difference here, 20 cycles and 10 cycles, okay? Okay, loop update optimization. This is something that I came up but I don't think it's new. It's something that everyone that actually gets into the problem will find out, I think. That is a for loop consists in three parts like four, right? I need in it condition and update and then you do the block code, right? So what you usually have is something like this, right? So you start initialization, identify the condition and you update the control variable, right? And then you do something inside. But that's the simplest case. You can do something different like calculating something that is more heavy, right? So can we optimize this? What's the problem with this? Anyone knows what's the problem with this? You might not, no, you will not need the last Y, okay? The last Y you calculate, you will not need it, okay? Why? Because you calculate the next Y and then you check the control condition and you calculated I, the number, the counter but you don't need the Y and you just saved a few instructions. It might not look much, but I'll give you an example, okay? So imagine this is a sprite. It's composed in these lines for each byte is a line and then you want to mirror this into the screen in reverse, okay? So what do you have to do to make this happen without duplicating the data? You need to actually access it in a different order, right? But you can do it in two different ways. So if you look at the bytes in memory, they are in order. So you can do what? One pointer runs to one side, the other pointer runs to the other side, right? So you can actually change the order. You can actually write on the screen in the reverse order or you can go in the reverse order on the sprite. It's up to you. But on the case of the spectrum, it's better to reverse the order on the sprite data, okay? Okay, this is the last example, I think. It's a huge one and I will explain it in assembly, okay? So RP Blit is a function that does pixel blitting in reverse, exactly what I mentioned before, right? And this is a data structure that I'm using, not very important except for the last part. And I'm going to show you that the HL register points to that position and the destination register actually points to the screen pixel address when we get to that code on top, okay? So what we do, we calculate the pixel address by actually reading from there. And then we save the pointer for later. We calculate the move the pixel address to the HL register. One more. So now we decrement it because for the next time we need to go to the next pixel address. And then we calculate the pointer. We increment the pointer again because we want to point now to this one, the HL register actually moved, right? We need to calculate that. And why do we need this? Remember, the ZX80 doesn't, sorry, the ZX80 doesn't have multiplication. So to optimize this, we can have the sprite data already composed with width and height multiplied so you don't have to do that map, right? So we need to actually get that so that we can get that value to add to go to the bottom of the sprite data, okay? And now we actually notice this, that RP bleep W, that's a variable. But that's a variable that's going to be self-modifying code. Take note to that. And then we do some more math to calculate the width and the swap, swap that data again and then decrement again to go a little bit further, okay? So the current destination register now has this data. Screen peak solid rest plus size minus one, okay? And then we actually load the bleep height from another variable. And now, so this part is the actual init function of this four cycle, remember that? So now what do we do? This is auto-screen, hmm. Okay, I can tell you what's in there. It's a jump to an instruction that is down here. It's not very problematic, okay? So it's a jump to this address. So what is this? This is the body of the four cycle, okay? And then we decrement the variable that controls the cycle, we test it. And if we have still stuff to do, I'm gonna jump back on top, right? And then we are going to do what? Do everything that we need to calculate the next Y, right? I'm going to speed it up a little bit. And here you can see the example here, partial inline of function increment screen Y, okay? And if it's the special case here, we will not do the jump, and we will save some instructions, basically all that is in the call. And beside that, there is a trick here because it's cheaper to actually decrement and then increment when it go through this route because it takes four cycles for each decrement instead of a jump that would take 10, okay? Four plus four, it's eight, so that's faster. And then we do some more processing, trying not to make you guys sleep. And here is the instruction where it's being used again as a variable, right? Dynamic code, there is that little guy there showing there is a possible case of memory dysfunction. And then we do some calculation to get back to the right position on the sprite data. And now you can see that all this is what we would repeat on the last cycle if we didn't need the Y variable on the last cycle. Because on the last cycle, we will exit here and we will not do all this. And at the first time, we just jumped from here to here to avoid that because we don't need that in the first time, okay? And now, I can show you a demo if I'm still allowed. If not, questions? Hurry up. Hurry up, okay. The building is closed. Oh, that's fine, we'll stay. So, let me, I need to, okay. I tried to load this a while ago to see if it would work. This is the game, the video link is on the page, okay? This is me playing the game in the spectrum version, okay? The idea is if you can see there is actually, you see the balls going up and down, although the video codec is actually messing that a little bit. But you can see that the application is actually running at 50 hertz. And with some time to spare at the end, okay? This works in single plays when you have more columns and you have also the two player play and it works like this. And that's it. Thank you very much. Any questions? It's a huge short scope and it's not worth it, although I got a lot of the screen graphics as a base to work the graphics on the spectrum. But the code is huge, there is no documentation. Well, there is, but you're gonna take some time to actually do that. It's possible, but it's not easy, it's hard. Thank you.