 Hey guys, welcome back. It's good. So an episode for topic today printing stuff. But anyway more importantly Do you guys ever use this utility called cat? Basically, it lets you print stuff to the screen Yeah, it works just fine. The problem is if you use it You're officially a boomer because there's a better version that we just made a couple days ago I made called nyan cat Which is the same thing in every way only it's like 30% less legible because everything is rainbow colors So you want to see how that works? We'll talk about that in this video as well as everything that goes into it and the cool thing is is that Like the Linux version of cat. It's like 140,000 bytes. This one is like 800 or less And we didn't even code golf. That's just what it comes out to be. So that interests you check it out at the end of the video So let's go through the very short of the slides I have and so The first thing and pretty much the only thing that we're going to cover is this Right syscall and if you can't see this on the bottom right, that's the 105 But basically the way this works is On Linux and be a stator is a right syscall where you pass it in a file of scripture that could be either Screen the console where we're going to pass it and send it out or it can be an actual file That you've already opened so you can write bytes to that file this character array buffer in RSI that is kind of the address for your memory is that you want to be printing and and then Number of bytes and that array so Also prints out it returns. I should say the number of bytes that it wrote if you care about that I don't really care about that value So it's you wanted to print out a nine byte string address I think 42 what you'd do is you'd pass in the file scripture for the output in RDI you'd pass in the number the address in RSI you'd pass in the Number of bytes in RDX you pass in the syscall ID in REX I just get the syscall. That's how the process works Very simple and that's what we're going to do in this video, but we're going to make a function to do it for us Thanks plank But I want to speak about syscall a little bit more closely because I Have to make an apology video Dear that make this kind of video so soon, but we got to apologize And I apologize that the free BSD devs are almost as dumb as I am. That's my apology Yeah, so basically I never came across this because I always use the system 5ABI in which case This is not a problem, but when you assume that all registers are being preserved across every function call like we are in this series with our own ABI This matters and so what the problem is is that in Linux when you call a syscall it clobbers RCX and R11 Why that is I mean it makes sense. It has to preserve certain things. It has to use those registers for certain things But on free BSD they also clobber R8 R9 and R10 at least as far as I can tell that's what they clobber in addition It could be more. I have no idea. I have to check that but either way I put a listing in the syscall ASM for both free BSD and Linux that has an OS dependent macro basically called sys push syscall clobber clobber registers and sys pop registers and so basically on Free BSD when you execute this macro it will push RCX R8 R9 R10 and 11 to the stack and then pop them off in reverse order and on Linux It will only it will only push the you know outer two not this and not this So yeah, I changed the code to reflect that if I figure out that there's actually more registers that are gonna be clobbered and Linux for BSD I will adjust these macros accordingly and here's how that process works For one of the programs that we're gonna actually be using today and we use in the previous video file open basically you remember I had Sets of pushes and pops above and below the main code body where I was just saving all the registers are gonna be clobbered in this case I'm just executing that macro sys push and sys pop. So very simple at this point. We're pretty much done with the slideshow I'm gonna see it right into the code So I want to first show you kind of the motivation for this So I have four examples here. Let's start with example B. That's what I was just talking about. So Let me open up This stuff and so yeah All we're including here is exit and the sys call listing that will give us the the variables like sys standard out and Sys write that we need to execute this this code So what I've got here, you know the commented lot out parts. That's for a little later bit basically We're doing what I just said. We're moving that file descriptor into RDI Removing the text address. Here's the the text that we're gonna be printing out that address is going to be in RSI You know here you'd put in RDX the number of bytes if you're too lazy like I am to count this 22 bytes What you can do is you can just Compute that with NASM. So what you can do is you can actually subtract Text that address from address after text and that would give you the number of bytes between the two So you can use this to kind of count strings for you if you're too lazy to count them yourself And then we're moving the sys write call to rx calling sys call and if you execute this It prints out the string to the screen so We're done with our video. Thanks for watching. Just kidding if we go back and I want to uncomment out those lines I want to show you something as you would expect. This is a loop So we start with R15 as a million Every time we go around the loop we decrement and we jump back to the top of the loop Until we get to zero in which case we exit. So let's just Run that and you can see it's printing out a bunch of random nonsense of the screen. That's pretty quick But that's not the issue on Linux. You have a command called strace on BSD. It's called truss you can actually Count the number of sys calls being executed by a program With dash C No, now it's even slower You know the time it takes is always different, you know, I don't know why I think it's just because the console is Much of voodoo magic Sometimes this is actually faster. Okay. Anyway, you can see here that there's a million calls to the right sys call I'm not sure why it's not telling us about the exit, but oh well a million calls to write Which is what we expect we did that loop a million times But here's the thing if I go back into this other example example a and look at what's in here This was the C version of the same thing. This is just print f a million times So you would expect this to also be doing a million sys calls, right? We're looping a million times printing the same exact string. So if we We run this I should have trusted Why do I not do that? Oh, I know why You can see here obviously the boomers got involved for a bunch of this crap as well Like IO control and f stat and memory map or whatever. Who cares? But look at the right row only five thousand three hundred and seventy two Calls to the right sys call. We had a million. How did they get away with only five thousand? Well my friends Let's see. So we did hold on. Let me quit this out You get the more sick face here so we did a million sys calls and each of those strings was 22 bytes and Somehow it managed to only do that in five thousand 372 right calls Hmm. Very curious. Does that number make any impressions on you? Does that mean anything to you five oh, or sorry four oh nine five point three Yeah, is that is that not basically two to the twelve? I Think what's going on is they're buffering it. I have no idea. I didn't look this up But I'm guessing that they're buffering the output in other words. They're filling up a buffer with their Strings until it gets to 4k and then when it's 4k it flushes the buffer And let me show you how I think that's working in The last slide here. I could be wrong about this But what we just did in our assembly code was we were writing One not one letter at a time, but one string at a time I we were writing that 22 by string out three times or in a 2026 times And that was a lot of sys calls and that was slow Well, I think they're doing in C is that they're basically putting those strings in a buffer and Then they're auto flush in that buffer whenever it's full But then at the end of the program They're also pushing the buffer and maybe on new lines. They're also, you know, pushing the buffer as well Sorry, this says faster So, yeah, it's fewer sys calls. It executes more quickly. So why don't we do that and We will and we will so Let me go to the third example Buffered writes. Let's take a look at the code and also let's take a look at the Two let's look at this one. Hold on pretty cars. Hold on. So it's like one as well live. I Print I have two functions here. One is print characters. I want its print string Look at all three. Oh, also, let's look at the Print buffer flush. I should have prepared this beforehand And so in this code it is very much the same thing except we're not going to do the sys call in a loop we're going to be Using this print characters function instead Which we're including here By the way, the align this align thing it basically means whenever It basically will align that code to a number of bytes So this will align it to 16 byte boundary, which is a little bit faster when you're calling functions many many many many times That's why I have aligned this print characters function So whenever we jump to that it will go a little bit faster and also you can see I have aligned the loop There's a pros and cons of doing that, but I aligned the loop itself So every time we jump to the top of the loop it also is a bit faster. Just a prototype there. Anyway Let's take a look at these other two functions that we're using here So they actually have three there's print cars print string and then there's print buffer flush Which I'm not including but it's being included by print cares. So that's fine So in print cares, this is basically kind of a wrapper function to write things out to the to the buffer And then to the standard out actually, I think I'll first start in the print buffer flush Actually, no, let's do that. Let's start in the actual code You can see I'm not playing this video at all. I have at the top a third directive here, which is now print buffer size 4096 just like it wasn't see on this machine and so what we're doing in our headers is before where we had these two Quad words one was code size indicated the size of our binary on the hard drive and this was size of the binary in Memory once the program loads it up. I've added this value. So now let's say the code was 200 bytes long on the hard drive now when it loads up it will be 4296 bytes in memory and only these Bites will be in the first 200 and then I'll cite all these bites I mean first 100 and then down here where it says print buffer at the end of the software There'll be an extra 4096 bytes that will all be zeros. Hopefully at runtime So that's the idea That's kind of the idea of you know using a print buffer This is like having a dynamically allocated memory search sort of it's not going to it's not a variable that we're defining a priori so in the print buffer flush array this kind of manages that Vector that 4k vector in memory And all that does is well obviously does some pushing and popping to say registers but it pretty much just manages is manages that buffer and So it has its own variable a global variable mind you remember if it has a dot in front that would be a Local variable or a local address. This is actually a global Variable that we can access and just can it just can encodes the number of bytes that we have written To our buffer array, so let's say we had One of those strings from before was 22 bytes whenever we print it It would increment this value here by 22 and once we break 4096 This writes and so you can see this is the most fundamental of of things. This literally just This is like that writes this call from the beginning. This moves the address into RSI moves a number of bytes Which is accessed at that global, you know variable location It moves this right into rex executes this call and then it resets the value and this location So it's very straightforward how this works. I'm going to close that because it's a Boring now I've got two other functions here print cars and print string And all they do is what I just showed you before I'm not going to go through the code in detail You can take a look at how it works, but basically it is just loading Inputs from the input array into the buffer array It's loading rdx of those inputs into the buffer array And then it's checking if we have reached the maximum size of the array and whenever we have It flushes it so this will automatically flush the array every time The buffer is full So it's very simple then it pops the registers back and leaves And the last thing I wanted to show Was this print string now? This is the same exact command except that you can see it has two or sorry one fewer arguments only two arguments again It takes the file descriptor and an address to memory, but this involves null Terminated character arrays. I mentioned those in the previous video This is whenever you have an array that you don't know the length of Necessarily but It ends with a null byte ends with a zero byte So this way you can just write out a string that you don't know what it is beforehand as long as it ends with a zero You you can tell how long it is So this requires one less input and all it does you kind of can see it just Counts the number of bytes you see here in rdx and then it executes print cars itself So it's just a wrapper for the previous function that counts the number of bytes in the string before it calls that function So it does some work for you. I would never use this Unless you didn't know a string beforehand. I don't like string formatting like prints like printf and stuff because you do you're making the computer do work at runtime that you Could have done beforehand. So there's no reason to do that So in this case, you know, this is just for like strings that you don't know Let's say it's user input. Let's say it's from a file, you know, or whatever. That's when I would use this particular Function So that's that now. Let's look at the code that calls it So this is the same code as the previous example Again, we're moving singered out to RDI. We're moving the text address Into RSI we're moving the length of the string to rdx and we're looping a million times and You can see we're in this loop. We're calling print characters We're decrementing our counter variable r15 by one each time when that hits zero We fall out of the loop. Otherwise we keep looping and once that loops done I'm sure in the other example in which case we're using that second function that print string and this is a Nultrimated text so down here. You can say I have two pieces of text I have the old sample text as well as I have this one. This one is basically a new line character The word done Another new line character and then zero bite you could also do a slash zero to encode null bite in this as well One tip I have for you is that when you're using these slash ends make sure you use Those like back ticks and not regular quotes sometimes they break So if you almost use the back ticks, these are the top left of my keyboard you can handle a Fancy inputs like slash n and slash zero and slash r or whatever More easily. Anyway, yeah, this line basically just moves that address into RSI and calls the null term They did print function that we just made and Lastly the thing that we have to do that we didn't have to do and see is that whenever we end our program We have to make sure we flush the buffer because we don't have no way otherwise to do that Let's say the buffer is only half full it will when you're done with the program that will not have flushed So you'll be you'll be losing 2048 bytes worth of valuable Information that you didn't print out so whenever you end your program You have to make sure the print to flush the print buffer before you leave So this is what that's doing it flushes the buffer automatically or I should say like manually and Then it exits the program before you know Okay, so let's run that one You can see it's pretty much the stuff and say print out done the more important than that Let's actually look at the number of syscalls Here you can see there's a 5,372 syscalls just like there was in C So we've implemented this the same way that seeded same size buffer and everything But actually if you want to go in and you want to change this buffer you can change this buffer Let's say you want to make it Price is big that works and let's trust it You can see we have half half this syscalls So you can change that value very quickly and recomp you know recompile and re-execute your code All right with that out of the way we'll get into our final example, which was the Nyan cat this one's a little bit more complicated might take a while to go through Maybe I'll just skip the hard parts. I Think I will skip the hard parts In this case there's another value here So I'm using a print buffer size of 128 bytes and also a rebuffer 128 Again, you could change these numbers to whatever you want. I just kept them the same just for the you know example here Now this means we have more memory to define we have another buffer to work with Why are we doing this because remember cat function that was taking bytes from a file and pretty them to the screen So we have to be able to take we're gonna buffer our input as well as the output and so if you look here Remember it used to be just code size plus print buffer size in this case. We've adding another buffer, which is read buffer size So we'll have the size of our code in Ram plus 128 bytes of zeros plus 128 bytes of zeros When this is loaded into memory and at the very bottom of this You can see I have the print buffer at the end of the program and then you can say I have the read buffer Which is just Offset from the print buffer by the size of the print buffer So if you grow the print buffer rebuffer will shift down in memory When this thing is loaded not not in real time But we know before you have to compile that but yeah This basically will automatically look at itself in the right location per the size that you give for your read buffer Okay Let's talk about how this works. So there's a lot of inputs into this Program and again includes are literally just copy and pasted from other files So we are copying and pasting the syscall listing depending if you're on Linux or BST you're putting different values in here They're all like NASM like macros and and and other syntactical things We have a function called file open I think I covered that in the previous video as well as print cars We also have print and see formatting. This is how you do A Colors, why don't I open that really quick as well? Yeah, that's important actually So in this case, it's just a bunch of define. So I'm defining These different things I have and see clear screen and see reset and see bold all the different colors They're actually more colors. I just pick the cool ones and Basically all it does is based off your input you input the file descriptor and the number of the and see color or Format that you want to put in let's say you want your code to be you want your prints to be cyan in color bright Say and you pass in 25 or you can just pass in the string and see bright cyan into RSI execute this code It will print the proper escape code to color your Text out and if you care to look at how that works at the very bottom You kind of can see how the skate codes work at least how I'm implementing them You have this slash e bracket this these two bytes mean it's an escape code then you have either two bytes or three bytes worth of values for formatting or colors and This basically prints those out if you're curious you can check the code in the soy hubs depository So back to the match here That's a print cars. That's the printing function and it's pretty anti-formatting that prints out the colors read cars this is basically Just a wrapper for the sys read command that reads values from an address in Sorry from a file descriptor into a Buffer in this case the read buffer and then we have our or our flush buffer Function in our exit function. So everything you should understand as far as includes go Now the instructions so we're using what we did in the previous video to get the command line argument So remember we have to take in a file remember we're printing out a file to the screen So we have to take out the file You know out of the command line argument So in this case we're checking from the previous video the argc it has to be to if it's not just leave the program call a day You've had enough work You can't forget to use the software don't bother if you did pass into though You won't jump to fail and so now we're going to access that second Location in the argument that is the the path to the file and we're going to open that with file open The only thing that's important here is that we're opening the file But we're saving that file descriptor in r8 Now r8 would normally be not a good place to save something because that's a that's not a call a save register That's that will that would lose its value if you're running in system 5 a bi But because we're using our own a bi we can use any register We want because all registers are saved across function calls. At least they should be Then I'm using the low three bits of rbx to refer to the color of the rainbow. So we're using seven Rainbow colors here at the bottom of this file. I'm using the rainbows defined as the stock colors That is antsy red antsy yellow yellow is actually orange on my computer Maybe it's more gallery on your computer, but on my it's orange And bright yellow is more like a yellow color then followed by green cyan blue magenta and then like pink that's kind of the rainbow I'm going for here and Basically we're using rbx to access the low three bits So zero would be here one two three four five six and seven once you pick it back around to eight You are gonna get rid of those high values and go back to zero. That's kind of how I'm Accessing them the the rainbow here Okay, so that's that Now we have two loops. We have an outer loop which just fills the buffer from reading the file So basically we're going to constantly keep pulling 128 bytes from that input file Over and over and over and over again in this in this loop. So this just basically does that it It moves that file descriptor value into rdi Moves that buffer location that we have at the bottom of our code into rsi So it knows where to write the write the bytes of memory 2 then it reads the buffer It puts the rebuffer size the 128 value with rdx and it's gonna keep reading 128 bytes over and over again And what we're doing is actually in this case We're saving that return value which would be an rx into r15 and that basically what we're saving is the number of bytes read And that should be 128 every single time whenever you read should be 128 except for two times that would be When you get to the end of the file and it's only like seven bytes left It will return seven because it only read seven new bytes It didn't read 128 bytes read the last seven bytes of the file before the file ended and then When you get to the end of the file, it will return obviously zero. So what this works is I'm this loop is valid for a number of bytes read greater than zero once Once we get the zero bytes returned jump to done or done reading bytes We're done with the whole program. That's how the program terminates basically we jump to done at the bottom and done basically is you can see here Flush the print buffer and then return zero to the to the console Okay, that's the outer loop Now the inner loop is where all the work gets done so at this point remember we have a full buffer ideally that's 128 bytes of New things to print and the first thing we do is we print the color So we always start off with I guess red for each new line, but you can see at the top here We set RBX to zero so that will refer to red so basically what this does is it Puts centered out into the RDI file descriptor Then it kind of computes the address of actually it puts the the ANSI color value into RSI Zero extended that's what this means here. It means all the leading values, you know the high values are zeros above the the low byte And we print out the formatting so this will print out it will make everything this makes everything after it read But we're only gonna print one byte at a time So we'll only make the next byte red then a loop will go next that will be orange then a loop will go next That will be yellow etc. Etc. So after it makes the next color it does the next character. So in this case It's just taking The next byte from the read buffer and printing it out and you can see here We use R9 to track that location so every time this loop starts this inner loop We set R9 to zero and so that's the start of the read buffer And so basically we're going to go to the the first Bite of that buffer and print it out standard out Using the print cars function so one bite at a time gets printed At least it gets put in the print buffer not printed to the screen But put in the print buffer that we're going to eventually flush when it fills automatically or manually at the end of the program and so there's two things to keep track of there is B RBX which is the kind of the offset we're using into the rainbow as well as R9 Which is the number of bytes that we are in our print in our read buffer And so I'll skip this part for a second here And show you this part so Every time we write out one letter we're going to increment RBX in this case Just the low the low byte of RBX by one so it goes we're going to go from zero to one one to two Etc that we're going to throw away whatever is not in the low three bits by using and BL7 that only keeps the last three right seven in hexes in binary is 111 so this just keeps the last three bits and then it increments R9 as well So this will keep pushing us one byte forward in the rainbow and one byte forward in the read buffer Okay, and then this last bit of code code here before I get to the you know fancy bits is That whenever we get to a new line we reset to Red I want all the reds to line up all the oranges to line up all the yellows to line up as we go left to right So this basically just checks If we have a new line character the new line ASCII value is 10. So if that the Color is 10. We just reset RBX to zero Using this kind of logic here Now down here, this basically checks if we're done with our buffer. So There's two bits of logic. It's a little bit confusing The first one is to go through whether or not we've actually filled up the buffer in the first place Let's say we're trying to get 128 bytes But at the end of the at the end of the file it's only like, you know, seven bytes left You have to be able to check for that I'm not gonna go into details about how this comparison works You can take a look at the code go line by line and see but this just checks if the if the buffer is full or if We're out of bytes to print and if it is Jump the done leave the program otherwise jump to the top the only last bit of cool stuff that I had to do here was These five lines here actually I guess six lines, but one of them is just an address Basically I Because of the way, I mean, let me actually open this again Because of the way this works where you're printing out like escape codes Sometimes these escape codes won't be aligned properly like what happens if you're 126 bytes in your print buffer you're at the end of your print buffer and Then you have to print out the escape code. What's gonna happen. It's gonna put Slash E bracket in that print buffer. This is two bytes slash E is just one byte bracket is one bite It's gonna take that put it in your print buffer and then flush it so now you're gonna have printed out 126 bytes of important information and then Bracket E, you know E bracket, right? Whatever that means and then the next that that that means nothing And then it's gonna flush out to J and then the rest of important values So it's not gonna it's not gonna count as an escape code because it's gonna be Divided at the end of the buffer and so what I had to do was I had to put Some logic in here to check if we were almost full and if we're almost full flush because I can't let it get full because And we might accidentally have our escape code on that boundary which would not would not actually work And it would look stupid. It would print out a bunch of crap to the screen. It would be embarrassed So what this does is this five lines here is basically checks if we are within five bytes of filling up our print buffer If not don't flush yet. Otherwise, manually flush. So this is kind of a cheating We're basically using a smaller buffer just to make sure that we don't accidentally Cut off our escape codes on the end of our buffer boundaries. So Just show that in in working order here Let me compile that and so now we have a binary. Let me just show you how big that binary is It is only seven hundred sixty five bytes. So way less than it is on Linux links. It was like For 140,000 bytes, which is absurd on on BSD. It was better. It's like only like 14,000 But still and this is way better and we didn't even try to code golf yet You could make this way smaller if you wanted to work. We're still using the full 64 bit Registers and stuff. You could probably get away with smaller ones. You could inline everything It would be way better Anyway, you can show this works by running that binder that we just created on the code itself and This will print out immediately to the screen if you want to and that's fine But normally you want to be able to read what's in there and so you pipe it into less The problem with less is that it you can see it doesn't like escape codes You can see all our random colors and stuff in here like all this random junk And so instead of that what you can do is you can pipe it into less but do dash R that preserves color So now you can kind of go through read your code Like you normally would The only bug with this and you call it a bug you call it a feature is that you'll notice that Everything lines up, you know rainbow wise in each column the problem is look at tabs a Tab only counts as one byte So the oranges are like a tab minus one space off You know so because I tab indent everything that's not an address and so the oranges don't line up really We didn't account for that if you could go check for a you know a tab Bite and then count it as Whatever was it one two three four spaces if you want and do that logic on your own time be my guest But you otherwise, you know, you're gonna have kind of Zigzaggy rainbows, which is fine. It's still impossible to read. It's still distracting. It's still objectively worse in every way than the You know Unix utility cat, but I like it. That's why I've added it to our List of bins so I go to our bin directory We have a couple files that we have here a couple of you know Tools we can use we have big executable which we use all the time We have nyan cat recycle and spawn these are all utilities that we've made in kind of our bare bones library lists link your list assembly and Yeah, they're pretty pretty cool pieces of Programming in my opinion That's it for the video. I hope you guys liked this if you did let me know in the in the comments one last thing I will plug the Fed honey pot. I mean sorry the discord server link in the description It's only the true viewers because I'm only plugging it at the end of the video not in the beginning So only the true MLG viewers know about this. Anyway, thanks for watching. Have a nice day