 Hey guys, welcome back skits on series episode 9 topic today is timing things you can't hear it But this watch is ticking at five ticks a second. This is my my I think it's Vostok, Komandir ski I probably butchered that pronunciation pretty bad. I Can't even read these hieroglyphs, but Yeah, so we're gonna talk about how to time things. Let's say you wanted to benchmark your code profile part of your code How would you go about doing that in x86 64 assembly? Let's see you wanted a timestamp value for some reason Let's say you wanted to have a file be saved by a function And it just happens to save the file with a file name for the timestamp, right? That's not an uncommon use case, right? So how would you go about doing that? How would you count number of cycles that have elapsed and also seconds that have elapsed on your processor? Since you know something happened and also how would you time a delay? Let's say you wanted to do something every five seconds. How would you implement that kind of a delay in assembly? So we'll talk about those things today Now there are a lot of ways to time things Some are more or less portable. Some are better and worse. These are probably somewhere in the middle of the road I think there are two of the more common ways that you track time And I'm gonna do two different ways one is a syscall called get time of day Which is available both on Linux and free BSD and that basically returns even though it's not a return value It returns a timestamp, which is counting number of seconds and microseconds since the epoch Which is some period of time. I think it's like 1970 or something like that. It's like a boomer a boomer date And then there's the rdtsc instruction that would be available on a more of a even on a bare metal implementation and that returns number of cycles since the CPU was reset and They seem different, but I think we'll show one in one of the examples to come that they're not really all that different They're just giving you different outputs Okay, so going into that instruction, there's kind of three main Things that you should know from the documentation. The first is obviously that the processor monotonically increments this timestamp counter Every clock cycle and resets it to zero when the processor is reset so Yeah, that's what it says. So basically when CPU turns on it's supposed to be like zero and then when You query it later. It will tell you the number of clock cycles that have taken place since it turned on the opcode is 0f31 and the instruction is just rdtsc as I've showed before and what it does is it returns the timestamp counter value That's 64 bit quantity into two registers the high 32 bits goes into edx and the low 32 bits goes into eax and Basically, you can just shift them around if you want to put them together and make one big number Or if you only care about the low 32 bits, let's say it's a very short thing that you're timing You know, you can worry about that only in EX the problem with that is that what if you overflow eax during your during your time Then you won't know you'll have a negative quantity. That's elapsed if you were to subtract that so yeah I would always take the full 64 bit value. There's no harm in doing so nowadays. We have 64 bit processors in the first place and Then lastly is that CPUs don't really do things Sequentially in my understanding. It's not exactly some things are happening while other things are taking place And so it's not this is not like a I think it's called the serializing instruction Basically, you have to if you really care about getting an accurate measurement of the time that's taking place or what current timestamp quantity value is You have to call these other Instructions called L fence and M fence which basically ensure that all the other previous loads And stores respectively have taken place Okay That's pretty much it very simple stuff. And so how you use that or how I'm going to use that at least I'm going to call L fence to make sure all the instructions are finished I don't really care about stores, but instructions at least I want to get some level of Accuracy here, and then I'm just going to call the RDTSC value Instruction that's going to put that timestamp quantity into edx and ex and here you can see I'm shifting the rx value left So these 32 bits are going to be moving into these 32 bits And then we're going to place them into rx with this or instruction here at this point We'll have a full 64 bit value in rx. That's been read from the timestamp counter on the processor So yeah, pretty cool pretty easy Now Patrick had a good question and he's wondering how do I know if my CPU supports instructions like this and actually Patrick's not the only one to ask this question because L1q in the comments brought this up as well And he was at a very good point in the previous video. I was using these two different random Integer generating Instructions already seed and already ran on the processor and those are not You know pervasive I guess I guess older processors 10 years plus ago wouldn't have had had those Instructions and so how do you check if a processor even supports those instructions? Is there a way in software to check and the answer is yes Or you can just follow this flow chart. So basically you ask yourself. Are you a boomer? Have you walked this earth for hundreds of years now? If the answer is no, then you probably have a new PC and so it probably supports this instruction But if you are a boomer and ask yourself are you in destitute poverty? If no, you probably have a new computer and so again your CPU probably support this instruction It's been around since the Pentium obviously But if you are poor and a boomer, then you have to run the CPU ID Instruction to check for support now. What is this CPU ID instruction? Well, I don't really know but from what I can tell this basically is just a way to allow software to discover details about the processor on which it's running and so You can pass in a what's called a CPU ID leaf number into EAX and That just basically means I'm looking for a certain category of things about the processor Give me all that you know about yourself And so we're gonna pass in EAX equals one to get some different feature information About the processor into two registers EDX and ECX and This case you can see we can check for a bunch of different stuff. We can check for the FPU support We can check for SSE 3 folding point instruction support We can check for all this different stuff in this case We're gonna be checking for the TSC support that is the timestamp counter and also the associated already TSC instruction And to do so it's very simple. All we do is as you can see we pass in CPU ID EAX of One to get kind of these bits set up and many more bits. I cut it off here. Obviously just for space Into EDX and ECX and then you can see here these bottom two lines I'm just testing if that fifth bit is set to high And that would indicate that this TSC is supported and if it's not supported if that's low I'm gonna just be jumping somewhere else in my code and saying hey, by the way that instruction didn't work Or it won't work if we try it So yeah, this is how you would check if your CPU supports a given feature So thanks for bringing it up Patrick and L1Q is a good point Now the syscall I would say is is easier to use in a way And also harder to use in a way And so this syscall is available again on both on Linux and FreeBSD with these syscall numbers Don't worry about the numbers because they're gonna be Implemented in the OS specific and they already are implemented in the OS specific File that is included on your processor whenever you assemble this code And what it does is it basically it takes two structures and it returns an error value So it would be zero if it succeeded Um and actually of these two parameters the second parameter is bogus because you don't use it anymore You can see here that time zone structure is obsolete. So we're just going to be passing in zero or null for that But the time value Address that's important. That's the pointer to it's basically the address of 16 bytes or 128 bits of space in memory That you're gonna use for the syscall to drop in the number of whole seconds and number of microseconds to have elapsed since that 1970s epoch whenever And so yeah, basically if you run the syscall with these quantities in these registers in this way This dot time value memory location will contain number of seconds and microseconds since the epoch so very straightforward, I think With that out of the way, let's hop into the code We'll move this up so you can see it for now We have was that six examples? I'm gonna cover the basic usage of that instruction rdtsc and the syscall Get time of day in examples a and c I'm gonna implement also Tick and talk If you have experience with MATLAB or octave tick and talk which are spelled TIC and TOC there ways you can kind of Keep track of the elapsed time that's taken place for blocks of code It's used for like very cursory levels of benchmarking in MATLAB So we're gonna implement those two sets of instructions in assembly And we're gonna do so both in terms of cycles and also microseconds that have elapsed So that's cool. Then we have we're gonna use those two tick and talks to estimate the CPU frequency of our processor Which is pretty cool topic and a pretty cool result in there And lastly, I'm gonna talk about how to do a delay a wait or a sleep function in a same way For your purposes, so that out of the way, let's hop into the code and the first I'm gonna hop into the actual functions because I think that's good to cover at first and So I'm gonna first talk about these a couple of functions. So I have a function called tick cycles a function called talk cycles a function called tick time And a function called talk time and so what these do is they just count the number of cycles or microseconds that have elapsed between them so tick I Call tick I do some code I call talk and it will basically compute and it will return in rex the number of cycles and or the number of Microseconds that have elapsed in that block of code between them So first off is tick cycles. So what this does is it's a function that takes no parameters But it were it saves in memory at this location down here and also in rex the number of the timestamp counter at the current Instant and so basically you can see here. I'm calling L fence to make sure all the instructions have finished I'm calling R DTSC to get the timestamp front counter value into those two registers DX and EA X I'm shipping them. I'm oaring them at this point. I have in RX the total 64 bit value of the timestamp counter Then I'm also moving that into this memory address down here Which we can access Outside this by the way, even though it's a quote-unquote local variable with dot tick as long as I refer to this memory address as Tick cycles dot tick and as long as I've included this File I will be able to access that memory Any word my code just by referring to it as tick cycles dot tick and then lastly So not only is that value in this address That's 64 bit quantity of the timestamp counter. It's also in RX So if you wanted it immediately you have it in RX for your own purposes Now talk cycles is the exact, you know associated Function there For tick so this includes tick cycles as a Dependency obviously we have to have called tick before talk because we're going to is subtracting off that saved value From this in this point in time And so here you can see it's very simple again We're calling that same rdtsc in exactly the same way at this point in the code where I'm showing my cursor. We now have the Current timestamp counter value in RX and all we do is we subtract off the previous value And so in doing so we now have in RX the number of cycles that have elapsed on the processor Since we called tick cycles Cool now tick time is a very similar Process but instead of using rdtsc obviously it uses that syscall and so you can see we're pushing the you know corresponding clobbered registers by the syscall and we're just calling the syscall and we're saving that quantity in terms of seconds and microseconds in this tick time dot tick memory address 128 bits of space and We're also putting that value in our ex just in case you care about getting the current timestamp quantity You know instantly and then of course in talk site or talk time Again, we include as a dependency the associated tick time function that you would have to call first and all this does is that again It gets the times the current Time seconds and microseconds since the epoch and subtracts off the previous elapsed microseconds down here and Returns it in our ex and so at this point you'll have a number of microseconds that have taken place from talk time since tick time Lastly, I have the sleep function This is a very simple function All it does is you pass in the number of microseconds for which you want this Function to just sit there and wait and we're gonna wait basically by just having a loop inside which we are checking the number of the current timestamp and So we're gonna use that syscall again that get time of day syscall just to keep checking. Hey, has it been 5 million microseconds yet has it been 5 million yet? Has it been 5 million yet over and over and over again until the answer is yes And once the answer is yes, we drop out So a very straightforward you can peruse the code at your own leisure if you'd like it's all online on the The soy hub suppository so with that out of the way, let's hop into the examples So I have two examples. Let's look at the first one first. That's um, just the RDSC instruction So if I run this, you'll see all it does is it's printing out. Oops. Oh My god, I'm so dumb the number of timestamp clock cycles that have Taken place since my processor was reset. So it's been however many Cycles since my processor was turned on and you can see it's counting up Each time higher and higher number, right? Cool. How does that work? Well, pretty safe straightforward First thing I'm doing is I'm actually checking if my CPU supports that instruction like Patrick suggested And so you can see I'm just checking that fifth bit if it's high I'm good to go. If not, I'm gonna say there's an error message for us So again, all I'm doing is I'm querying the timestamp counter and I'm printing it out So if I were to run that again, we'll see that I just get that quantity Increasing each time and if I go back in and what if I would have changed that that Conditional jump to be inverted right at this point I'm kind of simulating the CPU not having support for this instruction if I run it now You can see it says that our DTSC is unsupported Okay Let me change that back before I forget So yeah, you can you can successfully check whether or not your CPU supports a given instruction like that Cool Like example, I'm gonna talk about example C now, which is that get time of day syscall And so if I run that from this example function, you'll see that it prints out the number of seconds and Microseconds that have elapsed since that that unix whatever epoch back in the 70s and So you can see it's going up each time I call the code called a yeah the code How does this work basically as you would expect all it does is it calls that get time of day syscall and Then prints out number of seconds prints out at some point and then prints out number of microseconds So that's pretty simple again. You can see just prints out that time stamp value Okay, cool example B Now this is that tick and talk function that you would have in MATLAB or an octave but in Assembly and this case that's kind of not seconds, but it's counting cycles that have elapsed So if I run this code, you'll see what happens It basically did a billion loops in 1.2 billion cycles So in this case 1.24 1.30 Is that billion? Yeah So you can see it's running pretty much more or less at one cycle per loop iteration almost So yeah How does this code work? Basically, it's pretty pretty straightforward. We have the number of iterations 1 billion being saved in a register We're printing out some stuff in the beginning and you can see here's our tick and talk in our loop So here we're calling tick cycles. This saves that initial Timestamp counter value the cycle count into memory Then we have our loop for 1 billion iterations right here We're just decreasing that register by one every loop and until it's zero We keep jumping to the top of the loop at the end of that loop once it's done We call talk cycles and that just subtracts off the number of cycles that have taken place and it prints it out here So again, I run that code you can see it prints out we've elapsed We've taken a 1.2 billion reference cycles to do 1 billion iterations of that loop Okay, cool Example D now Exactly the same thing but in this case it's printing out that same thing but in microseconds So not in cycles but in microseconds now and you can see it's taken around 400,000 or so Maybe 300,000 or so microseconds to do 1 billion iterations And so you could actually compute the frequency of my processor if you wanted and we will do that in the next example as well, so How does this work? Well, it's exactly the same code pretty much cut and paste Except that we're using two different tick and talks we're using tick time instead of tick cycles and talk time Instead of talk cycles So this will compute the number of time that's elapsed number of seconds and microseconds that have elapsed since tick was called and Printed out to the screen and so You can see here it prints out that quantity as we would expect Example E this is how we're going to estimate the CPU frequency and How do we do that? Let me show you so you can see here here is that same exact code again 1 billion iterations of that loop and you can see at the top of the loop before it takes place we have Calling both tick time and tick cycles We have the loop and then we call talk time and talk cycles And then you can see we are basically dividing the two and scaling by a million to get our units in Hertz As opposed to being in what mega Hertz or something And so if I run this you can see we're basically gonna be estimating the processor frequency And so if I run this code You can see it's done a billion loop iterations in 3.593 billion Hertz I can run this as many times as I want it's always giving the same quantity Why is it given the same quantity? Well first off? What is my CPU quantity? So if I run the effect, you can see my CPU frequency is 3.593 gig so it's giving the exact CPU frequency How is how is it so accurate? How even though the time is different between each time it calls something like How am I getting the exact quantity every time if I had to guess just between you and me if I had to guess I would guess that that get time of day syscall is using RDSC and just multiplying it by some scalar constant to compute the time that's elapsed even though It's just it's really measuring cycles to have elapsed, but it's pretending It's it's convincing us that it's measuring time But it's really not and so in reality in my opinion if I had to guess These two methods the syscall approach and the RDSC approach are actually the same thing So anyway pretty cool stuff Lastly the last example I have was that delay function and so All I have here like the code. I think I'm delaying it for two million microseconds or two seconds So all I'm doing is I'm calling sleep with that as the parameter And then I'm printing out that we've waited so we know when the program's done So if I run this code, let me clear the screen It takes two seconds to print out that we waited for two seconds as you would expect 1,000 to 1,000 perfect. So yeah with that out of the way. We've covered pretty much everything I wanted to cover I think Now you guys know how do you basically use some pretty basic benchmarking You know what goes into that syscall and that instruction and kind of how they work behind the scenes And you can also implement delays and whatever else you want to implement on your own time No pun intended without further ado. I'll do one last thing and that's to plug our top secret discord No normies allowed. It's in the description in the last link Check it out if you want to hang out with some of us Yeah, I'll see you guys in the next video