 Hey guys, welcome back skidstone series episode 19 Topic today is a filler topic. I'll be honest least squares regression basically polynomial fits for data and our goal today is to make plots like this one and like this one where we basically fit a polynomial to data and As usual, we'll first talk about the theory pretty quickly and then we'll talk about the implementation in assembly And again, there's no libraries of any kind in the skidstone series so dog asks what is least squares and the cat responds very verbose Lee basically least squares is an approach well, I'll summarize here Basically, you have data points in this case six of them that you want to fit with a line a line can only pass through Guaranteed to data points. And so how do you pick the line that best approximates the other four? Or I should say all six. So how can you do that? And I'm sure there are other applications of this methodology that I couldn't care less about So what's the approach? How do you fit a line to data? Here's the math that you would do by hand. So let's say you have these data points x and y Which by the way are the data points from the plots. I just showed you a Line has the equation for y equals mx plus b where m is the slope and b is the y intercept And if you just plug the values for x and y Into that equation you unless the lines unless the points are already online You won't be able to pick a line or a combination of m and b values that would hit all those points And so you're gonna have an error term on most if not all of these equations And so this error term are called residuals sometimes the idea is if you solve this equation for the error term And you take the square of those terms and sum them up and you you basically get a Quantification of how bad your line was right? If I pick a line I pick values for m and b To create a line and I get error terms for every single point. I was off by this much every single point I can quantify how bad my approximation was by squaring all those terms and adding them up and the idea is The best line it would be if you were to take all of these squared error terms and add them up That error would be the least the lowest error That's why it's called least squares because you're taking up the error terms squaring them adding them up and Picking the values for b and m that minimize that error And so if you do that if you take these equations solve for e Square them and add them up you'll get this which if you spend five minutes You can reduce down to this expression down here and the question is still what values for slope and intercept would minimize That expression remember that expression quantifies how bad my line was So how do you minimize an expression or an equation or whatever? Let's say you have a single variable function like this one here The minimum would be where the slope is zero obviously you can't get any better So your slope has to be zero at the minimum and also the maximum But in our case we don't care about maximum because I can always pick a worse line I mean this line is good I can pick a million worse lines than this and I can always pick a worse line And so maximums are not of any consequence for this question today wherever the slope of the error sum Square would be zero that would be the minimum and so If you take that expression to be let's say s and Then take the partial derivatives of s with respect to those two coefficients b and m slope and intercept You would and make the equal to zero you can find these equations here So you'd say now zero equals this equation Derivative with respect to be that would be 12b plus 44 and minus 200 Similarly, you can take dsdm and get 44b plus 200m minus 1000 and you can solve this linear System in whatever way you want. I like matrices But you can do whatever you'd like and if you solve for b and m in this system You'll find that b is negative 2 and m is 6 and so you can plug those values back into your line And say y equals 6x minus 2 and that is in fact this line that you see here And so that was very arithmetically expensive and took a lot of time for me to write out But of course there is always an easier way in this case the easier way involved matrices And so the idea is if you take your data Your safe for example your y data is this you can keep that the way it is for a vector y For your x matrix what you do is you construct this here This would be a n by 2 matrix n is the number of data points you have And so the first column of that is just once and the second column is the x values Now if you take those two You know matrices you can equate them in this fashion with what I'm calling a q here You can call whatever you want. That's just the coefficients that we're trying to solve for And this is your equation for a line essentially y equals you know mx plus b least squares here and so To square something which you do is take the transpose times the values itself But so how does that work? So what we can do is to solve for q in this equation We can pre multiply both sides by x transpose And then invert x transpose x And multiply both sides by that in that in that case you'll solve for q equals x transpose x inverse times x transpose y So a lot of manipulations here, but You've taken out all this math all this squaring all this derivatives And you've encompassed that same exact math in these matrix equations here and the idea is If you can just generate these matrices From the input data You can solve this expression to find for your coefficients of interest And there's two notes. I have the first tip would be for linear Lines, you know that you're trying to fit data with I'd say two by two matrix inverse and for parabolas It would be a three by three. We don't usually like to take inverses of matrices, but for small ones like Two by two and three by three. We can get away with that pretty easily The next tip is you actually don't need to generate These intermediate matrices x and x transpose and whatever because you can just Compute x transpose x on the fly as well as x transpose y on the fly Completely from the input data alone. Why is that because it's such a simple You know matrix x is just ones and x values y is just y values I don't have to waste memory space with x transpose x or sorry with x transpose and with x I can develop these Terms directly And so our approach to solve for this in assembly. It's going to basically be constructing x transpose x In memory directly from x x values Then we're going to be constructing x transpose y directly in memory from the values of x and y Then we're going to be inverting x transpose x Multiplying it by x transpose y And that is our answer Now that's for lines. How does it extend to higher order polynomial fits? Well, the number of rows obviously is the number of data points you have we have six points So we have six rows the number of columns is the number of unknowns in the polynomial lines have two unknowns slope intercept and so we've had we had two columns But if you had a parabola you'd have to add another column and how do the columns look? Basically the first column is just well I should say the zeroth column is just the x values to the power of zero which is just ones Second column or the first column. I should say is the x values to the power of one And then you have to the power of two and three and four etc And so if you had a parabola fit your x would actually be of the form ones x values x values squared So yeah, that's how that would work Lastly, how do you invert these small matrices two by two and three by three? I'm very simple two by two matrix. What you do is you basically Swap around a bunch of bunch of stuff negate some stuff and pre-multiply by one over the determinant What is the determinant of a two by two matrix? It's just ad minus bc. So this diagonal minus this diagonal That's a two by two. How about three by three? Well Terminant of a three by three matrix is basically The way you compute it is you compute it a couple different ways in this case. What we're doing is taking the first row Multiplying that by the determinant of these smaller Submatrices, so you have a times determinant of this sub matrix minus b times This sub matrix determinant plus c etc. This follow the equation. No big deal What you do is you take that one over that value and you multiply that by this matrix Which is just the determinants of two by two matrices out of the big one So follow along at home solve this yourself. It's not a hard thing to implement at all all the code for today is in the lib math Lin algebra directory of this soy hub suppository pick a look if you're curious Here are the four functions that we're going to use most in this video Uh these first two to take the inverse of matrix So what you do is you pass in address rdi Or register. Sorry you pass in register rsi the address of your matrix of interest that you want to invert and um You pass in register rdi the destination address for the inverted matrix For straight forward and then you have these two functions here Which are function of the same thing in terms of how you call them, but they do different things. So linearly squares computes the linear line of best fit and Quadratically squares computes the parabola of best fit And what you do is you pass in register RSI and rdx the addresses of your input data for x and for y And register rcx you pass in the number of data points you have And then lastly you put in a register rdi the address of your Answer where do you want the answer to be placed? Where do you want the? B and m values the coefficient values to be placed that would be in register rdi And so with that out of the way Let's look at the code So here is examples 19 Directory and there are again two examples in here, which I showed you at the beginning of the video First let's take a look at the actual Functions that we're going to be calling. So here are three of those four functions The first one I want to talk about is the inverse two by two matrix function And uh, so how does this work? Well, we just covered how this works All we're doing here is we're computing the determinants of a Two by two matrix, which is the first element Times the fourth element minus the second element times the third element One over that is what we're going to be pre multiplying our matrix by essentially. So how did that work in assembly? Well First you can see I Saved my registers because again in our schizone series None of our registers are ever affected by function calls unless they are return values So we save those registers and we restore them at the very end. You can see down here Now how does the rest of the of the code work? So first thing was to compute that determinant So how did this work? Well, you can see here what we're doing is we're moving so rsi remember pointed to the rsi points in memory to the matrix of interest and so rsi Plus zero that Offset of zero is the first element as you can see here I've pulled the first element or what was called element a In this expression into x of m zero Then I multiplied that by element four element four is three bytes offset from element one You can see I have multiplied this by the previous value. So now I have basically a times d in x of m zero then In x and m one a different register. I have pulled in The value b And I multiply that by the value c So now x of m zero contains ad x of m one contains bc and so we have this term and this term This term and this term Now we subtract the two how does that work? Well, I subtract x of m one from x and m zero Then you can see I've put the value one which I've defined here in memory Into x and m one and divided x and m one by zero at this point x and m one contains one over The determinant. So I basically have Sorry, this whole thing is now in x and m one at this point. All I have to do is to basically Grab elements out of the original matrix Multiply them by this And then negate some of them and I can then construct the resultant in inverted matrix so Let's see how that looked for element one one. Basically, you can see here. I'm grabbing The fourth element of the original matrix that would be Three bytes offset from the first element which would be at rsi And that's now in x and m zero. So this is basically element d element d is now in x and m zero that I'm multiplying element d by One over the determinant value and putting that value in the destination element one. So is that correct? Yes, we take element d multiply it by this one over value And put that in the first element's location I'll give one more example element one two. So this is the element to the right of that one I here you can see I've I've set x and m zero equal to zero and I've subtracted off Element b From the original matrix. So now I have negative b in x and m zero Then I'm multiplying element negative b times one over the determinant And I'm dropping that into the second slot in the matrix That's here. Is that correct negative b times one over that number? Yes, that's where it should be And similarly you can you could do the last row as well. That's how we've inverted two by two matrix in assembly similarly A three by three is just much of the same but a little bit more arithmetic I won't get into it, but I'll just summarize how this works We've started off computing the determinant of that three by three matrix like this That's this value here We then took one over that value as this value here Then for each element we are pulling out of the input matrix Values and making the determinant calculation for each two by two sub matrix Multiplying that by this one over determinant value and dropping that in the destination Of uh of interest So again, yeah, we could compute determinants of any matrix like this realistically speaking But it gets more and more mathematically expensive to do so for higher and higher dimension matrices Okay That's this one. Now. How did the linearly squares function leverage this I mentioned before that our procedure was basically To take the input data directly and compute X transpose X and X transpose Y without any intermediates of any kind Then invert X transpose X Multiply by X transpose Y And that's our answer. So those steps are our operating procedure here Is that what I did? So first thing I saved my registers And at the end I obviously restore the registers But so here let's talk about how does X transpose X and X transpose Y work So first you can see I have space at the bottom of my Function here to drop in X transpose X and X transpose Y now How bigger these spaces will think about it X that matrix X Is an n by two matrix n is the number of data points we have So it's n by two So X transpose X Is two by n Well X transpose is two by n X transpose X is a two by two matrix So we only need space for four double numbers Eight bytes each for X transpose X. Do we have that? Yes, 32 bytes all set to zero to begin What about X transpose Y? Well, that is just a two by n matrix multiplied by a n by one Matrix so this is a two by one matrix. So yeah, we have two Doubles worth of space here or 16 bytes all set to zero to begin Now let's talk about how does X transpose X even look Right. This is X. What is X transpose X? Well If you think about it The way it will look is basically you'll have n here, right X transpose X is basically this row this column Dot it with itself, right? What is what is once? Dot it with itself. Well, it's just the number of things you have. So we'll have an n in the first element What about the diagonals here? Well, this is basically this row Dot it with or this column dotted with this column. What is that? That's just the sum of the X values Sum of the X values And then what is the last one? What is what is the second column times the second column? Well, that is just the sum of the X value squared And of course you could argue that this is not n. It's it's the sum of the X values to the zero, right? And this is to the one and this is to the one That's how X transpose X looks It's a very simple matrix to construct without having to go through the any intermediates like this matrix here So I can directly compute the sum of the inputs And put it in this slot I can just put The number of elements in this slot and I could put the Square of the inputs in this slot and of course this one and this one are the same value What about X transpose Y? How does that look? Well X transpose Y is just basically Think about it. What's the what's this column? Dot it with this column Well, it's the sum of the Y value So this is how X transpose Y would look it would be the sum of the Y values And then you'd have the sum of the X times Y values And of course this is probably Y values times X to the zero This is times X to the one and then if you had more you'd have X squared Y Etc etc etc That's the form of X transpose Y. It's a very simple thing to construct So how do we do that? Here you can see for the first element that was just the sum Of X to the zero that's just the number of elements again. We passed that as an input that was in rcx That's the number of data points we have so we can just convert Rcx into a float And drop that Into the first slot in X transpose X Then you can see here. I've set some summing Variables equal to zero So i'm saving some spot in memory or in my registers for the values of X transpose X components and Y transpose Y components. So I have saved four You know floating point registers to do that So how did this work? Well, obviously I loop through the input data I decrement in this loop by One in rcx every time and every time I iterate through The input data which we've passed in the address for that in rsi and rdx And so what I do is basically I pull the X value The current X value into X of m4 I pull the current Y value into X and m5 And then I add those to the running sums in X and m0 and X and m2 again Those were just the sum of the X values and the Y values respectively That's these two lines here. Then what I'm doing is I'm I guess I'm multiplying both those X and Y values by the X value. So now I have X squared In X mm Four and I have X times Y and X and m5 Then I've added those values to the running sums X and m1 X and m3 and those were the components 2 2 of X transpose X and component 2 of X transpose Y Basically, I'm just populating those two matrices As we go I'm computing all these terms sum of Y times X to the zero sum of Y times X to the one as well as the ones for the X transpose X That I discussed before Once that loop is completed We now have basically dropped in to our memory locations X transpose Y And X transpose X the correct terms. So that's what we've done here. Once we compute in X and m0 through 3 all of the Elements we then drop them. You can see down here These five lines are populating those matrices with the running sums that we just computed And then I have another workspace here where I'm basically inverting that X transpose X into by calling Inverse 2 by 2. So if I look down here, we have a workspace Of 32 bytes. So that's four 8 byte floating point numbers worth of space So we've entered that now our dot workspace memory location contains the inverse of X transpose X Which was the next step, right? That was That was this here We now have this in that dot workspace memory address and we have this in the X transpose Y memory address What's the next step? Well, obviously it's to multiply and so you can see here. I then I multiplied this Workspace so the inverse of X transpose X by X transpose Y. I passed in the dimensions Call multiply and then of course Um What what is this here? What's this move rdi rsp plus 120 that is where we saved the original rdi value So I don't have to do that because we didn't touch rdi, but just to be safe. We saved our rdi in On the stack and it happens to be 120 bytes off the stack right now And so I can grab that I didn't have to do this because we didn't affect rdi But let's say we did in some case I can grab the value of rdi And put it back in its proper location for this function call rdi again contains the address of our answer Where should we put the answer? And so I can comment this out and it would still work all the same But I'm gonna leave it just just uh, actually no, I can't because look right here Here we we destroyed rdi. So that's why I've done this I've pulled rdi back off the stack because I destroyed it in the previous function call So that's true. I did mess it up. Sorry At the end. I just returned everything back off the stack to its register and Return so that's the entire process of linearly squares Implement in assembly as you've seen. So does this work? Well, let me look at the code that we have here. So We have our two examples Example a was a linear fit if we look at the code Here's the code. So we have a lot of stuff to talk about here The first thing I want to talk about is the includes So Because we're plotting stuff we have to have the includes required for plotting And so for that we have to be able to open and close files make a scatter plot Etc etc in addition because I plot a line. I also have the stuff required for plotting Evaluating expressions plotting a plotting a curve that we that we have So I have this evaluate parameters linear space again for lines You don't have to worry about this because it's whatever but for more advanced Curves we want to be able to like for quadratic. We want to be able to plot You know the entire curve without just having two points And so how does this work? Well, first thing that I've done is I've defined a line fit function So this what this does is basically it computes uh, the line if you give it the coefficients, so the B and the m for y was mx plus b This will compute if you give it an x value the corresponding y value And so you can see here it has a 16 byte worth of space for Though those two coefficients The slope and the intercept would be in this memory location So how are we doing this? Well, you can see first thing Um, very simple this these five lines here. This is the entire video. This is what I just talked about for half an hour Um, this is the linearly squares function. So what we're doing is we're passing in to rsi and rdx Our data point values. So I have those in memory at locations Determines with the label dot x values and dot y values Those are probably down here somewhere. You can see here here the x values and y values that we're calling Then I put into rcx the number of data points six and lastly I put the Location where I want the answer to be put. You can see here. I moved into rdi Line fit function dot regression coefficients. What that does is basically I'm going to compute least squares on the x values and y values. I'm going to put the answers up here This basically means the sub label regression coefficients under the label line fit function. That means basically put the answer In these 16 bytes here So at this point when the when the program gets to this line Because you know, it's skipped over all this it doesn't start until the start label Why is that? Well, because if you look up here in the elf header, we said start at the entry point start So it will jump all the way down to start And then immediately compute the linearly squares of those data points and put the answer right here And then I'm printing it out We covered that in a previous video. I'm printing out those answers those coefficients And uh, yeah, then what I'm doing is I'm basically plotting everything in a way that you've been familiar with from previous videos So I won't cover all that here But if I run this you'll see that well first off it printed out those answers This this is the um y intercept and the slope of that linearly squares So everything worked as expected and in fact these values happen to match Uh these values Exactly It also created for us this line fit dot svg This scatter plot which I already showed you that's this one here. I can refresh and it's it's all the same So, yeah, that's it. We did it. We plotted the line of best fit for this data great And it's much the same process for the quadratic one. Let's show that really quick Example b I can run that very much the same process In this case, I have three coefficients I have basically an x to the zero term and x to the one term x to the two term Here here in here and again, it also created this svg Which I can open up in firefox It's this one here and you can see this is that parabola defined by those coefficients and it it pretty closely approximates those points With that the video is over. We effectively have implemented in assembly A least squares regression algorithm for lines and parabolas And if you wanted to you could extend this to higher order systems I didn't want to because I don't see any value to higher order polynomials, but maybe you do If you want to make a cubic You know Best fit program It's fine. The only difference you have to do is you have to make a larger x matrix. So your x transpose x will be a four by four matrix And your x transpose y will be a four by one matrix It'll have to be able to compute the inverse of a four by four matrix, which you can do With that out of the way, I'll end the video. Thanks for watching. We do have a discord server If you want to hang out last link in the description Thanks for watching. See you