 So, today we will spend some time discussing the midterm paper and then we will continue with more matrix problems very fundamental matrix problems some of which you might have seen before like Gaussian elimination solving linear systems of equations and finding determinants of matrices. So, in discussing the midterm paper the first question was supposed to be pretty much give away if you were tracking the course it is basic index games. So, that you get a pattern of crosses in an ASCII art. So, the very first pattern is fairly obvious what is going on is every alternate position is being filled in and therefore it is not difficult to see that row plus column is the deciding quantity and you check the parity of that if it is odd you do something if it is even you do something else. Now, your code does not need to be exactly equal to this code for example, instead of checking if row plus column is even if you check if it is odd and do the correct thing that is perfectly fine. And instead of dividing by 2 if you mask with the bit pattern of all zeros and a 1 and then use that that is equivalent. So, do not worry about those things. Generally speaking in exams we will not insist on completely correct syntax from start to finish there is no need to waste time and space talking about what files to include and all that stuff even defining main as such unless main arg v arg c is required specifically. For most of the problems in or for all the problems in midterms you did not need to give anything except the code logic. We avoided problems involving characters and strings and got right down to the basics. So, the first pattern was fairly simple the second pattern was little more non trivial and that is why we gave a template code. So, in the second pattern you see that it is like a diamond with four corners which are grayed out with x's and then there is a center. Now, the important thing about the coordinate system here is that you print rows in increasing order going downward and columns are increasing as you go to the right that is half opposite to what you see in a Cartesian plane. So, I did not want to miss the pattern itself. So, I am going to use the board. So, the coordinate system is almost like here is your column here is your row rows increase downward columns increase to the right and in this coordinate system if you think about the upper left corner that corresponds to some kind of a half space. Clearly the equation of this must be row plus column less than equal to something or less than something. Similarly, the other corner that half space has to correspond to row plus column greater than equal to something it is removed from the origin. Now, the other two corners have to use a difference between row and column and that corresponds to the four lines in the template. So, two row plus columns and two row minus columns or column minus rows. So, at this point all that is left for you is to figure out the polarity of the comparison and the right hand side of the comparison. The left hand side bull clearly tells you that some comparison operator has to happen there and the pictures themselves they make it clear. All that remains is to figure out what those right hand sides are. Now, we have already defined n and half for your convenience and so it is easy to see that if this is about half half then the first equation has to be row plus column less than equal to half or thereabouts and you can similarly figure out all the other ones if this is about half double of that is nn then clearly this thing is going to hit the axes at something like nn plus half. And after this it is just a problem of making sure that you do not have this off by one errors that after dividing an odd integer you are doing the right thing and you do that by considering one of these examples and you figure out that the southwest, southeast corner actually has nn plus half minus one you have to adjust that and that is it all the others are exactly comparisons with half. So, once you work that out this these two problems together should not have taken you more than say 15 minutes at most for people who are running along fairly well with the course it should take them like 8 minutes at most. So, the first question was supposed to be a reasonably a giveaway question. The second question is slightly more interesting and I initially set it to use vectors but then if people are not comfortable with it I change the spec so that it use native errors. So, that even people who hadn't studied vectors in much detail could easily solve number 2. So, what is question number 2? You are given a polynomial where many of the coefficients could be 0. So, it is a sparse polynomial and therefore the way to store it is to use two gangue arrays. The first integer tn stores the common length of the two arrays and then you have powers from 0 through tn minus 1 store various powers of x and coifs of i will store the corresponding coefficient which could be positive or negative. The coefficients are floating point numbers the powers are integers positive integer or non-negative integers. So, as you can see the very first power may be 0 and that is the only subtlety about the question and we gave a couple of examples where p equal to 2 minus x cubed plus something x to the power 5 is represented with three terms where the powers are 0, 3 and 5 and the coefficients are the corresponding numbers. So, after reading up to this point you should be very clear about what the representation is and then the first question was how do you differentiate these kinds of polynomials and again the only subtlety is whether there exists a constant term or not. If there is no constant term then the size of the arrays the logical size of the arrays will remain the same. Every coefficient will be multiplied by the corresponding power and each power will step down by one that is well understood and the problem case is where you have a constant term because the constant term on differentiation will disappear and to represent this in this representation correctly what you will have to do is to eliminate that and squeeze down everything else from the right one position. Now, again there are multiple ways of writing this you could either detect the presence of a constant term and explicitly do a block move in the two arrays which would be fine. Slightly smaller and simpler piece of code is given here where you create a right cursor and a read cursor as we have often done in examples in class. The read cursor marches along from left to right the right cursor always lags it is either at the read cursor or it is one behind it could be one behind in case the first constant term is discarded. So, you start of WX and Rx both at 0 and then they march down the indices Rx is always incremented in every iteration of the loop, but you modify write WX only if the power was positive strictly in which case you write out to WX and then you advance WX. Now, at this point after the for loop ends WX is either the original Tn if there was no constant term or it is Tn minus 1 if there was a constant term. In either case you write WX into Tn and that fixes the logical length of the arrays that you should look at. So, again fairly simple problem can't imagine taking more than like 15 to 20 minutes for this at most. 2B was more non-trivial. So, 2B basically said if you have a symbolic polynomial represented in this fashion and also there could be large gaps in the power sequence. So, you could have something like you know 1 minus x plus 3x cube that is just dropping x square term, but you could have you know x to the power 15 minus 2x to the power 29. So, if you have such large gaps in the powers then how would you efficiently evaluate the polynomial at a given floating point value of x itself. So, Px is symbolically given through those 2 arrays and now I instantiate x to a specific value evaluate Px for me and return a float or a double. So, the very simple way to do it is to build up the powers needed as you go along. So, you build up x to the power 15 starting from x by multiplying x 15 times and then in case the next power is 30 then you need to multiply another x to the power 15. So, you keep multiplying in arithmetic progression order namely you multiply you with 1x at a time and that of course takes linear time with the powers. So, that takes a long time and this is the very simple code for it. You just track the last power you have seen the maximum power you have seen so far and if the next power you have to compute is larger then you keep multiplying by x until you reach the next required power and then you you know take care of the coefficient here. Now, we have already seen in class that this is a slow way to do exponentiation and you could do fast exponentiation by squaring and this is what it does. Any confusion? Yeah, am I going too fast or is my voice not clear? So, the faster way is to bridge the gap between adjacent powers by squaring. So, if I just saw x to the power 15 and the next power I have to construct is x to the power 30 that means I have to build up x to the power 15 again and I do that by doubling the same doubling trick that we saw in class. So, we pretty much use that as a subroutine to build up gap power. So, gap power is the next power I need minus the previous power I have and then you know we use the doubling trick to bridge gap power and then you multiply by the gap value and then you update max power to the next power. Exact same code but just that the bridging of power to next power is done by doubling or squaring. Yes. Yes, that is already stated. So, in strictly increasing order. Well, if you use the power function you will lose a little bit of credit but if it is correct we will give most of the credit. So, in correcting all the midterm all the all the exams the general policy is that correctness is paramount and efficiency is sort of secondary. In some questions we will highlight that you need to get the most efficient solution and but generally if you know use library functions like power then you are invoking log and exponential inside which is pretty slow. So, roughly speaking we might deduct something like 10 percent to 20 percent maximum in such cases. So, here the very first solution which uses linear building up of powers or uses the power function will get some baseline marks like 75 or 80 percent say 80 percent and then people who do this exponential or sorry the squaring trick will get full marks. However, the squaring application here is not the best possible there is actually a slightly more elegant way to evaluate the polynomial based on squaring but in a slightly different way. That solution is not here if you would like to work on it and submit it as an extra credit homework you can feel free to do so. From linear time to squaring of course buys you a lot but from this form of squaring to the slightly cleaner form of squaring may not buy you that much it may be at most a factor of two speed up. So, we did not put any marks for that what we are thinking of doing is certain questions if they are answered particularly elegantly by students they will be flagged for a coding competition at the end of the semester. And based on that coding competition will decide the A pluses and of course also other marks in the class but anyone who answers this question in the third way which is not stated here will be flagged for the possible A plus. There has also been some discussion on the missing project and what we are going to do about it some people want to do projects but the point is it is going to be very non uniform the backgrounds and what you want to get out of the course is so different for different people that it is pretty hard to standardize projects. It is already hard enough to standardize the five labs in a week being fair while evaluating projects is basically impossible. So, what I was thinking about is why not try a quiz two first or quiz 1.5 say which is actually a lab quiz with a much longer time base. So, we will take some weekend and have a three hour lab exam without much time pressure because limited time making exams you have to write some code in you know 10 minutes or 20 minutes which is correct you know I would not get it correct most of the time. So, we will try that out and we will try it long before the official quiz two deadline. So, that if we see it is messed up and it is not working and people are complaining about various unfairnesses then we might drop back to the written quiz two or you might have even both ok. That way we will reduce drastically the total weight on the finals and we will have a bigger part of the evaluation based on no time pressure lab coding. So, this has been expressed by the other section and we are seriously thinking about that, but either through the three hour lab exam or through a sort of extracurit coding project we want to have more you know not regularly evaluated coding work in the class. Solving puzzles is not can only get you so far ok. The third question is about needles and haystacks again, but the only difference here is that neither needle nor haystack is sorted and they can have lots of duplicate entries and the goal is to remove any occurrence of anything in needle from haystack, but keep the other elements of haystack unchanged. In particular if there are other duplicate elements in haystack they should remain unchanged and the ordering should be kept the same. So, for example if haystack had this these numbers in it and needle was basically the set 2 and 3 you would remove all occurrences of 2 and 3 from here, but you would leave the 5, 1, 4, 1 unchanged and in that order. So, generally speaking you should realize that you know sorting haystack is kind of useless because I have to report the surviving elements in the original order anyway with repetitions. Whereas sorting and deduplicating needle is a good idea ok. I might have given you needle with a thousand elements, but with only three distinct values. If you first sort them and then squeeze out all the duplicates your needle becomes only a three element array and searching it is much faster than searching a thousand element array. So, that's one observation and other than that I provided all the primitives here and I alerted you to revise all this before the exams particularly as well. So, the simplest code will just scan haystack left to right and for each it will check whether the number is in needle or not and accordingly it will just squeeze out the values one by one if it appears in needle. But the problem is that if you squeeze out gaps in haystack one by one you might be copying the same items multiple times ok. So, you shift a block then you shift a block some more and so on. So, a better technique is to again have a read and write cursor because you can only lose elements you will have the read marching along ahead of the right and the read reads a particular element of haystack if it is not to be eliminated you write it on the right cursor and advance both. If the red element appears in needle then you advance it without writing anything in the right cursor. So, it's fairly simple all you says read cursor from 0 to the size if I binary search for the haystack rx in needle from beginning to end and I don't find it then haystack write x plus plus is haystack rx. So, here the right cursor advances and then you resize it to the final size of the right cursor to discard the junk elements at the end. This resize is of course important otherwise you have junk elements at the end. Now, so I sorted needle but you can also deduplicate needle in pretty much the same way. In fact deduplicating needle after sorting it is even easier because all you have to check is that the as you are reading along you have to only write values which are larger than what you last wrote otherwise there is no reason to write it. So, that's the comparison here if if either wx is 0 or needle wx minus 1 is not equal to needle rx then append the next value to needle itself. This is a trail behind right cursor. So, again you write give the first algorithm you will get about 75 to 80 percent of the credit. If you give the second algorithm you will get full credit. In this case there is nothing particularly smarter that you can do. The last problem is mathematically the most fun and there are two pieces to this puzzle both of which were covered in either a class or a lab. We have of course discussed permutations and cycles in them at great length in class including showing how you can circulate the values in a cycle by one step. So, all that should give you sufficient background on what it means to find cycles in permutations. And the second piece of the puzzle was this lab exercise for one of the sections where we looked at planets rotating around a star and each planet has its own periodicity or year. If at the beginning all planets are aligned in one line with the star when will that next happen that is the LCM of their years. So, between the two of these you should have caught on that if there are multiple cycles forming the permutation and one cycle has periodicity three another cycle has periodicity two and so on. After you apply the permutation and LCM number of times all cycles will fold back to their original position. So, it is a combination of finding cycles from a permutation and finding their LCM. But even if you did not get that last piece there are enough other parts to the question that you should have got. So, we start off with finding the inverse of a permutation and that is pretty easy if a particular item i goes to pi i then all you have to do is to take pi i back to i and that is what is being done here. So, inverse pi takes pi i to i and this will all work out fine because the permutation is bijective will fill up every cell exactly once and nothing will go wrong. The second is about repeated application of a permutation and this one there is no coding to do it just to get familiar with the idea of applying a permutation again and again. So, if you take the sequence s which is just a b c d e and you apply pi once to it you get this sequence if you apply pi one more time to the second line you get the third line and the question was to just keep applying it for say seven steps and the intention was that you apply it six times and you notice that you have come back to the original order a through e and that should tip you off about what is really going on because you have talked about cycles so much that it is pretty easy to pull out the cycle manually and then you observe that there are exactly two cycles one of which is of size three and the other is of size two and you fold it back after six iterations. So, that should tip you off as to what is going on that was the intention. So, after that four c basically said that if you do not have s to start with but you only have to compose pi k times how do you do that and that is not difficult. So, this is very similar to walking a cycle you just do not care about whether it is a cycle or not you just walk along k times and that gives you pi to the power k and four d is that last one which is how many powers do you need before you fold back to the identity permutation and here again if you do an exhaustive search by actually applying pi and checking you will get some marks will decide exactly how much but if you do it the smart way by taking cycles and then figuring out you know when all the cycles line up in their LCM then you get full trade. Again there is not nothing particularly smarter that you can do in this case but I wanted to comment about one last thing about cycle finding before I moved on to today's actual content which is when we discussed pulling cycles out of a permutation. Suppose I have n elements the union of all the cycles has n elements and there is no reason that you cannot list cycles one by one in a total time which is proportional to n. However there is one small technicality there and in class I wrote bad code in this solution the code has been fixed. So, let me explain why the code was bad and what the fix was. Suppose I have this permutation of n elements in which the first cycle involved the first n over 2 elements. So, it's 0 to 1 1 to 2 up to n over 2 back to 0. So, this is the first cycle and then there are some trivial cycles involving pairs of elements like this. Now in my earlier sloppy code given out in class we what did you do? We had a bit at a marking what elements have been already covered by cycles already reported right. So, as I traverse this cycle I marked off that bit as visited for all the first n over 2 elements and then remember we look for the first unvisited element to pull out the second cycle. In this case how much time would that take? I would skip over n over 2 visited guys and then I would find the first unvisited guy at n over 2 plus 1. So, I take about n over 2 time and then I would cover the next 2 elements and therefore n over 2 plus 2 elements would have been covered. I would again start from here that's the mistake. I would again scan this time up to n over 2 plus 2 for the next 2 cycle I would take time n over 2 plus 4 to get there and it would increase this way right. So, this is roughly n over 2 terms each at least n over 2. So, the total time taken would be about n squared over some constant and that's silly right because why should I always start from the left end to find the first unvisited element. So, that's a wrong thing to do what I should do is I should maintain a permanent marker at the first unvisited thing I last found and when I need another unvisited element I should advance that and remember where I left it off. I should never start from the left again otherwise I will take n squared time doing a job which evidently takes only n time. Now, one important thing is that if you look at how it's written it might appear like in the worst case we might still take n squared time but this marker that I will put up here will only advance to the right and therefore, the total amount of work you will do advancing the marker will never be more than n. So, this often happens you might in non-trivial code find nested for loops and unlike all the analysis we have done where we just multiply the number of iterations of the outer for loop with the inner for loop you can't do that because the inner for loop may actually depend on the position of this cursor which never goes back. See, when you look at other nested loops that we have looked at like going over the rows and columns of a matrix you can have the columns snap back to 0 and the row advance. If that cannot happen then you cannot just multiply the number of iterations in the various loops to get the overall time. So, let's look at the code now in light of that information. So, I keep a num visited to do my cycle tracing but I also keep a next. So, this next points to the next unvisited element and this next will only increase it will never decrease. So, forget about the other stuff like LCM and so on that we know what's going on. So, visited is this bid vector which is all false no one is visited initially and while num visited is less than the size this is the key loop. So, I say while visited next plus plus next. So, if next is pointing to anything that's already visited next will be incremented until its position at the first unvisited element. Now, there is no danger of running out of platform here because I am doing this only when num visited is strictly less than size. So, there is always at least one unvisited element and next will lock into that place. So, you don't need to check the value of next when you do other stuff we can actually use next without fear. Now, you do a cycle begin at next and then you keep going. Now, looking at the code very superficially it might seem like well num visited is going up one at a time in certain cases and then inside I also have a plus plus next. So, this looks like a nested loop each of which might go through n values but it doesn't take n square time because you observe that there is no decrease in next ever. Next can only increase. So, sometimes you need to argue about loops in this way and your actual complexity might be less than what the for loops will superficially suggest. So, that was the midterm. We will look at the distribution of marks. I am basically expecting this time that the median should be more than 50 percent of the total. So, it's 16 and I don't want to scale this down. The quiz one I scaled down because people perform somewhat below my expectations but this one I am basically thinking that some people will miss question four. Most people will get some parts of one, two and three right. One I do expect you to you know everyone to get. If you didn't get one then you should worry a bit. Two and three would be a mixed story and some people will get some parts of four. So, let's see what it turns out. I will publish the distributions of quiz one and meet some after the creep sessions are over. So, we get a good idea of where you stand on the curve. Any questions about midterm? Yes? Okay. So, we have fairly important material in matrices today. We will be talking about Gaussian elimination, determinants and a few other algorithms. So, why do we leave off? We talked about eigenvalues and eigenvectors and the math of why it converges to the dominant values and we wrote some boost code for finding the dominant eigenvector and the last thing we started doing is least squares fit. So, I will just recap that quickly and then we will move on to the next problem with matrices. So, remember that we started motivating it by saying suppose you want to fit a model for the execution time of a program you are writing as a function of the input problem size. For example, if you want to load n elements from a file into your program's memory and then you want to sort it. Our domain knowledge about computers by now tells us that a program has a roughly constant time to get loaded in RAM and startup. Then it reads the file of n elements from disk and that takes time proportional to n with some constant that we don't know. And then you sort it and depending on the sorting algorithm you might take n squared time or you might take n log n time, but those things have constants in front of them. If you are sorting say integers that constant may be smaller than if you are sorting strings. And of course in case of strings it depends on the average length of the string and many other factors. But the idea here is that if you have observed quantities in your artifact that you are studying like the size of the array, the speed of your CPU and things like that that goes into the matrix called x. Those are observed variables or suppose you are trying to predict if someone has a disk of a heart attack. For that there are various predictors whether the person has parents with hypertension, what's the food habits of the person, do they take enough vegetables or they take high fat food or high calorie food, do they exercise regularly or not, so that these are factors. So every factor becomes what's called a feature and that's a column in this matrix called x. In our coding example the features were n, n squared, n log n may be a constant. So we assume that x's are always padded with a constant column of ones just to get the offset correct. Now after that you also have training observations. So while you're trying to learn the pattern of hypertension and heart attack, suppose you have patients who have documented histories of heart attacks or you also have healthy patients. Those go into this vector called y. These are the predictors, these are the observed training examples of what you saw in actual patients or in actual runs of your code. You measure the running time and that's your y. What you're trying to learn is a predictive model called w which linearly combines those feature values into a predictor for y and you would like to fit w so that you make very accurate predictions and the way you do that is you want to minimize some kind of difference between each xi times w and yi. So the actual objective you have written in non- matrix format is sum over instances xi.w, this is the prediction and this is the training observation, this is observed squared. That's why it's called the least squares fit. And that exact expression has been rewritten in a matrix vector notation here as xw minus y is l2 norm squared or equivalently this product of a vector and its transpose, inner product. And after that it's all just vector and matrix algebra and we wrote this down in a quadratic form and then we said that well in this case there exists a closed form value of w called w star which minimizes that divergence between predictions and observations but it turns out that w star involves inverting this large matrix called x transpose x. Now today and the next lecture we'll learn how to invert matrices and many of you may know it already but supposing you don't know how to invert a matrix or it's too expensive then how do you solve for w star numerically without doing an inversion and that's where we use Newton-Raphson namely you know at every step you start you know we start with a guess w and in every step you update the w because you want to minimize the objective you find the gradient of that objective function and go in a direction against the gradient in the direction opposite to the gradient and that's this equation. So you'll find the gradient and you subtract some arbitrary constant called the step size against the gradient. Now when you do this and you use an arbitrary step size eta you always run the risk of overshooting. So again to give you a brief picture in one dimension a quadratic surface to minimize looks like this and let's say I'm initially here so the gradient says the function increases to the left therefore I'll go to the right but suppose the step size I take is too large and I go here I may in fact even increase the value of the objective if I go far enough. So your step size should be smallish you don't exactly know how much yet but suppose you go here you have overshot now you find the gradient again the gradient goes this way therefore you have to come back and maybe you overshoot again but you do that and then you end up at the value. In 2D the shapes may look a bit different in 2D it's like a bowl and you land a marble at some periphery of the bowl and it rolls down to the bottom it may overshoot and what's the shape of the bowl depending on that quadratic term so remember we have this term w transpose x transpose xw so look at this as some matrix A now and then some linear terms maybe of the form w transpose b because drop the constants so in 2D suppose w has only two dimensions they'll call them w0 and w1 what does the surface look like let me draw some contours if A is symmetric and there is no B term in certain cases the bottom of the value will look symmetric because each ring is hired up than the ones inside it and in this case optimization is said to be a little easier the more difficult case is where you have a minimum but there are highly elongated elliptical bowls the bowl is very elliptical it's very flat in one dimension and long in the other dimension why is that why is this bowl difficult here a single choice of a step size will typically do ok here if you are here and you find the gradient going that way you can actually afford to take a long step size you'll not overshoot by months but if you are here and you take a large step size you'll overshoot to a large object so finding the best step size is quite a science and there have been you know books and books on finding the step size now one way to look at the problem in this particular problem is that I have a parabolic bowl if I've already decided on a detection of descent then I can slice the bowl along that plane and that actually gives me a parabolic line 2d now it's just a parabola so I can always hunt for the lowest on that parabola in closed form by solving a one-dimensional parabolic equation the problem is that that's as complicated as finding the inverse of the original matrix so the big question in this area has been you know I don't want to invert matrices I don't want to store inverses they're all bulky difficult things to compute but I would like to buy magic find the best step size large enough that I converge fast but small enough that I don't over sure that's the central problem in convex optimization and you'll some of you will take many advanced courses on optimization will learn much more about it but finding a step size is a very you know tricky art so I won't discuss that it turns out that in this problem subject to very mild conditions you can not really go down and we have seen that you choose some arbitrary eta which is not too large you'll do fine you may be converging slower than optimally but you'll be safe now one thing is you know is there some kind of an upper bound you can say okay as long as it is less than that I will not infinitely overshoot and you can it's not too difficult to derive that but we'll leave it there okay so and you sort of demo that you can find the optimal w at some point yeah which one which line this line this equation is it or something else oh so remember my update equation said w at the next time step is w at the previous time step minus a step size times the gradient at the previous time step this was already on the slides right so remember all these w's are actually vectors okay so this is a vector in some set to so those are the two axes here w is just one thing w zero and w is just a real number so if you generalize your hunting for the minimum of a quadratic surface to multiple dimensions if you draw it in multiple dimensions here the minimum is a point here and the curve is shown like this if I have to show a 3d parabolic bowl on 2d I have to draw do a contour plot so these are contours so it's like a valley and lines here are contours of equal height above the floor so because it's a parabolic surface any section through it will be a circle if the parabola is symmetric but depending on the value of x it may not be symmetric x and b you may have a parabola a bowl that's quits in one direction so in general the contours would be like that okay so today we start with finally start with determinants and you have all read about schoolbook definitions of how to compute a determinant they are exceedingly complicated and I confess I've totally forgotten about it somewhere in my subconscious there are these formulas which look like minus one to the power i plus j and then something called a co-factor which is you chop off the row and the column of the matrix you do something complicated okay so determinants are never computed that way in real life determinants are the only practical way to compute a determinant uses these three axioms number one is that if you take a matrix a square matrix and then you multiply one row and add it to another row the determinant does not change adding you know scaled versions of rows to each other doesn't change the determinant the second is that if you exchange two rows or two columns the determinant changes sign and the third is that if you are lucky enough to get an original matrix which was upper or lower triangular namely all elements below or above the diagonal were zeros then you don't need to work hard at all you just multiply the diagonals up and that gives you the determinant so given this axiom the strategy is very clear it's very similar to Gaussian elimination you have to keep on adding or subtracting you know scaled versions of rows or columns from each other so that everything below the diagonal becomes zeros and then you can multiply the diagonals and report that as the determinant okay so in pictures here's the approach suppose I'm given this square matrix for which I want to find the determinant the first row is gray and the second row is yellow the first row is already done there's nothing to do in the second row I would like to subtract some multiple of the first row so that a 1 0 is taken to 0 so in other words I'd like to find a factor f 0 such that a 1 0 minus f 0 times a 0 0 is equal to 0 so I'd like to find a factor for the first row which when multiplied by the first one then subtracted from the second would reduce a 1 0 to nothing and you can solve this explicitly for f 0 it's just a 1 0 over a 0 0 so for the first half of this treatment will assume that will never fall in trouble by dividing by a 0 0 in that the diagonal elements are never 0 or negligible then we'll fix their assumption afterward so in this case once we do this attraction the very first element of the second row reduces to 0 and so I show that as pure white and a 1 1 others change that's fine now the first two rows are processed and done we attack the third row which has become yellow this time we want to 0 out a 2 0 and a 2 1 ok now zeroing out a 2 0 is easy because only the first row can help that happen and so f 0 is just going to be a 2 0 divided by a 0 0 now suppose we have already subtracted f 0 multiple of the 0th row that means that a 2 1 minus f 0 a 0 1 minus f 1 a 1 the new multiple we want to find for the 1th row that should be equal to 0 and now given f 0 is known already I can solve for f 1 and get another factor for the second row so keep doing so after this step both a 2 0 and a 2 1 have been 0ed out and we attack the next row we make the first three elements 0 and we go on like that ultimately we have an upper triangular matrix and we just multiply out the diagonal elements and forget about the rest of the matrix yes yeah so I am redefining f 0 all the time so for every yellow row I need to find a new set of f 0 f 1 f 2 etc yes no no will be this absolutely so at every point you should think of f the f vector as bound to a specific yellow row that you are trying to do after I am done with a yellow row and moved on to the next yellow row the old f's are all forward they don't matter okay so how many people are already familiar otherwise comfortable with this algebra okay it's very very simple so let's write some code for this and see it at work so the code is simple but the index arithmetic needs a little bit of care okay because eventually I'm kind of I have a pattern like this where in pictures the general stage of the algorithm pictorially looks like this so here's my matrix I have already zeroed out something like that here is my yellow row okay and I want to zero out this segment and below is out of control all kinds of junk now so this is all zeros this is already done and that's final so let's name this row as DX in the code so it's the row where from column 0 to column DX minus 1 I want to zero out okay now how am I going to do that I'm going to subtract various multiples of previous rows which I'll index by Sx to represent subtraction if the row to be subtracted therefore it's the Sx this is the row up to whose diagonal I want to zero therefore it's DX now when I do that subtraction I'll need to run through columns and do the subtraction so the column will be more Cx as usual when I need it so those are the three indices we will use in our code and here is what the loops look like so the very first outer loop will cycle through DX so DX goes from the first or 0th row to size 1 minus 1 size 1 remember is the number of rows now what we want to do is to make ADX 0 through ADX DX minus 1 equal to 0 by suitable scaling and subtraction now how do I do that remember that in general I have to subtract everything each row in Sx index by Sx so I start up the Sx loop here in Sx equal to 0 Sx strictly less than DX I have to compute the factor by which I have to multiply the Sxth row and that's just ADX Sx divided by ASX Sx this is the only important statement to watch so DX Sx divided by Sx Sx so why what is that so I'm currently considering a particular Sx here and this is the diagonal element Sx Sx and this is the element DX Sx so I take the ratio of those so FAC is equal to ADX Sx divided by the diagonal at Sx and then running through the columns I do the actual subtraction that's fairly simple ADX Cx C running from 0 to size 2 which is columns ADX Cx minus equal to factor times ASX Cx so the slight bit of jugglery with the indexes you get one typo here your code will be down but once you draw that picture like that it's not too difficult to get all the indices right okay there's no tricky arithmetic here fine and then finally we multiply together the diagonal okay everyone comfortable with this code can I get a show of hands please yes so let's code this up and try it on some input matrices now so here is the code for determinant 1 now I start off declaring a matrix of doubles do you need the font any larger this is tolerable at the back is it clear larger okay yeah so I declare a matrix of doubles and then I did it in from CN this is one way in which boost can let you directly read a whole matrix from one statement okay how does your input need to look like so here is an example of what your input needs to look like you need to say the number of rows and number of columns and then you need to state each row okay so here is rows columns and the rows of the matrix one after the other so this code is 152641527415 and 3741 so it's a four by four matrix okay and I can read it in one statement by saying read it from CN and I'll show you how that file can be fed into CN okay and now it's exactly the code which I stated in the slide earlier the DX loop the SX loop and the CX loop and finally remember I had to multiply all the diagonals together so I have a product equal to one and then I dance to the diagonals multiplying them with product one by one and finally I print out the product as the determinant okay now since it's not good to trust one-zone coding I am going to fire up Sylab so that we can actually cheat and see what the determinant is before starting up our code good look at the code for that while that kind so right now if I compile it and meanwhile here is Sylab so let me look at the file value so it's 1526 let me enter that data so I say a equal to 1526 next row 4125 7 415 and 3741 huh 4152 okay so that's the actual matrix written down in row flattened order it looks like that now in Sylab we could ask for the determinant of this guy just by saying that of a it's very simple so determinant of a is minus 1176 you want in and found now here I'll say a dot out but remember that this I wanted to read the matrix from C in but I have the matrix in this file called determinant one dot that so all you have to do is to run a dot out and use this less than symbol to say whenever the program is trying to read C in feed it bytes from this file this construct will make this file look like C in to the program so let's do that and this is the runs going on so see in the first iteration is just the original matrix in the second iteration what happens is this four has to be zeroed out and you can zero out the four by subtracting four times the first row so four minus four is zero okay one minus 20 is minus 19 okay five minus eight is minus three and so on in the next round this two elements have become zeros in the next round these three elements have become zero okay and finally the product of the diagonals does turn out to be minus 1176 so we have passed the exam right now one problem is that if I look at other kinds of matrices like this like one two four two four six three five seven okay let's feed that to Sylab and see what happens so let's say b equals one two four two four six three five seven okay that's a perfectly legitimate matrix as well but observe that when I find the factor for the first row okay I'll find that I should subtract twice the first row from the second row and in doing that I'll zero out both the first element and the second element and this will be a problem afterwards because my diagonal will go to zero and in my code if you remember I was dividing by the diagonal here and so I'll fall in trouble okay so let's get a demo of that so if I run the old a dot out on the second matrix less than is required this is what I get after the first step this two has become zero but the four has also become zero as a result after that I get nans and infs because I'm dividing by that zero so this is not acceptable and I need to have some recovery plan on the other hand if you actually look at the determinant value of this matrix there's nothing wrong about it the determinant of b is minus 2 so it's not a singular matrix but the simple first attempt algorithm will not get it right because it will zero out the diagonal so the fix for that is well known it's called pivoting so if a diagonal element dd or dx dx becomes too small in magnitude then to avoid trouble we look for a lower row z where the dth column zd is not too small so pictorially if I find that after I finish off this diagonal element is almost zero that will end danger the computation in subsequent iterations and what I'll do is I'll look for a lower column zx where this element is large or large enough and then I'll swap these two rows remember swapping the row only changes the sign of the that of the determinant I'll remember how many times I swapped and that will be enough to fix the problem now if it turns out that all of these are zeros you can actually prove that the determinant is zero so anyway and there are various ways to do this pivoting this is just one way now let's see the modified code for that so that is determinant 2 which is the exact same code but I have this threshold 10 to the power minus 7 in case okay so I do the exact same thing here now I check if pivoting is needed so if the absolute value of a dx dx is less than threshold then the yellow line which I have just processed has fallen in trouble and I find a lower row that does not have a near zero in the column dx so I look for z or zx by initializing to an illegal value of minus 1 then I run rx from dx plus 1 to number of rows that's the scan down that column and this time I don't I'm not very picky I just look at some element which is larger than threshold and I pick the first one I find and I break the loop that gives me a z or a zx row that's good enough if that is minus one then we cannot pivot otherwise I swap the rows dx and z and swapping column by column is very easy so we run a column loop again cx and I use a temp to swap a z cx and a dx cx so the rows dx and z are swapped but the important thing is that I have to use dx now because I have to reprocess that lower row okay the important thing is that I took the trouble to zero out all these entries but these guys could be non-zero this has now come in this position therefore I have to redo dx okay so I reduce dx by one before going out of that check yes but you can't even finish the operation because you're because an intermediate diagonal has become zero right you can't divide by it yes so you can you can swap ahead of time sure you could do that any linear combination is okay you can do it in a way that you don't hit this term there many ways to avoid this problem just one it's not even a very good way there are problems with this also that I'm not mentioning so you know then if the pivoting happens then you reduce dx and repeat the loop otherwise we exit the loop and then we find the product as usual so now how does determinant 2 behave so g++ determinant 2.cpp so now if we give it that slightly problematic matrix what does it do this time see it's turned into 0 0 but then realizes the problem and shifts it down so this 0 0 minus 2 is gone to the last position and this 357 came in and was rescaled to become you know part of the diagonal part of the triangular matrix and then if you multiply that diagonals you get 1 minus 1 minus 2 product is 2 but remember I've swapped a couple of rows that gives me minus 2 in there so determinant 2 is a slightly more robust algorithm but it's not the best you can actually do better pivoting and in fact there are cases where the determinant 2 will also fail why it shouldn't so there are better you know more reliable pivoting algorithms which will be globally correct the other thing you noticed is that I was just really picking any z which gave me a non-zero but that doesn't have the best numerical properties okay if you pick the largest one under certain settings you'll get more stable algorithms so you can improve the algorithm in various ways but the some is somebody is that you know you can compute diagonals you can compute the determinant of a matrix nice so any question about finding determinants numerically how much time does this take by the way can you make an estimate of the time suppose pivoting is not happening all the time but most of the time there is no pivoting or there is no pivoting then there are loops the outer loop is dx loop which happens n times inside there is the sx loop which happens dx minus one times but that's like n squared and finally inside we have the column cx that happens n times so overall this is like an n cubed algorithm the outermost loop happens n times the second loop happens about n times on average or in over two times on average and the innermost loop happens n times you multiply those you get n cubed so it's a fairly expensive algorithm it's harder than sorting it's takes more time than even selection sort now the next problem we'll look at is very simple given we have already done this which is to solve a system of linear equations so how do you do that well the input goes like you're given matrices a and b okay or a and y where a is n by n and invertible say and x and y are n by 1 column vectors and the goal is to solve for x now this is quite distinct from the least square problem where we expected the w matrix to be actually quite redundant w has sorry the x matrix so there the x observations had many more instances than the variables in w that you wanted to fit like if you're if you think your sorting algorithm takes some time which is the function of a constant n n squared and n log n that's just four things to fit but you can easily run your code 500 times to get a scatter plot and you're trying to fit the scatter plot so least squares fit is by definition a over-determined problem whereas solving a system of linear equations is exactly determined see in least squares fit you are expecting redundancy and you want to minimize the divergence between x w and y so your objective was to minimize x w minus y norm but here we want exact equality I want x to be exactly equal to y and therefore I can't afford to have any kind of a unlike any kind of x which I saw earlier see subject to experimental error your x could have been anything you could have a lot of noise in x people don't always get hypertension if their parents are hypertension it's a completely noisy world right but here it's totally mathematical I want an exact fit of a x and y and therefore a has to be much more controlled it has to be invertible it has to be full rank so suppose a is given as an n by n matrix and the goal is to solve for x now observe that whatever we have done on a so far in finding the determinant was basically a swapping rows which is swapping linear constraints or it was adding a multiple of a constant to another constant which doesn't change anything the only new piece here is that there is this column vector called y and as I am operating on row pairs of a I also have to operate on the corresponding pairs of y elements okay so again we can use our earlier process to turn a into an upper triangular matrix this is what we do in Gaussian elimination we turn a into an upper triangular matrix which transform y simultaneously at the end now we need to add code to manipulate rows of y in sync with rows of a now we can solve it backwards we find xn minus 1 then xn minus 2 and backward to x1 and the equations are x0 and the equations are very simple so now our final matrix is squared and it looks like that upper triangular thing and here is you know a here is x after transformations and here is y so obviously xn minus 1 is just equal to yn minus 1 divided by a n minus 1 n minus 1 and then for the for the general case what we have is a i i x i plus a i i plus 1 x i plus 1 plus etc plus a i n minus 1 x n minus 1 is equal to y i so you run down the ith row from i to n minus 1 from i to n minus 1 and you get yi now if you solve for that it's quite easy to see that I can solve it backwards and my expression for x i is simply yi minus sum over this quantity a i j x i j divided by a i n just directly solving them so the code looks like this so if a have already been turned to upper triangular and y is available and x is allocated this part of the code is quite easy so I run from ix down from size minus 1 to 0 remember here x i depends on all larger x j's x i depends on all x j where j is larger than i I'm solving downward from the highest index downward right so I initialize x i to yi and then j runs from i plus 1 to size and I directly implement this equation I initialize to yi I keep subtracting a i j x j a i j x j and finally I divide by a i i I'm just avoiding allocating a new variable I stored the all the intermediate results in x i itself so so everyone okay with this the only missing piece which I haven't discussed in detail is this red line which is code to manipulate rows of y in sync with a now observe that unlike diagonals when we swap two rows that's like swapping two linear constraints equality and I don't have to remember anything interchanging the position of two linear constant doesn't change the system at all okay I just need to swap the corresponding yi is also that's all fine so in writing the code it helps if we I'll write it from scratch just so that no one gets lost so but I'll have my cheat sheet handy to cut and paste but we'll discuss everything as we cut and paste so first all the prelude stuff which doesn't matter now the first thing we'll do is create a stub function called make upper triangular okay and let's see so I'll start with the file being essentially nothing so we'll see that everything works out okay so in in doing coding using modularity like declaring functions and using them it often helps if you start typing out the main method pretending that you have all the supporting libraries see earlier when he used to code by using square root and log and X we assume someone else had written them for you it's good to keep coding in that style pretending that someone has already coded make upper triangular for us and then fill in the details so what do we do we declared a matrix called a we read this in from CN and we write it out just for verification now we invoke upper triangular on a and why remember we also have to pass in why so let's say why is a vector of doubles we read in why and we print it out as well so remember the first input to this problem so main does the following solve a x equal to y right so here we input a now we input why now we make a upper triangular while transforming wine corresponding ways and then we write it out okay so we code part by part now how do we make things upper triangular well here we go so what is this stuff okay so I'm saying that I'm declaring a function called make upper triangular where the first argument is a matrix of doubles the second argument is a matrix of is a vector of double now let's do one thing let's not do anything at all in make upper triangular let's just compile it and I will submit one a and one why so what what is my input look like so I've written out in boost format a which is a 3 by 3 non singular matrix and why which is a 3 by 1 column vector this is a that's why I'll read them both in from that file and just print them out and I'll run make upper triangular but nothing will be done inside this is a good way to start developing code so I compile that and then I run given the data file so what happens it says a was that file so for comparison I'll just print out the original files themselves so a was 1 2 4 2 4 6 3 5 7 ok so sure enough a is 3 by 3 1 2 4 2 4 6 3 5 7 why is 21 34 42 why is 21 34 42 now I didn't do anything inside the make upper triangular so the upper triangular matrix is exactly identical to a nothing has happened right now let's try one inside this I'll just say a 0 0 equals minus 100 so I'm not really doing upper triangularization I'm just mangling one element of a to an arbitrary value just to check what happened so if I compile this again suppose I print this out I'll say inside make upper triangular equals and then I print a so all I did is this is not this test code I'm not really coding what's required I have passed a and y as inputs to this function I'm just changing a 0 0 to a value and I'm just printing out a as I see now if I compile this and I run what do I find inside make upper triangular a 3 3 has become minus 100 in place of 1 2 4 so that's because I changed a 0 0 to do minus 100 but when I reach outside it's back to 1 and that's because c and c plus plus pass parameters by value meaning that this a when I call make upper triangular on a and y it actually makes a copy of a it makes a copy of y and passes those copies into the function what I do to those variables inside the function stay within the function when the function exists there is no direct way of communicating those changes to the outside world if I pass parameters like this the correct way to do it in this case if you want to change and why for the outside world is to pass it by what's called reference so this ampersand sign tells the compiler to not make a copy it tells the compiler that whatever changes happen to a inside this routine will be populated outside even if you call it something else like we can call it b and we can now say b 0 0 equal to 100 and then I can call all of these b's but this ampersand makes sure that the changes are actually happening to the matrix outside it's just a name change it's not a copy of values so this time if I compile this and I run that this time you see that even after the return the so-called upper triangular matrix actually has the change reflected in the diagonal element so because make upper triangular will obviously have make changes to a and y I should declare it with the ampersand so let me admit to a because my other code is like that and I'll take out this test code now let's put in our upper triangular code in it so what do I have to do for upper triangularization so here is the body of the code that does the upper triangularization again remember that the initial you know I want to have the DX loop to start with then I have the sx loop for the subtractions and I have the cx loop for the columns as I subtract again if I see that pivoting is required I'll find a lower row z okay now the important thing is here as I run through the columns I'm subtracting fact times a sx ex from a I have to do the same thing with why so to give you the picture here I want to zero all this out here is sx an earlier row between sx and DX I found a factor fact so that I have to sub subtract a so adx comma star should become adx comma star minus fact times a sx comma everything that's the operation I want now obviously what I'm doing is you know actually this was a that was x and I was saying a x equal to y so I need to extend dx and sx here and I also want to subtract fact times y sx from y dx I'm transforming both sides of the equation otherwise the equation will be violated right so I also have to do y dx minus equal to fact times y sx and that's exactly what you see in this line here so after I am done with the column loop I also fix up the why similarly if I see that a pivoting and swap is required I swap rows dx and z I do that not only for a but I have to do it for y as well those are the only two changes and then DX has to be reduced like before so this is the code for make upper triangular and once the matrix has become upper triangular I can solve upper triangular where someone has given me a upper triangular matrix and a transformed y and I'll output the solution x and this is the same code that we saw in this expression this expression is directly implemented in this ability we'll review this again but let me just quickly run this so what how is the call sequence look like first I make a and y upper triangular or a upper triangular and then I allocate the new vector x and then I solve upper triangular by passing it a y and x is output from and then I print out x so I'll compile the old code Gaussian dot CPP and now if I run this giving it the Gaussian data set you see that a was 1 2 4 2 4 6 3 5 7 y was 21 34 42 after upper triangularization it becomes 1 2 4 0 minus 1 minus 5 0 0 minus 2 and then x can be solved to 3 1 4 you can verify that so that is Gaussian