 So, this is lecture 32 and we have been looking at constrained complexity equalizers, let me quickly summarize where we are, so you have sk entering h of z, then to this noise gets added, remember h of z is the complex baseband equivalent, so I think we think of it as a complex filter, n of k some noise, you get the z case, I am trying to design one filter which I called cz, at its output I am going to put a slicer, then hope to get some estimate of this hat, so on the c of z is constrained in complexity as in, so c of z is going to be simply summation m equals minus p to p cm c power minus m, so it is got, the order of this filter is 2p plus 1, this is the order, order is what, number of terms, so to speak in the c transform, so that is order, so it is finite order, so the question to ask now is what is the, what is a good choice for c of z, so this is what we have been looking, so the DFE structure would have feedback around the slicer, the linear equalizer structure does not have any such feedback, only one filter which is the preprocessor, so this is a linear equalizer, the question we asked was what is the choice of c of z, so the various ways of choosing it, one way would be the zero forcing method, but we are not going to see that, we are going to see the mean square error criteria, so the criteria I had was to call this xk, then look at the error ek, so I defined it as xk minus sk, so remember once again we are going to think of ek, xk, sk all of them as random processes, so I am not going to change my notation, so typically people use capital letters for random process and small letters for the actual samples, but I am going to simply use a notation and depending on where they show up, hopefully you can understand which of them are the random variables and which of them are the constants, so for instance all these guys are random variables, but c is not a random variable, so it is a constant vector that I am going to pick, so the criteria was to minimize the mean square error which is expected value of mod square of ek, remember once again all of these guys are complex random variables, so each of this ek is a complex symbol, so I want to pick c such that this quantity, the mean square error in ek is minimized, so this is what I want to pick, so this could be a criteria and we saw this is a pretty good criteria for designing receivers, equalize this particular, so then the question is now to write ek in terms of c, I want to minimize this mean square error over choice of c, so my argument is c, so I have to write it in terms of c and write it in a suitable way so that minimization becomes equal, it becomes easy, so for that we introduced some vector notation, so my c I introduced a vector c which is what, it goes from c l or p or minus p here, I think it was very confused now, did I put cp of, sorry minus p, c minus p2, you can do it in various ways but c minus p to p is, so this is my vector c, remember c is once again complex, so this basically represents all the filter coefficients, so once I give you a vector c you can implement the filter c of z, so to look at xk correctly, we define this zk which is once again a complex vector, I define that as zk plus p zk so on down to z k minus p, so once you define these two vectors, so remember zk is actually a vector random process, so each k you have a random vector and it is, it is updated in a very simple way, you just push it one step down and then put the next entry on top, so it is a very simple cyclic shift type process, so zk can be very efficiently represented in software or hardware or anything you want to build at the receiver, so it is not difficult to imagine, so once you do this we saw xk was what, simply c transpose times zk, so once we write this it is just a simple manipulation from that point on, so it is just linear algebra manipulation, you do it carefully because you want to ultimately the goal is to minimize msc with respect to c, so you should write it in a way that you can do that easily, so if you do that turns out you get three different terms, so the msc is written, you get three different terms, the first term is very simple to in fact evaluate and practice, the first term is simply the energy in the transmitted or the received symbol sk, so whatever that is carefully modeling, so this is what we have been calling es all the time and then there will be one more term which involves something of this form, so here you have to pay some attention because the way you write it things can go wrong, zk star, zk transpose, and then c, so notice my notation when I put star I am going to conjugate each term in the vector without doing anything else, when I put transpose I am going to transpose the vector without doing any conjugation, so if I say star t then it means doing conjugate transpose, so the notation changes depending on who does it, some people just use star itself for conjugate transpose and then put a bar or something for conjugation, all these things are possible but I like this notation, can I do c star instead of star t? You will get the same answer, so it does not matter how you write that but should write both of them to be sure, but this is not all you have a third term which is, so it is actually two things which are conjugates of each other, so you can collect them together and then write it as two times real part of c conjugate transpose expected value of sk zk star, so if you are not used to these kind of manipulations this will not be very trivial, so go through and make sure you can write ek as ek times ek star and then go through the whole step and make sure you reduce it to this form, so something that can be very useful, so if you notice the first, so now we have to start thinking about evaluating these expectations, so for that you need to know something about the distributions of these random processes, so I do not want to get too specific to a special case, so we will try to keep it as general as possible but one assumption will make us all these random processes are jointly white sense stationary, so individually white sense stationary and also jointly white sense stationary, so that will get rid of a lot of things, so let us look at this term for instance, this expectation, if you write that, in fact what is this quantity, is it a vector or a scalar after evaluating the expectation, it is a vector, zk star is a vector, I am doing expectation of a vector, so I will get a vector, what will be the dimension of this vector, so n cross 1, it has to be because I am multiplying on the left with c conjugate transpose which is 1 cross n and I am expecting a one number, so in fact a real number, so after we take the real part, one complex number, so it has to be 1 by n, so what about this guy, that will be a n by n matrix, so remember that these things are all important to know, so let me try to write this first n by 1 vector expectation, it is going to be sk times, the first term of zk conjugate is zk, keep forgetting what to do, did you k plus p conjugate, remember this conjugate and then you will have sk times z conjugate k all the way down to sk times z conjugate k minus p, so this is my term, I am going to call this vector as alpha, so remember it is once again an n by 1 vector, this vector I am going to call as alpha because it shows up there, should I put alpha sub k, this all kinds of k is floating around in the term, I am just saying it is alpha, should I put alpha sub k or can I just drop the k, what assumption is going to allow me to drop the k, the white sense stationarity assumption, so I know since s and c are jointly white sense stationary, if I take expected value of sk times z star k plus any m, it is going to be a function only of that m and the k itself does not matter, since I know it is white sense stationary I can drop the k, otherwise of course the k will show up in alpha also, so since the jointly white sense stationary assumption has been made, I can drop the k and I will get certain evaluation, so maybe at this point I do not even know how to evaluate it, but it is in terms of the cross correlation of s and z, so using the cross correlation of s and z one might be able to evaluate, so that is all, but remember it is independent of k, what else can you say about these entries, it is difficult to say anything else, each of these entries can in general be a complex number, so it is cross correlation stuff to say anything about cross correlation in time, so that is the first term, so now we will look at the other term which is showing up in the quadratic expression, so remember this is this expression the MSc, if you actually write it down in terms of the coefficients of c, it will be a what is called a quadratic form, it will have c terms, it will also have c 1 c 2 type terms, well of course the conjugate will show up here and there, but c 1 c 2 types terms, it will not have anything else of larger degree, so it is in general it is only a quadratic form, so this is the linear term, the linear term is what we did just now and the quadratic term is what I am going to look at next, so this is the way to look at it, so let us look at that matrix carefully, that matrix I am going to call phi, this is expected value of z k star z k transpose, so let me write down that matrix inside in detail and then we will see it is actually quite easy to see what it is, so the first row is going to be z k conjugate k plus no there is no z k anymore, z conjugate, so maybe I should write it just fully first and then we will do that, so I will write it fully, rotation is going to get into trouble here, so I need another vector here, this is z k star z star k, z star k minus p multiplied by z k transpose, so you get z k k plus p, oh there is no k subscript, there is too many subscripts floating around, z k z k minus p, is that okay? So that is what we have here, so it is a question of pushing the expected value inside, but you notice all the products are going to be z k and then z k plus m of the form z k times z star k plus m for some m, that is what? That is the autocorrelation function of the random process z, so I am going to define, I know I am going to write this finally in terms of autocorrelation, so I will define the autocorrelation of z which is formally defined as you should be careful here where you put the conjugate depending on where you put the conjugate, the plus and minus will show up differently, so pay some attention here. So I am going to define the autocorrelation as expected value of z k plus m z star k, so once again it is going to be independent of k by the white sense stationarity assumption, so once I do that that goes through, so if I do it then phi becomes, so before that let us just quickly review one property of the autocorrelation function, rz star of m is going to be what? It is going to be rz minus m, so this is a property of the autocorrelation, the reason you have this property is because, where do you have this property? Well you can substitute it here and verify it very easily, but there is also another important property for the autocorrelation, what is that? The power spectral density is real and non-negative, so that is the more crucial property, so if I do say a DTFT here and if I get some S of e power j theta, this is going to be real, non-negative. The real part will come from just this conjugate symmetry, the non-negative part has to be guaranteed, you have to check that, so only those auto correlation, those functions, well autocorrelation is actually a signal, it can also be thought of as a signal, it changes just one dimensional sequence, so only those sequences it satisfies all these properties can be this matrix me, can show up in this matrix phi, other things cannot show up, so remember that it is important. So let us try to write this phi in terms of this rz, so once I do that, phi is going to be, go back and check this, you can easily see this, it is just a difference of various things, I do not want to spend too much time on this, on the diagonal you will simply get rz 0, you will not get anything else, the diagonal you will simply get rz 0. The first term will be rz minus 1, so the way that is just, so I am putting square brackets and I have suddenly moved to curve brackets, I apologize for that, so I should put square brackets. It is going to be rz minus 1 and so on till rz minus 2p, so this one down below will be minus 2p plus 1 and this one would be rz 2p and all that, no did I get that right, this will have to be rz 1, so let me come to that later, so I will come to this guy later, sorry I think that is wrong. So this will be rz 1, if you keep on writing down this way, this will be rz 2p, so I was right I think, this will be rz minus 2p minus 1, so let me write it this way so that you see where it comes from, down like this, this will be rz 2p minus 1 down to rz, so this is how it will go. So you see it is what is called, I am sorry, it is unitary, I do not know if it is unitary or not, you cannot say it is unitary, it is, well it is a topless matrix, first of all, as in the first column and the first row are the only things that are important, so everything else depends on just the first column and the first row. So another way of thinking of it is phi ij is what, phi ij is the ijth element of phi is what, rz i minus j, can I say that, rz i minus j, you can go through and check it, so i and j will run from 0 to 2p, if I run i and j from 0 to 2p then phi ij can be written as rz i minus j, so any matrix which is the function of just the difference of these indices is called the topless matrix, so this makes this matrix topless, so these matrices have a lot of properties, you might want to study that, the other comment is about Hermitian symmetric, so phi is Hermitian symmetric, that is very very crucial, so this ends up solving a lot of our problems, it is Hermitian symmetric, what is Hermitian symmetric, phi conjugate transpose is going to be equal to phi itself, so that is the thing and the third property which is once again comes from the power spectral density is that one can also show phi is what is called positive definite, so it comes because the PSD is required to be non-negative, so you can think about it, so positive definite means this quadratic form phi star conjugate phi x will be, no there is no star after x will be non-negative for all x, sorry that is why you should go back and revise your energy, Hermitian matrices need not necessarily be positive definite, see another way of thinking of positive definite is non-negative real eigenvalues, so Hermitian matrices only have real eigenvalues, you can have negative real eigenvalues and immediately have a Hermitian symmetric matrix, nothing will stop, so positive definite forces the eigenvalues also to be non-negative, well real comes from the Hermitian things, of course you cannot talk about complex numbers being greater than 0 less than 0, so of course eigenvalues would be real, all those things come through, but Hermitian symmetric does not necessarily mean positive definite, so this property the quadratic form being non-negative is also quite important, so this is one more thing we will use, so this can also be taken as a definition of positive definite, you can go by either way and then show that this works, so the very fact that I am saying this is greater than or equal to 0 forces the fact that phi has to be Hermitian symmetric, otherwise if I do conjugate here I will never get the same thing it has to be real and all that, so this quantity has to be real first, so that will force phi to be Hermitian symmetric and then it has to be positive which will say something about the eigenvalues, so those are the properties here, so typically we will assume it is positive, well I think I should say positive semi-definite here, so semi-definite is equal to 0 also, so typically we will assume positive definite in the sense that it is strictly greater than 0, so all the eigenvalues are positive, they cannot be 0, so that is an assumption we will make in most cases it is true of, so that is phi and notice one can compute this phi, given zk, given the zk you can compute it, so it is after all the autocorrelation, so one way of doing it is to look at zk find its PSD then do a inverse DTFT you will get the autocorrelation, so you can do that in several ways, so it is an eminently computable thing it is not some abstract matrix I am defining, so given the statistics of zk or how it is derived, in fact given hk and the statistics of nk I can compute all these quantities, zk after all is what, sk convolved with hk plus nk, so given hk and the statistics of nk and given the statistics of sk I can exactly compute all these quantities, so all these quantities are computable, for instance in an exam they are computable, so if I do not think that these are not computable you can actually compute them using some very simple things, so once I write all these things my mean square error can be written in a slightly simpler form, so after all these simplifications or substitutions, well I have not really done any major simplification simply doing substitutions, my mean square error becomes minus 2 times real part of c conjugate transpose alpha plus c conjugate transpose phi c, so that is my mean square error, so the problem basically is to find a c that minimizes this mean square error, so remember once again this is a quadratic form, minimizing quadratic forms is quite trivial, if you learn enough linear algebra you will know that its quadratic forms are very trivially minimized and you can do them in several possible ways, the way I am going to do it is by, I will show you some two or three ways of doing it, the first way is to do by completing the squares, so just like you minimize a quadratic, if you have a x square plus b x plus c what do you do, you complete the square and then you so know that the square can only be positive, you set x so that that goes to 0, whatever you get has to be the minimum, so it has a unique minimum, all those properties we can prove. So, it is very similar technique but since it is vector it becomes a little bit more twisted. So, we rewrite m s e to complete squares and you can see that m s e will become, first term will remain as it is a square and you will get a alpha star conjugate phi inverse alpha, so phi inverse, so the very fact I am writing inverse means what I have already assumed positive definite, so we will assume positive definite plus phi inverse alpha minus c conjugate transpose phi phi inverse alpha minus c. So, I have completed squares, so when you complete squares you can actually see what you will get there, so the non-negative part will be this quadratic form and since I know phi is positive definite this is going to be strictly greater than or equal to 0. So, only way I can make it 0 is what? So, set x to be 0, so this quantity has to become 0, in that case it will become 0 and so from here it is easy to see that c opt or the best choice for c is going to be phi inverse alpha and the minimum m s c, minimum mean square error is going to be e s minus alpha star transpose phi inverse alpha which is actually c of. So, that is my final result, this is important enough we can box it. So, the critical thing is to show that these two are actually equal, spend some time and try to make sure you understand how this completing the squares happen. So, it is not very difficult to pay some attention. So, we readily see to find the optimum c you have to do the optimal filter you have to do phi inverse times alpha. So, phi is your autocorrelation matrix and alpha is the cross correlation vector. So, you do phi inverse times c you get the best possible filter which will give you the minimum mean square error with constraint complexity. Also I have also fixed the order right, I said order n, best order n filter which minimizes your mean square error is given by phi inverse, very simple at the end. So, so the mic is really giving me trouble, so looks like I need to hopefully it is you do something else let us see. I think it is picking up something, so it should be ok alright. So, there are other ways of deriving this, I want to point out one other way because we will use that later in the theory. So, the other way is to do. So, look at the mean square error as a function of, so this is an alternative method for doing the minimization, view it as a function of c. So, it is a vector valued function, remember c is actually a complex vector. So, you have two times n real variables and do some multivariable calculus on it. So, basically find gradient with respect to c and then equate that gradient to 0. Since you know it will have a unique minimum because it is a quadratic form, wherever the gradient goes to 0 should be the optimum value. So, this is a little bit more difficult because it has been written as with complex numbers. So, you will have to undo that thing a little bit, write it properly in terms of multifunction, multivariable function. Once you do that you can find the gradient. In fact, you can show the gradient with respect to c for the m s c is going to be what can you show it is going to be? It is going to be 2 times phi c minus 2 times alpha. You can show this is the gradient for the mean square rather. It looks like the mic is completely given up. We try some more magic, it is not going to work. Oh no. Let me just hold it for a while. We have about 20 minutes to go. So, it should be able to do that fine. So, the gradient is interesting again. So, remember this result we might use it later. The gradient for the m s c with respect to c is going to be 2 phi c minus 2 alpha. It just involves painful differentiation and all that. So, I am not going to go into detail here, but ultimately it is a quadratic function. So, the derivative has to work out to be linear in c and that you are getting. And in fact, it has to have 2 times c because you are doing a c square type term there. So, eventually all these things have to work out. You can see that it will work out very easy. So, from here also you can quickly derive that c opt is going to be phi inverse. So, it is a simple derivation. There is nothing really fundamentally challenging here. It is just a question of carefully rewriting the way we do the m s c. Any questions so far? Anything that threw you off completely seems okay. All right. So, let me quickly summarize what is going to happen here. Okay. So, if you want to derive the optimal constraint complexity linear m m s e equalizer. Okay. So, that is what we are trying to derive. So, deriving m m s c linear equalizer with of order n. Okay. Of order n. n equals 2 p plus 1 is basically equivalent to solving a solving a linear system of equations. Right? Okay. So, it is simple enough to see this. That is basically given by phi times c equals alpha. Okay. Given that you know the channel. Okay. So, how do you find these matrices? What are these phi's and c's and all that? c is a n cross 1 matrix. Alpha is also an n cross 1 matrix. This is a n by n matrix and n is 2 p plus 1. Okay. I have written that n is 2 p plus 1 already. Okay. And how do you find phi and alpha? It is quite easy to see that. Alpha is going to be. Okay. So, I will start with z k. So, you are given z k to be s k convolved with h k plus n k. Assuming I know the statistics of n which is typically assumed to be white and I know what exactly h k is. Okay. So, I have to know exactly what h k is. Then I can find the statistics for z. Okay. So, phi and alpha derived from the statistics of c. In fact, alpha can be written precisely as expected value of s k z star k plus p. So, on down to expected value of s k z star k minus p. Okay. So, this is what alpha becomes. And phi, the ijth term of phi becomes. Okay. Remember i and j run from 0 to 2 p. It becomes simply rz, the autocorrelation function of i minus g. The autocorrelation function of z evaluated at i minus g. Okay. So, it is a simple linear equation solving problem. All right. So, that is the summary for the constraint complexity linear equalization. Okay. So, it is, I would say it is not technically that difficult, but one needs to get used to these various matrices and eigenvalues and all that. So, it might take some time from that point. Okay. So, the next question is how do you solve this linear equation? Okay. If you know, if you know the matrices phi and alpha exactly, there are numerous numerical ways of solving it. Right. What would be, what would be one way? What is the simplest, straightforward way of doing it? It still goes in elimination, find the inverse. Okay. Right. So, that seems like a very good way of doing it. But eventually, we would like to go to a situation where we do not know phi and alpha. Okay. What is the only thing that is known? We only know z. Right. Z we know. We have to know something. So, we know z and maybe we know something about h and alpha by h and n, but we do not know h exactly. We do not know maybe even the statistic, we know the statistics of n. Of course, you cannot know n. Okay. So, you have only z. Then in that case, whatever method you have solving the linear system has to kind of work in a way in which you can calculate our kind of estimate phi as well as alpha as well as do the solution of the linear equation. So, all of those things can be rolled into one nice adaptive algorithm. And to go towards the adaptive algorithm, we need to try and solve the system of linear equations in an iterative way. Okay. So, first I am going to talk about this iterative algorithm for solving this linear equation given that I know phi and alpha. Okay. Without then too many constraints. And then you will see that iterative algorithm can be nicely modified to accommodate a situation where I do not know phi and alpha exactly. And it turns out that through the iterations, it will also estimate phi. It will keep estimating phi as it runs. It will also estimate alpha. It will keep estimating alpha as it runs. Keep adapting to a new fetch. Okay. So, that solution is useful because tomorrow if it changes, then you do not have to go and reset anything in your program. Okay. So, it is going to continuously adapt in its own way depending on the program. Okay. So, that is what we are going to see next. And that algorithm is called the MSC gradient algorithm. So, like I said, it is iterative as in you will start with an unknown C. Okay. So, for now I will assume you know phi and alpha. Okay. We will assume these things are known. You do not know C. I will come up with an iterative algorithm for doing this update, finding C. Okay. So, iterative as in you start with a C0 which is arbitrary. Then using C0, phi and alpha, you will update it and find C1. And then you find C2. So on. Okay. So, in general, I need an update rule for finding Cj plus 1 from Cj. Right. That is all I need. Once I have a rule like that, I can simply apply that repeatedly and get a iterative algorithm. The eventual hope is what? This will converge to Copt, which is phi inverse alpha. You might say, well, I might find phi inverse directly. I could find phi inverse, but that is not the point. Right. So, I am trying to develop an iterative algorithm so that in the case when I do not know phi also, I have some way of running it. Okay. So, that is the idea. Alright. So, this rule for finding Cj plus 1 from Cj, I am going to just give it to you and then justify it later. Okay. It is going to be something like this. It is going to be Cj minus. So, the gradient is going to play a role. Okay. So, it is kind of like a gradient descent. Okay. So, roughly that is what is going to work. I am going to find the gradient with respect to Cj for the mean square error. Okay. So, I will write the mean square error explicitly in its notation as opposed to writing MSC. Okay. So, there is an arbitrary thing here, beta by 2, which is the step size, which is again used in most gradient descent algorithms. And this is the gradient. Right. And what are you doing? You are actually moving opposite to the gradient. Okay. So, why am I moving opposite to the gradient as opposed to along with the gradient? Sorry. Yeah. We are trying to minimize. Right. So, you are doing a minimization. Right. You are not trying to gradient to point towards the direction where you have the maximum increase. Okay. So, you want to go opposite to that. You will have the maximum decrease. Okay. So, it is what you want to do. So, it is all, see remember this function is a quadratic in C. So, the gradient and all is very nicely defined. There is no problem. It will be a very simple gradient descent and you know exactly what it is. It is not a very complicated function. So, you do not have to worry about it too much. Okay. So, of course, the step size has to be chosen. Okay. So, what we will do next is to try and come up with the nice choice for the step size. Okay. So, we will analyze the convergence and see how it is converging and then based on that we will pick up suitable step size. Okay. So, that is the only thing to choose. Okay. In fact, the gradient we know already what it is. Right. Already the formula has been derived for the gradient. Right. It is 2 times phi times Cj minus alpha. Okay. So, we know 2 times alpha. Okay. So, we know exactly what the gradient is. So, you see why I have put beta by 2. Why have I put beta by 2 as opposed to beta? Yeah, there is a factor of 2 in the gradient. Okay. I want to cancel that and finally, the expression should be simple. So, I will put beta by 2. Okay. So, let us do those substitutions. So, you see Cj plus 1 becomes Cj minus. So, you substitute from the gradient. It becomes beta. Well, I put plus here just for making all these things plus. So, you get alpha minus C phi Cj. Right. So, you play around with it. Maybe you can write it as I minus beta phi Cj plus beta alpha. Okay. So, that is the update for Cj plus 1. Okay. So, the hope is you start with an arbitrary C0 and choose maybe a small enough beta or a large enough beta or an optimal beta or good enough beta and keep on iterating this. Keep on repeating it. Eventually, you should hopefully get to the C out. Okay. So, we have to see that and to see that let us look at the algorithm. Okay. So, what happens now is so, let us see the error after the J plus 1 iteration. Okay. So, that I am going to call Qj plus 1. The vector Qj plus 1 is simply Cj plus 1 minus Copt. Okay. So, what is Qj? Cj minus Copt. Okay. So, I am going to now derive an iteration for Qj plus 1 in terms of Qj. Once I do that, it will become slightly easier to analyze. Okay. So, let me see who is going to give me an expression for Qj plus 1 in terms of Qj. So, what is Copt? C inverse alpha. So, I want to form where you have a matrix here multiplying Qj, i minus beta phi. Okay. So, that is it. Okay. Go through and check that. Make sure that this works out. You are able to do that very easily. It is a simple question of just manipulating with the various variables. Let us say there are too many variables for such a small problem you may confuse. Okay. So, should know which one to drop and which one to use. You will get this i minus beta phi. So, what happens to Qj plus 1 in terms of Q naught power j plus 1 times Q naught. Okay. So, Q naught is your initial error and it is getting multiplied by i minus beta phi raised to the power j plus 1 to get Qj plus 1. Okay. All right. So, this is how I have written the error after the j plus 1 step in my gradient descent algorithm. Okay. So, now, I have to figure out how this matrix is going to behave. Right. So, what would I want? Ideally, I want that matrix to go to 0 as j becomes very, very large. Okay. So, one way of looking at it is to go to this scalar case. In the scalar case, i becomes 1, beta is beta, phi becomes some phi naught which is a scalar. Okay. So, when will 1 minus beta phi raise to the power j plus 1, 10 to 0 and beta phi is less than 1. So, something like that we want for the matrix case now. Okay. So, that is the only thing. So, we want this to 10 to the 0 n by n guy. Okay. So, when will that happen is the question. Okay. So, now, we will start using all we know about phi. Okay. So, eventually that is what we will do. Okay. So, what do we know about phi? Phi is a Hermitian symmetric matrix. Okay. So, once you know it is Hermitian symmetric, phi can be written using what is called spectral decomposition. Okay. So, it turns out for all Hermitian matrices you have what is called a complete set of orthogonal eigenvectors. Okay. So, you have a set of eigenvectors which are orthonormal. Okay. And they are all eigenvectors for this phi and this span the entire space. Okay. So, you have a complete set of orthonormal eigenvectors for any Hermitian symmetric matrix which means you can write phi as summation i equals 1 to n lambda i V i V i conjugate transpose. Okay. So, then this is an eigenvalue of phi. Okay. And this V i the set of V i is an orthonormal set. Okay. So, what do I mean by orthonormal? V i conjugate transpose V j is going to be 0 if i is not equal to j and 1 if i is j. Okay. So, that is the orthonormal. So, any Hermitian symmetric matrix can be written like this. Okay. In addition, we also know phi is positive definite which means all the lambda i are positive. Okay. Positive. Okay. So, actually for Hermitian symmetric, lambda i will already be real. Okay. So, you have to pick lambda i to be real. In the positive definite case, lambda i will also be strictly positive, strictly greater than 0. Those are the things we can say. So, what people typically do is they arrange the lambdas in descending or ascending order and call the smallest eigenvalue. This is for the positive definite case and even for the general case you can do it, but positive definite case all of these case will be positive. Okay. So, you usually arrange it like this. Okay. Is it all right? Okay. So, now you go in and do i minus beta phi raised to the power j. What do you think you will get if I do i minus beta phi raised to the power j? Okay. Once you start using the orthogonality property, you will see that this matrix when you raise this to the power k will continue to remain as itself and the only thing that will get modified is the eigenvalue will start getting raised to different powers. So, eventually you will get i minus i equals summation i equals 1 to n 1 minus beta lambda i raised to the power j V i V i conjugate transpose. Okay. Once again, this needs some proving. Write it down algebraically. You can do induction and quickly show this. Okay. So, it is not a very difficult thing to show. If you use induction, all you are doing is just one more stage of multiplication and that the orthogonality property will really be at work. Okay. So, quickly get the answer. It is not very difficult to show this. Okay. So, this gives you a nice handle on figuring out when this matrix will tend to 0. When will it tend to 0? All the 1 minus beta lambda i should be less than 1 in magnitude. If I choose my beta so that 1 minus beta lambda i, all of them are less than 1 in magnitude. Eventually this thing will tend to 0. Okay. So, that is the simplest possible explanation you can come up with. Okay. So, so my MSE gradient converges to C opt if modulus 1 minus beta lambda i is less than 1 for i equals 1 to n. Okay. So, since I already have ordered it, only one thing will really matter. Okay. Lambda max is the only thing that will matter. So, if you satisfy this for lambda max, for everything else also it will get automatically satisfied. There is no problem. So, all I have to do is make sure that 1 minus beta lambda max lies between plus 1 and minus 1 or beta lies between 0 and 2 by lambda max. Okay. So, in my MSE gradient descent algorithm, if I pick my step size to lie between 0 and 2 by lambda max then I am guaranteed to converge to the optimal solution for my linear equation, phi c equals lambda alpha. Okay. So, that is the final result. Okay. So, typically what people do is you just pick beta to be a small enough value and see if you are converging. Okay. So, if you are not going anywhere, you are going around in circles then decrease further. Okay. If you see you are going too slowly what do you do? Increase further. So, eventually after a few atoms you will get to a optimal beta where you know convergence will be reasonably fast and at the same time you will converge to the right value. Okay. In any case, convergence is going to be exponential in j. Okay. You can expect that to happen. Okay. So, it is going to be quite fast. Okay. Several exponential terms are adding might change slowly. Okay. So, if you do more careful analysis, you can show for fastest convergence of Cj to Copt. Okay. So, fastest convergence for these guys. Okay. So, this may not mean much in terms of mean square error. So, be careful there. Fastest convergence here, you have to choose beta op to be 2 by lambda min plus lambda max. So, it turns out if you look at the, so there is one more criteria that you can use. You can look at mscj at the jth instant look at msc and expect that to go fastest. For that it turns out the better choice is 1 by lambda max. Okay. So, typically this is a very good choice. Okay. So, you just make 1 by lambda max. This will work out very nicely. Okay. All right. So, I think we will stop here. Maybe in next class we will see a simple example of how this works and then maybe look at more intuition as to how this relates to the channel. Okay. So, we will see that also as we go along in next class.