 So typically we use just I was just discussing about where we use different types of norms. The reason why the L2 norm is the most popular norm is because it can be written like this. The L2 norm squared can be written as X transpose X. So it has lots of good properties. So it's typical to use the L2 norm in optimization problems. Even if you want to use other norms, it's not uncommon to try to reduce it to an L2 norm and then solve a sequence of problems where you are working with the L2 norm and then hope that you will be able to solve the problem involving other norms. L1 norm is typically used when you want to find what are called robust estimators and it's also used very heavily in compressed sensing which I teach in the next term and it's a it promotes sparse solutions. Sparse solutions are solutions for these vectors where lots of entries of the vector are equal to zero and the many applications where you want to solve for example an equation like AX equals B but this suppose this has many solutions, you want to find the solution which has the maximum number of zeros in X and for such things solving for minimizing L1 such that AX equals B this optimization problem will lead to sparse solutions for X. L infinity norm is useful when you want to you care about element by element convergence of properties. However as I mentioned L2 norm is by far the one that is most amenable to optimization. So the norm in which the norm that is most natural to a given problem may not be the most mathematically convenient or tractable one and so if you use a different norm to solve the problem we want to ideally we want to know how it is related to the original out to solve and so for example if you're considering a sequence of vectors and you want to look at this is a sequence of vectors output by a particular algorithm and if you monitor say the L2 norm of these vectors and you find that the L2 norm is converging or you take the difference between consecutive outcomes of this iterative algorithm and you find that the L2 norm of that difference is converging does it mean that the vector itself converges or not. So to answer these kinds of questions there is a very strong property that norms satisfy which is that essentially if a sequence of vectors converges according to a given norm it in fact converges to the same point with respect to any other norm that you wish to use and so I'll just discuss that aspect a little bit. Sir what is meant by robust estimation? Is it like to perform well in the presence of noise or? Yeah certainly you wanted to perform well in the presence of noise but other norms also will give you good properties and recovery in the presence of noise. However what happens is that if you think about it suppose I'll just go off to the side a bit here this is a side note. So suppose you have a certain point x0 and you have an algorithm where you hope that the algorithm will return this optimal point x0 but it returns a different point call it say x hash and now there is a distance between these two and your algorithm is returning x hash because you've sort of said I want to minimize say something like some function of x subject to some constraints and effectively if you're looking at say the L2 norm what this is doing is it's actually taking the difference between all entries of x hash and the corresponding entries of x0 it's squaring them and adding them up and then finally taking a square root. If you consider the square of this Euclidean norm you can see that if there are one particular pair of entries in x hash and x0 where this difference is very large because you're squaring it the distance or the Euclidean norm will end up becoming a very big number and so this really penalizes the most mismatched entry the most and penalizes the least mismatched entries less but if you take x hash minus x0 and one norm then what this is doing is just looking at the magnitude of the error between x hash and x0 and so this essentially penalizes all the errors more or less equally so that is what is called robust estimation so you're not giving undue importance to incurring a large error in some components and incurring a small error in other components all errors are equivalent to you and that's what is referred to as robust estimation so if there are many parameters you can estimate but how does it translate in LN3 norm there we are only optimizing with respect to only one element of how many we are looking in L infinity norm you're looking at the largest entry of x hash minus x0 so this is typically used in what are called min max type of problems where what you want to minimize is the maximum deviation across all entries between x hash and x0 and when that is important to you then you would use the L infinity norm so yeah thank you sir so let me define what I mean by convergence of a sequence a sequence of vectors so the main point I'll first write down the punch line here and then I'll discuss further so the punch line is that vector norms can be used to measure convergence of a sequence of vectors so let me define convergence first so let V be a vector space over r or c and let V be a vector norm on V so we say the sequence xk so this is a common notation for denoting a sequence you write curly braces xk and sometimes you write k greater than or equal to 1 if you want to say k goes from 1, 2, 3 up to infinity this of vectors in V converges to x which is also in V with respect to this norm defined like this if and only if xk minus x goes to 0 as k goes to infinity and we will write this as limit k tends to infinity xk is equal to x now again I have to write with respect to this norm like this okay so this is the definition and two aspects sort of immediately come out one is that it seems that in order to define convergence of a sequence of vectors I need to tell you with respect to which norm I'm asking for this convergence the second is that if I change the norm it's possible that this xk will converge to a different point x dash because it's dependent on which norm I'm specifying here so the two related questions are one is is it possible that this given sequence xk converges with respect to one notion of norm but not in another and the second question is that can a sequence converge to two different points with respect to a given norm so it turns out that the answer to the first question to both questions is no in finite dimensional space but it is possible that a sequence converges with respect to one norm but not in another in infinite dimensional vector space there is an example in horn and johnson which shows that shows that a sequence can converge to two different points with respect to two different norms but we won't discuss that here because the focus of this course is on finite dimensional vector spaces so the first question is can a sequence converge in one norm but not in another and the answer is no in finite dimensional vector space and we will see why this is true in a minute but before that let me write the other question which is actually easier to show so can a sequence converge to two different points with respect to a given norm so the answer is no so that is limit k tends to infinity equal to x and limit k tends to infinity xk equal to y with respect to the same norm possible and the answer is no and that is very easy to see and I guess some of you have already been able to figure out y the same follows from triangle inequality so so what we are told is that xk minus x this goes to 0 as k goes to infinity and similarly xk minus y also goes to 0 as k goes to infinity so what that means is that if I take the norm of x minus y so that is equal to the norm of x minus xk plus xk minus y which is less than or equal to the norm of this is triangle inequality xk plus xk minus y which both of these terms are going to 0 as k goes to infinity so this itself goes to 0 as k goes to infinity but the left hand side is greater than or equal to 0 and which implies that the norm of x minus y and so this is non-negative but this is a norm this becomes equal to 0 it implies that x equals 1 so it has to converge to the same point a particular sequence can converge to different points if we take the different norms so that is what I want to show now to show that a sequence cannot converge to different limits or different norms with one other theorem that we will need and so this theorem is actually a theorem from real analysis we will outline the proof but there is one step that we will need from real analysis which I won't go into here so f1 and f2 be real valued functions on r to the n for n finite and suppose 3 properties hold first is that fi so fi of x is greater than or equal to 0 so these will be replacing f1 and f2 with norms later on so this is true for every x in r to the n and fi of x equals if and only if x equals 0 and property b is that fi of alpha x is equal to mod alpha times fi of x for every alpha belonging to r and x belonging to r to the n and property c is fi of x is continuous on r to the n so notice that I am not using I don't require the triangle inequality which is part of the definition of a norm I just need these 3 properties but instead of not instead of but I don't need the triangle inequality but I do need this continuity property on r to the n so if I do it like this I think all of you have some idea of what it means for a function to be continuous I won't go into the definition of continuity and so on here in fact this continuity is really used only in when we use another famous theorem from real analysis called Weirstrass theorem which is used in the proof but other than that let's not get into the notion of continuity in this right now so we'll take this on faith that we know what is continuous and what it means for a function to be continuous so when this is true then for there exists positive constants which we'll call c small m and c capital M c small m times f1 of x is less than or equal to f2 of x is less than or equal to c capital M times f1 of x for every x in r to the n that means now translating this into norms what this is saying is that if you take a different norm the norm of x with respect to this second norm of x is sandwiched between some constant times the first norm and some other constant times that same first norm so that is this result so the proof goes like this so let h of x be defined as f2 of x over f1 of x for x in some set s where I'll define this set as s is the set of x in r to the n such that x2 so I'm using the Euclidean norm here you can actually use any other norm here it doesn't matter so the reason I mean the only thing you need is that the set s must be a compact set and it does not include the zero vector okay so so that's again this is another notion from real analysis that is beyond the scope of this course but for your reference you can note that compact set then what we have is that h of x is certainly not zero for any x belonging to s because the zero vector is not here and fi of x is positive strictly positive for any x not equal to zero and so both these numbers are f1 f2 of x and f1 of x are both strictly positive in numbers and so the ratio is also some strictly positive number and so h of x is not zero for any x in s and also h of x is continuous on x belonging to s because of this property c the ratio of two continuous functions is also continuous okay so now what the theorem says it says that h a function h which has these two properties it attains a finite positive max maximum c capital M and minimum c small m on the set s so that implies that we have c small m times f1 of x is less than or equal to f2 of x so h of x is between is bounded between c small m and c capital M and h of x is just f2 over f1 I am taking f1 to the other side and then I have c capital M times f1 of x for every x in s but we have that if I take z over norm z this it always belongs to s for every non zero z in r to the n so then what I can do is if I want to show that this holds for every x with x in r to the r if I want to show that this holds for every z in r to the n I will just replace x with z over norm of z for any non zero z then by property b which is the homogeneity property this norm z can come out of this and then it will cancel throughout because there will be a one over norm z here one over norm z here one over norm z here it comes out throughout so from b the above inequality holds for every non zero z belonging to r to the n but then if z equal to zero the case is trivial because this is zero and this is zero this is also zero so it is already true so that concludes this proof so now the consequence of this is that if you have two different norms are vector norms on rn if xk is given sequence of vectors then limit k k tends to infinity xk is equal to x with respect to alpha if and only if limit k tends to infinity xk is equal to x with respect to beta so it does not matter which norm you consider it converges with respect to a given norm then it converges to with respect to any other norm and in fact it converges to the same point so the proof is one line so we have by the previous theorem that cm times xk minus x alpha is less than or equal to xk minus x beta this is the most constant cm and c capital M such that this holds xk minus x alpha and this is true for every k that implies that the xk minus x alpha can only go to zero if and only if see this quantity is sandwiched between these two quantities so if you want this to go to zero then this side will also go to zero and that's only possible if this guy is also going to zero if this is going to some non-zero quantity then you can't have it being sandwiched between these two if this is going to zero so this is true if and only if xk minus x beta goes to zero as k goes to infinity okay so basically this implies that in the finite dimensional vector space all norms are equivalent in the sense that whenever xk converges to x with respect to one norm then it converges to the same x with respect to any other norm so I think we are out of time for this class so we will stop here and continue the next time