 The last module 2.1 we reviewed the fundamental principles of vector spaces and various concepts associated with vector space. In this module 2.2 I am going to provide a quick overview of matrices. I am sure many of you have been introduced to various properties of matrices. I am going to collect all the properties that we would need at one place. So I would like to make this module as a one stop shop where you can go back and refer to all the basic principles needed to pursue most of what we have to do in data simulation. First definition and basic operations of matrices. A matrix M by N matrix is a real matrix. If it has m rows and n columns there are n m elements. Each row is a m vector. I am sorry each column is a m vector. Each row is a n vector. If m is not equal to n it is called a rectangular matrix. So any typical element is called aij. i is called a row index. j is called a column index. When m is equal to n it is called a square matrix. Square matrix of order n or size n. Order and size are used synonymously. If all the elements are 0 it is called a 0 matrix or a null matrix. We need a number 0. We need a null vector 0. We also need a null matrix 0. So we will use the symbol 0 but the context will tell us whether we are talking about the number 0 or the vector 0 or the null matrix 0. But we need all these objects. More often than not we will be dealing with square matrices. So a belongs to r of n by n. I would like to be able to say a word or two about r n by n. Please recall we have used the symbol r n to denote the set of all vectors. This is the set of all vectors. This is the set of all vectors. Likewise r n by n is the set of all matrices. So there are n elements in a vector. There are n square elements in a matrix. Each element is a real number. So there are infinitely many vectors. There are infinitely many matrices in these sets. So I would like to emphasize these sets r n or n by n cross n. They are all infinite sets. Each one is a different object. A vector is an object. A matrix is an object and so on. I can refer to Ith row or the jth column. So if you go back to the previous slide, this is called the first column. This is called the second row. So we can talk about the notion of a row of a matrix. It is important to recognize a row of a matrix, a column of a matrix. So a matrix can be represented by sequence of columns or a sequence of rows. So the Ith row is represented by A i star. The jth column is represented by A star j. So A star 1 is the first column. A star 2. So this must be 2 here. This is not n. I am sorry. This must be 2. So A star 1, A star 2, A star n. These are n columns. A 1 star, A 2 star, A n star, these are n rows. So this is called column partitioning. This is called row partitioning. So we can talk about partitions of a matrix. Again going back to the previous slide, there are elements that lie along the diagonal. For example, in here that is called the diagonal of the matrix. So column, row, diagonal, these are called different cross sections of the matrix. So A 1 1, A 2 2, A 3 3 is a vector of size n. The vector that lies along the diagonal. So that diagonal is also called principal diagonal of a matrix. All the diagonals that are parallel to the principal diagonal are called super diagonal or sub diagonal. Super diagonals are above the principal diagonal. Sub diagonals are below the principal diagonal. These are all nomenclature that one has to remember. These are fundamental to our pursuit of mathematical treatment of data simulation. Now I am going to define quickly several operations in matrices. I would like to back up a little and then explain this now. If I have set of integers, I need to define operations and addition multiplication subtraction. If you have real numbers, you talk about addition multiplication subtraction division. If you have vectors, you talk about addition scalar product, outer product and so on. So what does this tell you? If you have a set of mathematical objects, we have to talk about a set of consistent operation for those sets. So for numbers, there are operations. The real number of operations, complex number of operations, vectors that are operations, polynomial that are operations. So with respect to matrices likewise, we have to have different sets of operations. I am going to quickly define some of the fundamental operations on matrices. So if I define 3 matrices A, B, C, if I define 3 vectors x, y, z, if I define 3 numbers A, B, C, now look at this now. I have elements from 3 different animal kingdom. Matrix is one class of animals. Vectors is another class of animal. Scalers are yet another class of animals. I am going to combine all of them to be able to do what I want to do. This is where the notion of vector space comes into play. Sum and difference of matrices is the matrix. So C is the matrix which is the sum of A plus B. C is the matrix which is the difference of A minus B. These sums are called element wise sum, element wise difference. If A is the matrix, A is a scalar, I can define A to be A times A that is called scalar multiplication of matrix. You multiply each element of the matrix A by the scalar A. I also can combine matrix and vectors. This is called matrix vector multiplication. I can define a vector y as the product of a matrix A and a vector x and that is defined by yi, the ith component of y is given by the ith row of A times the vector. So in here I am going to represent a little picture to tell. So if this is the vector y, if this is the matrix A and this is the vector x to compute yi, I take the ith row of A, I multiply that by the vector x and that is a scalar product. The ith row of A is given by aij, j running from 1 to n. The elements of the vector x is given by xj, j running from 1 to n. So I am multiplying the first element with the first element, second element with the second element, nth element with nth element summing them up. So the scalar product of the ith row and the vector x is the yi element. You continue to those for every one of them and that defines a vector. That is what is called matrix vector product. So I can define now matrix, matrix product. We talked about several different operations previously. Look at this now. Some of matrices, difference of matrices, multiplication of matrix by a constant. Multiplication of matrix by a vector. Now I am going to talk about multiplication of a matrix by a matrix. Multiplication of a matrix by a matrix is also a matrix. It is given by the ith element of the matrix again. We all should know if I have a matrix C, if I have a matrix A, if I have a matrix B, this is A, this is B. If you consider the ith element in here, this is the ith row, this is the ith column, this is the element C ij. The C ij is essentially the inner product of the ith row times the jth column of B. So ith row and jth, ith row of A and jth column of B, the inner product is C ij. That is given by this product. There are other ways of looking at the matrix product. One is called the sax P way. One other is called the outer product way. I have given these definitions in these. I would like you to verify that the matrix product defined by the inner product, sax P outer product, they all give raise to the same result and I would like to be able to give that as a homework problem for you to work out. I think it will be in illuminating homework for you to verify the matrix product can be defined in one of three ways. I would like to now emphasize the fundamental property of matrix product. Matrix product is not coming take you. That means AB is not equal to BA. Let us go back now. If you take two numbers AB, AB is equal to BA. If you took two numbers A plus B is equal to B plus A. If you took two matrices A plus B is equal to B plus A. But if you take two matrices AB in general is not equal to BA. So what does this mean? Algebra of real numbers is commutative. Algebra, matrix algebra is non-commutative. Matrix product is not commutative and that is a very fundamental restriction when you go from real algebra to matrix algebra that one has to be cognizant of. Now I am going to define lots of other operations. There are ton of operations and matrices which is very rich. If I have a matrix which is m by n, I can define a matrix called transpose of A which is denoted by A to the power t. It is n by m. The rows of A are the columns of A transpose and vice versa. So if A is this, A transpose is that. So what are the properties of transpose operation? Transpose is a very fundamental and a basic operation. So transpose is called a unary operation. So I would like to now distinguish between two types of operation. Operation can be either a binary operation. A binary operation needs two operands. For example to add I need two numbers. To multiply I need two numbers. To divide I need two numbers. So a binary operation needs two operands. The unary operation on the other hand needs only one operand. What are the examples of unary operation? Transpose, transpose of A, negative of A, inverse of A. So transpose, negative, inverse they are all unary operation. Addition, subtraction, multiplication they are all binary operation. So I would like you to be able to be cognizant of the fundamental difference between two types of operators, binary operator, binary operation, unary operator, unary operation. This unary operation transpose has several properties. Transpose of a transpose is itself. Transpose of the sum is a sum of the transposes. Transpose of the product is the product of the transposes. These are all basic properties. I am not going to prove them. Many books that I talked about at the end of module 2.1 has proofs of these. In your case if you do not want to prove this at least you should be able to verify. How do you verify these? Take two matrices A and B. Take a matrix A. Do these operations and verify. So I would like you to very strongly recommend please verify these properties is very fundamental to see why and how they operate. They work. The next unary operation is called the trace of a matrix. Trace of a matrix is defined to be the sum of the elements of the diagonal. So if I have a matrix A, it is simply the sum of Aii. When i is equal to 1 I get A11, A22, A33, Ann. So trace is a functional. It is a function from Rn to R. You can think of it as a functional. The trace has lots of important properties. Trace of A is equal to trace of A transpose. I am assuming A is a n by n matrix. Trace of A plus B is trace of A plus trace of B. Trace of alpha times A is alpha times trace of B. Trace of AB is equal to trace of BA. Trace is the same when you compute the product AB and BA. Trace of the product ABC is BCA and CBA. You can think of it as a circular property. So this is A, this is B, this is C. So you can think of ABC, you can think of BCA, you can think of CAB. You can run around the circle starting at A, starting at B, starting at C. So this is the property F essentially tells you no matter where you start the triple product have the same trace. Then the trace of A times B times A inverse is simply trace of B that essentially comes from applying the property F to G and again I am going to leave all these things as a homework problem. I would like you to verify. In other words these are simply definitions. I would like you to be able to verify using simple examples. It is absolutely essential that we all have a good understanding of these properties. Then is the notion of a determinant of a matrix. I am trying to list all the properties that the matrix possess. Determinant of a matrix we all know. Determinant is again is a function that maps A to real. The determinant of a matrix is a number. The determinant is defined by the sum of the product of AIJ with the cofactors. Everybody should have known the definition of a cofactor. Cofactor is called a signed minor. So the determinant of a matrix is a fundamental quantity. I am sure most of you should have been introduced to the notion of a determinant. Now I am going to introduce some of the properties of determinant. If A is not singular determinant of A is not 0. If the determinant of A is 0 then the matrix is called singular. Determinant of A is equal to the determinant of A transpose. Determinant of AB is determinant of A times determinant of B. Determinant of A inverse is 1 over determinant of A if A is not singular. Again these are the properties I am going to ask you to verify. So what is the first thing? Ultimately you know how to prove but the first step towards proving is to verify. At least you should be confident of the fact yes these properties hold I have already verified using examples but examples verification is not a proof. Proof is little bit more abstract. A proof deals with all the cases verification deals only with specific instances. So that is the difference between verification and proving. Proving is the ultimate goal but to get to prove you need to verify first. So you need to build your expertise first verify and then to prove. Now I am going to enlist properties of several special matrices. First of the property is called the symmetry. A matrix A is said to be symmetric if A transpose is A. So what does this mean? If I have a matrix if I have 1, 2, 3, if I have a 1 here there must be a 1 here. If I have a 4 here there must be a 4 here. If I have a minus 1 here I have a minus 1 here. So if I took the diagonal element the upper triangular part of the lower triangular part are mirror images of each other and that is what A transpose A refers to. If A transpose is equal to A means the upper half and the lower half are mirror images of each other. So symmetric matrices are special class of matrices that is a restriction. The restriction comes from the fact the upper half must be a mirror image of the lower half. A matrix could be a diagonal matrix in case there are only a diagonal elements. All the non-diagonal elements are 0. So what is an example of a diagonal matrix? An example of a diagonal matrix is 1, 2, 3 again 0, 0, 0, 0, 0, 0 that is an example of a diagonal matrix. The unit matrix is a special kind of a diagonal matrix where all the elements along the diagonal are 1, 1, 1. So in this case this is a diagonal matrix of size 3. This is also a diagonal matrix of size 3. This is a non-unit matrix. This is a unit matrix. Then we can talk about upper triangular matrix. Upper triangular matrices are those the diagonal and above the diagonal are non-zero. Lower triangular matrices are those where the diagonal and below the diagonal are all non-zero. This is lower triangular. That is upper triangular. Then a matrix can be tridiagonal. A tridiagonal matrix is 1 where the principal diagonal is non-zero. The first super diagonal is non-zero. The first sub diagonal is not 0. Everybody else is 0. That is called tridiagonal matrix. A matrix is said to be orthogonal matrix if A transpose is equal to A inverse. So orthogonal matrix have this extremely nice special property that inverse is equal to the transpose. These are extremely basic these are examples of basic properties of special class of matrices. The special matrices continue. A matrix is said to be skew symmetric matrix if A transpose is minus A. So symmetric matrices A transpose is A, skew symmetric matrix A transpose is minus A. So in a skew symmetric matrix A i i in a skew symmetric matrices A i i is equal is equal to A i j. A i j must be equal to minus A j i and A i i i is equal to 0 if an i is equal to j. So the diagonal elements of a skew symmetric matrix is 0. So what is an example of a skew symmetric matrix 0 0 0 1 minus 1 2 minus 2 3 minus 3 that is an example of a skew symmetric matrix. The diagonal elements are 0 there is a reflection with the sign change. So A i j is minus A j i. Given any matrix A I can separate the matrix into two parts. One is called the symmetric part of A another is called the skew symmetric part of A. The symmetric part is one half of A and A A plus A transpose. The skew symmetric part is one half of A minus A transpose. Therefore you can easily verify A is equal to A s plus symmetric part plus non-symmetric part. This is called the additive decomposition of A. Every general matrix can be expressed as a sum of a symmetric matrix consists of a symmetric part and a skew symmetric matrix consists of a skew symmetric part. Given any matrix A the product A A transpose the product A transpose A these are two matrices one can generate out of matrix A. In other words given a matrix A compute A transpose. I have A and A transpose I can multiply A and A transpose like this or A transpose A like that. It turns out A A transpose and A transpose A both of them are symmetric and they have a special name they are called Gramian matrices. Gram is the one who first introduced this. So these are called Gramian of A. Gramian of A are always symmetric for whatever A is. The next one is called the concept of rank of a matrix. The rank of a matrix is essentially the number of linearly independent rows or columns. It can be shown the column rank is equal to the row rank. So you can think of number of independent rows of a matrix called the row rank. The number of independent rows of a matrix is called the row rank because the rows are vectors. If I have a bunch of vectors I can talk about the linear independence of a set of vectors. The number of linear independent vectors is called the rank. So I can think of a row rank of A column likewise column rank of A the row rank of A is equal to the column rank of A and the common value is called the rank of A. So if A is a m by n matrix the rank of A is less than or equal to the minimum of m and n. Rank of A is equal to transpose of is the rank of A transpose. Rank of a sum is less than the sum of the ranks. Rank of a product is minimum of the rank of A and rank of B. We have earlier seen outer product of 2 vectors is the matrix. If a matrix is if a matrix arises outer product of 2 vectors that matrix is always as a rank 1. If a matrix is non-singular the rank of A is n. So if A is n by n matrix if it is non-singular the determinant is not equal to 0. It is also the fact that the rank of matrix is also n. So you characterize the set of all non-singular matrices to be those with non-zero determinant are full rank. So this is called the full rank. The full rank condition is the same as non-singularity is the same as determinant of A to be not equal to 0. I am now going to review the concept of inverse of a matrix. Inverse of a matrix is denoted by A inverse and how do I define A inverse the same way in number theory we say A times 1 over A is 1 we call 1 over A as the reciprocal. In matrix theory we call it A times A inverse is equal to I we call it inverse. So inverse and reciprocal are pretty synonymous. The role of number 1 in numbers is the same as the role of number of the matrix I in matrices they are called unit elements. If you multiply any number by 1 is the same if you multiply any matrix by identity is also 1. So what is 1 for numbers is identity matrix for matrices. So I would like you to be able to know the property of an inverse matrix and through the inverse A times A inverse is I. Inverse is the unity operation inverse of A inverse is A and that should not be surprising because reciprocal of reciprocal is the given number. So I have A I have 1 over A I have 1 over 1 over A that is equal to A. So the reciprocal of reciprocal is the same number. So inverse of inverse is A inverse of the product is the product of the inverse has taken the reverse order assuming the matrices are non-singular. These are fundamental property which will be used repeatedly in data simulation. Inverse of a transpose is the transpose of the inverse and is a combined so inverse is 1 unity operation transpose is another unity operation I am talking about the conjunction between the conjunction of 2 unity operations. So transpose of the inverse is the inverse of the transpose and that is denoted by A to the power minus t. So A to the power of minus t means is a transpose inverse I can perform any operation first any other operation second I can transpose an inverse or inverse and transpose both are same they are commutative. Once you have the concept of inverse I am going to introduce you to several different formulas relating to inverses. These are called Sherman Marley formula these are called inverse under perturbation. So if I have an identity matrix the inverse of an identity matrix is equal to itself that is the general property we all know one the reciprocal of 1 is 1 the reciprocal of identity matrix is identity matrix. But if I add an outer product matrix to an identity matrix it is no more an identity matrix I know the inverse of identity matrix is identity. So the question is this if I perturb the identity matrix by an outer product matrix we have already seen outer product matrix is a rank 1 the inverse of the sum can be expressed by this formula this formula is called Sherman Marisen Woodbury formula. So C is the vector D is the vector CD transpose of the outer is an outer product matrix. So this is called a rank 1 perturbation of IFN. So if I perturb the matrix and compute the inverse I do not have to compute the inverse from ground up I can simply update the inverse of I with this correction and that formula carries over and these are generalization I can be replaced by A C and D remains the same. Now A remains the same A is non-singular in this case also A is non-singular in this case I am assuming A and B are non-singular. I want you to be able to look at this now C and D are vectors C and D are matrices so this is the most general form of this inverse operation this is the simple form of inverse operation it is this version that we will use repeatedly in Kalman filtering techniques. So Sherman Marisen Woodbury formula D is one of the most fundamental relation that is used in data simulation especially in the derivation of Kalman filters I am sorry called Man filters. So what does it tell you if you know a matrix A is non-singular if you add a correction to that I can compute the inverse of the correction by simply a correction term the inverse of the original matrix. So that is a very beautiful formula these formulas have been known since 1930s and mathematicians have done these things just for the fun of it and these formulas find great use in many of the derivations especially the ones relating to Kalman filters. I have now given a proof of the Sherman Marisen Woodbury formula look at this now D is the generalized version of the Sherman Marisen Woodbury formula I have given the proof of this I am not going to go over the proof because it is given in extremely simple case. So I am going to leave the proof as a reading assignment I would like you to be able to read the proof as an assignment the proof is a reading assignment I have given in extreme details all the derivations are given. So now you can see I have derived the formula using these 3 pages of derivations. So this is one of the fundamental ways in which we can express the Sherman Marisen Woodbury formula. So this proves the formula D in slide 12 by replacing B inverse by minus B by letting B equal to 1 I came I can get the formula B C and A. Now you can look at this now by this single one single by proving this one single formula I am able to look at this now by proving D using this this and this I come here then I specialize by choosing several different parameters I derive C I derive B I derive A. So D is a generalization of A B and C in that particular case I am not going to go through the proof the proof is extremely simple in the elementary and I would like you to try your hands on the proof. So that gives you the general aspect of what is called Sherman Marisen Woodbury formula. The next one is the notion of generalized inverse we already talked about inverses of matrices which are non-singular. Mathematicians have been indulging in the concept of how do I define inverses of matrices which are not square. So let us talk about the basic ideas here now I would like you to be able to think. So matrices square matrix rectangular matrices square matrix can be singular non-singular in the case of non-singular matrix I can define A inverse for a singular matrix I cannot define it for rectangular matrices also we cannot apply this definition of a non-singular matrix. So mathematicians have always been challenging themselves how do I extend the notion of inverse of a non-singular matrix to a general case of rectangular matrix to a general case of singular matrix and that is what is called generalized inverse. Generalized inverse is the generalization of the concept of inverse to matrices that are not necessarily square matrix. The notion of generalized matrix is very fundamental. So if A is a matrix of size m by n A plus denotes the generalized inverse of A. A inverse denotes the ordinary inverse so I would like you to look at the symbolism A A inverse and A plus this is the ordinary inverse this is generalized A plus is the generalized inverse. More and more where the first one to define the notion of a generalized inverse they said any matrix A plus that satisfies the properties A B C and D with respect to A is called generalized inverse. In other words A A plus A must be A A plus A A plus must be A plus A plus A transpose is A plus A that means A plus A symmetric A A plus transpose must be equal to A A plus A plus is must be symmetric. So any matrix that satisfies these 4 properties can be regarded as the generalized inverse of A. It turns out if m is equal to n A is non-singular all these reduces to the definition of A inverse we already know therefore this notion of a generalized inverse includes the ordinary inverse as a special case as a special case. In the case of rectangular matrices when A is a full rank what does it mean? It is equal to the rank of A is equal to the minimum of m and n in that case we can give specific formulas for the generalized inverse. So when m is greater than n the rank of A is said to be full rank if it is a rank n in that case A inverse is given by generalized inverse of A is given by this. In the other case when m is less than n generalized inverse is given by this we will talk about the occurrence and properties of generalized inverse when we talk about inverse problem in module 3 when I have already mentioned this but it is worth repeating when m is equal to n and A is non-singular non-singular A plus becomes A inverse in that case A plus A A plus becomes the identity matrix. So this is a very beautiful mathematically consistent way of extending the notion of inverse of non-singular matrices to rectangular matrices. It can also be extended to singular matrices in much similar fashion but for case of singular matrices we do not have explicit formulas but for rectangular matrices with full rank I have very specific formulas for generalized inverse. These are the basis by using which we will deal with least squares theme. These generalized inverses occur very naturally in the theory of least squares sorry the theory of least squares. So these occur in the theory of least squares and we have seen in the morning lecture that yesterday's lecture Gauss invented the theory of least squares. Gauss did not know at that time the notion of generalized inverse but in 1930s they had introduced this notion of generalized inverse and it turns out that generalized inverse and least squares theory least squares theory are intimately associated with each other. So it is absurd and necessary that we have a nodding understanding of the Moore-Pendoff's inversions of properties. Thus far we have seen several properties of operations on matrices special matrices. Now matrix can be also thought of as linear transformations of one vector space to another vector space. So let A be a matrix of size m by n, A then as an operator maps the space rn to rm where y is equal to A times x. Here is the map, here is an illustration this is rn, this is rm, A is the matrix which is m by n. So if you take a n vector multiplied by a m by n matrix I get a m vector. So it transforms n vectors to m vectors and that transformation is induced by the matrix A so we call A an operator or a transformation. The word operator and transformation are used synonymously. We call an operator to be or a transformation to be a linear transformation or a linear operator from rn to rm if it satisfies two properties. A times x plus y is A x plus A y, A times A fx is A times A fx. If it satisfies these two properties the first property is called additivity, second property is called homogeneity. These two properties if a given matrix satisfies then it is called a linear transformation. So transformation, linear transformation if there is a linear transformation there should also be a nonlinear transformation. So transformations in general are of two types linear, nonlinear. It is a general property every linear transformation can be represented by a matrix that is a theorem in operator theory I am not going to go into that but it is good to know. So given a transformation A there are two subspaces, there are two spaces associated with it. One is called the range space another is called the null space. So given A the range space consists of all those vectors y in rm where each y is obtained as a product of A and x for all x belonging to r of n. So looking at this picture A is known I pick x every one of them in rn and then I take every vector through A to this vector. So set of all collections y so obtained is called the range of A. The null space of A on the other hand is also called a kernel of A these are different names is a set of all vectors which are annihilated by A. So A times x is 0. So x belongs to rn is a set of A x is equal to 0 that is called the null space is also called a kernel. So the kernel of A is a set of all vectors which are annihilated by the matrix A. So I would like to be able to emphasize that given a linear transformation A there are essentially two subspaces associated with it one is called the range space another is called the null space. So the range space is a subspace of rn the null space is a subspace of rn. So if I were to talk about the null space is a subspace of rn the range space is a subspace of rn. So this is the range of A this is the null space of A null space of A. So you can see I am associating two spaces with every given linear transformation. Now I am going to talk about examples of certain operations let q be a matrix the matrix A is called orthogonal if q transpose is equal to q inverse q maps rn to rn is called an orthogonal operator where is a linear transformation is an operation operator sorry I want to go back yeah is an orthogonal operator. So q is given by cosine theta sine theta sine minus sine theta cosine theta is a simple example of an orthogonal operator this matrix is also is also called orthogonal matrix orthogonal operator represented by an orthogonal matrix this matrix represents a rotation. So what does it mean if you have a vector x if you multiply the vectors by q so if this is x if you have the vector y y is equal to q times x the length of x and length of q are the same. So this is called a rotation operator rotation operators are generally denoted by orthogonal matrices are orthogonal matrices represent the rotation operators. So if you multiply a vector in R2 by q you rotate the vector by an angle theta the theta is called the theta by which you rotate is called the cos is the theta that comes in q cosine theta sine theta minus sine theta cosine theta are the four elements of the 2 by 2 matrix. So let y be equal to q of x q is called an orthogonal matrix then the norm of the square of y you already know the norm of x square of that is x transpose x likewise if you have square of the norm of y this is going to be equal to y transpose y but y is equal to q of x therefore y transpose is equal to q of x transpose the transpose of the product of the product the transpose has taken the reverse order this is x transpose q transpose therefore if I took the square of the norm of the vector y which is q x transpose times q x is x transpose q transpose q of x but by property of the orthogonal matrix q transpose is q transpose q is i therefore is x transpose x that is the norm of x itself. So if y is equal to q of x and q is orthogonal even though y is different from x they share the same length. So orthogonal transformation preserves the length of the vector x so the length of the vector x is invariant under the orthogonal transformation and that is the fundamental property of the orthogonal matrices. Now I am going to quickly refer to coordinate transformations again the property of matrices of the linear transformation let us consider R of n R of n is a basic R of n yes R of n is a space every space has a basis the standard basis for R of n is e 1 e 2 e n these are called unit vectors given a space there are multiple basis for a given space for example if I have R 2 e 1 e 2 is one spaces I can also consider this is e 1 this is e 2 so e 1 is equal to 1 0 e 2 is equal to 0 1 I could consider this is this is g 1 this is g 2 what is g 1 g 1 is 1 1 what is g 2 g 2 is equal to minus 1 1 that is the basis this is the basis. So given a space I can have multiple basis each basis has a same set of elements of vectors so if I am considering a space R n I can consider a standard basis I can consider the new basis the standard basis are listed as e 1 to e n the new basis are listed as g 1 to g n if I have a set of n vectors I can create a matrix e u 1 to e n if I have a set of vectors g I can construct a matrix g g 1 to g n so e is a matrix consisting of standard basis vectors g is a matrix consists of the new basis vectors both of them are basis both of them n e are the same equal to R of n so it behooves us to ask a question how do these two basis e and g are related to each other we are going to show that these two basis are related by a linear transformation and how do we show that every element of the new basis g can be expressed as a linear combination of the elements in the old basis because every vector of the new basis the vector in R n R n has unit basis the standard basis so I can express g as a linear combination of the standard basis if I did this for every g g 1 g 2 g n this is the general expression for g i I can now collate all of them so if I have if I consider g 1 g 2 g n in the form of a matrix please understand each g is a vector so first vector second vector nth vector this is the matrix this is the matrix g this is the matrix e the matrix e and g are related through the elements t's it can be easily verified that this relation is very fundamental this relation induces simultaneously so this is for 1 g i if I consider all the g i's together this is the resulting relation so now you can see g is related to e g is related to e through this matrix t so you can think of this as t is a transformation that relates the basis g 1 that relates the new basis with the old basis so we can we can denote it as the new basis equal to the old basis times so the new basis equal to the old basis times t and t is the transformation and it can be shown that transformation is non-singular so what is the role of a linear transformation role of one of the roles of linear transformation is that it transfers one set of basis of a given vector space to another set of basis and they are related linearly through the linear transformation t to the linear transformation t so this essentially tells you the coordinates of the new basis and the coordinates of the old basis are related and that brings us to a special class of transformation called the similarity transformation so I am now going to be talking about space Rn I am going to be concerned with the standard basis b1 instead of calling it e I am simply going to call it b1 so let x and y in Rn be the standard basis elements in the standard basis R of n so a is the map from Rn to Rn we can think of this as a linear operator y is equal to a of x so now let us think of it now I have Rn in one basis I have Rn in another basis I have Rn in another basis so the basis for this is b1 the basis for this is b2 and we know that there is a transformation linear transformation t that can map the basis b1 to b2 we have already seen b1 was e b2 was g you remember that that being the last line so t is the linear transformation from basis b1 to the new basis b2 let x star so if I consider both the Rn's are the same space even though I have drawn it differently if there is a vector x here I have the same vector x here this vector has a representation in b1 this representation this rx has a representation in b2 if the basis are related I can also relate the representations of x in these two basis so let x star be a representation of the vector x in b1 let y star be the representation of the vector y in the new basis so if I have t is x is equal to t times t of x star y is equal to t of y star let y is equal to a of x I hope you understand there are lots of animals here so x and y are two vectors in Rn the standard basis a is the linear operator that transforms y to x t is the linear transformation from the basis b1 to the new basis b2 so I have a linear operator I have a basis transformation I have then the representations of vectors from x to y in the new basis if I put them all together x is equal to t of x star y is equal to t of y star let y is equal to a of x that essentially tells you y is equal to t of y star which is equal to a of x but x is equal to t of x star but y is equal to t of y star so if I substitute y is equal to t of y star we get the relation y star is equal to t inverse at x star so you can readily see the relation between y star and x star as follows so we have not changed anything we are simply concerned with two vectors x and y we are simply representing x and y in two different basis of Rn and so a is an operator t is the transformation all these things relate to the fund reads to the fundamental result which is given by the star so I have a new matrix the new matrix is t inverse at is related to a t inverse at is the representation of a in the new basis b2 so this transformation of the matrix a to t inverse at is called a similar to a transformation it again plays a fundamental role in linear algebra so similar to a transformation is a special class of linear transformation when you represent two vectors in a given vector space and these two vectors are related by an operator a if I change the basis for the same space from b1 to b2 then there is a transformation vector t comes into play so t a together help us to be able to define the linear transformation so this is called representation of matrices in different basis a is a representation in one basis t inverse at is a representation of the same matrix or same operator in another basis these two are related by the fact that we are called transformation the next transformation is called congruent transformation that a be a matrix b be a matrix I am requiring b to be non-singular a is any matrix a transformation from a to b transpose a b is called a congruent transformation please remember the difference was now a to t inverse at that is similarity transformation a to b transpose a b that is called the congruent transformation congruent transformation so these are two transformations of matrices that occur very naturally in linear algebra we the reason we are talking about congruent transformation and similarity transformation because these are special cases of linear transformation now I am going to define it another concept called a joint operator any of you who have done 4d var you will know 4d var is also called a joint method a joint is a property of operators that comes essentially from matrix theory operator theory so I am going to quickly define the properties of a joint operators which are fundamental to understanding data simulation method called 4d var let a be a matrix that denotes the linear operator in Rn so a matrix matrix can be called as an entity as a matrix and define the operations on that you also a matrix can also be thought of as an operator on vectors in Rn so the same object plays two different roles either a matrix or as an operator a matrix is a representation of an operator now define a new linear operator a star and the definition goes like this given two vectors so let us talk about this now I am having Rn to start with I am having an operator A and Rn so what is A and Rn means it takes vectors in Rn and maps to vectors in Rn so let us speak two vectors x and y belonging to Rn I have been given a matrix A if I have x and y and A I can compute the matrix vector product Ax that is a vector y is a vector I can compute the inner product so this is the inner product of Ax and y the inner product of Ax and y is related to inner product of x with A star y so what does it mean Ax is a transformation of A A star y is a transformation of y the matrix A star that forces this equality is called the joint of A is called the joint of A so that is the definition of the property of the joint this a joint A is not unknown to us if you look at the standard definition of inner product if you consider inner product of Ax and y by definition inner product of Ax and y is Ax transpose y but Ax transpose y is Ax transpose A transpose y Ax transpose A transpose y is Ax transpose times A transpose y I can associate like this this can be expressed as the inner product of two vectors and that can be expressed as x is equal to A star y therefore in general the adjoint of a matrix is the transpose the transpose is the adjoint so that is the fundamental thing that comes from this analysis. So for finite dimensional vector spaces if you are considering matrices of finite dimensions the transpose operation is related to adjoint so transpose is a unity operation we have already defined a simple operation adjoint is another concept it turns out adjoint can be represented as transposes in this particular case of matrices but adjoint in general is a much more deeper fundamental concept in operator theory in operator theory. So adjoint of a matrix transpose of a matrix these are unity operations and operators are matrices adjoint of an adjoint is the original matrix so adjoint of an adjoint is A A times A adjoint is A times adjoint of A adjoint of a sum is a sum of the adjoints adjoints of a product is a product of adjoint taken in the reverse order if A inverse exists adjoint of inverse is the inverse of the adjoint these are very fundamental properties of adjoint with respect to other operations. So how adjoint behaves with respect to adjoint how adjoint behaves with the scalar multiplication how adjoint behaves with respect to matrix addition matrix product and inverse so the interaction of two different operations is the topic of discussion in here the notion of adjoint operator and its closer relation to transpose. Now we come to one of the fundamental concepts in linear algebra why do we do all these things ultimately we would like to be able to solve equations. So given an equation A x is equal to b under what condition A x is equal to b has a solution so we are interested in analyzing the existence of solutions of linear systems let A be a m by n matrix when m is equal to n is a special case so we are going to start with general matrices. So A x is equal to b A is known b is known I want to solve the inverse problem I want to find an x before you compute an x you have to verify the solution you have to assure yourself the solution exists in this case A x is equal to b has a solution only when b lies in the range of A you remember range of A we have already defined range of A is a set of all vectors that A maps from the domain to the code of mind. We also have talked about null space of A if I can talk about null space of A I can talk about null space of A transpose so null space of A transpose is set of all y belonging to arm such that A transpose y is 0 is 0. So if b belongs to the range of A and y belongs to the null of A then b transpose y is A x transpose y is x transpose A y is equal to 0 that is a property that follows from the fact that y belongs to the null space of A. So what does this tell you x belong b belongs to b belongs to the range of A y belongs to null of A this is the inner product of two vectors one from the range another from the null space the inner product of two vectors 0 means orthogonality so this essentially tells you the range of A and the null of A are mutually orthogonal. Please remember this is the fundamental property given a matrix A of size m by n we have associated two spaces the range space the null space this essentially tells you the intrinsic property of the behaviour of vectors one from the null space another from the range space they are mutually orthogonal. Now coming back to the existence question there is a famous result by Fadam Fadam is called Fadam's alternative. Fadam's alternative essentially says the following given a matrix A which is m by n then exactly one of the two statement is true either A x is equal to b has a solution or A transpose y equal to 0 has a solution such that b transpose y or y transpose b is not equal to 0. So these are the only two possibilities that can happen for a general case of matrices which are a tangular. When m is equal to n A belongs to R n by n b belongs to R n then the non-homogeneous system of equation A x is equal to b has a solution only when A is non-singular and x is A inverse b that again follows from the alternative A of the Fadam alternative. The homogeneous system A x is equal to 0 has a non-trivial solution only ways is singular. So these are the two basic fundamental facts. So what does this say if I want to be able to solve A x is equal to b I have a unique solution x is equal to A inverse b when A is non-singular the determinant of A is not equal to 0 in this or else what happens the homogeneous system in this case b is equal to 0 x is a non-trivial case in this case A x is equal to 0 has a non-trivial solution only when A is singular. So in this case the determinant of A is singular these are the two fundamental differences. So a homogeneous system has a non-trivial solution when the matrix A is singular a non-homogeneous system has a non-trivial solution when the matrix is non-singular and these two are consequences of the fundamental property called Fadam's alternative and these two together provides a condition for the existence of solutions of linear system. I am not going to prove the uniqueness you can always if the matrix A is non-singular A x is equal to b not only the solution exists we can also show the solution is unique. Once you know the solution exists and is unique we can then try to develop computations to be able to actually develop the solution to show something exists is one thing to be able to derive or build or compute it is something else but to be able to compute I must have been assured that the solution exists. The next set of ideas for matrix theory are called bilinear and quadratic forms let A be a matrix of size m by n sorry let A be a matrix of size m by n I am given two vectors x is in Rm y in Rm I can define a functional defined by A which is f f A of x of y x transpose A x that is called a bilinear form bilinear in x it is also linear in y. So, because it is linear in two variables at a given time it is called bilinear when I but when A becomes instead of a rectangular matrix by a square matrix n by n I can define q A of x as x transpose A x for x in Rm and this is called a quadratic form associated with A. So, this is a bilinear form is a first degree in x and y a quadratic form is of second degree in x bilinear forms are linear in each of the variable quadratic forms are quadratic in the components of x. So, here is an example of a quadratic form let n be equal to 2 x be a vector x 1 x 2 let A be a matrix given by this q A of x is equal to A 1 1 x 1 square plus A 1 2 plus A 2 1 x 1 x 2 plus A 2 2 x 2 square you can see the first term is quadratic in x 1 this is the quadratic in the product x 1 x 2 this is the quadratic in x 2. So, this is an example of what is called a quadratic form bilinear form quadratic forms are special cases of bilinear forms what is the property of a quadratic form let q A be a scalar in this particular case we already know we already know q A is a quadratic form q A is given by x transpose A x q A is a scalar x transpose A x means what x transpose A x x is a x is a row vector A is a matrix then I have a column vector. So, x is a column vector A is a matrix this is the row vector. So, this is 1 by n this is n by n this is n by 1. So, the whole thing is 1 by 1 1 by 1 is a scalar you all know that basically your quadratic function is a scalar the transpose of a scalar is itself. So, I can the scalar is its own transpose, but transpose of the product is the product of the transpose is taken in the reverse order we have already seen the product of the unitary operation transpose. So, this is equal to x transpose A transpose x. So, this is the quadratic function in q of A transpose f x. So, what is that we have shown a quadratic form of A is the same as quadratic form of A transpose that is the fundamental property. So, if these 2 are equal I can then write q A is equal to 1 half of the sum of q A and q A transpose because these 2 are equal. So, this is equal to 1 half of the sum of this and that this is equal to x transpose A this is equal to x transpose A transpose A I can do a little bit of an algebra in here x transpose is the left common variable x is the right common variable I can take the right common left common I can arrange the inner matrix as A plus A transpose by 2 you will quickly recall when we talked about decomposition of matrix into symmetric and skew symmetric part this is the symmetric part. So, q of A of x is the same as x transpose A of s times x. So, this is called the quadratic form related to the symmetric part of x. So, if you are interested in quadratic forms we can without loss of generality assume the matrix A is always symmetric if it is not I can convert the matrix A to its symmetric part I have not changed anything because symmetric part of a matrix is always symmetric A s is equal to A s transpose and this property is routinely used in data simulation techniques again these are all fundamental properties that come from matrix theory. So, quadratic form the quadratic form with respect to a vector each term consists of second degree term as we saw in the previous example x 1 square x 1 x 2 x 2 square this quadratic form has a special property the special property being the fact that quadratic form of A is the same as quadratic form of the symmetric part of A because of this from now on without loss of generality when we are going to assume in quadratic forms we will only take symmetric matrices. Once you have the notion of a quadratic form I can then find split the idea of quadratic forms when is called positive definite quadratic form. So, this is your fundamental concept again let A be a matrix A is said to be positive definite is a definition if x transpose A x is greater than 0 for all x not equal to 0 is equal to 0 only when x is equal to 0. So, this is the definition of a quadratic form so what does it mean in general the quadratic function of a matrix need not be positive but if A is positive definite the quadratic function always is positive. So, this is the further definition of positive definite functions or positive definite quadratic forms I hope that definition is very clear so I want you to go back. So, we simply define the quadratic form to be we define the quadratic forms to be given by this product in here there is no condition on the sign of this except that this is a scalar now what is that we are saying this one is not only a scalar but also it takes a positive value for all x x can have positive negative elements A also can have positive negative elements but if you consider the product x transpose A x it is always positive when x is not equal to 0 it is equal to 0 only if n x is equal to 0 this positive definite quadratic forms have different ways of can be explained in different ways there are equivalent definitions for a quadratic forms one definition of quadratic form is what we define but this definition is not very useful to apply if I if somebody gives you a quadratic form if I want to apply this definition I have to test it for infinitely many x's it is not possible. So, this is a very nice definition but computation will not useful so there are equivalent definitions which are computationally meaningful so tests have been developed to decide under what condition a matrix is positive definite one of the condition is if the Eigen values of A are all positive then it is positive definite if all the principal minors if principal minors of all orders of A are positive the matrix is positive definite a principal minors the determinant of a sub matrix a matrix has several different principles of minors so if all the principal minors of a given matrix are all positive that means the determinants of all possible sub matrices in a given matrix are positive the matrix is positive definite. So, these two definitions give you an algorithmic way to test for positive definite this is simply a fundamental definition this first definition is not very useful in terms of computation the second view is derived from the first view but it is very useful computationally to get an understanding of the constraint on the elements of A to be a positive definite matrix that is consider a symmetric matrix A, B, B, C please understand with respect to with respect to matrices in the context of positive definiteness we need to consider only symmetric matrices so we can consider symmetric matrices like this. So, if you consider QAFX for this matrix this is this takes this form I can rewrite this by simply completing the squares like this a simple algebra will show you like this this is called the method of completing the square. So, by method of completing the squares we can express the expression for a quadratic form like this now I would like to examine this expression what are the condition necessary in order to make this positive as required in the condition one a square of any number is always positive. So, this term is always positive so in order that this term is positive I have to have A positive X2 square is always positive. So, in order that this term is positive I have to ensure that this term is positive. So, we can state that we can state that A is positive definite in this case if A is positive if C is positive if AC is greater than B square if AC is greater than B square this is positive A is positive this is positive I could have rewritten this by completing the square with the other way that will give you C as positive. So, a matrix of this type is said to be positive definite if A is positive C is positive AC is greater than B square. So, this is an example with this condition of a positive definite matrix of a positive definite matrix. So, you can see not every matrix is positive definite positive definite just brings constraints and the elements of the values of the matrixes.