 Carry the load and today what we're going to do is talk about a problem in system identification which has to do with extending our state estimator to a different domain. And the problem that I want to tell you about is called subspace identification. It's a method that was discovered in the late 90s, so it's fairly new. And it allows us to solve the following problem. What we want to do is ask the question, now that we've done state estimation, how do we solve the problem associated with understanding the structure of our system? So if we have a system of the variety, this, what we've done so far in class, we've learned how to estimate the state of the system so we know if we knew given a, b, c, d and the noises associated with epsilon x and epsilon y and some prior belief x hat of 1 given 0 and uncertainty, we would be able to estimate the state of the system. But what I want to talk about today is that how do we get this? How do we get the state from state estimation and the structure of the system from our data? And the problem that we face is as follows, we have the following things. We have u vector that's gone from u1 to up and then we have our observation y which has gone from y1 to yp. And what we want to know is that how do we estimate a, b, c, d and the noises and the initial conditions? That's our problem. Now, to get to our problem, I want to motivate it using an example. And the example that I want to use is from robotics. It involves a little bit of nomenclature that you haven't seen before but it's nothing that I don't think you can handle. So suppose that I have an arm here like this and my arm is going to move in the space so it has its link here and it has a link there and I'm going to define some angles q1 and q2. And if I write the dynamics of the system moving in say two dimensional space, although just two dimensions is easy to write it but in principle this is true anyway, so if I represent my torques as tau1 and tau2, they're going to be related to my angular position velocity acceleration via some matrix which we're going to call inertia. This is going to be a matrix that depends on some parameters a, some vector q, the position of my arm, times q1 double dot, q2 double dot plus another set of forces that are called Coriolis and centripetal forces and that's going to be represented by some other matrix so I'm going to call c which depends on some parameter a and another state which is going to be my velocity times q1 dot and q2 dot and basically the reason I put this in here as q dot is because what happens is that velocities in the case of Coriolis forces are going to be squared so there's this relationship. Anyway the point being that this system has some description of how forces are related to velocities and positions and if I now wanted to write the same problem for a slightly different system so say that my arm is holding a tennis racket, say I'm holding a tennis racket and I wanted to control the system as well. So I want to say what's changed in this structure of this equation and what I'm going to show you that what changes is nothing interesting other than this parameter a some parameter describes things like length of this link, the location of its center of mass, length of this link, the location of its center of mass, the center of mass changes when I hold the racket in my arm it becomes like this, it becomes a vector like this, it has some angle it's called gamma with respect to my forearm but these things, length, location of center of mass and these angles are in parameter a so if I knew the structure of dynamics then I could estimate a from when my hand is not holding anything to when my hand is holding a tennis racket so what changes is a parameter in this equation the structure of this equation doesn't change so the point is if I knew the structure of the system then I could do state estimation, find the parameter that I'm looking for and use the usual common filter but if I didn't know the structure of the system then the problem is much harder because it's not linear, it has all these terms and these functions which I don't know about and so forth so let me show you how this problem could be solved if you knew the structure of that system how you would use state estimation techniques even though this is not a linear problem you could use the state estimation techniques to find the parameters that you're interested in so you could control just as well the thing holding a tennis racket as it's been holding nothing so what I need to do is to write this equation this equation is called inverse dynamics why is it called inverse dynamics? because forward dynamics is the relationship between inputs, forces and outputs, changes in states so if we would write this equation in terms of q double dot and how it depends on some function that has A in it, it has q dot in it, it has q in it, it has torque in it this would be called forward dynamics and the way we've been writing equations of motion things like this, these are forward dynamics equations this is called inverse dynamics F is equal to mA F is equal to m times x double dot this is inverse dynamics and when I wrote this equation, torque is equal to some function of angular accelerations plus some function of angular velocities this is what I'm writing, F is equal to mA but I'm writing it in the coordinate system of this complicated system it's not a point mass, it's this multi-link system but really that's what I'm writing in joint coordinates that's called inverse dynamics if I now solve for q double dot what I would be writing is forward dynamics so let's solve for q double dot so this is equal to h minus 1, this matrix here times which is a function of A and q times torque c, which is a function of A alright, so this is forward dynamics it's a non-linear equation because in this matrix h I have cosines and sines and similarly of course here and so how would I how would I write this so I could estimate things so what I'm interested in is A is how do I find do I estimate A, this vector A so why do I want to estimate A because A has the physical properties of my system for example in this vector A I have things like the mass of the upper arm I have the length to the center of mass of the upper arm I have the mass of this tennis racket that I'm holding and the vector that tells me the point the position of the center of mass with respect to my elbow so the physical parameters of the system are on this vector A the rest of these equations are about how those physical parameters get translated into these equations of motion so when I'm holding a tennis racket versus a ping-pong paddle versus nothing what I'm changing is A and so how do I do state estimation so that I can compute A so what I want to do is to write this non-linear equation in a linear form and do state estimation so typically the way you take a non-linear equation and write a linear form is you use Taylor series expansion so this is my function g and it's a non-linear function so I have some I have some estimate of the state so what is the state here so let me write x as the state it is position it is velocity it is A this is what I mean by state and this is a vector this is a vector this is a vector and so I have some estimate of the state I have some x hat so what I can do is write q double dot as the following as this function g evaluated at my state x hat and some torque t it's going to be some constant then I'm going to add to it the first derivative of g with respect to x evaluated as x hat multiplied by x minus x hat so that's my linearization of the system this is a constant and this is a linear function of the difference between the actual state x and my estimate of it at x hat so what I do now is that I just write my state update equation so let me let's see if I can fit it here so my velocity at some time q dot is sorry start with q q at t plus delta my first time step is going to be equal to q at time t plus q dot at time t times delta that's my time step my q dot at time t plus delta is going to be q dot at time t plus q double dot at time t times delta what is q double dot here's q double dot and then finally I have my vector a at time t plus delta is going to be equal to vector at time t plus maybe some noise epsilon a and let's put a matrix a in front of that to add some stability to it making it so that whatever parameter values I have tend to be very consistent from trial to trial so here's my state update equation and my observation y at time t is going to be equal to some matrix L times x at time t plus epsilon y where maybe L is maybe I can see position and velocity but that's all I can see 1 1 0 so maybe I can observe position and velocity but I can't observe a so that's a reasonable description of a dynamical system here's my state update equation the first three or my state update equation here's my observation equation if I knew the structure of the system if I knew h if I knew the derivatives and so forth then I could do this but who gives me that? how do I find these matrices? the problem that we're going to talk about today is in the context of linear systems but the problem is more general because if I'm going to understand the structure of the dynamics of the system then it would help my learning considerably if I knew that when I hold on to a tennis racket it's very similar to holding on to a ping-pong paddle because all it's doing is changing the center of my elbow then I could learn a heck of a lot faster than de novo because I knew that all it is is just a change in this parameter so that's the benefit of knowing the structure of the system but in principle our problem remains to be CND so I want to solve the problem today for you for a simplified version of the task so we're going to assume that there's no noise in our system in your textbook you have the the version with noise at the end of it so this is going to be chapter 9 to I think today I'm going to cover 9.6 so our problem is as follows we have what I want to do is find we have been given U and we have observed Y and what we need to find is X at time point 1 the initial conditions we don't know it we need to find A B, C and D so the critical thing here is I not only don't know these matrices I don't even know the size of these matrices so I don't even know the size of the state space X I don't know if it's a vector of size 2 or 20 so how do I find A, B, C and D and the initial conditions X the critical thing in today's lecture is that to begin with the understanding that there is no unique solution to our problem so if all I know is the input to the system and the output Y there is no unique A and B and C and D but regardless I can reproduce your data precisely even though there is no single solution to it so I'm going to be able to pick from all these infinite solutions one of them that's going to be particularly easy for us to compute and by doing so there's a relationship between inputs and outputs and so let's begin with the understanding of why there is no unique solution why this particular input and this particular output could have been generated by an infinite number of A, B and C's so to show you that what I'm going to do is to write the problem as follows suppose that I take my unknowns in a matrix form A, B, C, D like this this is a matrix made up of matrices now on the right side of this I'm going to multiply the states and the inputs so A is going to multiply by X1 and that's going to give me X of n plus X2 and B is going to multiply U1 so A times X1 plus B times U1 is going to be X2 C times X1 plus D times U1 is going to be Y1 here I'm going to have X2 and Y2 and that's going to give me X3 and Y2 this is going to go on to X of P plus 1 this is going to go on to X of P U of P and I'm going to get Y of P here so I know this and I know this I don't know anything else so what if I multiply the left side of this equation by a matrix that's made up of an unknown matrix T so suppose I take this and I multiply it by T00I well what's going to happen is that I'm going to end up with T times X2 so T times X3 0 times Y1 that's going to be here 0 times X3 plus I times Y1 this is going to be Y1 here this is going to become T times X3 Y2 here T times X of P plus 1 Y of P is here it's going to be equal to this matrix also has to be multiplied to the right side of the equation that becomes T times A plus 0 times C and then T times B plus 0 times D 0 times A plus I times C in the D here so I multiply this on the left side and right side by this matrix so now what I can do is that you notice what has happened is that I have transformed my X space by some matrix T some invertible matrix T so if I now take the right side of this equation and multiply as follows I multiply this by T 0 0 I so this is an identity matrix when I multiply these two I get identity so I didn't do anything but now what I can do I can multiply that through let me do it on this side so I get okay so the point of this exercise was to show you that I have Y from 1 to P I have U from 1 to P and if I just multiply my unknown X's by a matrix T an unknown matrix T I get precisely the same outputs from the same inputs but with different A's and B's so the point is I can produce for you the same outputs from the same inputs with very different A, B, C, D so there is no unique solution okay so if there is no unique solution that's nice because it says there are many, many solutions to our problem what subspace analysis does is find one of those solutions and one of those solutions is all we need and then once we have that solution we have the system that's equivalent to anyone that generated our data and that's what we're going to do today is basically identify one of those solutions okay so to do that the idea is as follows I want to give you the intuition about what we're going to do so our problem is that we have input U output Y we know that our output Y depends on some hidden state X okay that hidden state X is evident in Y but the previous U's the inputs are also evident in Y so what we have to do is the problem is as follows the outputs that we have seen is some linear combination of the state X and these inputs U the input in the current time and all the previous times project our observations Y in such a way that we could eliminate the influence of U then we would end up with a system that's just proportional to the hidden state X so what subspace analysis does is as follows it says project your data that you have measured Y onto a space that's perpendicular to the inputs U but that sounds pretty easy find something perpendicular to a space U what the heck does that mean U is some vector of inputs and how do I find this perpendicular space because this doesn't have to be just the last input my input U from the very beginning has influenced the current X so the objective is to write our equations in such a way that I can project a matrix which has all my data onto a matrix that lives in a space perpendicular to the space spanned by this inputs U so I need to define these terms for us so we're going to begin with something simple we're going to begin with just vectors and let's describe what do we mean by projecting a vector onto another vector and I'm going to start with just the two vectors A and B so suppose I have a vector A and a vector B and when I say project A onto B this is what I mean I'm going to write it like this project A onto B and put a slash there to describe this term projection and what that means is that the usual thing A is projected onto B and what that's supposed to say is that well you take the magnitude of A of the angle that describes the difference between these two vectors so there's one vector here, one vector here there's some angle of difference between them that's alpha and then you have a normal vector B that is normalized by its length divided so we do that by dividing it by its absolute value so when I project A onto B I get a vector in the direction B with a length that depends on the magnitude of A magnitude of B and the cosine of the angle between them so that's what we mean by projecting something so this cosine alpha here what does that mean? well if I write A transpose B it's just a scalar right? what this is is cosine of alpha times length of A times length of B so cosine of alpha is equal to A transpose B times this so that means that A projected onto B is A so this is how I project a vector on another vector A transpose B is a scalar B transpose B inverse is a scalar A transpose B is inverse is here and then vector B is a direction of which a new projection is so what do I mean by projecting A matrix let's take a matrix A capital A it's made up of A11 A12, A13 say it's a 3 by 3 of course that's equal to vector A1 transpose vector A2 transpose B3 transpose that's our that's what we mean by this matrix A what I mean by now projecting this onto some other matrix B say I have a matrix B which lives in a two-dimensional space so I have now B11 B12 B13 B21, B22 B23 so this B is B1 B1 transpose B2 transpose and another B2 transpose and this is a plane they don't need to be perpendicular with each other they can just be two vectors that are linearly independent so what I mean by projecting this matrix A onto the space described by B what I mean by that is that you take A1 so here's vector A1 and you project it onto this plane you take vector A2 and you project it onto this plane and so forth you take the individual components of that matrix and you tell me the components that end up with in the plane described by the matrix B so when I want to write this using matrix form like this A projected onto B well let me write it as a vector form first vector A projected on the space described by matrix B is going to be B transpose B transpose minus 1 times B A and the matrix A projected onto matrix B is going to be the individual components of this which is A B transpose B B transpose minus 1 B this is the definition of of projection now what's important from our perspective isn't to project A onto B what we're interested in is much more is to project A onto the space perpendicular to B so what does that mean so if B describes a plane B perpendicular is a line right is a line that's perpendicular to that plane so if I want to project a matrix onto the perpendicular to this matrix B it would be A projected onto this line each one of these A is projected to those lines this is equal to A multiplied by the identity matrix minus this that's what I mean by project A onto B perpendicular of course B projected on its perpendicular is just 0 so that's our preliminaries and now the reason why we introduced this concept is that what I want to do is write all my observations Y in terms of all the things I don't know X and all the things I do know inputs U all the past history of X's that I don't know and all the past history of U's that I do know define this matrix all the observations Y so you notice that Y lives in the space defined by X and U in a way X is one of these guys U is one of these guys and then Y is a projection onto the space so what I want to do if I want to recover just the X what I can do is take my data Y which depends on X and U and project it onto a space that's perpendicular to U and the way I'm going to do it is that I'm going to write a matrix that describes all my observations Y all my inputs U and I'm going to project my observations Y onto a space that's perpendicular to all my inputs U and when I do that I end up with some things that are going to be only dependent on X they're going to be proportional to X and that proportionality, hold on just a second that proportionality is what I care about because remember, I don't need to recover X, all I need is something that's proportional to it and if I can get that then I can find A, B, C and D yes as long as they're linearly independent yeah, because Y depends on a basis set that has two things X and U that's the combination of these two as long as I can project as long as I can get rid of one of those guys I get what remains it's going to be proportional to the X's so the key idea is we don't need to recover X all we need to recover is something proportional to X and if we can find something proportional to X then that's all we need because we can then find our equivalent A, B, C, D for the problem there's a little bit of matrix definitions that I need to show you because to do this problem using real data what you need to do is build up histories and here's the way it's done the matrices that are called Hinkle matrices and basically what they are are histories of the inputs that you've had so I have to H-A K-E-Y K-E-L so you take matrix Y 1 1 and you write it like this this is basically writing all the inputs that you have all the observations you've had Y1 until YI I is a small number it's going to be a number that's going to be larger than the size of X so you don't know the size of X but you must have some guess of how big it is you want I to be a little bit larger than what you think is the size of X Y2 running to YI plus 1 and this runs till YJ to Y I plus 1 it's just a representation of my data the observations that I've made and so typically I is much less than J I have many more data points that I've observed than I have rows in this matrix so this matrix is really long and pretty thin and similarly I represent my inputs U U1I U6 U1 intuitively it's basically a short-term history and the question is how long of a history do you want it's very long it's very thin and long so I is tiny compared to the number of trials that you have J which is probably hundreds so this is thin and long yes yes see in a system that has noise you need to average a little bit that's where this is coming from alright so these are two matrices these are things we know Y and U we can define these guys and things that we don't know matrix X is going to be described as the history of the inputs sorry the history of the states that I want to estimate XI XI plus 1 it runs until X I plus J minus 1 and some matrix gamma that's going to be telling us how to go from state X to observations and state updates XI plus 1 that depends on C CA and CA times power of I minus 1 and then I have another matrix H and this matrix is diagonal terms are D off diagonal terms are 0 and this is C times B C times A times B C times B and so forth so these these things H and gamma are going to be used to describe the history of the Y's and X's and it goes as follows this matrix Y 1 sub I is going to be equal to gamma I times XI plus HI U 1 sub I so all I've done is taken our state equation here and measurement equation and just write it in a way that I have more than one observation I have the current observation of the previous observation I just written it in terms of the hidden states X and the inputs U alright so this is my new observation equation now I need to write a state update equation that has this matrix instead of the vector X so X is this matrix that represents my previous states X and I have the state update equation that looks like this so I just want to write X in terms of the past its past history and so what does that depend on so I have you know X1 is equal to sorry X2 is equal to A times X1 plus B times U1 X3 is equal to A times X2 plus B times U2 which is equal to A times A squared X1 plus A B U1 plus B times U2 and so I can define some new matrix delta delta I to be A I minus 1B A I minus 2B up until B and I can write this matrix XI plus 1 is equal to A raised to the power I X1 plus delta I times U1 so by so what I did is that I wrote the state update equation and the observation equation all I've done is that I've extended the size of the state X and the observation Y to make it so that it's more than just one so we have some history here in each of these so this is the definition of Y we have this, this is what we have is U, we don't know X and we want to be able to recover something that's proportional to X, that's our goal alright so let me rewrite that over here so I have Y 1 sub I is equal to what do we call it gamma XI plus I can solve for this X X1 which becomes gamma I it's pseudo inverse, I'm going to start on top of it to mean the pseudo inverse of this matrix gamma times 1 sub I minus then I have the state update equation XI plus 1 is equal to AI times X XI plus plus 1, sorry X1 plus delta I U1 sub I star so I'm going to write down things that we know and things that we don't know so I'm going to call this matrix W to be things that we know we know this matrix W is made up of U and Y and so if you look at this equation here U and Y and things that I don't know are going to be this matrix L and the things that I don't know are things I multiply by U I have delta I minus A sub I A raised to the power of I gamma so this I don't know so that means that I times W1 I so now if I write my observations Y in terms of XI things I know and the inputs and the inputs to it so I have say is equal to XI plus 1 is this so what I can now do is project my Y onto a space perpendicular to U so what I'm going to do is take Y and project it onto U perpendicular I plus 1 to I so I want to get rid of this part here if I do that what I get is if I did that right yep ok I want to solve for this this is going to be equal to this multiplied by this inverse the pseudo-inverting of that ok now if I multiply both sides by W 1 I so I get gamma I W 1 I is going to be equal to this times this times this gamma I L I W 1 I is XI plus 1 is equal to this now Y is that interesting look what we just did I know everything on the right side of this equation I know Y I know U perpendicular and I know W here everything on the right side is known this is known the whole thing is known it's just going to be a matrix I'm going to call that O the left side is a matrix times the states that I've been trying to estimate X so here's X and here's the matrix that's multiplying it so what I've done is to reduce my problem to one where I have something that's multiplying my hidden states and that's equal to something that I know so I can't recover this X but I can recover something proportional to it so it's kind of a magical thing I tell you when I first came across this because it's the idea that you can recover the hidden states something proportional to the hidden states even though you don't know A, B, C and D so we don't have to know A, B and C and D all we know is that if we were to project our observations in such a way that we eliminate the influence of the inputs U we end up with something proportional to the hidden states and the idea it really comes about because in our linear set of equations X depends on its past history and input U Y depends on X so if I could take out all the inputs and how they've influenced X then I end up with something just proportional to X Y that I see is only proportional to X because I've taken out all the inputs so if we take out this non-autonomous system that we manipulated via inputs U and we take out all the if influence of U and then what we end up with is just the just the influences of the hidden state X so again what I end up with is this matrix O all I can compute because I know the Y's, I know the W's and I know the U's and if I can compute O then I can get an estimate of X and I'm going to do that next okay so what's this matrix O that's equal to gamma I times X I and what's this gamma gamma is this matrix that's made up of C, C A C A squared and C times A raised to the power of I minus 1 that's this gamma and what's X it's this matrix made up of the state at time point X plus 1 at time point I plus 2 to time point I plus J so I know O but I don't know gamma and I don't know X I can compute O I can compute the multiplication of these two but I don't know the individual component so what we do is that we use singular value decomposition that we can handle so basically if we take this matrix of singular values P, S and D matrix and the P matrix is a matrix that is made up of P1 to Pi the S matrix or the matrix of singular values is made up of scalars S1, S2 and how many singular values there are in our thing let's say N of them 0, 0s and then V is going to be a vector of the associated with the singular values is going to be V1 V J so it turns out that the number of singular values N is going to be size of X the hidden state X you recover the size of the hidden states the number of singular values in the singular value decomposition of this matrix O now if you have no noise in your system and the simulations that you're going to run for your homework tonight has no noise in it you're going to get the integer for your singular values but in reality when you do this with a real world system what happens is that of course N that's as big as the input that you have but when you look at the singular values you'll see there are a few singular values that are large and the rest of them are going to be tiny and those are the parts that matter that tells you the size of the hidden state of your system by looking at the distribution of the singular values you get the size of your hidden system so in our simulations when we have no noise N is going to be exactly the size of the hidden state X so basically the thing that we computed and we find its singular value PSV we can of course always write it as P take this S matrix here and find its square root S one half S one half times V, S one half means the square root of this diagonal matrix just the square root of each element and that's also equal to PS one half times T minus one T S one half V where T is some arbitrary matrix that I can find as inverse and so now if I say that my estimate of X hat is equal to this thing T S one half V where my estimate of the hidden states of this you see that X hat is related to the true state X by just some linear transformation T so the procedure is to find this O matrix well the procedure just follows first you take your data Y and your inputs U and you write them next what you do is you compute this O matrix which is made up of Y, U and U perpendicular next what you do is you find the O matrix is singular values you decompose it into PSV matrix and then you pick your estimate of state X to be S one half times V the singular values that you compute it and if you do that you are basically saying that I'm within a T matrix of the true X and T is just some arbitrary thing if you estimate the hidden state X of course then you can go back and estimate ABCD if you know the hidden state X the input U the measurement Y that's what you're going to do for your homework see you Wednesday