 So, let us get started. So, today I have two topics to cover. One is least squares. We are going to look at the meaning of least squares which I plan to do on this paper and I am going to use this desk to explain. And then I am going to present a case study from internet, control of an internet application. I am going to explain and I am going to show you how least squares, the modeling that we discussed in the last class, all those things fit in and how it is relevant to this course. And so, with this we will get started and of course, some details are in the bottom, lecture number, whatever that is. Today's date we have to put. So, let me just put 26. I am Karnan Moghulya. So, let us get started. So, least squares method is something that we use when you have more number of equations. So, let us start with a simple example. So, let me just say this is. So, we have an equation, set of equations of this form. Now, why do we do this? Because we want to measure or we already have information on these values namely this a's and b's and we may have got them experimentally. And x 1, x 2 are the unknowns. They make up the model. That is if you know x 1, x 2, you know the model. Now, the problem is that this model may be only approximate. And there could be some measurement noise and things like that. As a result, you say that I should not just take two sets. I have two unknowns b 1, b 2. Why do not I just measure two? Just you know two lines. I have two equations, two unknowns, two equations. Let me solve. The problem is that there could be lots of error, noise and so on. In order to even it out over lots of readings, what you would say is let me go on adding these and then you arrive at an equation of this form. It is an over determined system. So, what is the meaning of this? We understand how to solve this. You can form the objective function, differentiate it, minimize it, but what does that mean? That is what I want to explain. First of all, I want to want you to see that this can be written in the following form, the left hand side. Do you see that? Have you seen this form? The way to express the left hand side in this form. Have you seen this? So, the reference for this, a book that explains this very well is linear algebra and its applications. It is an outstanding book. In case anyone is interested in linear algebra, you want to get started. This is the book you should read. It is available in Indian edition for about 300 rupees or less. Every one of you should have a copy of this. Read this. It is an outstanding book. So, he discusses a lot of this. Let us come back to this. Let us come back to this equation. What this says is that instead of this, I can write it like this. Now, view each one of these. What is each one of these? A 1 1, A 2 1, A 3 1. What can you geometrically what does it represent? Any guess? Yeah, vector. It is a vector. So, vector in which space? R 3. It is a vector in R 3. So, essentially we are saying find, find a vector in R a combination of these two vectors to give a third vector, which is also in R 3. So, in this specific problem, you are saying that let us take these two vectors. Now, two vectors lie in a plane. Think of this as a plane. This is the plane where this board is, this is the disk is. So, the two vectors are lying A 1 1, A 2 1, A 3 1 which is lying on this and the other vector is also lying on this. There are two vectors and they form a plane. So, let that plane be this. Now, you want to find, you want to, when I say this times some scalar. So, let us do it this way. I have two vectors. They form a plane and what I can do is, I can, if I multiply by x 1, if I multiply by x 1, what I am saying is I am making it, making that vector longer or shorter. So, in other words, I can elongate this vector as required, elongate this vector also as required and take a sum of this, vector sum of this, which means I can essentially produce any vector on this plane. Is that ok? Because two vectors, two independent vectors form the basis for this plane. So, it is possible to produce any vector lying on that plane. So, I can do that. Now, so I can solve this problem under what conditions can I solve this problem? That is right. The answer is that, if this vector B 1, B 2, B 3, if it also lies in the same plane, then what I can do is, by suitably choosing x 1 and x 2, I can make the left hand side equal to this and the problem is solved. Now, does it make sense? If B 1, B 2 vector, B 3 vector also lies in the same plane, problem is solved. The question is, will it happen? Just see how this B 1, B 2, B 3 are obtained. We just collect data. We observe the experiment. We collect the data. There is no guarantee that such a thing will happen. That B 1, B 2, B 3 will also, you only know that all three vectors lie in R 3. We have taken these two vectors. So, they form a plane. We have taken that plane. That plane I have taken to be this. So, in general, the new vector that we have on the right hand side vector, it is going to be outside this plane. Is that okay? So, I have a plane and I have a vector. So, let me call this as, this plane as A, made up of columns of A and then I have this vector B, which lies outside. Is that okay? Does it make sense to you? This B vector is lying outside. And I want to take, I want to find. So, by doing this, you know, obviously, does this problem have a solution? This is what is going to happen most of the time. Because if you look at a 3D R 3, a plane has no volume, right? Zero volume. So, you take any arbitrary vector B, it is likely to be outside this plane. The chances of it being in that plane, randomly selected B to lie in this plane will be, probability will be 0. So, this is going to lie outside. Is that okay? So, now, what is the meaning of, so this is the B vector. Are you aware of this with this? Is there anything not clear? So, what do I, if you keep quiet, then what do I know? You understood, not understood. Alright. So, what do, how do we solve this? How do we solve this? It can only have approximate solution. It does not have exact solution. It does not have a solution. So, we approximate it. So, can you give a suggestion? Take a projection. So, you say that we cannot solve this. If only this fellow were to lie here on this plane, right, then I can solve it. Is that okay? Then I can solve it. Then how do we approximate this by this? Any suggestion? If this plane, this B were not outside, but it lies on the plane, then I can solve this problem exactly. So, I want to replace this B with an equivalent one. What is a good candidate for this dotted line? So, you have said projection, okay. When we project, in what way do we project? So, actually the answer was already there. Essentially, you replace by its projection, which means what is the property of the project, of the projection? Geometrically, if you have to explain to somebody what is projection, what will you say? Drop a perpendicular, right? Drop a perpendicular, get the vector that is closest to the vector lying outside. That is what is projection. So, you would say that let me draw this line, okay, drop a perpendicular from here, okay. That will get me a vector closest to the vector that is lying outside, okay. So, closest means you are talking about shortest distance. Shortest distance in Euclidean sense, we are talking about. That is why the least squares comes in. The word squares comes in because we are dropping a perpendicular, okay. In Euclidean, that gives the minimum distance in the square norm, alright. So, what we want to is, this is E, which is equal to A x naught minus B. I put x naught because I say that is a solution. A x naught is a vector that lies in this A plane, right. This is the dotted line. That is A x naught minus B, this is the vector lying outside. Error is the difference between these two. A x naught minus B is the error, right. Now, then I want to use the fact, is it okay? Now, I want to use the fact that this vector E is perpendicular to the A plane. It is perpendicular means it should be, this vector E should be perpendicular to every vector lying in that plane, right. So, how do I do that? If I want to make a vector perpendicular to every vector lying on a plane, I can say that instead of that, I can say that make the error perpendicular to orthogonal to every basis element that makes up that plane. Is that okay? Because after all what is the plane? It is a collection of vectors. How do you get them? By linear combination of every basis element. If I do that, then it becomes perpendicular to the whole plane. Does it make sense? Alright. So, how do I make it perpendicular to, so let us say this A. So, what we have is, I want to make E perpendicular to A x naught minus B and A equals, okay. So, it has, now A is made up of two vectors, linear combinations of two vectors. The first vector is this, the second vector is this. So, we wanted to make this E perpendicular to the first one, right. So, I can write it as A x naught minus B itself is a vector which is E, right. Let me make it first, let me make it perpendicular to the first vector. That means take the dot product. Dot product means I can then say A11, okay. By making this, from this equation, I have made this error A x naught minus B perpendicular to the first vector. Similarly, I make it perpendicular to the second vector which is, sorry, this is A12, A22, A32. So, this is A12, A22, A32. I have made this perpendicular to the second vector also. So, I have made this perpendicular to both the basis elements. As a result, I have made it perpendicular to the whole plane. Is that okay? Then what I can do is, I can stack this one below other. I can write it as these two together is equivalent to, and what is the left hand side? What is this? A transpose. So, this gives and you have the least square solution. Is that okay? Are you with me? Is something not clear? So, what we have done is, we found the error, we projected, projection means we are saying take the distance, take the shortest distance. So, we drop a perpendicular. So, that means we are talking about least squares method and then to make it perpendicular, we make that error perpendicular to every basis element and that results in this equation and this gives. So, this is the least square solution. So, this is something exceedingly important. We do this all the time. If you have the geometric meaning, then it is very easy. Just outside you project it, make it perpendicular, then just derive it. Is that okay? So, we go to the next topic. This is the first topic that I wanted to tell you about. Namely, recall least squares. All right. Now, I want to let's go to the next topic which is the case study. So, we are going to use equations of this form. What equations? We will shortly see that equations of this form come naturally. They come naturally in systems that are difficult to model. For example, economic system. For example, rain prediction. Because we do not have detailed models. So, that is one reason. But these are also used in systems where you have good models, where you have a good understanding. For example, I talked about spacecraft in the last class. I talked about distillation columns and so on that appear in Jhamnagar refinery. You take any system or you take a big structure. You take anything. You take an example from any engineering and you are capable of constructing a detailed model. In order to construct a detailed model, however, it will require exports. You would need expertise in the domain. Somebody who has done a master's degree or a PhD degree, because who has spent a lot of time really able to understand write down the equations. Number one, somebody who can solve the resulting mathematical equations quickly and correctly, more importantly correctly and then quickly. Then you need somebody who can understand the material properties, thermodynamics, so on and so forth. And somebody who has and somebody else who has to validate that this model actually matches with the experimental data, planned data. Because you can arbitrarily choose some numbers that may not be correct. You have to validate and so on. So this fundamental modeling, namely writing balance equations and so on and so forth, could take a lot of time. It could take three months, four months, six months and a team of really good experts. For even for modeling a single unit, you may have to do that. But on the other hand, people could say, you know, what is the application? If the application is only, if the application is only to take some decision that can work with approximate models. For example, we talked about approximate model cycling down to the exam from your hostel. We said that it may be very difficult to come up with an exact model and so as to predict exactly what time will you reach. You will go on estimating based on the progress that you make. So this is a feedback methodology. So supposing you say that the model is anyway going to be used, you know, in feedback format. So approximate model will do. Then you can ask this question, why do I have to derive this rigorous model? Use the experimental data that I have. Fit an approximate model. Do a curve fitting to experimentally obtained data. Do not build this fundamental model because that will take a lot of time. I do not have the time. I want to do it immediately. But an approximate model will do. So let me just go and collect data. Fit a model. So this fitting model to experimentally collected data is applicable not only difficult to model systems such as economic systems, rain prediction, weather prediction and so on. But it is also useful to systems that have rigorous models. If you have time and money effort, you can actually derive that. In order to save on those, you may say that let me use experimentally determined model. So these are known as time series models. Process of building such models is known as identification. So very important area. We will see that in the example that I am going to present. See that means I am going to, so that is how I am going to link this. So what we will do is we will switch over to feedback control in internet applications. This is based on the work that some of our students did here in both computer science department and chemical engineering. This is the first outline of this case study is as follows problem statement, previous work and then feedback controllers, time series model, checking performance and validating that this approach indeed works. So here is the example that deals with dynamic data that we solved in the internet problem that we talked about. Look at the last one. Look at the last one. In fact, this is given as the motivation. There is a stock broker and then he has been told, actually he is a portfolio manager. The portfolio manager is told anytime the stock market, stock code increases by let us say 10 percent. Sell 10 percent of the holding. He has some portfolio instruction or a way to maintain the portfolio and to make lots of money, give them back to the client. So this fellow has to predict what the stock market value is. So typically what he does is what this fellow may want to do is he may want to say that I want to predict when this changes and then take a decision. So the question is, so this fellow, this portfolio manager or the stock broker wants to know how the stock is changing. So you may ask, so how does one solve this? I mean supposing we do not make it, we do not really get into the problem. I tell the portfolio manager do this. What will the portfolio manager say? Let me tell you the constraint. The stock broker does not want to keep collecting the data. This fellow wants to poll only when required for whatever reason the internet is slow. Now instead of collecting 100 samples, if he collects only one sample and he is told that then your, that sample will come very fast. So he can go to an expensive channel where he has to pay lots of money for every data bit downloaded but he is willing to pay that because he has minimized the polls. He does not want to poll every time. Just to give an example, if you look at some of this online cricket commentary where the data keep coming, it will keep refreshing. After how many seconds? For example, even t-time it will keep saying t-time or it will keep saying stumps but it will keep refreshing. You do not need to because nothing has changed. So after every, how many seconds? Does anyone know? 60 seconds? 65? So that is the HTTP protocol. So as a result it will keep refreshing. So the stock broker says I do not want to do that. I want to download only when the value changes by so much. So for example, this can be handled. So let me just summarize this. To develop a model to support queries involving dynamic data items and to provide good quality of service as specified by the clients. So what is the good quality of service? So he wants to predict. He wants to use that. At the same time he should not miss out. Supposing the value has changed by 10 percent, the price of the stock has gone up by 10 percent and he should have sold 10 percent but because this fellow did not predict it properly he lost the chance. This may not appear so important, so serious but this will be more important in situations such as remote health service. Somebody monitors the health of some patient and says now give the drug. Now temperature has gone beyond some value then there is a serious problem. So it has applications like this. Now you would say that after all the server has that information. The stock broker, portfolio manager, where does he get this information from? There is somebody that knows that value. We call it the server that pushes the value. So we can say why cannot this fellow take it from that server? Tell the server when the value changes by 10 percent you tell me. So but if you do that what the server there are some problems in that because there could be many such people, many such people requiring, requesting data and the server may not be able to service all these requests. Number one. Number two is this portfolio manager may use information from two servers because it may be portfolio when the portfolio changes by 10 percent sell off 10 percent. So now we are talking about portfolio not one stock. So he may be taking stock value from Bombay stock exchange, some other stock value from national stock exchange and another one from commodity exchange, something from real estate exchange whatever. So he may take the information from various things. It may not be possible for one server to tell the value has changed by 10 percent now. As a result this thing will not work. So he has no option but to predict. That is why I am saying. So this is the push technology where the server pushes the information but I am saying this is not okay. So you pull only the required information and this fellow has to come up with some heuristics and so we say that pull strategy is preferred over push because of the reasons that I explained but it is more difficult because now you have to predict. Do you understand this? So what this fellow has to do is you should say that now the value has changed. Collect the data verify that the model is okay push it. If the model is not okay correct the model. You would still take some action. That is how it goes. So because you might say if you already know when it is going to change so why you want pull? You have a great model so do not even pull. Just take action but the point is that model may not be accurate. So you would pull the thing you will correct your model improve it so that next time it will work better so on and so on. And the model of course could be changing the universe could be changing and so on. So I will just briefly you know some of the slides I will just go through quickly just to give a flavor of that approach I would even show that slide otherwise not even required. So when I started this we were looking at this some heuristic methods were applied and those were actually not very good one called adaptive TTR the second one is adaptive pattern matching the second one is control theoretic approach. I am going to concentrate on the control theoretic approach but I will just give a flavor of these two. So do not worry about all the symbols only thing is you say that TTR is time to refresh time to refresh not the 65 seconds but I decide when to refresh. Adaptive is maximum of TTR min min of TTR max a times this plus 1 minus a times TTR dine where TTR dine is again once again calculated by some formula. So this is a heuristic method some formula was given and then how does one arrive at those and how did he get this formula well just by trial and error and I put 10 parameters then I go on fiddling around this is one adaptive TTR. The other method is adaptive pattern matching. So what does this say I saw it in the past 5 times whenever these 3 whenever 3 successive samples went up in this with this rate the fourth one will come down. So you say that that is my logic I will list all of those and I will say that next time when I get something I will say that does it belong to the database if it does then I will say that next I will predict that this will come down or this will go up stock quote will go up will come down whatever it will come down by 5 percent go up by 10 percent whatever I predict this way. But what is the guarantee it will fit into that so there is no guarantee so if it does not fit into that then I create a new rule. So I go on building this data so this is the second heuristic methodology that was used at that time. So I am I work in the area of controls so I propose let us try control strategy so what is the logic so problems in heuristic approaches too many tuning parameters no sounding not extendable to multiple streams so control theoretic solution was attempted. So what is that look at this very simple example you want to suppose heat a room choose some heating if the error is positive which means that the temperature is actually lower than required value you heat more if it is negative heat less a very simple logic let us say we will do this choose some sampling interval based on previous calculations and then if it is positive that means it should have changed by 10 percent I should pull when the data value should have changed by 10 percent but I pulled it when it changed by 9 percent only that means I have not waited long enough then I will say that if it is positive error I increase the interval if it is negative error I decrease the interval so that I in this way I go on tuning my model is it correct we follow this of course I assume that here in this interval there is a monotonicity property that interval is so chosen that it does not value does not go up and come down if I make this assumption this will work but the same thing is used here also monotonicity property is used here also. So this is how it is modeled so let us say so this is my process what is this process it is a sampling process and I am going to sample now u is the sampling time because I want to decide the sampling time when should I pull the next value and why is the value obtained at that what is the stock quote at time 9.05 9 hour 9 o clock 5 minutes what is the stock quote 9 o clock 6 minute what is the quote so I have this is the time this is the stock value so let one sample be taken at t i minus 1 let its value be q i minus 1 that is at time t i minus 1 the stock value is q i minus 1 coherency is c coherency is in this case I am interested in the value changing by 10 percent so c in this case is 0.1 coherency refers to the change that I am looking for by how much this is allowed to this is to be changed. So I want to find the next time instant t i with let us say q i minus q i minus 1 the absolute value equals c because if it increases by something I will do something if it decreases by something I will do that of course I have made symmetric ok in this case on one case I will buy the other one also ok I want to find the next time instant t i at that time the difference in data value should be equal to c the absolute value equal to c could go up or come down. Did you follow this this is my control problem this is how I want to find t i I am at t i minus 1 I want to find t i so what is the heuristic that I should use logic I should use to pick t i. So in other words t i should satisfy this equality so this is how I define my input and output values so do not worry about this u r it is not second line is not required u i is t i minus t i minus 1 the change I should effect ok that is my control value for example heating or cooling should I heat now or should I cool now how much I should do right that is the control value. So here the control value is this small u i ok it says when should I pole next what is the output I want to control the output is this and I want this to be equal to 0 in the in the heater problem the output is the error I want to make the error 0 ok here I want the numerator to be equal to 0 in other words this absolute difference to be equal to c you see the analogy ok. So I want to make this y 0 at every time accordingly I should choose this u in the heater problem I want to make the error 0 accordingly I choose the heating or cooling and the quantity is that ok. So this is how the control logic comes ok. So what we will do is look at this logic this is how the control logic is arrived at. So process is something that I have already explained u and y this is the controller I should what is this controller do it just checks delta y is greater than 0 delta y is actually delta y is y minus set point set point is 0 that is I want this output y to be equal to 0 I just check is suppose this is greater than 0 ok. So delta y and y are the same ok. So if this is greater than 0 then this is greater than 0 if this is greater than 0 then I say that and of course I am told that c is greater than 0 because it is coherency point 1 for point 1 point 2 whatever point 0 1 point 0 5. So as a result I find that q i minus q i minus 1 is greater than c in other words if this error is negative if the error is negative which is nothing but 0 minus this is negative that means that this is positive if this error is negative this is positive if this is positive that results in q i minus q i minus 1 greater than c that means it has changed too much you understand this should have been less it should be actually equal to c but at the most I can live with less than or equal to c but it has changed too much. So what should I do? So e less than 0 results in this so change is too much should have sampled earlier reduce time to refresh. So what want the TTR new to be less than TTR old. So TTR new I use this logic TTR new equals TTR old plus K e so this is what this controller does this controller has this value K so that K because error is negative because we started with error negative error is negative K is positive then TTR new will be less than TTR old and this will work also for error positive same logical. So this is how we design this controller this is known as the control loop or feedback control loop it is very similar to the heater control problem room heating problem is that ok. Now the question is where does the model come where does the time series come where does the least square problem you know application where is it useful where are you using the projection ok it is the next slide. So in order to find this K in order to find this K you need to know about this process you should know that if I put so much heat the temperature will go up by so much. So that means you have a model in your mind so you have to model it how do you model it in this case you model it like this yesterday I showed you only the right hand side but in a general thing you can say that these differences that you have here of course A 1 through A n B 1 through B n these are all model parameters and E is there because this is an approximate model ok. So what do you do so you have this you stack you do this at this time instant at the next time instant you do this again write down one equation below go on doing this stack these equations one below another and you arrive at equation of this form notice that this theta has A 1 through A n B 1 through B n theta is the unknown vector to be determined whereas all other things are experimentally formed ok. So you have least square solution by doing this you get theta theta is nothing but these model parameters once you know the model parameters you know what will happen if I pole at this point pole at this point what will happen so I have constructed that based on the data ok because there is no model fundamental model that is available this is an example where we collected lots of data and then we plotted this we used a moving window that means I will go on estimating I will do this using the values obtained over last 20 samples I will fit it and then that window itself will keep moving so that I will always get the latest model and then we used proportional controller so the control algorithm and then I said that we are going to use of course how does one choose K I used a procedure called Ziegler-Nichols tuning procedure which uses the model and to arrive at this K we used three stock codes IBM Intel computer and Veritas these were collected in the first week of June 2000 and then we used two matrix how much by using this process by how much have I reduced the network load how many less number of pollings I am doing now that is one the other one is loss of fidelity did I lose something important that is the second one so I want to keep this small I want to keep this also small right these are the two matrix I will show the results ok the first one let me just tell you what these things are network overhead that is I want to reduce the load and this is the loss of fidelity in both of them I want the lowest one the lowest performing one is the best ok I am not sure whether you can read these the letter is somewhat small but it turns out that this order 1 2 3 4 is the same order that you have here this is adaptive TTR this is pattern matching these are the two heuristic methods that I told you right in the beginning that was used at that time developed by the researchers and this is with the control thing you can see that network overheads are much much lower than the other two we found that sometimes you could just get away with one fourth the sampling that means the network load has come down one fourth compared to the previously obtained heuristic methods and the fidelity loss in fidelity is still I mean it is comparable so this is fast trace we actually have three traces this is fast changing medium changing slow changing so we got all this medium trace slow trace in all the things this worked very well ok then we said let us try it on temperature data it worked there also so this is a summary of course there are other problems we studied so not just one piece of data but two three quotes ok some total has to be minimized and then we so multi input single output this is the portfolio query and then the same stock broker has to handle the stocks of various people so all these things resulted in various problems and then of course we published a lot of papers so this work was done with professor Krithi Ramamritham of computer science this person Shah was a student in chemical engineering and this R. K. Majumdar did a Ph. D. in computer science on this topic this was his Ph. D. topic ok and then now we applied it to lots of other problems couple of Ph. D. students in computer science are doing this now under the guidance of professor what Shah I have come to the end of this talk so in this talk we saw the importance of models time series models and we discussed the meaning of E square solution and applied it to practical control problem and then this modeling issues are actually very important that is how this course actually fits in because then you have to say that what is this error is it stochastic what kind of distribution it has so all of those are extremely important how many samples to take lots of this moving window so if you see any probabilistic based modeling you will quickly end up with E squares prediction so with this I will end the talk thank you.