 Hello everyone. Welcome to yet another session of our NPTEL on nonlinear adaptive field. I am Shri Khan, Superman, Systems and Control IIT Bomb. So we just started the 12th week of our lectures in this NPTEL course. So this being the final week, I sort of started off a little bit with a little bit of history of our adaptive control world. And this is what we were looking at until this day. So if you, so we were looking at this paper by Ernest Ronald. There is a paper that of course I will post and we are essentially looking at the analogy of how things are developed. We of course saw a few things on how you started with the requirement for adaptation. Then we had this optimal optimality based MIT rule. And then there was a need for stable adaptive control. Beyond that, we also had all the different adaptive control, linear adaptive control, and epsilon, sigma modifications, parameter learning. But then there was also this parallel world where there was development for this. So plastic discrete time systems, then there was a notion of learning, which was like, like neural networks, Adeline filters, which involved parameter estimation again, or weight estimation again. And then there was more from the reinforcement learning which involved, you know, the use of neural networks to estimate some value functions and so on. And so overall there is, there is a lot of parallel developments where the motion of parameter learning was playing rather big role. So this was what we have seen. I quickly want to wrap up our discussion here on this paper. Of course, I will put this up. We have not even discussed half of the article, and we're difficult to discuss it in entirety. But a lot of it are things beyond this, a lot of it are material that we have covered in this course. And so it should not be very difficult for you to follow. So this is where I start my lecture 12.2. So one of the problems that of course started to be discussed in parallel with the evolution of adaptive control was the pattern recognition and classification problem. So in this, what happens is that there are usually two classes A and B. And here there is basically some features which are denoted by XK. And there is an output which is denoted by YK. And usually these outputs are in the form given in equation 21. So this framework was of course in Yakubovic and Novikov also. So the, you know, what we want to do was to classify these image. And so the question that was sort of asked is, is there such a, are such of these possible to separate these classes by a hyper plane? And what does it mean to classify objects or these images into a hyper plane? It means asking the question of whether there exists some parameter it does and it does not such that you can write this output in this linear relationship with the parameters. So this is this, and of course this file is some suitable kernel function or regressor in our adaptive control frame. So how this works is you sort of expose the model to a lot of features. And the goal is to learn, you know, this value theta, theta 0 so that you can classify any further images after the learning process into classes A and classes B. So and of course, I mean, how do you separate them into classes? And of course, I mean, we look at this value right here, right? I mean, but then you get, if you get one, you are in class ASA and if you're minus one, you are in class D, right? So that's how you do this classification problem. So this is the image classification problem. But then in recent times, you would have seen sort of data classification problems also, right? So the interesting thing is now again, this problem is reduced to identify some parameters. And of course, you can use this gradient type algorithms in order to find the values for these parameters to be specified accuracy, of course. And of course, there are many different solutions such as the one you see here, which is, you will see that it is not very different from your adaptive feedback loop or adaptive update loop. This is just in discrete time, but has a very similar feel to your adaptive update. So of course, there is this, you know, I mean, this sort of interesting idea. Of course, there's an alternative approach where you choose some vector of weight such that this corresponding hyperplane is a supporting hyperplane to the convex hull of all these vectors, right? So you sort of construct these vectors yxk, phi xk and then you sort of find these hyperplanes that are supporting hyperplanes to the convex hull of these vectors, right? So then, and of course, so that the minimum distance from it to the convex hull is maximum. So this is sort of how the support vector machine methods where I come about. But then you also see that there is this alternative method that we discussed first and these, you know, are sort of closer to what we do in adaptive control, right? So anyway, so then there was of course also the notion of reinforcement learning or adaptive dynamic programming. So I'm going to highlight this that the fact that reinforcement learning is essentially adaptive dynamic program. And why is that? The idea is that you have some, you know, nonlinear evolution and then you have what is called a control law which is called a policy in this discrete dynamic programming framework. So policies essentially you have different control values at different instance of time, u1, u2 and so on. And then there is this notion of a performance index that you want to minimize. This performance index is like an infinite series or infinite sum, you know, of this value function g, right? And then j pi is called a cost function. And the idea is to find the optimal, like you want to find the optimal. And where is the optimal if you see the, you can see that it is indexed with a pi and this pi is this policy. So it's an optimal with respect to the policy. So you find the control law, if you may, yeah, or a control policy so that this infinite sum is minimum. So you have to choose. So it's not, it's not just one choice of controller in discrete time, it's actually infinitely many choices, right? So you have u1, u2, u3 all the way to infinity because you can see that this is an infinite sum, right? So then in order to sort of look at this problem, how dynamic programming does it? And of course, this is, this is the expression for j pi star. You can see that it is infimum with respect to the control policy. So using the Bellman's optimality principle, what you can, the value iteration process with this is essentially organized like this. So the optimal, how you read this is that the optimal at the 3 plus 1 h time is simply the infimum overall control values admissible, admissible control values. So ux is the control admissible control set. So the admissible control means that suppose your control is to be bounded between, you know, absolute value of control is less than 1. So that would be this set. So you essentially look at the infimum overall control values of the function gene and the optimal cost at the kth. So this is how you sort of progress from one step to another in order to find the optimal control for that step. So this is the optimal value of view for that step. So of course to solve this, there is a very, very large number of functional computations to be done. And so the idea here is to use a neural approximation for the function jk and that essentially is approximated using as always a regressor phi, yeah, and a set of parameters wk. So you again see that there is a set of unknown vectors wk, which are weights, in this case large weights. And then there are finds which are some truncated basis here and these are used to approximate the value function. This is what is the basis for, you know, doing reinforcement learning and this identifying these weights is what is the adaptive model. This is the connection with adaptive control because you see that again I have some parameters that need to be learned. I mean you can call them weights but eventually they are parameters that need to be learned. The only problem here is that if there are disturbances then talking about stability is rather difficult because you know, you sort of have to, if you want to look at stability and evolution of trajectories you have to integrate over all such trajectories. And this has to be some kind of Monte Carlo method that has to be introduced. And therefore it is very difficult to, you know, I mean come up with stability conditions for a closed system. I mean there are of course some results out there that talk about these but that is of course not a part of this article at least in here. So anyway, so this is sort of where we stop our little bit of a history lesson. Subsequently there are many other sections in this paper on solutions in adaptive control and so on and so forth. A lot of other interesting topics that have been covered in adaptive control. So it gives a little bit of a flavor of everything there in adaptive control. Some of this, of course we have covered in our curriculum here and some of it we have not. So I would strongly encourage you to look at this article to get a good idea of where we stand in terms of adaptive control and how it's connected to learning. What I want to do now is to again go to another article if you may, right. Okay, I don't know why I have to pay the subtract. I apologize. All right, so I want to sort of hear this article. And this is an article it's from 1996 from Frank Lewis and go out those and this article actually talks about implementation of an adaptive controller via multi layer. Or an adaptive controller via multi layer neural networks. And this is being used as a robot. But this is one of the probably one of the first multi layer implementations of neural network adaptive controlling neural networks with with nice tracking stability performance results. So therefore, I sort of wanted to look at this article which in this last week, that you can get an idea of how actually stability guarantees in neural networks can be provided. Yeah, so again you see that this is an old article. And this is a rather specific article we look at real time application of neural network. In a lot of literature that you see now or a lot of applications that you look at now. Most learning algorithms are implement are offline implementations in the sense that the entire training of the algorithm happens offline. But this is one of the few results that you will see especially connected to control where you have online learning of the adaptive controller also. Now, one can of course question, you know the advantage and disadvantage of fire. Of course it is true that if you do offline training you can take in a lot more data and you can do much better learning of parameters and learn probably for a larger variety of cases and so on. But one of the problems is that with this offline learning and then implementation online is that if your initial conditions and your of your dynamical system and your operational conditions of your dynamical system changes a lot. Then, for example, if I if I did a lot of, you know, I wanted to look at a sector, and I want to fly my unmanned vehicle using some images, image processing, and I made a map for example, you know, in dawn at dawn, so I did all my training at early morning. And I use images that I obtained using an early morning flight, a bunch of early flights in this way. But then I suppose I fly my, you know, actually, I actually find my UAV in in the meantime, where the light is a little bit less or maybe the data where the light is a little bit more than the results become significant to different. So if the initial conditions operational conditions change from the training data, then your performance can significantly deteriorate. On the other hand, if you are actually using some kind of an online learning of this neural network, then you can continue to learn even when things change and you know, you don't deteriorate a performance significantly. So that's one of the advantages. Of course, the limitation is that you can only have a few layers, you know, for computational efficiency, and, you know, and in order to be able to implement things in real time. So that's the idea. We are looking at a real time implementation of a multi neural network. And we do this in the adaptive control framework that we have seen. So a lot of results have been written about this adaptive neural network. And, and, but there is little about the use of neural network in direct closely, at least until 96. There's a little bit more now. But again, not a lot. Yeah, this is still an open area of research, especially the part about proving stability. So, one of the issues is sort of that need to be addressed adequately is the inclusion of ad hoc controller structures and the inability to guarantee satisfactory performance. So, so one of the things that is required in a lot of articles that these authors mentioned is that you need, you know, initial estimates for the neural network rates in order for performance to be good. Yeah, and identifying such stabilizing weights may not be very easy before you even start running the system. Right. So, so the idea in this article as the author says to confront these deficiency for full nonlinear three layer neural networking arbitrary activation functions. So, of course, there is a concept of robot control get discussed here a little bit because the application is a robotic manipulator. And the tracking performance is guaranteed using the app on a shelf. Like, even though there is no ideal weights that are identified and typically when you use a standard adaptive controller for a robot. The regress, the regressor matrix, the regressor matrix that is used in all these needs to be very previously computed for every specific robot. For example, if I go from a three-link robot to a filing robot or you go from a, you know, linear joint to a article or a rotary joint, things change significantly. So you have to compute the regressor matrix very carefully in order to apply an adaptive controller. The advantage of using a neural network based adaptive control is that it will automatically learn these parameters. Right. So these using these activation functions so it can be applied for any serial link robot become. So this is the advantage one controller works for any serial link robot become just like a package. So one of the things that is sort of demonstrated in this article is that the standard tuning using application propagation based methods. He yields a can yield unbounded neural network weights. And especially when there is, you know disturbances, you know, and the robot arm is more than one link, which is sort of very, very basic requirement that robot arm may have more than one link. So there are modified weight tuning methods that also proposed here, which, which of course, you know, select the epsilon modification. Yeah. So, so it's the modified weight tuning approaches also, you know, the authors claim that they can avoid the P condition, which is also something rather good, rather strong, right? Now, so let's sort of start looking into what the problem setup is. Right. So we know that we are doing learning using adaptive inputs. So typical neural network parameter estimation algorithms can rely on some kind of, you know, again gradient descent type formula, some kind of optimization based formula. But here, everything is based on adaptive control and Lyapunov based results, which helps us to hear stability and close to performance, which will not usually be treated in the optimal frame, right? The standard framework for offline identification of these parameters. So the other thing you suppose is that we do online identification and online tuning of the neural network. So that is another thing that we are looking to do. Yeah, so we do this. Yeah, if in a particular framework and this framework is what we are going to look at. Yeah. So we already know what all these real numbers are in an RM cross and one important thing is that we will be looking at functions f which go from some compact connected set in R in to RM. And the set of continuous functions in this space as is denoted as CMS, right? Then we also, we have already seen the notion is infinity norm. So this is essentially the supremum norm is the same, right? It is similar in fact. It takes the normal norm of the vector norm of the function and it takes the supremum of that over all the X in S. Over all X in this domain, you take the supremum of the vector. Yeah. So this is what is called the supremum norm. Then there is a Frobenius norm. We also saw this when we were looking in at model reference adaptive control. So the Frobenius norm for matrix is just the square of the Frobenius norm is the trace of A transpose A. It is not a vector norm. It is actually just a, it is essentially looking at the matrix as a vector and computing the, it somehow vector transpose itself of the matrix by stacking the matrix column columns into a vector. But of course, it is not a vector. It is not an induced norm. The way we have defined induced norm, I hope all of you remember. But it is in fact, compatible with the induced norm, right? So if you take the two norm of AX, then it is less than equal to the Frobenius norm of A times the two norm of AX, right? So this sort of inequalities are rather useful, right? Whenever we do sum of square kind of decomposition, you know, some pushy schwarz step inequalities we want to use in the Lyapunov analysis, you know that these kind of, you know, qualities are very useful points, right? Of course, we say that this matrix A of t is bounded if the infinity norm, induced norm of this matrix is bound, right? Excellent. Now, we want to of course see what a typical neural network looks like. In this case, we are looking purely at the three-layer neural network. So there are three layers in the sense that there is an input layer here, there is an output layer here. So all these, sorry, let me be careful. Yeah, so there is an input here, there is a hidden layer here, and there is an output layer here. Yeah, input, hidden layer, output, whichever way you want to look at it. Yeah, so there is, so anything in the middle of input and output is called a hidden layer. Yeah, it's as simple as that. Yeah, because it's neither the input nor the output, so you don't look at it, so it's the hidden layer. Okay, so this is the three-layer neural network structure. These VI, VIJs are the weights here, WIJs are the weights here. And then you have these sigmas which are basically like activation functions. So this is how a neural network works. You have some N1 inputs going in, then they are scaled by some weight, right, here. And then they are summed up because you see there is a summing action here because of everything goes into everything else. So every input feeds into every node, if you may. So there is the scaling and then there is the summing action over the N1 nodes. And then there is an offset which is added. And then it passes through this activation function. So I think it's better to write it as this. So the activation function acts on this. And once the activation function sigma acts on this, it again goes through the second layer. So of course there is again weights, right, and that's the weights here. And then, you know, then there is again a summation. So, and then there is again an offset and then there is a summation. So this is what is the output. So you can see that you have an input layer, the hidden layer, then an output layer. So this is what is the three-layer neural network, one input, one output and a hidden layer in between. So there is one activation function sigma. So we have several things you can see that are already unknown. What a typical neural network implementation would try to identify are these offsets and these weights. Because once I identify these offsets and these weights, of course these are these N1s and N2s are pre-chosen quantities. So once I identify these offsets and these weights, I have an exact, an almost exact relationship between the input and the output. So don't get confused by our control system, the input is not the control, output is not the output of the system and all that. This is for any non-linear function. It's for any non-linear function which is getting approximated in this. If you take any non-linear function, it can be approximated in this. And that's the whole idea. And that's the whole idea. Now, in this case, we are doing the real-time neural network implementation. Therefore, we want them to exhibit learning while controlling. And we are not okay with learning first and then finding some parameters, good value parameters, then using it for control and so on as a separate problem. No, we are looking at jointly learning and control. Now, what a standard choice of, you know, these activation functions are these like sigmoidal functions. These are like sigmoidal functions are like smooth signum functions, hyperbolic tangent function. We already know hyperbolic tangent functions. We saw them for the purpose of projection-based adaptive controllers. And then you have these radial basis functions. These are essentially like the functions that you see in this bell curve in normal distributions and things like that. So, all these are very nice functions for function approximation. That is why these choices are made. It's not just arbitrary functions. I wouldn't take a linear function here for sigmoidal functions. So, all these functions are somehow functions that help you to do good function approximation. So, how we want to write this whole equation one is in a very compact, you know, again, parameter regressor form. So, this looks like a, I would say, nonlinear parameter regressor form. This looks like a nonlinear parameter regressor. Why? So, how we do this is, first of all, we define our x with an x0. So, there are, you see there are n1 x's here, but I add one more x to it as x0, okay? And then what I do is, this sort of lets you to accommodate this heaters, okay, in this v. So, the first column of v contains a threshold theta wm, right? I'm sorry, I was here. So, if you have it 0 chosen as 1, vector of 1, then you can include, you know, this threshold vector into the wings. So, if you look at, then if you compute v transpose x, it should be evident to you, because the first column of x is 1. Yeah? Then, sorry, because x0 is 1, not first column, because x0 is 1. If I choose the first column of v as the theta v's, then this is exactly what I have inside. And similarly, I can do for the w's also, right? I take the sigma activation function, but again the first term of this activation function is chosen, it's just written as 1. And so the activation function, of course, if I write it as sigma z, for a vector, it operates, it's assumed to operate element-wise, sigma z1, sigma z2. But if I make the first element to be 1 here, then I can incorporate in w this offset source. And once I do that, I can actually write y equal to w transpose sigma v transpose x, which is essentially like a nonlinear parameter regressor form, alright? So, nonlinear because both w and v are parameters, it's not like a y theta equal to u kind of a film, not as simple as that, but a little bit more complicated. So, any tuning function basically w and v includes, you know, all the thresholds and the offsets and everything, right? So that's the idea. So, great. So, what did we see in this session? We sort of finished our discussion on this very interesting article, which chronologically discusses how adaptive control developed, how it got connected with learning, how reinforcement learning and deep neural networks essentially are parameter, doing parameter identification in a different framework. And now we've started to look at the basic problem of doing real-time neural network-based control of a robotic manipulator. And here the neural network builds are going to be adjusted using a laptop-based adaptive control theory. And that's what we are sort of trying to set up the framework for that. And in the subsequent session, of course, we will continue to look at this. So, I hope to see you again soon.