 Good morning. Welcome to Purdue Engineering Distinguished Lecture Series. I'm Wayne Chen, Associate Dean for Research and Innovation in the College of Engineering. This series is established in 2018. Since then, we have been inviting world-renowned faculty and professionals to Purdue Engineering to encourage thought-provoking discussions and ideas with our faculty and students regarding the grand challenges and opportunities in their fields. So we've been doing this once every few months, and it's a distinguished series. Traditionally, Dean is going to introduce our distinguished speaker. So it is my pleasure to introduce our new Dean, Dr. Arvind Rahman, who is the new John Aderson Dean of Purdue University College of Engineering. He's also the Robert Adams Professor of Mechanical Engineering. Dean Rahman. Hi. Good morning, everyone. Everyone who is in this great atrium and all those online. This lecture is going to be broadcast live, so welcome to all of you. It's really an honor and a privilege to introduce our speaker today. He is one of the true giants in computer vision. Professor Jitendra Malik is the Arthur J. Chick Professor at EECS and UC Berkeley. He also has a part-time appointment at Metta. His research has spanned many areas, computer vision, AI, machine learning, robotics, human visual perception, and many more. But he and his research group have been known for tremendous impact in the community, right? They have been known for an isotropic, the Perona-Mulloch diffusion algorithms. Not often that you get algorithms named after yourself. Normalized cuts, image segmentation, shape context, and object detection algorithms, high dynamic range imaging, and many more, right? These have had a tremendous impact in the field. He's been recognized for the impact in many different ways. He was a 2013 IEEE Distinguished Research Researcher in Computer Vision. In 2014, and this is something we at Purdue are very proud of, that he was the winner of the King Tseng Fu Prize from IAPR, the International Association for Pattern Recognition. I don't know if many of you know the history, but Professor King Tseng Fu was a Gauss Distinguished Professor here in Electrical Computer Engineering, and he established IAPR. So that's a great connection. Dr. Mulloch, if you have a chance, we'd love to show you the mural in the Fu Conference Room dedicated to Professor King Tseng Fu, so you're welcome to that as well. In 2016, he won the ACMAAI Allen Newell Award, and in 2018, the IJCAI Award for Artificial Intelligence. In 2019, this was followed by the IEEE Computer Society's Pioneer Award. Dr. Mulloch is a member of the National Academy of Engineering, as well as the National Academy of Science, and he is, of course, a AAAA, Arts and Sciences Academy member as well. Please join me in welcoming Professor Mulloch. Thank you. Thank you very much for this generous introduction. It's my pleasure to be here. And Professor King Tseng Fu's name was mentioned. He is one of the pioneers of our field. So the field of pattern recognition, there was for a long time this sort of tussle between pattern recognition and symbolic approaches to AI. And of course, the final answer is going to be a bit of both, but what we are currently seeing is actually the victory of the pattern recognition school of AI. I mean, a lot of the machine learning techniques and neural network techniques in some sense go back to the pattern recognition tradition, which Professor Fu pioneered. So it's a great pleasure for me to be here and at Purdue and speaking at this event. So I'm going to talk today about robots that learn and adapt, and it's a high-level talk for a general audience. So I want to start by talking about what is natural intelligence before we talk about artificial intelligence. And the way we can think about natural intelligence is through evolution, the lens of evolution. So something like 550 million years ago is when you had the first multicellular animals that could move, and moving gave them an advantage because they could find food in different places. But if you could go to food in different places, you need to know where to go, and for that you need perception, something like a vision system. So the earliest animals which had perception, they had a combination of this ability to move and this ability to see. And there's a psychologist Gibson who has this statement, we see in order to move and we move in order to see. So this is the most central aspect of intelligence, I would argue, this combination of movement and perception. Moving down the evolutionary tree, you have hominids, the early humans when they will be branched off from other primates. And then you have the development of bipedalism. So you walk on two legs, so now your hands are free for tool use and so on. And the development of tools gave us the ability to modify our environment in a more significant way. And the brain developed in response to exploit that. So there is this quote from a Greek philosopher, it is because of his being armed with hands that man is the most intelligent animal. The development of the hand preceded the development of increased brain capacity. And then of course we can go on to modern humans coming out of Africa maybe about 60,000 years ago and somewhere earlier than that, maybe a million years ago, maybe half a million who knows, we have the development of language and so on. So these things like language and symbolic behavior is relatively recent in the evolutionary history of us. From this 500 million years, if you think of one million years, if you think of the whole history of intelligence as 24 hours, then this is the last two, three minutes. Now of course that is what we care about a lot today because everybody is talking about chat GPT and I think we will have this discussion at greater length in the afternoon and the panel. But it's incredible. I think what this can do, it has captured the popular imagination and it do well at LSATs and all these various kinds of tests which the general person in the street thinks of as associated with intelligence. On the other hand, I want to talk about what we can't do or we have not been able to do. Self-driving cars, it's a similar thing, a similar capability which is in the popular imagination and the first self-driving cars were in the 1980s. This person whose photo is there is Dickman's. Dickman's had demonstrated cars driving on the Autobahn in Germany in the 1980s and solving the control problem of staying in the lane and so on. And we still don't have self-driving cars. Elon Musk told us in 2018, he promised that by the middle of next year we will have a million Teslas all self-driven. Not quite there, right? Okay. And then I will make it even more mundane and driving a 16-year-old kid in 20 hours of driving experience can do this. Whereas you're talking about becoming a lawyer which supposedly takes years of training and we can solve one problem but we can't solve the other problem. And I'll pick something even more mundane what a 12-year-old can do, right? In a kitchen with some implements, we can do all these things. So this is a collection of verbs of, you know, if you want to make an omelet, you need to be able to stir, you need to be able to chop, you need to be able to slice, things like this. And we can, a kid of 12 can do this. No robot today can do this. Okay. So this is what is known as a Morovaix paradox in our business. And Hans Morovaix said this in 1988 but I think it was actually folk wisdom. Lots of people knew it. He articulated it and so he deserves credit for this. Which is that it is comparatively easy to make computers exhibit adult-level performance on intelligence tests or playing checkers and difficult or impossible to give them the skills of one-year-olds when it comes to perception and mobility. And Steve Pinker has a nice slogan which is the main lesson of 35 years of AI research is that the hard problems are easy and that the easy problems are hard. And then it goes on to say that the gardeners, receptionists and cooks are securing their jobs for years to come. Okay. So the question is why? I think this is an interesting question and I think, again, I think we'll get to this in the panel this afternoon. So I don't want to, you know, think of this as anticipation that we will discuss it more. Which is that, why is this so? And one argument can be, I think Morovaix original argument was that it's more difficult to reverse engineer skills which are older in the evolutionary process and have taken much more time to optimize and perfect. And I think another argument could be in terms of data, but I don't want to go into it because this is the afternoon panel. But so my focus is I really want to build the two-year-old or potentially the home robot. I think it would be wonderful if all of us could have a robot in the home which could assist with various chores or more seriously. I mean a very large fraction of the population is elderly and we need them taken care of and a robot in the home could take care of help out in many ways. So this is really a societal need, I would argue. And it's what I feel that I want to work on. In the next 10 years, if we can have a home robot, I'll be happy. Just like we had home computers which in the 1980s took off. Before that, a computer was a big expensive thing. It took off because there were applications like spreadsheets and video games and word processing which made using a computer worthwhile and it became cheap enough and useful enough. Well, when will that happen for robots? So that's the application and now let's get into it. Okay, so I'm going to start with some examples. So this is a robot that we have and it's walking in this terrain, okay. And it's walking autonomously. I'm not controlling it with something. And here is... So this robot is a fairly cheap robot. It's manufactured by a Chinese company called Unitary and they can sell it to you for $10,000 or less. I think the versions for much less now. So it's in the range. This is the kind of price point. When it's a few thousand dollars, you can imagine a machine like that in every home. Okay, but robots have to walk in these different situations. Very importantly, this robot is blind. It has no vision right now. I'm going to add vision later, but right now it's blind, but it has to walk in all these different kinds of terrains, right, which it doesn't know in advance. And importantly, it's going to use the same controller. I'm not allowed to say, oh, swap in controller A, controller B, controller C, controller F, 7.3. No, it's the same controller which has to work in all these conditions. Okay, more examples. So I think that is the problem of robotics. We need to be able to work in a wide variety of conditions. In the walking case, this is about walking in different terrain. Okay, it has to work. If you're talking about manipulating an object, if I have to slice a tomato, I should be able to slice different kinds of tomatoes with different kinds of knives. So I want to connect this with the field of pattern recognition versus control theory as a big grand idea. Pattern recognition, as I said, is a field pioneered by Professor Fu. And in pattern recognition, what we regard as a central issue is generalization. So what is a cat? What is a chair? What is a dog? Visually, right? We realize that mathematically, we cannot come up with necessary and sufficient conditions for defining a chair or the appearance of a chair. We do it by examples, and then we train some machine learning system, some classifier to do this. So generalization over aspects like pose, lighting, and so on are critical. If you think about it in the context of action, in the context of control, what do we need? We need robustness. There's a standard concern of control theory. The standard concern of control theory is that you're trying to do something, like maybe maintain the temperature of the house at whatever, 75 degrees Fahrenheit, and there'll be some perturbation, maybe some window is open and cold air comes in. So if you do something to compensate for the disturbance. So that's the central aspect of control theory. But I would argue that there's an equally important aspect, which is adaptation. So adaptation to different terrains, and I'm going to go into that in more detail. So I will not have too many equations in this slide, but this one equation I have to have because it defines the problem of control theory, which is control developed around 1960. This is very important. I think we give a lot of credit to aerospace people, and they should get credit. I mean, after all, we are in the Armstrong building, so how can I not give credit to aerospace people? But also you should give credit to the control theorists, because if there wasn't control theory, you would not be able to have accurate orbits and not be able to... I mean, John F. Kennedy could say we will send man to the moon and bring him back safely, but the safely part depends on having accurate orbits, and that's control theory, right? And 1960 is like a very crucial date here because the period just before and after, you had all these brilliant pieces of work. I mean, the U.S. with people like Bellman and the Soviet Union with people like Pontreagan, Kalman in the U.S., and then less appreciated in the control theory world, but as a pre-genitor of a lot of machine learning, Arthur Samuel's work on reinforcement learning. This is also around 1960. And these are all related because reinforcement learning is just trial and error way of solving the control problem, which is an optimal control problem. So in this equation x dot equals AX plus BU, what we have is X is the state, okay, so that from a physics setting, it would correspond to, say, positions and velocities of particles, or in the case of my robot, it might be positions and joint angles and their velocities. U is the controller, so I can apply some extra input. And A and B here are matrices of linear formulation, but you can imagine a nonlinear version of this setup. Okay. Okay, so this is a very, I'm using very classic terminology here, circa 1960. Look at that matrix A. So that matrix A is what captures the system dynamics. How the system changes from one time instant to the next, right? And this is Newtonian mechanics, for example. But I want to use this matrix A to capture the effects of different kinds of terrain and so on. So what that tells me is that this matrix A is actually not a constant. And it can change very, very significantly. Now that is the problem that I'm going to focus on, and that's what I'm going to call adaptive control. It's related to the classical idea of adaptive control that electrical engineers have studied and they studied it going back to the 1960s. But I'm going to do it using the tools of deep learning, machine learning, neural networks and so on. So basically, if you want to understand philosophically where all of my work is going to be a classic problem approached with tools of neural networks, and I'm bringing these two together. Okay. So that's the setup. And it's essentially that this matrix A is not a constant but it varies and I have to try to do the right thing for different kinds of choices of system dynamics. Okay, so that sets up the problem formulation. And I'm going to talk about this walking robot as a running example but the techniques that we have developed will apply in other cases and I'll show you a few other examples. So we call this rapid motor adaptation. But think of it mentally as, okay, that matrix A is changing around and I have to adapt to it. And that's what we are going to do for legged robots. Okay. So how do we do this? So we train we... So the problem of we start out by solving the problem of legged locomotion. And again, this is simulation technology has just got better and better over time. So of course control theorists have always tried to model the system and write down the equations. But after that the equations can get kind of tricky. And in particular, when you talk about legged locomotion contact is made and broken. And therefore the differential equations that you must use when you have contact with one legged versus the air. I mean, these change a lot and then you get into hybrid system. I mean, the whole thing is it requires a lot of mental gymnastics to make that work. On the other hand, the physics is actually not straightforward. So putting it into a simulator is actually quite easy. And the simulators work faster just because of computers are getting faster. Right? So simulators work. So what we are going to do is to train this robot to walk in simulation and using the machinery of reinforcement learning which if you don't know it's a very simple mental model. It's trial and error. The robot is going to try to walk. I'm going to give some commands to all its joints and so on. And then I'll see what happens if it walks good plus one, if it falls bad minus one. And over time it tries to have that kind of behavior which gets it more plus ones rather than more minus ones. And you can put this in a mathematical way. But that's it. I mean, there are some details but there are no pre-programmed gates. We don't start out by saying that it's a walk or a gallop or a trot or whatever. I mean, traditionally in walking controllers you input that. It turns out that this is not necessary. All you need to do is to have some kind of cost function. What is desired? What is desired is that you walk in a way without falling and you use minimum energy. Very simple. And now you just do trial and error. And it's okay. And then you will try to find some controller which is optimal in some sense. So that's it. So that's the basic idea. So in simulation we have these parameters like mass friction terrain height. And you can vary all of these. So we actually have some kind of a fractal terrain generator. And the robot has to walk in fractal terrain. So fractal terrain prepares it for small steps, large steps, lots of variation. And then you vary things like friction. So in a simulator you can create lots and lots of conditions. And we know all those variables because it's a simulator. I have access to all of that. And the robot has to walk. And this box, this base policy this red guy this is the controller. The job of the controller is to issue commands. So the commands here are indicated by A. And the commands here are what are desired values for the joint angles and velocities. And you give those commands to a low level PD controller which is actually managing the motor currents and so on. Okay. And so what is the, any controller needs to get some input. So the input here is X of T which is the state which is all the joint angles and so on. A which is the actions which in control theory you would use the symbol U. So reinforcement learning and control theory do the same thing but they use different symbols. Okay. One uses X, one uses S, one uses U, one uses A. And I have tried to keep both sides happy by using X which control theory is like and A which reinforcement learning people like. Okay. So A is the action at the previous stage. Okay. But here is one idea which is that we know that the same policy is not going to work in all conditions. When I'm walking on hard ground versus walking on a down slope versus walking in sand, I need to do, I need to issue slightly different commands to all my motors. So how do I capture that? You make your policy have an extra argument Z. And this Z captures some aspect of the terrain. Okay. And Z is going to be a fairly low dimensional thing. Maybe five, maybe eight dimensions and so on. So these five or eight dimensions I think we concretely used eight dimensions. These eight dimensions capture the variability in the terrain. And so what the policy gets are two arguments. The usual things for any controller, the state and the previous action and this variable Z. Okay. So if you go back to my previous equation, X dot equals AX plus BU where that A is the system dynamics, well that A changes depending on Z and now you're supplying that Z. And Z in this case is known. Right? Why is it known? Because it's a simulator. I choose all those conditions. I choose the mass. I choose the terrain height, et cetera, et cetera. So what we have is but we can have lots of variables but actually those variables all don't matter. And you can do something equivalent to a PCA, a principal component analysis to reduce the dimensionality. And that is captured by this red box called the environmental factor encoder which compresses it down to some eight numbers. And in this case it's not PCA but it's the nonlinear version of that which is you have some neural network encoder with a few layers but it's compressing it down. And we don't have to specify exactly what eight dimensions should be. This training process is end to end and it will figure out a way to compress the relevant physical variables down. Okay. So we call this extrinsics. Okay. And what is the robot trying to do? It's trying to walk and there's a reward function or if you want you can call it cost function. So again there's a reinforcement learning and control theorists do the opposite. Reinforcement learning people maximize reward control theory people try to minimize the cost but it's just a minus sign. So the reward function it tries to minimize the energy consumption. Okay. So we do this and then our robot can walk and this is the result of the robot walking in a simulator. And now we're going to take this and make it work in the real world. Okay. And so it seems straightforward, right? Okay. And I'm going to take it, make it work. But the problem is that I can't, I don't no longer have this Z because the Z in a simulator was known because I knew the mass, I knew the terrain height, etc. And this environmental factor encoder just took this privileged information and compressed it. But in the real world I don't know it. So what am I going to do? So basically that's what I have to figure out some way of estimating. So that's my problem. And what's the solution? So the solution it comes turns out to be let me give you the intuition and then we'll do the mathematical aspect a little bit later. The intuition is so imagine this is a blind robot by the way. So if I'm walking on this surface which is very hard versus when I'm walking on a beach when I walk on a beach what happens and I apply the exactly the same commands. So on a beach when I put my foot down and then I lift it up it doesn't lift up as much because it sinks in and it doesn't lift up. So even if I am blind just by my applying exactly the same commands to my legs and feet I get different behavior which reveals to me that I am in different conditions. So that's the insight. So the insight here is that the sequence of past states and actions if I look at my last one second of history and I commanded certain actions and I achieved certain states that is kind of a diagnostic for what kind of terrain I am walking in. So this is what we call this thing called the adaptation module. So I am saying this adaptation module if we give it the input of the past history it may be able to take that history and figure out what's the Z what are the latent variables corresponding to the terrain. And so that's the idea. Now I have just postulated this magic module which will estimate the Z. Okay, how do I estimate this? So that's the problem. So then let me reveal how that can be solved. So the way this so we are we do a phase one and phase two phase one is training in simulation phase two what we do is also in simulation and now what we do is to say that I am going to train this adaptation module but in simulation I actually know what the Z is. So in simulation I had access to the ground truth values of the terrain and from that I had an estimator for Z. And now I tell my adaptation module you have access to the past history and you predict the same Z. So this is a regression problem which is a most straightforward supervised learning problem. So this adaptation module is trying to mimic what you would produce if you had access to the ground truth variables. And okay, so that's how we can train the adaptation module. So you see yet another neural network which is trained. And now and now we have a system which can work autonomously because we have trained this adaptation module and it's going to use the past history to work. And now we have this slogan is one policy to walk them all. So it's exactly the same controller it varies in different so it can walk in all these conditions. I'll give you an example to explain this adaptation module idea. Okay, so here is so these experiments were done by the way during Covid times so my student I told him take the robot home and so he's doing these experiments at his home and he's got this mattress and he's going to this is his name is Ashish, he's going to pour some olive oil okay and there's a plastic sheet and then he has got this robot and he's put some plastic socks on the robot and now the robot is going to try to walk. And notice what happens it's about to slip and then it recovers and you can see it is slow-mo. So what should happen in theory? What should happen in theory is that it's always recording this history of observations and what we wanted to do is that in very quickly in like half a second to estimate the new Z and this is basically what you will see in these plots. So the top four rows they show the footfall pattern so which feet is on the ground so it's right rear front left things like that the four legs and then the the two curves at the bottom there's a red curve and a blue curve and they are two of the eight dimensions of this variable of this latent variable Z and what you see is that as it will walk and it will stumble the estimate of those Z1 and Z5 change. So it stumbles but now it's got a correct Z after recovered and then its policy adapts to that. So the secret is you adapt but you adapt quickly like in half a second and so that saves you from falling if it took you five seconds to adapt you will have fallen. You can't do it in 0.01 seconds because you don't have data but that's it. It's like we all recover from stumbles. So this is an example so this is a payload throw and slow mo and then the same behavior that it estimates so this algorithm is called RMA rapid motor adaptation and it enables this estimation and then this change again default adaptation recovered kit. More examples I didn't have adaptation. Okay, so I think I've now covered the main technical insight in the work and now I'll just show you lots and lots of ways in which this makes various problems easy to solve. So there's one problem which is that we want to walk at different speeds okay and this is what we did we did not have to pre-program anything we said to the robot walk at 0.375 meters per second and then this is what it did okay this is the gate that emerged and then we said so what you see at the bottom are the four feet and then we said walk at a speed of 0.9 meters per second but notice that these are quite different gates but we did not program them we just said to the robot walk at a speed okay and it turns out that there's an explanation for this which people in biomechanics have done the hypothesis is that an animal always tries to move at efficient to minimize energy because if you're a tiger or a deer it's for both of you it's important that you use your energy more efficiently and at slow speeds for humans this is also true for humans at slow speeds running is efficient at high speeds running is efficient for horses you have these gallop trot canter etc and people have these curves which show when they are efficient and we get exactly the same behavior with a robot and the beauty is that this just emerges you didn't have to do anything and you just said okay and now using RMA these can be just it works in different settings okay so this is blind robot and we showed that it could go at any speed different gates emerge all that was nice and then I've spent most of my career in vision and then here I'm working on blind robot so there's some disconnect here so the question is do we need vision to walk clearly blind people can walk so therefore we have to start there but there are settings where vision is helpful if you have to go across a river and stepping stones then you probably want you probably want setting so this is a setting where we developed the role of vision again I'm going to go fast here but a traditional approach would be that you build up a terrain by combining data from multiple views and that turns out to be very noisy and a much better strategy is that you go directly from vision to control you don't have to first build a map of the terrain for robotics people there's this approach called SLAM by which you take multiple views and you build a map and I'm saying it's not necessary you want to go from vision to control and the philosophy I showed earlier works that in a simulator you can have privileged information you know the terrain the robot knows in fact in advance what the terrain coming up will be because it's a simulator and so you start with that okay so you strain it to work with that kind of knowledge that it will never have in the real world but then what you do is that with a camera with an imagine that you have some kind of camera on the head of the robot which sees this RGBD depth this egocentric depth and then you try to use that to approximate what this privileged information would be and again we capture it in a small number of dimensions so we had previously this variable Z now you have a variable gamma and so it's the same philosophy but now for estimating the terrain you're reducing it to a small number of variables and let's see how it works and now you can deploy it in the real world and I'll show you some demos here it does not know the terrain there is no nothing, everything is on board on board sensing, on board compute so then we have done some experiments on bipedal robots but okay wait there's something here where my display has got thank you okay I think I have okay so you got the flavor but I'll show you some other examples of the same idea so this is here we have a robot doing a manipulation task and these are different kinds of objects of different size and different shape and basically the same idea works so many people design controllers but they'll design it for a specific object in a specific shape here what is happening is exactly okay so this shows what's the various conditions so we have very light objects like a shuttle cock and very heavy objects and it all works because of the same philosophy so we first train in simulation where we allow for a lot of variation and then we have this variable Z which captures some parameters of the condition which may be in this case they correspond to different size and shape of the object and then we can deploy it in the real world I want to conclude with some general remarks about how we should think about learning broadly and I like to think of it with children's development as an example so children and this is an idea actually it goes back to Turing Turing in his paper the famous 1950 paper said that instead of trying to build an adult brain we should try to build a child's brain and then subject it to a program of education or learning and children have this very rich process of learning they are constantly experimenting with the world they are looking at they take an object, put it in the mouth they have the touch signal they have the vision signal they have auditory signal they are proceeding stages and psychologists have characterized this actually professor Linda Smith who is at Indiana University be multimodal, incremental, physical explore and then finally use language and these ideas are equally relevant for us in computer vision and robotics and I'll conclude with this last example which is learning visual locomotion with cross-modal supervision so we decided can we train a robot to learn to walk in the real world and it has only an RGB camera it does not have RGBD and what we did was we said it's going to start out blind so this is how a blind robot walks and do you see what behavior it discovered it's very similar to a blind person using a stick to poke at obstacles now the problem that I pose to you is how do you make use of vision so now I put a camera so it's got a camera and with a camera obviously it walks better but I want to train in the real world how to use the camera and it turns out there's a simple idea which you can use imagine think of this robot in the steps so you can the robot can at this point calculate the depth of every leg because it knows all its joint angles and so on so using proprioception its own joint angles and so on it knows the height of the ground underneath each of its four feet that's easy for it to calculate with its internal knowledge so what we do is with vision so proprioception tells me all the depth when I get there and vision is giving me images so what I do is I tell my vision system predict 1.5 seconds in advance what will be the depth when I get there okay so it's a self supervised training system I have both the training signals I'm going to train my vision system with what my proprioception will sense 1.5 seconds later and that's it and that gives you a totally self supervised system and the beauty is that with this we got we were able to train a system which we learned from day to day and every day its performance went up and you can now do funny things like you can mess with a camera in real time we took this camera and what Antonio is going to do is to modify he's going to rotate the camera this is if you do this in a robotics lab your colleagues will hate you because you have messed up the result of calibration for okay and what will happen is that initially the robot will stumble okay but it's collecting data it's collecting visual data and it's collecting proprioception data and in due course of time it will just start to so let me show you this this is what happens before adaptation and this is what happens after one minute of training so I believe that this is the future that we should learn we should have robots which are constantly adapting and we have a lot to learn from the strategies used in biology thank you very much Professor Malik for a very thought provoking and fascinating lecture now we're open to questions from audience louder please hi professor thank you for the talk I actually have two questions so typically when we do neural networks we have to train it using back propagation right now with our reinforced learning model we don't have such an expression to tell oh if I change this particular weight I know exactly how the weight changes so how do we actually train it it's just pure trial error by are we just yes so you have other kinds of algorithms so it is harder then with supervised learning so here you have to use algorithms which are called policy gradient and what they are trying to do is that there will be some trials which are successful trials and some trials which are not so successful where success means more reward versus less reward and you try to make the network which predicts an action give more of the actions which lead to good rewards and this is called a policy gradient so the gradient is on the policy and there are optimization algorithms called PPO and TRPO which enables you to solve that problem it does take longer I see so the way we compute the gradient is to evaluate it in the real world. Thank you for the fantastic talk so I have a question about providing guarantees for stability or robustness in the theoretical sense so I think the generalization capabilities of learning based approaches are beyond what model based approaches can achieve for local motion control but do you think providing theoretical guarantees about stability is important if it is important then what are your thoughts about approaching that problem. So the question is about theoretical guarantees it's a fair question and I think this is actually an important I think of it as a philosophical question which engineers have confronted for years or decades or centuries really often times we have courses in practical engineering which are not yet theoretically well founded so people in Europe were building these cathedrals in 1400 and 1300 without the benefit of Galileo and Newton and once in a while they did fall it's actually not true that they always succeeded but they could build them the other example which more current is steam engines versus the science of thermodynamics. Now after steam engines were actually being used in practice so I think of this somewhat similar sometimes theory is ahead of practice and sometimes theory is behind practice and what we have got right now in the advent of neural networks and deep learning is that practice is running way ahead of theory but it is still desirable to develop that theory and it will come along but I as practitioner don't want to stop practice until I have theory because I want to do all these cool things so in the meanwhile I'm going to have tests which are experimental in nature a bit similar to what doctors do so in medicine they don't understand the system well what do they do they do these control trials there will be a set of patients who get a drug and a similar set of patients who doesn't get a drug and if you see that the drug does well we give it to more people even though we don't necessarily theoretically understand the mechanism of how the drug functions so I am advocating that kind of an experimental approach for now while in parallel people pursue developing the theory yeah over there Thank you professor for the wonderful talk so I had a question regarding the policy the reinforcement learning in particular and how you propose to transfer what it has learned in one environment to another because you train a policy on a particular set of problem a particular set of environment and if you change the environment all together will the same policy work in a different environment okay so the answer is that I don't train in just one environment I train in many different environments which are all in simulation so in simulation I have a lot of variation so it's captured by this variable Z which is the extrinsic I train in situations which are equivalent to walking on flat ground situations which are equivalent to walking in sand and so on so it's trained in a very very wide variety of situations and the goal is that when you are walking in the real world it is kind of like being in one of the situations I had seen in simulation and I need to know which one though and that is through this variable Z which I estimate in the real world that's a spirit so I do need to have a lot of variety in training if there is some set of conditions which are not encountered in training but then I encounter them in the real world and they are not like anything I had seen in training then I'm in trouble so is it just one policy or do you no so that's why the policy has an extra argument which is which I called Z it's like a function if you have a function of one variable F of X and then I can have different functions F of X, G of X, H of X or I could have that same function F but have an extra argument F of X, Y so that extra argument I'm making the function F, G, H, etc that's what I'm doing we have a question from online I'm interested in the self supervised training strategy you mentioned I'm wondering is there a memory like concept related to the self supervised vision system for example after we rotate the camera the robot will learn to walk under the new camera configuration soon but will it forget how to walk under the previous camera configuration and if true how long does this take to happen yes it does forget because what it has got is the new set of the new connection between the proprioception and the vision system and so it forgets the old one but then what I can do is I can fix it back and then it will recover so it's always learning and by the way this is we did this experiment based on what has been done for humans so humans there's an experiment called the prism adaptation experiment so I have my glasses and I put a prism in front of one of my in my eye so what's happening is all the rays are now bent by say 10 degrees so if I try to reach an object I'm going to miss it but if I keep trying to reach it in 10 minutes I'm able to do it well and now if you remove the prism I again try to reach it and I miss it again right and then but after a few minutes I recover so there are these experiments that psychologists have done like you can take your pair of glasses which invert the world so everything is being seen upside down it's horrible initially you can't do it but people have shown that after that you can in fact learn to ride a bike okay so our system is very adaptive but then if you remove those glasses again you are in trouble and you have to go back hello thank you for the beautiful talk and it's very inspiring I have one question about the Z parameter that you learned which is the result of the learning right but as said I have a kind of do you have a correlations it's not a physical parameter Z is actually some of the states that you learn which is affected by that it's not a single parameter but it is I think of it as a compression of like maybe 15 parameters the 15 parameters which capture aspects of the terrain and those are being compressed down to say some 5 which is some kind of mixture and so this can be compared to what in control theory is called systems ID systems identification then they would try to estimate all those 15 parameters here I am saying I am going to estimate only the smaller number but the smaller number is good enough and that compression I actually also learn so do you have a model for given the physical parameter to map it to Z such that you have a we have this black box neural network now I conjecture that it is capturing physical invariance so I'll give an example from say fluid mechanics so in fluid mechanics we have these numbers like Reynolds number Froud number like I forgot in my fluid mechanics there must be people in the audience who remember those things Prandtl number right these kinds of things well those are dimensionless constants which capture certain ratios and those are the important variables and that's why we can do experiments in wind tunnels and then and then in the real world and the wind tunnel will be much smaller scale and so I am where is so I am conjecturing that these these parameters which I learned they correspond to those kinds of dimensionless constants which are the most important for the physics of the system that's the last question I have is the adaptive control I think traditionally it's not really it's not really fast but basically you're doing the adaptive control too right but using the neural network compared to just model base type of thing is there any kind of comparative study of the convergence that your method can know I have no convergence proofs I I mean I think we it's of course you know adaptive control was developed in the 1960s and the computers of 1960s and the computers of today are very different so I can do a lot more I mean I have computers compared to what which are much faster so that's why I can do much more now hi very wonderful talk professor so I have one question regarding the parameter Z so if you have like one policy which learns to control in diverse situations let's say walking on the beach versus walking on hard surface will there come an interference in learning Zs from different diverse conditions and what would be the steps yeah no so the network has enough capacity that in a sense it has kept all those policies in the same network and it's like an indexing thing it's choosing which one to do based on the Z so it's basically like a vector and you just do some search across that vector yeah I mean it's a neural network but I mean you can think of it more metaphorically that you are indexing the policy by Z but it's actually all stored together thank you one online yes another online question how well do you speculate will such RL based systems perform when adapting to dynamically changing environments well that is what our hope is because the environments that I showed were dynamically adapting and the answer is you always need some time and we call our approach rapid motor adaptation and our goal is or the systems that we have shown are on the order of a fraction of a second like half a second and for the system that I'm talking about like walking where you will fall down right I think half a second is good enough to recover there may be settings where half a second is not good enough and so I can't make a general guarantee that this will work that usually physical conditions don't change so drastically so immediately so my claim is that 0.2 0.5 second is good enough and that's what we see in our examples thank you very much time is up I have to cut it here if you have more questions we have a more interactive panel session this afternoon and you're welcome to come back join the panel session for this morning lecture this will conclude here thank you professor again for this fantastic lecture thank you