 little microphones there. It's so dangerous, so easy to destroy. Okay, thank you everybody. Before we start, let me just update you on the program. So, as you may have noticed, there's been a shift. So, after the coffee break, we will have the hands-on tutorial by David Altman on Ray Foster Learning. It will be a review of the basic theory and some hands-on exercise. The second tutorial on deeper enforcement learning has been moved to Thursday afternoon. So, that's going to be the last event of the day for Thursday. Now, let me introduce Professor Jan Peters. We've delivered three lectures on robotics learning. As you may know, it's one of the fields of applied machine learning that has been experiencing a very rapid, impressive development in the last few months, I would say, even so. And then, let me just leave the photo. Okay, all righty. Sorry for all the confusion this morning. Just I couldn't be at the same time at the airport in here, I guess. So, I'll be talking today about robot learning and obviously, this is all your dream. What you see here is basically one of the reasons why any of us got into computer science, right? Hollywood movies. Basically, in there, we had all the dreams which a future robot should be capable of doing. It should be able to do the chores we don't like, like these trash cans. It should be able to take care of our dog sitting. And most importantly, it should be our friend. Now, obviously, you directly recognize how hard this is going to be, right? I mean, think quickly about all the uncertainty in this environment and the, well, safety concerns for that little kid. And finally, well, the programming complexity of this particular task. So, the question is, for us in robotics, can we create humanoid robots like that? And I'll start a little bit historically, since basically, it's been now nearly a century since the idea of the, since the word robot actually came up. And since the first movies showed a humanoid robot, and well, since the 1960s, we already have industrial robot arms, like Ann Geller Berger's Unimate. Finally, since the 70s, everybody is convinced thanks to Star Wars. But it really took up to the 1990s that we had the first full humanoid robots, which actually would look, well, which would look sufficiently interesting for more complex tasks outside of, outside of factory environments. Let's have a little look at what's out there in the zoo before we get into the whole learning thing so that I can persuade you that this is a good task to follow. So what you see here is obviously a German robot. It's upright. It's very squared. And yes, this was built by the University of Karlsruhe, or these days called KIT. Now, this year is actually not a humanoid. It's a centaur. So we are thinking about a horse with a human torso already. Then you have here two things which look exactly like astronauts, but that's actually was a, this is a very, this used to be one of the nicest robots you could actually buy for just 150 K dollars. And it's not an astronaut, both of them are not astronauts, but the Azimo, the one built by Honda and well officially terminated, unofficially still being rebranded into a new project. Well, that's been on the news for quite a while. You've probably seen it play soccer with Obama. Then there's way more in Japan. The Kawada Industries ones are built pretty much like the Azimo, but they move a little bit nicer. They are, well, can you imagine how much they cost? The country of France rented two of them. Any idea how much they would per year? Oh, sorry. I thought you were, okay. Well, about half a million per year. So pretty expensive. This is not a robot, by the way. Finally, well, here you see two historically famous examples, the robot Jack from ETL and the robot cock at MIT. One should say cock actually never really worked. It obviously required two PhD students with the soldering irons in order for it to, in order to execute even a single experiment. Then we are getting actually to the level now where people start to reproduce themselves. So this year is one of these two is Professor Ishii Guru, who managed to, well, recreate himself in such a way that he can teleoperate his humanoid and have this humanoid give his lectures for him. Now, I can't tell which of the two is Professor Ishii Guru, but I can also not tell who's the scarier looking dude of the two. Finally here, a famous German robot that's Justin at DLR, one of the first really professionally built half torsos. And here's probably the most frequently built humanoid from a research lab, the ICAP, actually made not so far from here, here in Italy in general. Then we personally, I like the, why that is? I like the Sarcos robots a lot, which have been really quite some of the most pioneering robots. They're hydraulic and they're very, very nice first generation. And well, I had the pleasure of working with them in the past. Most impressive is maybe what you see coming out of the company Boston Dynamics. Now Boston Dynamics brings you all of these little creatures like you, you see the spot. By the way, this will soon be available commercially. They cost about 100K. So from a robotics perspective, very cheap for Quattropad. And they're amazingly impressive. But you've got to be very careful. The company Boston Dynamics is of course in the business of turning military money into nice videos. And that's a very good business model. And I think it's good for the world too if the US fights fewer wars and spends more on videos. But generically, well, this is the one which you can hopefully buy soon. And generically, this means we are basically seeing an emerging technology. Let's skip through these more boring ones. And this one I liked a lot too, where you basically see, well, robot on wheels as wheels on the bottom of legs. And well, balancing is again a task classical engineers can do quite well. And it's, well, you see that this is actually doing pretty amazing thing, acrobatics. Again, let's fast forward a bit through this. And then here you see Wildcat. Now this is a Quattropad which actually gets, I think, close to, I think it was close to 92 kilometers per hour. Oh, sorry, yes, 32. Damn it, where did I get the 92? Okay, 32 kilometers per hour. So it's pretty rapid speeds. You could be, well, you should be a little bit scared of that already. If you're rounded, you will nevertheless notice it's really, really noisy and it gazels actually gallons. So that's four liters per mile. And not, it doesn't do miles per gallon. So it won't get very far with the regular tank, like of water. It's pretty robust already what they do. And I mean, mainly they do trotting gates. So as these are the most stably built, stable ones that you can build among the gates. And to some extent, all this looks like, well, two men in tights, right? Holding a block together. I lost my mouse again. There it is. So now let's look at the one part which has actually only been very recently fully released. So this is all pretty much from this, I think from this year from Boston Dynamics. You see things like summer salting, or again, a bit. The lifting part is maybe not that in. How much time do I have here? This is actually, I think, where's the core part? In that case, I have to quickly add one to show you one more video. I thought this was in there. That is embarrassing. But that's a video which just shows you at what state we currently are in robotics, the one which I wanted to still show you. Now you'll notice already this jogging-like gate. And now this year is the most impressive. We have so far seen in terms of humanoid robot behavior. And in terms of robots being built up to now. Now that, obviously, well, I think we all safely agree this qualifies as amazing, right? We've now seen several hundreds of different robots at different levels being designed. And nevertheless, we're far away from this dream, right? This dream of having the iRobot at our home. And when you look at it, well, you really start to recognize that for all of these Boston Dynamics demos, for example, we've had some really smart engineers. In fact, Boston Dynamics hired some of the smartest professors and managed to tell them to give up their jobs and join Boston Dynamics so that they would get, well, that they would engineer these robots, engineer the according to the trajectories, and really get all of this to work. And this is a far cry from the humanoid we actually want to have. The one which can do many different tasks at our home, in our hospitals, in our rehabilitation institutions, you name it. Even the co-bot working in industry together with you is a far cry. It requires substantially more programming, which is in robotics much, much more work because you in the end need to formalize it. You need to try it on a real system. Trying it in simulation is never enough. And really, when you look at this, you would think, despite all of these amazing robots, we're actually kind of stuck. So what should we do? Well, I propose, I propose many people, other people have done that before me, is that learning is the only way to incorporate autonomous robots, bring them out of the research labs, away from the factory floors, into everybody's life. And I think by now it has kind of arrived with the classical robotics community. In fact, even the old, outgoing big shots seem to be recognizing this. So Asama Khatib at some point said to me, I've always said that the time for robot learning would come later, but analytical robotics has barely moved for 10 years. The time for learning is now. And well, John Hallabach at some point put it the way, robot learning is the single most important problem in robotics. And I think this is really the case. Yes, sir. They don't have any learning, nothing. That's usually, it's a pre-programmed trajectory which they're following, and they have a stabilizing controller around it. In some cases, this pre-programmed trajectory is a little bit deformed. They need a little bit of re-planning, but there's absolutely no learning. And this slide, even when the parkour case, this was very, very well done, well prepared. So the state of art in robotics is very much at smart engineers sits down, models the task up to 150 micrometers, then comes up with a really good controller about corresponding trajectory. Boss dynamics is already a bit better because these are really robust as we build control laws, but nevertheless, they're really centered around this one task of executing a gate, which is again just one type of trajectory generation. You're welcome. Yes, sir. No, of these robots, nearly everything has not been teleoperated. I mean, I think Ishiguro teleoperates as a robot, but so the crazy Japanese professor. Everybody, everything else here is, well, I'm not sure whether I should call it autonomous, but it's program controlled. Yes, program controlled, I would call it. And in fact, we've started bringing learning into a variety of areas. It's not like just because you've seen all these amazing successes now in these amazing robots, it's not like there hasn't been learning. In fact, we've been learning basic motor skills. We've been learning in object manipulation. There is a lot of learning happening at the, at the revitalization level, as well as at the prosthetics level and airplane control. Well, people have learned pretty heavy helicopter acrobatics and well, things like SLAM has been a classic, so localization mapping is a classic learning application. So is robot locker motion. And we all want to bring it, of course, to this guy. So let's have a look at what has been done so far in learning. So this is a task from my lab about eight or nine years ago. We took the robot by the hand and we showed it this ball on a string ball bouncing scenario. And what you see here is, well, what people who study human motor control call kinesthetic teaching. And the robot actually managed to do this out of the box. This was actually just one afternoon of an experiment, but we nevertheless got to write a paper about it. How does this compare to classical programming? I had a really smart student who sat down and we tried for six months and we did the imitation experiment. He sat down for six months and for six months he tried to program this behavior. And the robot always hit the ball exactly two or three times and then it would just miss it and it would never recover. Despite then it in a simulator, it always looked perfect. The reason behind this is, of course, that there are sufficiently many, well, hard to model non-linearities. When you have a slightly elastic but not fully elastic string, when you have a paddle which has rubber on it, and these do not always behave the same way, it's very, very hard to get a good behavior. You're running if you're just doing this smart human within a simulator and it doesn't transfer a ball to the robot. It has four cameras seeing the ball and it has proprioceptive sensing through the joint angles through which it gets joint angles and joint velocities. There are classical industrial cameras, so you would not use an RGB camera here because you want to have this relatively fast and RGB, if a table tennis ball would move through the picture you would probably not even see it. And since you need to open the lens only very, very quickly and close it again and you also want to have a higher sampling rate, so we use industrial BGA cameras which have about 200 hertz here. Now, a task which actually preceded us, but it's actually somewhat easier is this one here. What you see here is a humanoid robot, Jackling. Now, why is Jackling easy? Jackling actually has it that you only have to hit the right rhythm and if you're always at the right location and you really have enough to throw off the ball off, it's something you can reproduce again by imitation learning amazingly well with an open loop trajectory. But of course you have to learn this trajectory first and you have to be able to do the accelerations which you require in order to execute it. So it's actually not trivial at all, but you don't need the ball observations. In fact, for humans you can show that humans juggle best if they have learned not to look at the ball when you put a screen in front of them but they only see the balls when they come by at the top so that they can lock to the synchrony of their Jackling behavior. If they train this, then they get the Jackling thing right after a while. Then you can learn things where you do start by imitation but subsequently go for reinforcement learning. So here, again, we take the robot by the hand and we show the robot the ball and the cup behavior and well, after imitation it fails and now it gets a little reward based on, well, proximity of ball and cup and well, there's already 15 trials, 25 trials, gets more reward still not quite there. After 40-something trials it can actually occasionally get the ball into the cup but what you see here, it just hit the rim becomes even better in a moment or not, actually not but it actually becomes perfect after about 90 trials and you can have it executed for days and days and days and it will perfectly get the ball into the cup. This is compared to humans. Well, first of all, humans you need to use chocolate as rewards and humans cheat but on the humans we try it so one of my students created a human set it's not a data set, a human set out of family members and the six to eight year olds did not manage to learn it at all the eight to ten year old, sorry, the ten to twelve year olds take about 30 trials, 30 to 35 trials so slightly better than the algorithm and the grown-ups take three, four trials so it seems to be only me who took three months to learn it. So, let's continue. You can now also take, well, learn more complex things with actually more different behaviors so you can, for example, learn something like taking a robot by the arm and showing it different forehands by mutation learning it could extract from this that there are 25 basic behaviors and these basic behaviors, well, all give you different responses to even the same or to different incoming balls and you can subsequently, well, use these behaviors to generalize and by re-weighted combination, for example of these different behaviors and already imitation learning gives you pretty good answers since you get already something close to between 60 and 70% success rate against a ball gun that is not too, today I'm cursed don't get too close to me I think maybe dangerous so then you notice, well, imitation learning is not enough you'll always find regions where the robot does not successfully return so you aim the ball gun at these you do some reinforcement learning robot self-improves and the robot, well, after some trials it'll actually get for that particular region to success rate which is substantially higher in this case it moved from 0% to about 80% all over against the ball gun you can actually get it that far that it can do 97% return 97% of all balls correctly finally you see the robot playing against its teacher here and I should highlight Katarina learned table tennis for her PhD as she's very proud to say she's a computer scientist and as an advisor I always say well she taught the robot to become about as good as she is then you can, well, again by a mixture by in this case a mixture of imitation and reinforcement learning you can actually teach air hockey to a robot and, well, these... this humanoid managed to actually learn this task despite that actually only an arm was involved and then finally... they also innovate when you train them for reinforcement learning so this actually happens occasionally so we've had this in grasping in particular that we would teach a couple of grasps and then we would see totally different grasps emerging this year the grasps no human would ever teach a robot but most robot hands are very awkward hands and so many robots very quickly learn... remove the thumb, it can only break things and then holds the hand... holds the abject in this case what also happens quite frequently is that when we break a finger in a hand you will suddenly see new grasps emerging you will actually not understand why until much later when you try to repair the hand actually when you first realize that there is something broken and then repair the hand and then you will recognize that, well, some joint was broken and for that reason the learning agent becomes creative and that's what we've both observed obviously we're searching in a very, very high-dimensional state action space but in other words it's not that likely that we find completely new paths in it that's obviously the holy grail of reinforcement learning to find the right paths, new paths through the state action space for a very, very high-dimensional system we always talk about continuous systems here so not about... many people talk about high-dimensional discrete systems but sorry, when you discretize a continuous system that's high-dimensional and you really don't want to do that yes, sir and this is a good question, so this is the sim-to-real question how much can I just break... no, I hope not okay, notes so, okay, it doesn't seem to be broken, I hope if so, I'm sorry scary so there's a big general question about simulation and robotics because we all know how to build a graphically beautiful simulator and we all know they're wrong there are some parts you can simulate quite well for example, one of the reasons why you see in Boston Dynamics usually things are in flight phases and they only have instantaneous contact that is a very good reason that is during flight phase a rigid body system can be simulated very well however, when a system makes contact then it gets really difficult that's for example the reason why Boston Dynamics stuff usually has point feet so that you can only give it a very short contact which you can simulate badly but at least much better so it's very rare that you can train something in simulation and directly transfer it in most cases we actually need learning on the real robot there's even something much worse to it that's a question we've only started to understand very, very recently and that is that when you are using a simulator in order to optimize behavior you're prone to something which is called the optimization bias which basically means when your simulator has a single error and this error allows you to get more energy, more reward, more you name it it'll directly jump to it and this is a very dangerous one I did not know about it, knew about it when I started my lab when I started my lab and I tried to learn a relatively simple task it was like just a figure eight task which was supposed to get a lot of reward look beautiful in simulator and I turned it on the robot and the first thing it did on the robot was bang so what had happened? well the simulator which I had actually learned from data in this case had a slight error on which is only in a slight small region due to regularization the sign of friction was somewhat wrong with the result that it thought hey I can actually get an energy pump here and so totally went burst into this optimization bias and went for getting into this region and oscillating in it real fast so they would get all the energy it would later need and instead of well staying in this region obviously this year and well broke its wrist and my first job when I just became a group leader was to go to my boss and say hey you bought me this nice little robot and I'm sorry but you actually need to pay for the first repair in my first week too but that's robotics to you you actually have to deal with real systems and these break and there are sometimes there is a fairly dirty component there too more questions? alrighty then let me show you one more video before I'm through with the pep talk this year it got well went all over the net about a couple years back but it's really very beautiful this little robot has actually learned to traverse the terrain and for that you need to put in a perspective this robot is the little brother to the big it's also made by Boston Dynamics but it's really the little brother to all of the other ones because it's really stiff and it's really has too few degrees of freedom it doesn't have a droughted actuation it's all of the Boston Dynamics things but has gearboxes in there so it doesn't have the speeds and force generation but in order and in the first people have been having competitions with this for several years actually for more than a decade and the first competitions on that with it looked like robot sits down, robot rolls over robot has a leg broken reality behind it is it's actually really hard to move when you have just three degrees of freedom in all of your legs and well what you see here is well what a robot can do when it has learned good footholds when it has learned in special movements for extreme terrain and even with the robots which are not so beautifully built as the hydraulic machines you've seen before you can actually get amazingly far as well these videos show you recovery behaviors I'll let it play out for the fun of it unless let me quickly check yeah I think that's fine to let it play out this is how the normal robotics control versus a learned controller look like this is a good point to stop so what I want to take you on in the next three days is a very fast journey through robot learning and if I'm going too fast to slow me down if you think it's all boring and you want me to speed up let me know as well I will start today still by the topic of model learning and if you have enough time we may even get into the first part of reinforcement learning for robotics subsequently on Wednesday on Tuesday I will continue with reinforcement learning and on Wednesday I'm planning to go into imitation learning which is well these three types of learning are really the core topics of robot learning as robot learners we are an emerging field we're not yet as far as like let's say machine learning on datasets we're not even that far as reinforcement learning on simulators simply because we are obviously held back by hardware but I think the explosion is coming now and since up to now when you look at classical robotics robots had usually exactly one task and what we see now thanks to learning is that it becomes more than one and that the number of tasks will now grow from now on pretty much rapidly and you could really underline this by the statement of the CEO of KUKA who said well up to now robots had to do one task million of times the robots in the foreseeable future will have to do millions of tasks just several times just already for production and now imagine what this means for the whole rest of things now three core technologies we do need for this one is learning models then using these models for reinforcement learning in the sense of optimal control on learned models then on well what you can do with value function methods and then we'll look into a policy search and then we look into two ways of imitation learning the first way of imitation learning which we call behavioral cloning is basically trying to mimic or copy the teacher in the second way called inverse reinforcement learning we try to recover the reward function of the teacher or some surrogate of it and from it try to obtain behavior questions yes so these days you could use the same learning framework so the same algorithmic framework around it but you would actually have a hard time gaining anything out of air hockey we at some point tried to go from ping pong to badminton which to us seemed to be two relatively similar sports it turned out that we were totally wrong there with the framework we could actually learn both but we hadn't realized that in badminton you needed to do a much stronger hit in order to get the right acceleration of this feather ball of the shuttle cork then you would need in order to accelerate the ping pong ball which deforms and actually can restore energy much better so in a way you need to when it comes to robotics you need to be prepared to always be surprised one other example we tried to throw hammers to the ceiling turned out to be really really difficult since what we humans do out of our wrist just with these wonderful actuators called muscles which are kind of catapult-like we can store energy and we can release it at will and we can release it at really high accelerations while robots at the moment have electrical motors or hydraulic actuation which just has a much slower built up of force and cannot cope with similar, cannot yield similar behavior what's the most better if you have these self-robots so the ones without the actual so if you have less weight, less inertia all this works better because you can accelerate better that's one thing you can do another thing which you can do is obviously to build actuators which are variable stiffness where you could now have several springs and kind of tighten these springs and then release one of them and you suddenly get a really really big response so in a way a mixture of different actuation and less weight will solve a lot of problems I mean just like when you think about it the wrist of the arms you see in which these barred arms is about two kilograms if I would cut off your arm, your arm would be two kilograms just to give you a feeling of most of our actuation actually sits in our base and that has very practical reasons because it's actually the same insight which Boston Dynamics uses in Boston Dynamics they make the body of the robot very very heavy so that they can control the legs of these robots with classical methods while the body obviously is then non-linear but they keep it nearly in the same regime so it doesn't matter that much so the hardware unfortunately matters in robotics much more than we are willing to think so now let me switch to this second slide set and let's start with learning a model so in a way maybe to give you a bit of a big picture why model learning and what are the core differences between the different lectures generically when you're interacting with a robot you can get this different data sense one data set is of course really really useful it's the one where we get a state of the robot like joint angles, velocities we somehow have actions which in the worst case on the actual torques we send to the motors or the muscle activations we would send to muscles in slightly better cases this could also be well for example the acceleration which we could command the system to do but that requires of course that we have a low level control policy and then what we get is basically the next state and the first thing which we are generically lacking of course is a good model well what does a good model do? it predicts the next state as a function of state and action now this model will have many purposes and one purpose is of course to learn a policy again a basic policy could be now policy e-learning can come with a well two kinds of data sets one kind of data set looks like this one before where we have well just state action next state in which case we can do exactly one thing and that is that we can do imitation learning now if we augment our data set by one more variable by an immediate cost or reward if you are a more positive thinking person let's actually make this in orange then by doing the orange one well you could also get a policy but you would get it by reinforcement learning and as I've told you already before my goal for today is that we get into model learning now model for us in robotics has actually more connotations than for than for people who work in classical reinforcement learning on simulators because we can actually do with a model already some things well which are useful for us which like for example Boston Dynamics will sit down and manually actually look up table telephone book work wise they very frequently do this compute an inverse model of their body now for us that is one instance of model learning in fact the classical rabot hoppers this was kind of a pogo stick one legged version of the Boston Dynamics makes robots which were made by Mark Rabert before he founded Boston Dynamics when he was still a professor at MIT was completely built on using a look up table representation of all the computations you needed for as an inverse model of the body so clearly models can be super useful well for by themselves as you will see them as well you always need well by themselves already knowing where your finger is given your joint angles is a super important question then you can use inverse models for control directly and well finally as simulator now again it may happen what happened to me if you do use them as simulators you will break your robot but nevertheless a learned simulator is frequently much better than a simulator which you can program by hand because there is just so much in the real world which we don't understand even at the physical level we always claim physics is solved which is only true for a freely floating object which doesn't make contact with the environment it has no friction immediately when you go away from that you are losing physics as a helper and friend it doesn't mean that physics doesn't remain useful but it is no longer the solution to everything yes this this is a very good point this depends so if you now this is a really interesting question maybe I should put this in a slide in a tree so you can have the data set without the reward so let's make this the first data set and you could have the data set with the reward I'll just put a not reward here when you have the one without the reward you have two roads one road is well you directly use supervised learning in which and directly get a policy in the other case you need to plug in a model or you need to learn a model from this data set first then use this model in order to solve then a reinforcement learning problem which comes along so you need to reconstruct the reward and only from that you actually get a policy and this here is of course a loop so this is the kind of learning you mean but you don't always for imitation need to understand what the other person is doing just take the take how a kid frequently learns things directly after you're born and I just repeated this experience with my twins a year ago many kids are capable of directly imitating doing like an hour or two later the first behavior and it's really like you make a funny face and they'll make a funny face it doesn't happen for every kid so don't be disappointed if it doesn't happen for the one you're trying on it only worked for one of my twins it's a very classical psychologist experiment and in this case I don't think the kid has any model about the parent which it uses in fact it will just plainly copy well if you have a model well this is the only way of how you can get a surrogate like a reward like surrogate out of your data without having a reward I should also highlight when you have a reward there's actually three strengths that you have in the first string you would first obtain a model then obtain what we call a value function and then from this obtain the policy in the second this year we call optimal control over the learned models and this will be I'll just call this RL1 here since this will be the first part of our lecture this is by the way EL imitation learning 2 since this is inverse reinforcement learning maybe I should inverse RL I'll give it proper names by the way you've got to tell me if you can't read things since I have terrible handwriting I know this so this year we call behavioral cloning this step down here is called optimal control when you first get a model then a value function then a policy if you follow the road of directly getting a value function without actually trying to get a model and then getting a policy from it well that's what we usually call value function methods and finally there's a whole area of things where you directly try to get the policy which we call policy search in a way these really differ in what space the core approximation happens since in optimal control the core approximation is to the model in value function methods the core step is the approximation to the value function in policy search well you're actually searching in policy space again this will be well I guess it will be tomorrow over this I got a hurry already or don't I I have until 4.15, right? okay am I you guys happy with me so far? so let's start now on this well particular step which is a key ingredient to two of our approaches the part of getting the model and it's basically the key application of supervised learning and robot learning and the key thing for us is well obviously when you're learning a model in robotics you're trying to model something which is all this true this is quite different from learning a policy or value function which obviously changes in the moment where you want to change your objective or your behavior well the model should be something like physics which ideally doesn't change unless you move to light speed and we can observe a lot of information and make much more efficient use of that if we learn a model now learning a model is basically nearly well it's nearly always well it can be easier than classical physical modeling try to do what you've learned in high school about mechanics and instead of doing it for the small linear example of the car which you learned in high school try to just do it to a 3 degrees of freedom arm you will see it's a medium scale nightmare to just compute things for 3 degrees of freedom we automatically do this obviously we have some nice mathematical scripts which automatically, symbolically compute physics for us but for 6 degrees of freedom arm you already get a telephone book of equations if you really want to print them out all as equations in their core form so it's quite hard to get a good physical model even at the computational level and well using learned models for a policy can be a super data efficient thing for us to do so part of this lecture is going to be we're going to start with an example and then I will show you different kinds of models and learning architectures which we use in robotics and then we're going to do 3 different case studies oops what happened to my third case study that's interesting 2 case studies actually kind of said let's take the power of robot lying to you let's start with an example now this year is a Mars rover you have all heard about it since by now it has actually landed on Mars and was a successful mission even stayed alive for much longer what you probably haven't learned about is that there was a heavy amount of learning involved in order to make this happen since this is a pretty difficult problem just think about this this is a teleoperated system it's a robot initially and you're totally right until you recognize that this is well 1.5 astronomical units which is 8 times 1.5 so 12 minutes away from us so you actually have to wait for 12 minutes to see what happened after you joystick the command and well even when you want to put all as obviously NASA wants to keep all intelligence on Earth since well sending a human being along is kind of costly in comparison to robotics well for that reason they have a guy with a joystick somewhere in Houston and this robot though has to act for 8 minutes and if you always have to give it 10 centimeters 8 minutes wait 10 centimeters 8 minutes wait that's not a very good mode so it needs to be autonomous for well sorry 12 minutes all together and this brings us to the two key problems well it can get stuck can hit a door go into a rock and well we have to cope with the delays of the human action so we have to really fill in for the human action so that the behavior remains good and well for that you really need good models and what people did immediately was that they created a sand dune which well actually no sorry this is the real sand dune on Mars this is the purgatory dune but you directly see how difficult it is to operate there because you well make it stuck in this sand right away so in this case we want to well learn a model and well how would you do this you would have to try to learn well in this case we were still in the classical world before deep learning and you would try to work obviously on stereo imagery and you would have an IMU which who of you doesn't know what an IMU is ok that case I gotta explain this to you so IMU basically measures the accelerations by having it you also get one acceleration obviously for three and that is the acceleration of gravity so the gravity vector we humans have it and it's quite crucial to all of our motor control it's the so called vestibular organ which sits in the middle of your ear if you would numb it you would actually not be able to walk properly ever again so quite crucial and we actually need this for well relative orientation to well the gravity vector for example or also well for training data now from stereo imagery we can get a lot of different features we can actually get some 2D map features we can get some appearance features but and from the IMU we can get the direction of the gravity vector to at the moment what we really want to have though are things like predicted slip will I just slide if I go into this direction or can I safely continue into this direction for example or will I get stuck and that obviously is a corpora and in there we have two sub variables which make a well big difference there and for that you could assume well a graphical now a graphical model that well slip well the terrain type it depends on both the appearance and the geometry and that slip again depends on the terrain type on the appearance features and the geometry features well and gravity was probably in to this so what do we actually have as inputs here well only these two what do we have as outputs what do we have well we would like to have for example this slip here are well terrain type oops so now let's see how this is technically well how this technically looks like well when you look through the eyes of the robot now still on earth you would see things like here this path well the sandy terrain with some brush on the sides this would be different camera pictures and for the IMU you need to first look at this one here since IMUs are not around since yesterday IMUs are actually quite important for fighter plane civilization for rockets in the past this is the kind of IMU they put into the intercontinental rockets it's in order to control how a rocket would well make it from the Soviet Union to the USA or from the USA to the Soviet Union and you could be accurate up to two kilometers as when you went once around the world with it just by accumulating the IMU which is a pretty thanks to the accumulation of IMU signals so how does such an IMU work it usually is several of these these gimbals nested in each other which then take up well take up accelerations today they're tiny they look like this year what do we get next well we obviously have to train terrain slope as a predictive model in here and then learn terrain type which on Mars can be some of them can be super useful like scent and soil some are totally useless like grass or asphalt or wood chip but and here you see that it does a fair would do a fairly good job so what would we all need in the end the simplification would be simplification would be that we do this by clustering in nearest neighbor for the terrain types then you treat the prediction of slip as a regression problem and well this was evaluated quite successfully here you see basically how well how well it does in terms of slip prediction sometimes there are outliers where the robot would have to adapt the behavior in order to have a fail safe but all over and you see that well if you know the terrain type it would be perfect there so this is really a terrain type mistake and well if the terrain type is known predictions are spot on now this system actually made it in I think a slightly modified form as far as I understand it to Mars which is pretty cool and it shows you that you have enormous power in robotics already by just doing things which from machine learning point of view seem to be well ultra simple in comparison to the kind of models we very frequently create in machine learning but obviously figuring them out and putting in the real world robotics understanding which you sometimes need in robotics well there is a lot of mileage in there too so now I want to take you into the next part of the journey that is that I want to make clear to you that we actually can learn many different kinds of models and that we have also to be very very careful in robotics since not everything we do is actually that straight forward regression problem for example as people thought initially so let's look at types of models and yes so you cannot access any form you get probably a GPS like position which they didn't have in the experiments which they published I think on the robot itself they probably have something relative to the satellites but I don't think it's very accurate so most likely they know very very little about the position gives you very bad signal so we used to do this when we did robot locomotion and you get I think we got about maximum something like 19 seconds where we had a position which was not somewhere on moon so the errors accumulate unfortunately in most cases unless you have an additional observation they accumulate very very fast so you only helps you when you have intermittently other kind of observations patients if you want to integrate them what does help those that the IMU usually gives you a gravity vector due to the since in the end today IMUs are implemented as a couple of springs that always gives you a gravity vector and that part is absolute and that is actually a really helpful one so now let's take you let me take you on to the journey are there more questions? let me take you on to the journey towards models now classical robotics nearly always works on continuous time for robot learning we actually prefer discrete time for the simple reason that we like to implement things on a computer there's a lot of theoretical advantages of continuous time because you don't get any discretization artifacts but practically we lose them by the overhead of having to work in continuous time with a learning system and by having to deal with a problem of integrating a system which again creates the new errors now there are four types of models which we have seen useful in robotics the first one is pretty obvious if you want to do something predict the future state that's a very classical problem I'm pretty sure that every one of you has tried at some point in their lives to predict the stock market I mean everybody who has done machine learning in his life has tried this at least once tried to get a data set figure out whether there isn't something inherent to it and whether you couldn't predict the stock market and nearly I think every one of us has failed a direct cable to the stock exchange where we could maybe predict the next couple of minutes quite successfully but definitely not the long-term trend so predicting such a future state well that is what we call learning a forward model now we though have a system which we can change and this gives us a slightly different problem too it gives us the idea of learning an inverse model an inverse model for the stock market applied is how much money would you have to actually invest in order to change the stock market and that's obviously for us a much, much more important quantity so we would like to know well how much force do I have to push so that this monitor falls over I won't do it, don't worry but that's obviously important and then it turns out that what you will see in a moment that inverse models are not that trivial but forward models nearly always are making us happy in one form or another inverse models don't always exist and if they exist sometimes you need to actually mix in forward models in order to get an inverse model and then finally there are things called these days multi step models in the old days we also called them operator models so basically what will happen if I take a long sequence of things and throw them into a system how many steps can I actually predict the head along with it in one compound prediction well these are the so called multi step models now the first one is easy right and predicting the next state given the current state and action plus some noise well obviously requires a data set consisting of state and action and with the labels being the outputs prediction is usually like learning a simulator since we can use it for long term prediction but we can also already use it for action generation and for the next state well we could search for the one action which minimizes the distance and to this desired next state but this would be obviously a costly process since we are searching in a relatively high dimensional space in real time now the next kind of model are inverse models and for inverse models we are directly getting trying to get to this result in the next state and a desired next state we would like to know the corresponding the useful action and that obviously requires a slightly different problem it's been now taking the sequence which we have so state and next state taking these two inputs and trying to predict from that the action which would lead to exactly this transition probably the one problem which has been around the longest in robot learning and is still every time there is a new machine learning method people directly try to write a new paper about it and manage to do somewhat better than well with the previous supervised is learning method in one respect on the other importantly this allows us to learn if you have such a model like a model of joint positions velocities and well the joint accelerations in this case this is in continuous time form well in this case you could just enter a PD controller for the desired and you would be able to well control your robot really really efficiently the big question of course is do these inverse models obviously exist well no not not all this if your system is an invertible function then yes they do exist but that's actually not the case since well look at basically my arm here now if my controller is just to go with my hand along a trajectory well there is a ton of things I could do for the rest of the degrees of freedom this is called redundancy and redundancy of course makes it that I have infinitely many solutions so clearly not every not everything can be done with an inverse model in one example where we can one example which we had already was inverse kinematics you saw this already with a redundancy inverse kinematics was actually a robot learning it was a big thing in robot learning in the last time the neural networks were hip again so in the second neural wave you could say where Michael Jordan actually wrote some of his key papers which made him back then famous about it with the digital teacher and similarly well we actually studied a lot of systems with hysteresis like actuate aerodynamics or friction usually has a hysteresis in this case well in both of these cases though in order to get something right you need to first use the last state and action pair to predict in what kind of regime you are in order to do subsequently choose an action to affect your future regime so if you have an hysteresis where in one direction or the other direction you would have well very different forces well predicting first along this path helps similarly if you had an inverse kinematics problem well knowing that you want to be close to a certain posture this also obviously helps so predicting well what happens next gives you this latent variable z which then again allows you to choose an inverse model which at least locally is right and interestingly locally we can always find some not necessarily unique but some inverse models which are actually useful now the next step of course is well what can we do if we had a sequence of actions and we wanted to throw wanted to know what they actually do well that brings us to the mixed models multi-step models or operator models and the mass rover is basically one of the best examples you actually want to learn a simulator of that system which you cannot simulate or model yourself anymore and in order to cope with the delays and well multi-step is really really useful for open loop control since if you would just take many single steps after each other and simulate them separately your error of your model would actually explode you can't actually tie it down that efficiently now this goes interesting power point these slides used to be in keynote before I started to teach a class together with a friend of mine and since then they have interesting new animations which I did not put in the first well one thing which is well in robotics is to decompose a problem in terms of two sub problems one problem is to cope with the inverse geometry of the body this we call inverse kinematics basically the question is well where if I want to have my finger somewhere like let's say here what kind of joint angles do I need the second big problem which we have in robotics is the one of well inverse dynamics that is well when I'm living as a physical system I need to create the right forces in order to really achieve the geometric behavior which I would like to have in kinematics kinematics we can actually solve by engineering if you want to since basically we can always measure joint angles very well but it's only the inverse kinematics step which is very expensive that's why it was such a big topic in the late 1980's early 1990's when for example the digital teacher method by Jordan was proposed dynamics on the other hand remains an inherently important problem especially dynamics with contact it's still something due to actuated dynamics due to friction due to soft bodies errors are normal the dynamics models of industrial robots have a 50% error just to give you a feel for that most of that is friction and actuated dynamics but some of it probably is also problems with the care data let me I could actually do this I'll skip these example problems oh damn it so learning these are the classical equations and you obviously recognize forward kinematics is always a function thus since your joint angles have exactly one geometric interpretation similarly if you take the derivatives now of the Jacobian in there this still remains a function of these variables so importantly you need both the q and the q dot in order to predict the velocity at your finger you need both joint velocity and joint position well it gets even more tedious when you want to predict the acceleration at your finger but you need both the acceleration at your joints the velocity at your joints and the position at your joints and well for inverse kinematics and you recognize this is not always a function on for dynamics this is some somewhat more hard up because in this case we are actually doing two steps most of the time when we are dealing with the system well we always pretend we get things on a nice perfect sampling rate there's actually uncertainty at when your signals are really from a little bit of stochasticity in your signals automatically when you're implementing things on a computer and so we normally well try to learn for that reason discrete time models that is not affected by the problems of integration but it really only works when we are trying to sample from the same time step the first kind of model which you try to learn is the inverse dynamics models but sometimes you even want to learn in operational space control models which tell us how to move with a finger along a reference trajectory and give us the right forces for that and again you can maybe recognize which of these two do you think is solvable by just supervised learning of the shelf and which is, what about the other one okay fine, who of you thinks we can solve this by supervised learning second election time who of you doesn't think we can solve this by supervised learning my god, now I know why Trump made it into the White House all of you guys didn't go for voting or whatever the current government of I don't know at least 15 countries at the moment where we are unhappy with this academics okay, well shall we try it one more time one more time making you guys sleepy, I have noticed that so okay who thinks this year is a supervised learning problem who thinks this isn't a supervised learning problem so the first group was right this year is a supervised learning problem because we have full we have the full joint state in which we are trying to predict what we are using here so we can actually learn an inverse model so let's ask the second group time who thinks this year is a supervised learning problem who thinks it isn't okay, second group is right this time there's only two people so my god I think even our politicians would be ashamed of that voter participation so why is this well we all notice we are dealing in task space so we implicitly in this dynamics model still have a kinematics model and this implicit kinematics model obviously is not a regression problem per se anymore now we of course want to do learning in real time and online and even for that there are slightly different architectures is in robot learning you would normally see in robot model learning then you would normally see in supervised learning the easiest of course is you just want to learn in this case an inverse model of the robot body well you would grab a signal at the entry of the robot given which is the you would grab the action and you would also grab at the encoder you would try to grab the encoders you would try to grab the state and you would try to learn a model now this sounds like trivial what do we do with this drug exchange all the time what about this one here now this should make you feel funny we are grabbing at a very funny look we are using the model for control so the feedback if you have a perfect model this feedback control law will stay completely silent it will not do anything because the model does everything in that case but if this feedback control law turns on then this means something is wrong with the model and it's command, like this could be a PD controller for example it's command can directly be used to train our robot can directly be used to train our robot model and finally we have something really funny this could be used for the so called digital teacher by Jordan which you can use in order to train a mixed model in a mixed model we usually have the big problem of trying to figure out how to learn an inverse model and you have an ill-posed inverse learning inverse model learning problem in this case though you can usually still learn a forward model and you can use the output of the forward model to create a surrogate error for your inverse model or to create a fill in information for your inverse model and well that's very very helpful for many things generically robot learning obviously model learning has a lot of problems when it comes to high dimensionality, smoothness discontinuities noise, missing data our data sets are at the same time too large and too small on the one hand we have a continuously growing data set which grows at the rate of between 500 and 8 kilohertz so they grow really really fast on the other hand all this data is highly correlated so you need to actually subsample to get meaningful data out of it if you want to use it for supervised learning and well when you look at a task like if you look at this ball and a cup task well there's only one data point which actually matters that's the whole trajectory together that's the reason why we usually have well also two small data sets because they're also episodic and then we want to have online updates for model learning and well in some cases we can incorporate prior knowledge like physics that's obviously super helpful and then there comes an issue we normally never have in machine learning we actually need to worry about the safety of our robot and the safety of all humans involved which is also kind of non trivial so let me take you to two examples the first example is one of where we want to learn inverse dynamics and for well inverse dynamics you could now take the best possible model by the manufacturer which is based on cat data and well some experimental values and you see basically here a desired trajectory which is the green one which is from a human data set human drawing data set the one which the robot is chiefs in red and you notice that well if you just train offline on a static data set you can do somewhat better than by supervised learning then you would actually be doing if you were just use blindly using the physics model if you additionally allow the system to train online in real time you can actually get a near perfect control performance now this is a non trivial function approximation problem just look at it quickly for robot arm this means you're mapping 21 dimensions to 7 dimensions so that's pretty high dimensional given that you want to be really good at them or even worse if you look at a humanoid you're mapping 90 dimensions to 30 dimensions and that is for a humanoid with just 10 dimensions then you want to learn in real time so online adaptation is crucial and you have an unlimited stream of data most of our work up to recently I would have said there's never that you're going to be able to do gps in a classical sense but there will never be neural networks effective there PhD student Michael Luther just proved me wrong on that I did it even in real time with the neural net quite well working but in terms of real robot results up to recently there were really just the LWPR approaches by Stefan Schaal and as well as the local Gaussian process approximations which we did on this so how would this work well we would try if you wanted to make a GP work in online learning but obviously we can't use the N-Cube computation power of computing the inverse instead we would need some form of activation function which allows us to take well a subset of the data points and with the subsets create a GP model and let me show you how this looks like for a data set where you can do this offline you directly of course you can use more data points which makes the local GP slightly better than the the so you can offline you see the comparison between offline trained LGP and the Gaussian process regression and you see them the online version where the online version obviously can do much better and now let me show you this automatically new back in the slides let's directly skip actually the first one is with the classical physics model and you already notice there's a little bit after it and even worse if you now start to destroy it it will not be able to recover from it subsequently you need to notice one thing this is very different from the industrial robots people use because we have made this robot totally squishy if you want to have safe robots around humans classical industrial robots would be like this table I could not push against them but they also would have a totally stiff feedback controller control while we make them so soft like humans that they can always push you outside since you don't want to invest a lot of energy into it this year is with online learning and a learned GP model where all of the inaccuracies are well basically learned away and you can very heavily perturb the robot but nevertheless still get good behavior so now let me take you to the second case study I really shouldn't have eliminated the third I don't know when I did this I want to now learn how to move directly in finger space and that's obviously a super important problem thank you powerpoint this super important problem you can see two instances of it the first instance is the one of a robot having only an IMU vector standing on a pretty rough surface in this case the center we want to control the center of gravity of the robot always above its foot polygon that's a nice learning problem for a nice problem for operational space control but you want to modify the reference we're going to follow a reference acceleration in the upper problem you see in this case a robot arm and it has a glass of water on the end effector and it should follow a figure 8 without losing the glass kind of unlike me and it turns out that getting these is to work with classical methods is really really hard but of course we would like to use learning for that and learning requires learning of course now solving a non regression problem in this case by dealing with it like a regression problem since we basically would have a totally unique both this point and this point would give you a physically sensible prediction but going the opposite direction would actually give you a physical nonsense somewhere could give you a physical nonsense somewhere in the middle and well there's two different there's different ways of how you could deal with this one is to only focus on one of the modes while trading your system to have a preference or you could regularize your system so that it always would remain physically plausible and there our knowledge of physics helps us because our knowledge of physics actually tells us that all physics relies on a fact that our torques are minimized in a squared sense but the metric in physics is usually the inertia metric but this is even instantaneous reward so it's not even a long-term reward so you can use it directly as a regular riser in a supervised learning problem you don't even have to make it you don't have to make it a long-term reward as a reinforcement learning so once you take this assumption and you pull it into your system in order to get the regression task well your model learning becomes basically a weighted regression problem where waiting where the waiting becomes well smaller for data which goes away further from physics and only a mode remains in the end so here importantly it's kind of an exponential transformation of this or it's an exponential transformation of this reward with some temperature in there and subsequently well you're being pushed into the banana and sometimes you actually create a preference for one of the modes we use this within locally weighted linear regression so where the weights get in here into the regression problem I saw there was regression on the slide before so this morning so I'm not going to review for you the weighted regression unless you want to no perfect so we take basically the regression in this case and use as the weights now these physics based weights it's in this case just a square torque and what happens well we actually can get a control law which follows nearly perfectly in simulation as well as on the real robot and well let me show you all this looks on the real robots for this particular robot kind of an imaginary trajectory which is following with these with the red dots here it's following the the red dot or the red ball on a stick as well as an inverse kinematics task operational space control learned task so key conclusion for us in robotics though when you can learn a model learn the model and sometimes learning an inverse model I mean learning an inverse model especially is a useful thing but sometimes it really requires averaging over multiple non-convex solutions inverse models are super useful if you can have them in fact they're more useful than forward models because they're not so prone to the optimization bias but learning good models generically is well also super hard so now I'm actually done for what I wanted to do today and we have only seven minutes left right so I best stop here and you guys get your coffee break and you guys start with a hands on tutorial after that right so okay and then tomorrow we do reinforcement learning