 All right, so last class of the semester, right? I mean, last lab, last time you see me. Well, not necessary. You can always see me, right? I have a YouTube channel. You can always come and see me whenever you want. So what do we talk about today? Today, finally, we get to the third part of this three-part story. We started two weeks ago, right? So why I'm excited? Because today we're gonna be talking about something I've done research about maybe a little bit. I mean, I contributed to, okay? And so let me restart with the same slide for the third time, such that we get from the starting point, right? And so we had an action plan. We talked about the first time about model predictive control, right? So we saw how we can do, we can compute and perform back propagation through the kinematics equation of, you know, a vehicle. And then we saw that we are gonna be minimizing the cost energy with respect to the latent. Actually, let me show you what I was doing these days. Remember, I was telling you about the Kelly Brison algorithm. I also told you a little bit about this and that it was used for sending things in orbit, right? Just send a space shuttle, I think, on the moon. So then I said, oh, can I do the same? I don't know if I can do the same. So I just started, right? So how are planets attracted by other things, right? By gravitational law. So this is the gravitational law. Then I just started playing a little bit with physics. I mean, not big deal, right? I just wrote a few equations. Well, I just started with this one, right? This is like how you should be eager to just try out things, right? So if you have an idea, just try it out, right? Unless you put yourself on the computer and try something, you really cannot see if this goes anywhere, right? So again, here I just define a body, which has an initial position, initial velocity, a mass. And then I define this equation here, which are defining basically the state equation, right? The state transition equations for my item, right? And then in this case, I initialized two planets, basically. The blue one has a hundred times is more, is a hundred times more massive than the red one. And the red one has an initial velocity. I can show you here, right? So the blue one here has a mass of 100. There is no unit here. I will change this. And then it has an initial position, 0, 0, the velocity is 0, 0. The other one, instead, is an initial position of 10, 0, right? So it starts at 10. And then I gave it initial velocity, right? So then it starts orbiting around. And then it basically drags the blue one around. Anyway, so then I was trying within this kind of environment to control something. No, I haven't yet started. But the point is that you should be able to have, well, we gave you a vocabulary, right? We gave you a toolkit of things you can play with, right? So if you're curious, you should be curious. You should try to do things that you like, no? Why things that you like? Because you're gonna put effort. And then something's gonna come out eventually. Anyway, this is just side project. I haven't done much there. Anyway, we are doing the recap, right? So I was going to play with a Kelly Brison algorithm for these two things here. Then we saw in the second lesson about this thing that it was like last week, we saw the trackback wrapper, how we were learning a emulator of the kinematics from observation. This is because we don't necessarily have those updated question, right? I started from physics again for this gravitational problem because I wanted to see whether I remember how things work. They actually are working. Fine, but let's say those equations are always gonna be approximate, right? So instead, what you observe is gonna be definitely what actually happens. So that's why we may need to learn, right? And the kinematics, which has all these additional parameters that I might have neglected, right? Let's say friction, let's say, I don't know, whatever. You can always add additional terms, right? But then if you actually learn something from observations, it's gonna be exactly that. You don't miss things because you didn't think about, right? It doesn't need you to have domain knowledge. That's important. Then we saw that we were able to train a policy when I told you we can. I didn't show you actually it was working last time, but I'll show you the other example where someone make it work on GitHub. And finally today, the third part. So PPUU planning and prediction and planning under uncertainty. So this has the only gray, no, maybe not. But it has like a combination of all different things we talk across the semester, right? So one of the things actually I even haven't told you how Dali was working, right? I told you during the generative network lesson that there was this Dali which is generating those very realistic or very cute if you want images given a caption. I didn't tell you how it works. And no one actually complained. You should complain more, right? Like intellectually complain. Like if I didn't tell you things should be eager to know how things work, right? So you should ask, how does it work? And then I can tell you because also I forget things, right? I'm not, I'm not, I'm forgetful. I'm not on purpose, forgetful, right? Anyway, I tell you later. So today we are gonna be talking about stochastic environment. What I was thinking, there is other stuff moving on around like there is, we are not alone, right? We are not just alone, like, yeah, okay, double meaning, right? The point is that there is other stuff we don't control and if other agents come into play in our environment then you cannot really know what's gonna be, right? And they are not necessarily predictable if they have free will or whatever. So we're gonna be introducing that. Then we're gonna have a uncertainty minimization basically to avoid going in areas that we don't necessarily know how things work, meaning be careful if you don't know what you're doing, kind of. And third one is gonna be like some important part we already mentioned a few times that is this latent decoupling, like how to avoid the latent to actually capture too much of this final output, okay? Ready? Go, no, okay, let's switch slide. Next, next deck of slides. I'm gonna be talking about this article that we wrote, well, Michael wrote together with myself and of course, yeah. So this is the model predictive policy learning with uncertainty organization for driving in dense traffic. Again, all maybe weird terms that we're gonna be understanding for sure. By the end of the class, okay? So on the right hand side, I'm gonna just show you a slightly different notation because this is actually a few years old now on the slide, so they are not matching the last one but it's already made a presentation. So we didn't change because it's already, I just tell you about something I've done. So in this case, I'm gonna call the state S of T, right? We call it X of T in control is called X, bold X, pink bold X, right? Should have been on the right hand side instead, we have an action AT which we call UT, right? Bold orange U actually function of time. And we know that since it's orange, it's a latent, right? So the U control or Z latent or in this case, A for action. So all of them are interchangeable symbols using different type of fields and certain circumstances, right? So this is more of the notation you would be encountering if you're doing reinforcement learning actually, right? So state with S action with the A, C for a cost or actually the user reward, we don't use rewards. So we don't do reinforcement learning but we are using now perhaps a little bit of notation from that field, okay? Anyway, we have this agent, no? This agent that can take, so what is the agent? Agent is something that can take decisions, actions, ATs given that it finds itself in a given circumstance in a given state S, T. And given that it performs an action, you're gonna be observing some C, some consequences, okay? Some cost, okay? How do we observe this stuff? Well, we are basically acting in a real world, right? So you're gonna be performing actions and given that the word was finding itself in a given state you perform an action, like if they move forward then the word will change its state correspondingly which is gonna provide you to use the next state and then also what is the cost associated to your action? Again, this is something I guess you already seen or but it's not explained in a different manner so I'll still go through. So then in our area there is this difference between model base and model free. What's the difference on the model free? You're gonna be learning by experiencing with a real world so they're right inside. Whereas the model base instead, you're gonna be not experiencing thing but you actually think things through, right? You should always think through things, right? Things through. Don't just act and try, oh, I got burned. Now you shouldn't get burned first, right? Just be careful, right? If you know that the fire is hot then just don't put your finger in the fire. Otherwise we'll burn. Anyway, the point is that here we are gonna be using a model of the word in order to be doing the thing we've been doing for the last two lessons which is gonna be back propagation through time in this a chain between agent or policy or controller and the word model or forward model or state update equations, right? Cool. So model base, right? So how do we learn this model? We already seen this stuff last week. We have a real world. We share the initial condition, the initial state ST at T equals zero. And then we provide a given action to both of the real world and the emulation of the word, right? This action last time were basically the random, right? The random degrees of random steering angles for the wheels. It can also be action performed by some expert agent, expert drivers, perhaps if we are driving. And then we are gonna be collecting on the right hand side both the state, the outcome, right? And the consequences, the cost CT. And all we are doing is gonna be just regression and try to match what is the outcome of the real world with our own emulator of the real world, right? And so here we are learning the word model. The model is called, we know already this stuff, just repeat, yeah, I'm not gonna be going slower. I just keep trying to keep the pace. So model base, in our case, in this case, we remove this cost CT and the cost CT is not provided by, in this case, and it's not provided by the word, but it's going to be a function of the actual output state, of the state itself, right? So we imagine now that the cost is actually a differentiable function of this state. So we have this model predictive control, which we have been talking for the last two weeks ago. So we have this given state on the left hand side and we have the action that's gonna be taken by the agent. We provide the action to the word model and then we're gonna be computing a cost CT on top of which one is the state in which we find ourselves, okay? So that's our current configuration. Cool, so let's talk a little bit about the data, right? So we started saying that we are gonna be introducing now a stochastic environment, what does it mean? So here we basically have cameras mounted, like seven cameras mounted on top of a 30 story building facing a highway piece, right? The I-80 highway. So here you can see that there are these different cameras oriented in different orientations. So you have also this kind of distortion, right? From the acquisition part. Then I perform a rectification. So basically I changed the point of view from top down, from like side view to top down. Of course, there are distortions, like because things are assumed to be on the street plane and so buses and other things get all distorted but that's what we get. And I perform a, sorry, localization of these vehicles, right? You can see there are bounding boxes all around. Not only localization, but also regression of the length and width of these vehicles. So you can see the bus, while the track on the left hand side, the red track, or on the, in the middle you can see the pickup track, no? So now given that I found these bounding boxes, I can create a virtual simulator which has basically I replace the images and pixels, whatever with this kind of synthetic image, right? So this is first time we see in this course a synthetic image, I think, as this is in my labs. The top one are natural images, right? They have a given specific statistics. The image right now that I show you are not natural images, right? How do we get from top down view, camera projection? That's simply computer vision. You can go in an open CV and then there is like projections, right? So it's just like a deformation, right? You just apply a matrix. So it's just a fine transformation, okay? No big deal. So you can recover what are the fine transformation coefficients given that you know the orientation of the camera. So the camera parameters, okay? Cool, so I converted this into this view for the user which every car has a label, right? So every car has its own label. I can track all the positions over time, right? And you can see here on the left hand side you still have this red track, right? You can see in the middle the pickup track as well. And then on the right hand side you can see the bus, right? Cool, so here I'm gonna be introducing two new variables, right? So for every time step, for every discrete time step P I'm gonna have a vector P. I know here these vectors are not bold, right? This is already bothering me. I know I wrote this stuff but yeah, we decided not to use bold. Also we decided to use RL type of notation. This is not, I don't like it too much right now. But anyway, you know already this stuff. I mean, you know what I like or what I don't. So PT is gonna be the position out of each vehicle, right? So each vehicle at time, discrete temporal index T is gonna have a vector PT, right? So how many components does PT have? Answer my question. I'd like to see whether you're following so far. So PT is a vector. How many components does PT have? That's straightforward question, it's no tricks. Tell me, someone type something please. Yeah, so there is gonna be X and Y, right? So two components. And then PT is gonna be the velocity exactly at the given temporal index, discrete temporal index T, right? So it's gonna be VX, the X direction and the Y direction. So we have basically over time, you're gonna be ending up with a trajectory of locations and velocities, no? Basically vectors that are telling you in which direction you're going and the velocity, the speed. So we have trajectories now, cool, for each vehicle. Moreover, and this is interesting now, I also get the action AT. So can you guess how I got this AT? AT is gonna be the actions that the possibly expert drivers took in the car. So how did I get the actions from these camera views? Did I go in the car? Change in the X and Y, yeah. So basically I inverted the kinematic model of the tricycle we saw two weeks ago. And by inverting the, by observing, by having the whole trajectory, I can extract what were the actions that were taken in order to get to that position, right? So you can tell what is the acceleration, right? If you speed change, no, the magnitude of the velocity change, you know that there is like changing acceleration. If you change the orientation of that vector, there is some turning, so exactly. Cool, so these ATs are coming from the inversion of the kinematic model. Of course, this didn't come out, this was not too straightforward because there were jitters, right? So what are jitters? So sometimes the bounding boxes were moving like this direction. So the car is moving like this, well, from left to right, but then sometimes there is like one pixel up or one pixel down. So what happens if your car moves sideways and your model is the tricycle model? It happens that it cannot happen, right? So you're gonna get like the kinematic model will give you bad numbers, like plus infinities or something like that, right? And so then you start, you had to introduce some Kalman filtering and so on, right? So there is so much more behind engineering that goes on, right? So there is things that I tell you right now and things I didn't, I don't tell you because there is so much, you know, getting things to actually work, but anyway, moving on. So here we have now, we introduce actually, so the intermediate one was like a user-friendly representation which I have numbers and so on. Finally, I'm gonna create those that are the data that I'm gonna be feeding to my model. And in this case, I decided to go with a RGB representation, why RGB? Because I didn't use the correct tools and so I only had RGB channel, which is wrong. I should have used a better encoding tool, okay? So in this case, I just draw things with a game, Pi game, right? Pi, what's it called? Yeah, Pi game is called, I think. Which allows me only to draw in three colors, right? But if I want to encode one more layer, then you shouldn't be using that, right? Anyway, here we have that each car is gonna be represented with this green box, right? And then you have the pickup track in the center or the bus over there. The lanes are gonna be shown here as red lines, right? So each channel has a meaning. And then I basically crop a context image which is that rectangle you can see here around each vehicle, right? And so each vehicle now will have not only its position, not only its velocity, but it will also have this window of pixels, basically, which is oriented with myself, which is following me, which is gonna be letting me know what is the configuration of the lanes in that specific region and the configuration of the traffic, right? Of the other vehicles around me. And so basically, per each vehicle at each timestamp, you're gonna be having a context image, right? And I also move myself from the green channel to let's say the blue channel in this case. So I don't, I can differentiate myself from the others, right? And so I have all these types of context images, right? They tell me what's the situation around it. And so you can see, for example, in the center, we had the pickup trap there, right? And then since it was slightly turned to the left, you're gonna see that all lanes are slightly tilted to the right and side. Or you can see the bus in the top right corner there. And so these are my images, IT, or perhaps observations, if you want to call it, okay? And so eventually the collection of position, velocity and context images is gonna be making my state at time, discrete temporal index T, okay? And so this is like my, you know, the stochastic environment we're gonna be dealing with right now. The state is made of a collection of tensors, right? So it's a set. It's not even more a vector or a tensor. Anyway, so what do I have to tell you now? We told you, I told you about the state. Now I have to tell you how I compute the cost on top of this state. So let's say you're driving, right? So when you're driving, what are you gonna try to do? What do you want to do when you drive to survive a, you know, if you go from point A to point B, tell me. Okay, someone asked the same thing. So no crashes, reach the destination, correct. And also try to stay within the lane perhaps, right? Don't go crazy, exactly. So now I'm gonna be computing a differentiable cost of the state, which is telling me how costly a given state is, basically how close I am to other vehicles or how far from the center of the lane I am, right? So this is what I came up with, which is not necessarily right, but it actually did work. So we go for this right now. Cost, first part. So the left-hand side, I'm gonna be telling you about the lane cost, and then on the right-hand side, I'm gonna be telling you about the proximity cost. So on the left side, you see that there is this lane with triangle, which is centered within myself, right? Within my own blue vehicle. Now, if you embed this within the lane, so you're gonna be seeing that if I am not within the center of the lane, then there is gonna be some overlap, right? And the overlap is gonna be my cost. So in this case, I just do the multiplication of this triangle, no, this kind of envelope, no envelope, yeah, this kind of envelope with this red channel, right? So it's a pixel-wise multiplication, right? Of these two shapes. And then I take the max, basically, right? So you have 0, 0, 0, 0, you get this value, then it's gonna be 0, 0, 0, 0, 0, 0, 0. And then I just take the max, which is this x here. If my blue vehicle is gonna be exactly in the center of the lane, there is gonna be no overlap between the red color and the aqua color, okay? I hope it makes sense. On the right-hand side, we're gonna have something similar, but for other vehicles. So I still have this transverse cost now, which has more or less, which is a width of the lane width, which is basically telling you, oh, if you're a lane apart from your neighbors, cars on the left and the right, then you're okay, right? And then I have also a longitudinal type of cost, which is going to be telling me how close I can be to the preceding and following vehicle, right? In this case, the safety distance, which is the extent in which this cost propagates, like it happened to be within the x direction, which is the direction of motion, is going to be depending on the speed. In this case, it's actually linearly dependent. So if I go very slow, I'm gonna have a smaller safety distance. If I go faster, I need more time to be able to brake, so I'm gonna be having a longer safety distance. There are some numbers, like user is like, one second at a given speed, cool. So how do we embed this in the environment? So we have that, like I just put myself basically turning on the left-hand side, there is this vehicle in front of me. So you can see here, there is a very high cost associated to this transverse cost, but then on the longitudinal cost, maybe it's further down on the slope. And so what is gonna be my final cost? My final cost is gonna be the product of the two. And then, basically I take the max of the product of the two, such that if in one case it's zero, it means that if I am exactly to the side of this one, of this vehicle, I will have zero transverse cost. And even if I am exactly on the maximum part, it's gonna be having the maximum value times zero. So it doesn't matter. Or in the other case, if I am exactly behind this vehicle and my transverse cost is the maximum, but I'm further away, well, if I'm further away, then the product is gonna be still zero. So it's also fine, okay? So I take the max of the product of the two, cool. So how do I actually do this in practice? Well, let's have a look here. On the left-hand side, you're gonna see a situation where everyone, actually where my blue car, my ego car, is moving at 20 kilometers per hour. And on the right-hand side, instead you're gonna see a similar situation where my speed is actually 50 kilometers per hour. Because as you can tell, there is like a larger distance between old vehicles. On the left-hand side, I'm gonna be showing you the outer product of this longitudinal transverse proximity cost, which is coming up to be that small yellow blob, which is highest value is gonna be one in the yellow region. And then the purple is gonna be telling you it's zero. On the right-hand side instead, since we go at a faster speed, you're gonna have that the safety distance is gonna be longer, right, larger. And so you're gonna have that the extent of this mask, of this kind of shape, whatever you want to call it, is longer, right? Such that now it picks up things that are further away. Nevertheless, the extent in the transverse direction is gonna be the same, right? You still look at this, you know, but regardless of the speed, I still want to be one lane away from the neighbors. Cool. So here you can see how the difference in extension, what is the difference in extension. And it is differentiable, right? So what is differentiable? Whenever I compute the cost, I said I multiply this mask, now this purple to yellow mask, which is a zero to one, you know, basically grayscale one layer, I multiply with the green channel of this image, context image, right? So you had the green channel, which have the content represents all the green vehicles, right, all the other vehicles. The green is gonna be one and non-green is gonna be basically zero. You multiply by this mask, right? Take the max, then that's gonna be my cost, right? So if you are very, very, very close to a vehicle, there's gonna be a very high cost. Look, you know, if I'm far from everyone, then it's okay, low cost. And so this is the cost and the previous one we saw the state. So it should be, okay, right? You already know everything, more or less. You should be able to, you know, to put this stuff together, right? Let's see. So what are the necessary steps to put everything together? Well, there are three major steps. And this is the outline of today talk. So we took the first half an hour to introduce the talk and everything. Then we're gonna have a talk maybe now for the next 30 minutes if I finish on time. So first, we're gonna be trying to learn the word model from the, we said the real word, right? This, whatever I define it right now, we try to mimic this real word with this regressor. So we go from the real green word to this blue cartoon. Second point is going to be going from the cartoon transfer or extract knowledge to train this policy, right? To train this brain, the agent. Finally, guess what? We're gonna finish the circle. We're gonna be having, we put the brain in the environment and we're gonna be interacting, right? So there is a full cycle, the green to the blue, blue to the pink, pink to the green, right? So we can be doing the evaluation. All these points can be, you know, work further and this is just, I'm gonna be going through each of them right now. Ready? Go. The word model predicting what's next given history and action. So we have this word model and I again, I had to apologize in this case because I'm gonna be using a different notation from the bullet points we've been introducing because again, this stuff, it's old and my understanding just recently improved. So graphs and charts and things are slightly not consistent today. I ask, I say, I apologize. So we had this word model, which should be a bullet point like a bullet diagram, right? Which is still a circle, right? Okay, so this one is the same. So we have a state going from one to T. What does it mean one to T? So if you only have, in order to be able to capture the whole situation, given that we have speed and position, we could just need one point, right? If you just have access to position, then you may need two points, right? So if you have two points, you can have location and speed, right? But now we have also other vehicles, right? And so if we won't tell what's the speed of these vehicles, we need two images, right? Now image now and image before. But then if you want to change whether they are accelerating or breaking, then you need three images, right? So if you want to tell acceleration, you may want three images. Nevertheless, here we just take a key number of previous frames in order to have the complete view of the history, right? Or at least a reasonably complete view of the history, right? This is kind of a very common approach. And then the word model is gonna be also getting this action AT is controlled. On the other side, what we're gonna be asking is gonna be doing a prediction, which is gonna be our S tilde, which now is called S hat here. But again, it's S tilde, which is our prediction. And again, the colors are orange. Don't care about the colors, okay? It should be the violet color. Anyway, so these are my predictions. The real word, what does it do? Well, the real word gives me basically the observations. Those are the blue, okay? The Y is the blue targets, okay? That's correct. All right, cool. So let's train this, right? Why should be hard, right? So what am I using? So I have my state here and the action. I'm gonna be using a predictor, right? That's what we've been calling this. The predictor gives me the hidden representation of the future. And then I'm gonna have a decoder, which is gonna be giving me my prediction, which should be this S tilde, the predicted version. On the other side, I'm gonna have the target, the Y, my blue Y. And then I'm gonna be having in between this energy block, which should be red square, which is our spring, that is trying to get the prediction close to the target. Okay, cool. Is it gonna be working? Tell me, does it work? Yes, no. Just using a predictor decoder network to predict the future, right? What do you guess? Does it work? Does it not work? No one says anything. Just play with me. Yes, but the noise may be an issue. Yes, exactly. That's a very big point. You're very correct, right? So let's see how it works, or it doesn't, right? So here we go. So we're gonna have the directional motion. The is gonna be going from bottom to top. On the left hand side, you see the actual future and what is actually happening. And on the right hand side, you're gonna see this deterministic model, okay? The frame rate is gonna be 10 frames per second. And again, you see the timestamp, the discrete time index T on the top right corner. So here we go. You can see that after basically four seconds, things get very blurry. And then now we are like basically at 10 seconds, things are completely unrealistic and nothing works. And so Alfredo sucks. Maybe, okay? No, maybe it's not my fault. Maybe it is, I don't know. But nothing works, right? So what's the problem here? As Jan pointed out, the noise and issue, right? More than the noise, the uncertainty, right? So if you have multiple outputs and you don't have, if you try to train with the MSC, then you will not be able to capture multiple outputs, right? So how do we fix this? I really told you this like 10 times already so far. How to possibly have multiple outputs given, yeah, we are gonna be using an energy model, which type of energy model? You already know the answer, right? That was the full thing. Mm, da-da, energy model, no? Energy model, there's something before. Two more words. How to possibly have multiple outputs given one single input? We need a, it's correct, your answer Camilla, but you need to add two words more. Yeah, one too many, how to do that? It's called, we know this answer, right? Yes, we do. Mm, mm, it's in orange, what's in orange? Latent variable, yeah, we're right, yes, correct. So we just need latent variables, right? Here we were trying to learn a multimodal distribution given that we don't have latent. Well, nothing works, of course nothing works, right? There we go. So okay, they issues this one, but we already know everything. So this is a pencil, you leave a pencil, fall it down, what happens? So let me draw from top down because I cannot draw things with a projection. So I have a pencil, I make it fall in many directions. Every time you're gonna get basically the rubber, the tip is gonna be going in different directions. You try to learn this with an MSE, well, you just learn the average, right? Of course, we already learned, right? So if you have the temperature that goes to plus infinity, the limit for beta goes from zero to zero, right? It's gonna be the average of all those things which don't work, we know that, right? Cool. So what's the average image? Tell me, it's a blur image, right? It has everything inside. In the right hand side here, I just show you the average coordinate. It's gonna be the zero, zero in this case, right? But in the other case, since we have pixels, you just get a pixel blur, right? Blur is average. I hope you know this stuff, right? I should be, okay. All right, so how do we fix it? Well, you told me, we add latent variables, right? So we can have a variational predictive network, yay. So what do we do? Well, we add a Z, ZT. Oh, of course, ZT should be in orange, right? So all the colors here are wrong, okay? So ZT, at least the letters are correct, should be a low-dimensional latent variable, right? So it should be a bold orange ZT which goes inside an expander which is just a linear module that is making dimensions correct. Then I sum this latent to what is the hidden representation given to me by the predictor? Should be okay, right? Despite being wrong colors, everything is fine here. Cool. So where do we get this latent ZT? Well, we saw in a few lessons ago, we can use a variational sum thing, right? So here I'm gonna be encoding my ST plus one inside with this encoder which is gonna be giving me the mu and sigma square, right? The mean and the variance of this possible distribution of Z and then we sample, right? With these values, right? And so the right hand side here is gonna be variational out encoder, right? I have the input, I have the encoder for the parameters of the distribution. Here I sample from that Gaussian, I get the Z that goes inside this up something, whatever, and then it goes inside the decoder. But it's not just a variational out encoder. It's a conditional variational out encoder, right? Because there is also X, right? So this is my Y, right? This is my Y tilde, and here is my X, right? And so this is not just a variational out encoder, this is a conditional variational out encoder. Can you elaborate more about the intuition of the design of the encoder for the latent variable? This one is simply going to be a convolutional net, right? Which is, so there is two branches. The inside this guide, there is a convolutional net which is feeding, being fed with these context images from the states. And there is also an MLP which is fed with the position and velocity, right? So there is a four-dimensional vector, X, Y, V, X, V, Y. And there are RGB basically context images which should not be RGB, but they are. And so these are going through a comb net basically. And this basically looks like a, I would say almost like a classification network which outputs, I think, 16 outputs for the mu and 16 outputs for the log sigma square, right? Log variance. So that's the architecture inside. If that's what you're asking, I'm not sure if I answer your question. And this is gonna give me these parameters, right? Okay, okay, cool. And so this is gonna give me the parameters from which I sample this Z, right? So Z comes from this Gaussian which has been parameterized by the encoder. Moreover, we need something. So this is gonna be usually called the posterior distribution, blah, blah, blah. We didn't talk, I mean, I didn't talk about this stuff because we don't care. And also, of course, there is gonna be an additional term which is the relative entropy, no? The KL with the prior. Why do we need the prior? Well, because we had to actually sample later, no? On the inference, right? Because we don't have access to the future in the inference time. So we're gonna be sampling from this one over here. So inference, how does it work? As we have said, since we have a relative entropy with the prior, no? We can simply sample from the prior. Then we're gonna get the latent here. We're gonna be getting a prediction as whatever this s hat t plus one. Then we're gonna be feeding this one in the input and we're gonna get the second prediction. Then we feed to the input and so on. So let's look how this stuff works. On the left-hand side, you see the actual future. The second column is gonna be the deterministic future. And then on the other four columns here, you're gonna be seeing the following. So I'm gonna be sampling 200 latents for the first one, 200 latents for the second, 200 latents still from a Gaussian, no? From a normal distribution for the third and fourth. So overall, we have 800 latents, right? Which I just used 200, 200, 200, 200. And so this is how it looks. So pay attention to these two vehicles, right? I put them in a circle and then a square. You can see how they basically show the display different behavior. Everything is consistent. The deterministic model already died. And the conditional variational encoder just works very well. And then you can see that the basically traffic, the configuration of the other vehicles just changed without any issue. And this stuff just works. This was great. Why did we use a convolutional network? Because pixels are where the easiest thing we thought about to encode this position of an arbitrary number of vehicles, okay? We didn't know when we wrote this not three years ago how to deal with sets of inputs. We didn't have transformers yet. Transformers are something very relatively recent, right? So potentially if we would be doing the same stuff these days, we would be using a transformer which allows us to deal with a set of vehicles that are nearby, right? So this is connecting back to the question to the young ask. So we use a convolutional net because we use images to encode information. These are not natural images, these are synthetic images which caused that was the most straightforward thing to do for us, like the most, there were no other options kind of. I mean, we tried to do some sort of permutation in variance network, but we were not smart enough to come up with a solution. So that's the answer and adding to the answer of the question from before. Cool. Everything works. So what next? Well, things are a little bit annoying because now there is a path that goes from the future to my prediction. And unless this ZT has some sort of constraint, like we're gonna be figuring out that we're gonna be turning to the right just because the latent has captured that information. Like we see all the lanes turning. And so the latent will necessarily reflect that everything turns, right? And we don't really care, unfortunately, of what action was actually taken by the agent. And so this basically makes our model insensitive to the actions that we are taking. So this is gonna be completely useless, right? If your predictive, if your conditional model is unconditional, well, you can't use it in a conditional way, right? So if there is no connection between your condition and the output, if you cannot move the output, given that you move your actions or your condition, then this stuff becomes useless, right? And this comes, this issue arises from the fact that the latent encodes too much information. So now we had to try to struggle, no struggle. Well, throttle, what's the English word? You know, you had to squeeze out some, we have to limit the information capacity of the latent, right? So otherwise the model cheats throttle. Yeah, I guess it's throttle. All right. So let me show you the issue here. On the, in this first part here, we're gonna see the real latent and the real action, right? So this is basically reality. And so this is how the car basically steer a few times. You can see the lanes moving all around. On the left-hand side now, you're gonna be seen instead. This is sped up four times. In this case, you're gonna see the real actions, but random, but different set of, different set of latent. You can see that it doesn't change. In the other case, in the first set, and the left-hand side, you see the real set of latent, but random actions. You can see now as I'm using the real set of latent, things were turning, right? Even in this last frame, you can see that the last, in the last case, we were turning left so everything turned to the right. And in these two images, although the actions are the same, the lanes are not, you know, reflecting such rotation. In the left-hand side instead, you can see that the latent is actually encoding the fact that we are turning, which is not what we want, right? The latent should be only encoding the, what the other vehicles are doing, not my actions, right? I want to decouple this. And so in the next few slides, we'll be seeing how to avoid the fact that the latent are encoding my actions, right? So let me show you again in this video. Maybe it's gonna be a bit clearer. So here is gonna be the reality. I just tear a few times so you can see things turning. In the other cases on the left-hand side, you're gonna see the same is actually actions. And so I would expect similar type of behavior. So we had the same actions, but not things that don't change the same way. And instead in the first column, you're gonna see the same latent, but random actions. You can see how the motion of the lanes are very correlated to my original one, right? So these actions basically leaked in the latent, which was not good news. This stuff took eight months, right, to make it work. So I mean, eight months to train the predictive model and then six months to train the policies. Took over a year, two people, right? So it's like, no, no, it was not easy, but nevertheless we end up making it work. So again, we have to find out a way to cut this path from the target to my prediction. So how to do that? Well, we introduced this symbol here, which is basically a switch. We already learned about this drawing here. And we basically switch back and forth between the prior and the posterior, which means that we have basically a latent variable dropout. We pick up during training. We pick sometimes the latent from the posterior and the latent from the prior, which means that we don't rely on the encoder here, the posterior to actually take predictions, like make predictions that are reflecting different actions, right? Actions can be always here, right? But then the latent might or might not be available. And so the encoder, well, the predictor and the whatever, the encoder will now have to instead rely on what is the condition, the conditional input to take directions, right? I can no longer rely on this signal. Before there was a signal and now it just broke, you've broken the signal. Cool, does it work? Yeah. So in this case, I'm gonna show you on the right two column, right? Most columns, two random choices of latent, right? Two random samples of latent, but the same actions, right? Or the same real actions. You can see here how the lanes are actually all moving synchronously basically, right? All of them are moving because I'm steering. Whereas on the left-hand side, we didn't see such kind of steering. Cool. So that was the first part. We learned how to train a model of the world, given that we have a stochastic environment. Second part of the talk, no? I will have to, I don't know, maybe I have to speed up a little bit, maybe not. Because maybe things are easier now. So the agent, we're gonna be trying to distill the knowledge of how to move around all these cars without crashing, right? So right now we learned how to predict the next state, how to learn about the future. Now we're gonna be having to deal with using the future and the cost to navigate this environment. So we already seen the stuff. We already know we're gonna be having to use this left-hand side, not planning, we already know the stuff. So I'll just skip it completely. So how does it work? We have an agent, which should also be a bullet thing. Sometimes it's called PI, PI states for policy. This is a controller. It's fed with a state, which is position, velocity and the context image for the last t-time steps. And then we're gonna be trying to produce this action, a till that should be whatever, or a check, right? So it'd be a check because it's the action that is minimizing the energy, right? And so this is gonna be my acceleration break or steering right and left, okay? Right? So the actions are going to be, remember, right? So the actions in these predictive, whatever, in the optimal control are gonna be basically the latent input, right? The control. And then we were doing minimization of the energy with respect to the latent or the actions, right? So these actions that are in theory, the optimal action should be the a check, which are the minimizers of the energy. Here's a written a hat because I didn't know about this stuff back then. All right, so training the agent, we have a state that goes inside the policy. We got this prediction of the action. We provide state and action to the word model. We also input a latent that comes from the prior. So we already trained the word model, right? So we get to get the next state, which is gonna be going inside loss, which is gonna be, for example, in this case, my cost task. What are the cost tasks? Well, now it's gonna be just this linear mixing, linear combination of the proximity cost and this lane cost, cool. So then we go inside, again, the policy with this, next state, we're gonna get the next action. It goes inside the predictive model. We're gonna get a new latent from the Gaussian, from the normal distribution. We predict the next state, which goes inside the loss. Then you provide the state inside the policy, get the new action that goes inside the forward, the word model. You pick another latent, blah, blah, blah, the thing left thing, all right? So does it work? Answer, no. Why doesn't work? Because of course, things don't work immediately, right? So you put also that in the loss, right? So here we have a loss for every, we have a cost, actually, at each temporal interval, each temporal index, right? We saw this two weeks ago already. I made a pretty diagram for you. So that's what happened. We basically tried to, or kill ourselves here. So all these trajectories end up crashing into other vehicles, or we basically run outside the environment, right? Why is that? What's happening? Well, yeah, there is like suicidal intent of our policy, right? Basically the minimization of the cost was happening whenever we were taking crazy actions. So basically the policy was breaking the predictive model by coming up with some adversarial actions, which were never observed in reality. And then the predictive model was just output in black. So black is low cost. And so that's awesome. You achieved the lowest energy, the lowest cost. Basically the policy was breaking my predictive model, right? So how do we fix that? Well, we need to be able to avoid crashing and going off the road and crashing. So we perhaps should try to imitate the expert, right? So not just minimizing the cost, which is getting away from other vehicles and avoid going off road, but then maybe we also try to get close to what expert driver did, right? So this is gonna be my cost, my loss. Now it's gonna be the cost task plus expert regularization, right? So we also try to not deviate too much from what it was the original trajectory from that given vehicle, okay? So how do we do that? Well, we additionally add this target, right? To my future predictions, right? So now I'm gonna be saying the new predictions should not go too far from what were the actual next states taken by the expert. So I add another spring, another cost, right? There's another square, red square, another energy term, which is not letting me go too far from where I wanted to, where the original vehicle went. But now we are adding this additional latent. So we try to go for a specific option, but now we are trying to go as close as possible to what happened in the past. So the way to be making predictions, ST plus one, the S hat, no, the prediction, the closest to the actual targets is actually to remove this latent, right? Because if you don't have latent, you're gonna be getting the average prediction, which is on average, the closest to all possible things that happened. And now we are trying to go as close as possible to actually, I think that happened, right? If you pick a latent, you're gonna be diverging and you're gonna be picking some solution. But the easiest way to go to the real solution, to want to think that a solution that actually happened, sorry, is to actually not diverge, right? And so this was annoying, right? We trained, we spent eight months training this latent variable conditional autoencoder, no? And then I had to remove the latent. I'm like my heart broke, right? Did it work? Yeah, it did work. So now we don't kill ourselves anymore, but it's sad, no? Do you agree with me? Is it, isn't it sad? We had all this latent and now we turn up the temperature in the oven, all this latent, ah, they're all dead, right? I'm like, so do you understand what's happening right now? Yes, no? Yeah, heart broken. So now we did manage to train that amazing predictive network which allows us to predict any type of future, right? You sample different Zs, you have different type of futures, right? But now in order to actually make the policy work, this is too good of a whatever future, right? I need to constrain the future to be the closest to the thing that actually happened. I cannot really, you know, take other options in consideration. So this is like a limitation, I think. So can we do better? Yeah, we can, right? Because we have beyond on our side. No, I'm just kidding. Because we should do better, right? So let's try a different manifold attractor. What am I talking about? So the point here was that we were falling from the manifold. The policy was basically taking actions that were making us go away from our expert region, right? So if this is our region where we can make nice predictions, our policy was coming up with actions which were far away, which were, you know, breaking the system, right? And so here we try to get the actions within the realm of actions that were taken by the expert and therefore were within our confident area, our confident space. But can I do something similar without necessarily being tied to the actual action from the expert? Can I still be within here without having to copy the experts without doing this imitation learning? Can we? What did I tell you about last week? Could there be some benefit from using a policy that breaks our model to help train the model to learn hard areas? So I don't have access to the real data, right? So I don't have a simulator of reality. This would work if you have, you know, RL, but here I'm actually training my world model from observations, right? I have observations only from expert drivers. I don't have the ability to take new actions and observe what the new outcomes are, okay? We are not doing RL here. We are not trying things in reality. We only observe things. And now we try to train a policy from observational data only. This is very important. We are doing machine learning. We are doing deep learning. We are not doing trial and error. We are not doing RL, okay? So what did I teach you last week? Do you remember the last five minutes of class where we need? Tell me, what did I tell you about yesterday? Well, last week. Yeah, dropout variance. What is that stuff? That allows us to understand how far you are from your training manifold, right? From your data manifold. So let's use that here. So let's introduce a notion of distance in the state space, okay? And this is gonna be our solution to our problem here. And it works amazingly well, okay? So let's see how we can introduce we can introduce this model, forward model uncertainty. So we have our forward model. We have our specific cost. And then here is gonna be, let's say my cost. The point is that within the region where I'm training all the points are the red points. But then what happens outside is that predictions are arbitrary, right? And so now we can compute the variance across the predictions. And we can automatically estimate what is the uncertainty with which a prediction is made. We've seen this. We can do that when it's symbol of models or we can do this with the variance computation over the dropout output during the inference. So we introduce now this uncertainty regularization. Cool, so we have our predictive, our policy, which is feeding our war model. And then we have different, we have one latent, right? Which is giving my prediction as hot T plus one. And then we had the cost there, which is my proximity and lane cost. But then inside in this dashed line, I have multiple dropout masks, right? I forward the same latent and the same state and the same action multiple times with different latent variable. Sorry, with different dropout masks, which is gonna give me, of course, different type of predictions, right? Then I can compute the variance and then I have an estimate of which is the uncertainty with which a prediction is made, right? And so then I have a multiplier lambda. And then that is gonna be in total, like this is gonna be modeling my uncertainty. And then I sum the two and I'm gonna have my loss to optimize in order to find the parameters for the pie. So all together, same diagram, we have latent, we don't kill them. We don't turn the temperature anymore to hot, right? It's to all alive, they are thriving. And now my cost is gonna be my cost task that is the proximity and lane plus the uncertainty regularization. Does it work? And here we go. So the pink one are showing you more, basically they give the controller more freedom, right? Now we can decide to take actions which are completely different from the actions that the expert took for that particular trajectory, right? So given we had a given vehicle before in the yellow one, we are basically tied to take actions which were consistent with the actions taken by more or less by the expert. How was that? We were forcing our next predictions to be as close as possible to the next states we observe in the trajectory. And so that one automatically was bringing our controller, our policy to take actions which were going to bring us in similar states, limiting ourselves to safe actions basically only. Well, not only to save action, but it's a subset of safe actions, it's gonna be the safe actions that we're actually observed for that specific trajectory. Now, we enlarge the bubble of safe actions to all possible actions that are not going to be messing with my predictive model, right? So now we have basically have a spring, right? You stay within this bubble and the spring is gonna be given to you by this variance, right? So if the variance is large, then you can go as far as you want, but then you're gonna be exiting what is this comfort zone for my predictive model, right? And so the variance, zero variance is gonna be in the center of the comfort zone of the predictive model. And then you have basically the more you try to take different actions, the more you're gonna be increasing this uncomfortable, uncomfortness of the predictive model, right? I think it's super clear. At least now, from whatever we explained, so you so far should be very clear, although symbols are wrong in this lesson, but whatever, it's the last lesson, okay? Forgive me. All right, final part. The evaluation, given that we have the actions, like the policy train, the controller, we're gonna be trying to see how well or bad it performs, right? So how do we evaluate this stuff? Well, I'll just show you that I'm gonna be having this configuration. So in yellow, I'm gonna show you basically what is the original driver, the original observation. In blue is going to be my controller, which is taking arbitrary actions, and in green are the other vehicles. I'm gonna be running these three times as faster frame rate. So here you can see that we don't necessarily follow the yellow one. Actually, we are going forward and all the other green vehicles don't see us, right? So we got pushed down there and we survived. Again, on the center part, we see that we are always trying to be in the center between two vehicles, and now we accelerate because there is no one in front. And the last part, again, we're gonna be seeing that now a vehicle on the bottom is gonna be pushing up as up, up there, right? And so all the other green vehicles don't see us anymore, right? Because all the green vehicles are thinking that we are the yellow guy, right? And so in the first and the third case, we end up diverging from the original trajectory by taking arbitrary actions, which were still safe, but that brought us to different locations, which are now unsafe. I would say locations because those other vehicles no longer see us, because we are replaying what was seen during recording. So this is like a more kind of an adversarial type of situation, right? So other vehicles are just driving as they were, and we try to survive this kind of, kind of multitude of vehicles that no longer see us. Cool. So finally, a recap from the beginning, just to see whether we understand everything, right? So we talk about model predictive policy learning with uncertainty regularization for driving in dense traffic. So we talk about model predictive policy learning with uncertainty regularization especially. There is this minimization of the variance of the predictions in the action space, right? So we try to take actions that are minimized in the variance, late and drop out for improving action sensitivity because otherwise there was a de-confirmation going from the future to our prediction. Then we are using a large scale data set from driving behavior from traffic cameras. So this is completely from observations we don't have emulators, we don't have any kind of fake stuff. We just observe real people driving and then we try to learn how to drive within that kind of environment. Finally, I also told you how we can possibly copy the past with this expert regularization, right? If you try to also get to the same states that were reached by the experts. And then what's the question here? How do you think the scale of the data set affect this method? So definitely having larger data sets will help you have like a more robust predictive model like the forward model. It's basically always the case that things will happen that you never observe them. And so the point is that you and we'll end up in situations where your predictive model will be unable to clearly tell you what may happen, right? So the larger the observational data and the data set and the more refined it's gonna be your forward model. Given that your forward model is what you use to distill out your policy, your controller, I think it's rather important that it has, it's well trained, right? I don't know if I answer your question, maybe I did. And so, okay. And so finally, just a recap of everything. You can find more information about this project. So this is the again prediction and policy learning under uncertainty. That was me. You can find more stuff on Twitter, of course. This was an article written basically by Michael Hennaf, a project we've worked on together for over a year. And yes, Jan, of course, slides, if you want the slides here, available over here. There is an article, of course, from my clear and the code is available in PyTorch on my web, on my GitHub, and then there is a website as well, okay? And there is also a poster. So with this, we reached the end of class. How, what I should recommend you, right? So now there are final notes, closing notes. So final notes are gonna be, you know, the thing that, you know, you always should be hungry, stay hungry, stay foolish, right? So always be eager to do more and find, no, don't get lazy, right? Just be always wanting to grow more. There's always space for growing, right? For growth. How do I do that? Well, I always stay on Twitter, for example, to try to learn from the latest thing, right? How do you learn about the latest paper and latest things, right? So we know already that maybe you don't know, you will do. There is Yannick, right? So Yannick makes videos of latest papers and there is this Dino, no? This is very recent paper from Facebook, right? They even include the link to the Yannick video on their website, right? On the original, on the official GitHub repo, right? So actually there is the, there is also here the latest video, right? So you always should be up to date with latest things if you want to, you know, succeed in this field, I guess. The easiest way is gonna be, you know, hang out with the community. If you need anything, you don't understand something, I'm always here to try to, you know, give an answer. You can always reach out. I didn't tell you how Dali was working. That's basically a variational out encoder which is condensing an image into smaller representation and that's concatenated with some tokens of the given caption. And then everything is just sent through a transformer which is trained as a language model. And so that was pretty much it. There's no magic. Now this sentence I think should even make sense. And so, you know, if you don't understand something, ask. First, look it up yourself such that you actually show, you try it. If you don't get it, you can ask. There is no shame in not knowing. I don't know many things. As you can tell, I've been changing things throughout the class because I didn't know them before, right? But it's okay, right? No one is perfect. We are here to improve. We are here for learning. I'm learning because of you, right? If I had to explain things to you, I do have to understand them and I cannot pretend I understand them because otherwise I don't make sense, right? I hope I do make sense. And so, you know, just let's keep pushing each other forward so that we can all move on, well, no move on, move forward, right? All right, so that was it. I hope to see you around. Well, virtually in person. I don't know, depends if these things stops, this pandemic. That's it, okay? Thanks again. It's been a pleasure, an honor being here for you this semester. I follow up again with these website contributions and things you can always work. Like we can still always work together on more content, right? I love education, you can tell. I hope. And so, again, if you're interested in this stuff, I'm always more than happy to partner up and do something together, okay? All right, take care. Been overtime again. It's the last time. Let's finish. Take care. Bye.